-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stepfunctions): Distributed Map #23306
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
This would be helpful to have :) |
Would it be good to have the bucket in S3ReaderProps as an IBucket in stead of a string? Also then it could be arranged the resource policy of the S3 bucket is amended properly? |
|
||
Step Functions provides a high-concurrency mode for the Map state known as Distributed mode. In this mode, the Map state can accept input from large-scale Amazon S3 data sources. For example, your input can be a JSON or CSV file stored in an Amazon S3 bucket, or a JSON array passed from a previous step in the workflow. | ||
|
||
A `Map` state with `mode` set to Distributed is known as a Distributed Map state. To create a Distributed Map state, set the `mode` property to `MapProcessorMode.DISTRIBUTED`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of this over let's say a new DistributedMap
state?
@@ -500,6 +485,27 @@ export class StateMachine extends StateMachineBase { | |||
enabled: true, | |||
}; | |||
} | |||
|
|||
private addDistributedMapStatePolicies(stateMachineName?: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this optional and using the wildcard *
later on seems like it's not required?
This is a private method and we only ever call it with the argument.
/** | ||
* Distributed Map State Resources | ||
*/ | ||
export enum DistributedMapResource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this provides a good user experience. What are these values for? What do they mean?
I could only find getObject
in the docs you've linked.
Preferably we would refactor this to represent the semantic meaning instead of leaking the technical implementation. At least we should provide documentation explaining each of these.
/** | ||
* The location of the header in the CSV file | ||
* | ||
* @default CSVHeaderLocation.FIRST_ROW | ||
*/ | ||
readonly headerLocation?: CSVHeaderLocation; | ||
|
||
/** | ||
* The headers of the CSV file | ||
* | ||
* @default - no headers | ||
*/ | ||
readonly headers?: string[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering about UX again. Maybe an enum like class would be better:
new S3CSVReader({
headers: CSVHeaders.useFirstRow(),
});
new S3CSVReader({
headers: CSVHeaders.use(['one', 'two', 'three']),
});
/** | ||
* The S3 bucket | ||
*/ | ||
readonly bucket: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reasons not to use IBucket
?
private validateProps(props: MapProps) { | ||
if (props.itemSelector && props.parameters) { | ||
throw new Error('Only one of itemSelector or parameters can be provided'); | ||
} | ||
if (props.mode === MapProcessorMode.INLINE && props.distributatedMapOptions) { | ||
throw new Error('distributatedMapOptions can only be used with DISTRIBUTED mode'); | ||
} | ||
if (props.distributatedMapOptions?.label && props.distributatedMapOptions?.label.length > 40) { | ||
throw new Error('Label can only be 40 characters long'); | ||
} | ||
if (props.distributatedMapOptions?.itemBatcher) { | ||
const itemBatcher = props.distributatedMapOptions?.itemBatcher; | ||
if (itemBatcher.maxItemsPerBatch && itemBatcher.maxItemsPerBatchPath) { | ||
throw new Error('Only one of maxItemsPerBatch or maxItemsPerBatchPath can be provided'); | ||
} | ||
if (itemBatcher.maxInputBytesPerBatch && itemBatcher.maxInputBytesPerBatchPath) { | ||
throw new Error('Only one of maxInputBytesPerBatch or maxInputBytesPerBatchPath can be provided'); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This long list of validations is an indication that we didn't get the UX right. There are so many ways a user can configure this wrong. We need to improve this by design.
@@ -59,9 +166,17 @@ export interface MapProps { | |||
* The JSON that you want to override your default iteration input | |||
* | |||
* @default $ | |||
* @deprecated use `itemSelector` instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? We need to also provide an explanation for deprecations. If not for the users, than for the people working on this code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is deprecated by StepFunctions service: https://docs.aws.amazon.com/step-functions/latest/dg/input-output-itemselector.html
Would highly recommend going through the complete AWS documentation for the feature to understand how these things work together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. This explanation should go into the docstring ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, but we need to iterate over the design. Or as a first start if you could share what your design thinking was.
@mrgrain Thanks for the comments and I do agree we could probably make it a bit simpler in some aspects. My overall thought around the design was:
Feel free to provide any other comments/thoughts and we can take it from there. |
Does this support the tolerated failure threshold parameters? I don't see them in the diff. https://docs.aws.amazon.com/step-functions/latest/dg/maprun-fail-threshold.html |
This PR has been in the CHANGES REQUESTED state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week. |
@mrgrain I have marked as re-requested for review since I want to have someone form CDK team review it as well once so that I do any major refactoring only once and I do not want this PR to be closed by the bot. |
Hi @ayush987goyal I'm not sure I understand what you mean by this, I am from the CDK team and I kindly ask you to address my specific comments (either by code change or discussion) to move this forward. If you are concerned (which is understandable) about multiple rounds of refactoring, maybe our RFC process would be more suitable for you. I'd say this feature is just on the edge of needing a RFC or just being a PR. Finally, you could also publish a custom construct independently of the core package. This way you can gauge interest and confirm the API before graduating it to core. |
This PR has been deemed to be abandoned, and will be automatically closed. Please create a new PR for these changes if you think this decision has been made in error. |
I really hope not 😟. I think this Distributed Map is critical to be inside the main CDK repo. I've been waiting on this for a while now |
Also very much looking forward to using this out of the box! |
ayush987goyal, could you create a cdk-constructs package of this? I would be super thankful to use your patch from there. |
I created a new PR for this feature linked above. I tried to take some of the feedback from this PR into account. Thanks @ayush987goyal for some inspiration and concepts. |
feat(stepfunctions): Distributed Map
Add native support for distributed map.
TODO: Write better documentation of the new features.
Refs:
closes #23216
All Submissions:
Adding new Construct Runtime Dependencies:
New Features
yarn integ
to deploy the infrastructure and generate the snapshot (i.e.yarn integ
without--dry-run
)?By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license