Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-step-function-tasks: Allow removing launchType from RunEcsEc2Task #7967

Open
1 of 2 tasks
mb-dev opened this issue May 13, 2020 · 16 comments
Open
1 of 2 tasks

aws-step-function-tasks: Allow removing launchType from RunEcsEc2Task #7967

mb-dev opened this issue May 13, 2020 · 16 comments
Labels
@aws-cdk/aws-stepfunctions-tasks effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2

Comments

@mb-dev
Copy link

mb-dev commented May 13, 2020

capacity providers allow using ECS with dynamic capacity. https://aws.amazon.com/tw/about-aws/whats-new/2019/12/amazon-ecs-capacity-providers-now-available/ , yet CloudFormation and CDK do not yet support this feature #5471.

I propose as a stop gap solution to allow users to submit EC2 tasks without launchType:
https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RunTask.html#ECS-RunTask-request-launchType

if launchType is provided, there is no way to get the task to use the capacity providers that are created outside CDK.

Use Case

I want to create capacity providers using boto3 and use them when launching EC2 tasks using step functions.

Proposed Solution

Provider a way to make launchType optional here:
https://github.com/aws/aws-cdk/blob/master/packages/%40aws-cdk/aws-stepfunctions-tasks/lib/ecs/run-ecs-ec2-task.ts#L62

Or to change it after construction.

Other

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

@mb-dev mb-dev added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels May 13, 2020
@SomayaB SomayaB added the @aws-cdk/aws-ecs Related to Amazon Elastic Container label May 18, 2020
@nija-at nija-at assigned shivlaks and unassigned nija-at May 26, 2020
@shivlaks shivlaks added effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Jun 30, 2020
@shivlaks
Copy link
Contributor

since RunEcsEc2Task and RunEcsFargateTask are going to be marked as @deprecated, this feature should be introduced in RunTask

@shivlaks shivlaks added effort/small Small work item – less than a day of effort and removed effort/medium Medium work item – several days of effort labels Jun 30, 2020
@shivlaks
Copy link
Contributor

shivlaks commented Jun 30, 2020

@mb-dev PR incoming once I merge in #8451 to make launchType optional. Step Functions does not support specifying capacityProviderStrategy as a parameter.

As per documentation you've referenced

If a capacityProviderStrategy is specified, the launchType parameter must be omitted. If no capacityProviderStrategy or launchType is specified, the defaultCapacityProviderStrategy for the cluster is used.

Although this unlocks some usage, it's still incomplete until users can supply a capacity provider strategy. Since we're limited I'm inclined to agree with your suggested stop-gap. It will still be compatible when capacityProviderStrategy can be supplied.

edit: added following question:

What happens if a cluster does not have a default capacity provider strategy? I believe that's optional too?

@shivlaks
Copy link
Contributor

per @mb-dev in comment

@shivlaks what alarmed me is that the PR makes the definition depend more on launch type than before. launchTarget sounds like launchType, and it's tied to either EC2/Fargate. I am not sure if it will be clear how to express a undefined launchType while still specifying parameters to be used at creation.

Currently no launchType might look like:

const runTask = new tasks.EcsRunTask(stack, 'Run', {
    integrationPattern: sfn.IntegrationPattern.RUN_JOB,
    cluster,
    taskDefinition,
    launchTarget: new tasks.EcsEc2LaunchTarget({
      launchType: undefined,
    }),
  });

If launchTarget would be named something like clusterConfiguration it might be easier to parse.

@shivlaks
Copy link
Contributor

@mb-dev launchType was never really exposed as a property (with the old classes or with the new implementation)
The thinking was to associate launchTarget closely to launchType.

I was thinking of introducing something like IEcsCapacityProvider to add the capability to specify capacity providers. Users would specify either capacity providers or a launch target (which would become optional).

const runTask = new tasks.EcsRunTask(stack, 'Run', {
    integrationPattern: sfn.IntegrationPattern.RUN_JOB,
    cluster,
    taskDefinition,
    capacityProviders: [new tasks.EcsEc2CapacityProvider({
      ...
    })],
  });

and adding types for EcsFargateCapacityProvider and EcsEc2CapacityProvider classes to help populate that list of capacity providers.

what do you think?

@mb-dev
Copy link
Author

mb-dev commented Jun 30, 2020

@shivlaks thanks for moving the discussion here. The capacity providers are a property of ECS cluster, while the task level property is choosing one of the providers specified in the cluster definition. Given that users might create ECS cluster or capacity providers outside of CDK, and that there's an option for default capacity provider (as this stop gap suggests)

there are actually 3 scenarios for using runTask in step functions:

  1. users specifying launchType
  2. users specifying capacityProvider
  3. users specifying neither capacity provider or launchtype - in this case the default capacity provider is used. See:
    https://docs.aws.amazon.com/AmazonECS/latest/userguide/cluster-capacity-providers.html
Default capacity provider strategy
A default capacity provider strategy is associated with each Amazon ECS cluster. This determines the capacity provider strategy the cluster will use if no other capacity provider strategy or launch type is specified when running a task or creating a service.

One idea is to expose DEFAULT_CAPACITY_PROVIDER constant that can be assigned to launchType or launchTarget that will make it clear we want default capacity provider

@mb-dev
Copy link
Author

mb-dev commented Jul 1, 2020

Capacity Providers were added to Cloudformation:
4ce27f4#diff-4a4e15a081904ee16b4d84e2f5cf5aee

So now it's even more important to be able to omit launchType.

Related: #5471

@shivlaks
Copy link
Contributor

shivlaks commented Jul 2, 2020

@mb-dev

The recommended workaround for coverage that isn't quite supported in the aws-stepfunctions-tasks module is to supply the needed ASL as a custom state. You can create your task, and supply the .toStateJson() as input (you'd need to adjust for any parameters you want to add/remove/modify)

We're still discussing options, but are not convinced that the stop-gap needs to be introduced directly into the task or folded into the EcsEc2LaunchTarget / EcsFargateLaunchTarget classes yet.

launchtarget as a whole should likely be something we can omit. Can you help me better understand the EC2 / Fargate specific parameters that would need to be supplied (with the omission of launch target). I don't mind the constant idea but will want to better understand what the ASL looks like for these use cases.

@mb-dev
Copy link
Author

mb-dev commented Jul 2, 2020

Ooo, i totally missed the escape hatch in the changelog. I will try that tomorrow and maybe that will be enough until full capacity providers support is added.

My pre-refactor EC2 integration is straightforward (in Python):

sft.RunEcsEc2Task(
            cluster=cluster,
            task_definition=task_definition,
            integration_pattern=sfn.ServiceIntegrationPattern.SYNC,
            container_overrides={...},
        )

So I personally won't need launchTarget, just not launchType.

@shivlaks
Copy link
Contributor

shivlaks commented Jul 7, 2020

@mb-dev let me know if I can help with checking out the implementation of the custom state or if it would help for me to whip up an example!

@shivlaks shivlaks added the p2 label Aug 21, 2020
@PierreKiwi
Copy link

Hello!
I have been using the CustomState to avoid this problem but now I am facing the problem of losing ResultPath during the "translation" (cf. this issue #8754).

Not really sure there is an easy way to avoid the problem (I can wrap my step in a parallel step) but annoying...

@trobert2
Copy link

is there any progress here? what does the roadmap look like?
One year later, I still can't get the task to run on the correct capacity provider.

@ferrarijefferson
Copy link

ferrarijefferson commented May 29, 2023

is there any progress here? what does the roadmap look like? One year later, I still can't get the task to run on the correct capacity provider.

You can use FARGATE_SPOT with Step Functions, as shown in my tests. You just need to omit the LaunchType field in your state machine definition.

image

Remove the field "LaunchType": "FARGATE" from the parameters.

{
  "Comment": "State machine integrated with ECS",
  "StartAt": "Initial",
  "States": {
    "Initial": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "Parameters": {
        "Cluster": "${ECS_CLUSTER_ARN}",
        "PlatformVersion": "LATEST",
        "TaskDefinition": "${ECS_TASK_DEFINITION_ARN}",
        "NetworkConfiguration": {
          "AwsvpcConfiguration": {
            "Subnets": ${SUBNETS},
            "AssignPublicIp": "ENABLED"
          }
        },
        "Overrides": {
          "ContainerOverrides": ${ECS_CONTAINER_OVERRIDES}
        }
      },
      "TimeoutSeconds": 3600,
      "End": true
    }
  }
}

@kevinbader
Copy link

Just bumped into this as well. My use case is spawning long-running tasks on (expensive) GPU instances that are scaled to zero most of the time. When the StepFunction workflow is at EcsRunTask, the expectation is that the cluster spins up the GPU instance to do the work. But that only works if the default capacity provider is used; currently, the execution always fails with

No Container Instances were found in your cluster. (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException; Request ID: 5652eb66-58cf-4720-a4d5-40bedf7ddf22; Proxy: null)

@danw-mpl
Copy link

I created a class which implements IEcsLaunchTarget so this is a drop-in workaround:

export class EcsFargateSpotLaunchTarget implements sfnTasks.IEcsLaunchTarget {
    /**
     * Launch the ECS task using Fargate Spot.
     * 
     * This functionality is not built into CDK yet.
     */
    constructor(private readonly cluster: ecs.Cluster, private readonly options?: sfnTasks.EcsFargateLaunchTargetOptions) { }

    /**
     * Called when the Fargate launch type configured on RunTask
     */
    public bind(_task: sfnTasks.EcsRunTask, launchTargetOptions: sfnTasks.LaunchTargetBindOptions): sfnTasks.EcsLaunchTargetConfig {
        this.cluster.enableFargateCapacityProviders();

        if (!launchTargetOptions.taskDefinition.isFargateCompatible) {
            throw new Error('Supplied TaskDefinition is not compatible with Fargate');
        }

        return {
            parameters: {
                PlatformVersion: this.options?.platformVersion,
                "CapacityProviderStrategy": [
                    {
                        "CapacityProvider": "FARGATE_SPOT",
                        "Weight": 1
                    }
                ],
            },
        };
    }
}
const task = new sfnTasks.EcsRunTask(this, 'RunTask', {
    ...
    launchTarget: new EcsFargateSpotLaunchTarget(ecsCluster)
});

@Muppets
Copy link

Muppets commented Nov 24, 2023

I like @danw-mpl solution. I went for a slightly different approach and removed the LaunchType property entirely, allowing the ECS task to take the default capacity provider settings from the cluster.

C# example:

    public class EcsRunTaskWithoutLaunchType : EcsRunTask
    {
        public EcsRunTaskWithoutLaunchType(Construct scope, string id, IEcsRunTaskProps props) : base(scope, id, props)
        {
        }

        protected EcsRunTaskWithoutLaunchType(ByRefValue reference) : base(reference)
        {
        }

        protected EcsRunTaskWithoutLaunchType(DeputyProps props) : base(props)
        {
        }

        public override JObject ToStateJson()
        {
            var stateJson = base.ToStateJson();

            ((JObject)stateJson["Parameters"]!).Remove("LaunchType");

            return stateJson;
        }
    }

@danw-mpl
Copy link

@Muppets I tried that initially and ran into an error from Step Functions - something like 'no container instances found in your cluster'.

I'd assume that's a problem with my cluster config though. Even though I had fargate capacity providers enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-stepfunctions-tasks effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

No branches or pull requests