Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(eks): missing support for "InstanceTypes" attribute assignment for AL2023 AMIs #29505

Merged
merged 2 commits into from Mar 26, 2024

Conversation

guessi
Copy link
Contributor

@guessi guessi commented Mar 15, 2024

Issue # (if applicable)

Closes #29546

Reason for this change

After #29335, @aws-eks should receive AL2023 support, despites the GPU types are not yet supported it should at least allow user to customize instance type. However, missing support for NodegroupAmiType[] causing validation error emit, so that user can only create default instance types ( t3.medium or t4g.medium ).

$ cat ./lib/cluster.ts
...
    cluster.addNodegroupCapacity("mng-al2023", {
      amiType: eks.NodegroupAmiType.AL2023_X86_64_STANDARD,
      instanceTypes: [new ec2.InstanceType("t3.medium")],
      ...
    });
...
$ npx cdk version
2.133.0 (build dcc1e75)
$  npx cdk synth

...
                                                                                                                                                                                ^
Error: The specified AMI does not match the instance types architecture, either specify one of AL2_X86_64, BOTTLEROCKET_X86_64, WINDOWS_CORE_2019_X86_64, WINDOWS_CORE_2022_X86_64, WINDOWS_FULL_2019_X86_64, WINDOWS_FULL_2022_X86_64 or don't specify any
    at new Nodegroup (.../node_modules/aws-cdk-lib/aws-eks/lib/managed-nodegroup.js:1:3921)
    at Cluster.addNodegroupCapacity (.../node_modules/aws-cdk-lib/aws-eks/lib/cluster.js:1:19807)
    at EksCluster.createManagedNodeGroups (.../lib/cluster.ts:85:13)
    at new EksCluster (.../lib/cluster.ts:30:10)
    at Object.<anonymous> (.../bin/eks-basic.ts:10:1)
    at Module._compile (node:internal/modules/cjs/loader:1376:14)
    at Module.m._compile (.../node_modules/ts-node/src/index.ts:1618:23)
    at Module._extensions..js (node:internal/modules/cjs/loader:1435:10)
    at Object.require.extensions.<computed> [as .ts] (.../node_modules/ts-node/src/index.ts:1621:12)
    at Module.load (node:internal/modules/cjs/loader:1207:32)

Subprocess exited with error 1

Description of changes

Add eks.NodegroupAmiType.AL2023_X86_64_STANDARD and eks.NodegroupAmiType.AL2023_ARM_64_STANDARD support for node group module.

Description of how you validated changes

$ npx jest aws-eks/test/cluster.test.ts
 PASS  aws-eks/test/cluster.test.ts (37.298 s)
...

=============================== Coverage summary ===============================
Statements   : 48.86% ( 10077/20621 )
Branches     : 27.88% ( 2388/8565 )
Functions    : 33.03% ( 1509/4568 )
Lines        : 49.71% ( 9905/19923 )
================================================================================
Jest: "global" coverage threshold for statements (55%) not met: 48.86%
Jest: "global" coverage threshold for branches (35%) not met: 27.88%
Test Suites: 1 passed, 1 total
Tests:       119 passed, 119 total
Snapshots:   0 total
Time:        41.064 s
$ npx jest aws-eks/test/nodegroup.test.ts
 RUNS  aws-eks/test/nodegroup.test.ts
...
=============================== Coverage summary ===============================
Statements   : 43.94% ( 9062/20621 )
Branches     : 22.08% ( 1892/8565 )
Functions    : 27.16% ( 1241/4568 )
Lines        : 44.8% ( 8927/19923 )
================================================================================
Jest: "global" coverage threshold for statements (55%) not met: 43.94%
Jest: "global" coverage threshold for branches (35%) not met: 22.08%
Test Suites: 1 passed, 1 total
Tests:       59 passed, 59 total
Snapshots:   0 total
Time:        23.334 s

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added the repeat-contributor [Pilot] contributed between 3-5 PRs to the CDK label Mar 15, 2024
@aws-cdk-automation aws-cdk-automation requested a review from a team March 15, 2024 10:12
@github-actions github-actions bot added the p2 label Mar 15, 2024
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@guessi guessi force-pushed the missing-instance-types-support-al2023 branch from aee3332 to 81ab615 Compare March 15, 2024 14:12
const arm64AmiTypes: NodegroupAmiType[] = [NodegroupAmiType.AL2_ARM_64, NodegroupAmiType.BOTTLEROCKET_ARM_64];
const x8664AmiTypes: NodegroupAmiType[] = [NodegroupAmiType.AL2_X86_64, NodegroupAmiType.BOTTLEROCKET_X86_64,
const arm64AmiTypes: NodegroupAmiType[] = [NodegroupAmiType.AL2_ARM_64, NodegroupAmiType.AL2023_ARM_64_STANDARD, NodegroupAmiType.BOTTLEROCKET_ARM_64];
const x8664AmiTypes: NodegroupAmiType[] = [NodegroupAmiType.AL2_X86_64, NodegroupAmiType.AL2023_X86_64_STANDARD, NodegroupAmiType.BOTTLEROCKET_X86_64,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the key to the issue, missing definitions for AmiType: AL2023_[ARM_64|X86_64]_STANDARD.

@guessi guessi force-pushed the missing-instance-types-support-al2023 branch 3 times, most recently from d87fba6 to 80f6bbc Compare March 15, 2024 15:37
@aws-cdk-automation aws-cdk-automation dismissed their stale review March 15, 2024 15:39

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@guessi guessi force-pushed the missing-instance-types-support-al2023 branch from 80f6bbc to a9dd543 Compare March 15, 2024 15:43
@guessi guessi force-pushed the missing-instance-types-support-al2023 branch from a9dd543 to a181afc Compare March 21, 2024 14:42
@guessi guessi mentioned this pull request Mar 21, 2024
1 task
@github-actions github-actions bot added bug This issue is a bug. effort/medium Medium work item – several days of effort p1 and removed p2 labels Mar 21, 2024
@pahud
Copy link
Contributor

pahud commented Mar 21, 2024

Thank you. This PR LGTM. Are you able to update the integ.eks-cluster.ts?

And you'll need to fix the CI before we can move this PR forward.

@guessi guessi force-pushed the missing-instance-types-support-al2023 branch from a181afc to c48d46b Compare March 22, 2024 03:38
@guessi guessi requested a review from pahud March 22, 2024 03:39
@guessi guessi force-pushed the missing-instance-types-support-al2023 branch from c48d46b to 811afb5 Compare March 22, 2024 03:59
@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Mar 22, 2024
Comment on lines +664 to +672
// THEN
expect(() => cluster.addNodegroupCapacity('ng', {
amiType: NodegroupAmiType.AL2023_X86_64_STANDARD,
instanceTypes: [
new ec2.InstanceType('c6g.large'),
new ec2.InstanceType('t4g.large'),
],
})).toThrow(/The specified AMI does not match the instance types architecture, either specify one of AL2_ARM_64, AL2023_ARM_64_STANDARD, BOTTLEROCKET_ARM_64 or don't specify any/);
});
Copy link
Contributor Author

@guessi guessi Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure MNG with LT could capture invalid instance type exception.

Comment on lines +686 to +694
// THEN
expect(() => cluster.addNodegroupCapacity('ng', {
amiType: NodegroupAmiType.AL2023_ARM_64_STANDARD,
instanceTypes: [
new ec2.InstanceType('m5.large'),
new ec2.InstanceType('c5.large'),
],
})).toThrow(/The specified AMI does not match the instance types architecture, either specify one of AL2_X86_64, AL2023_X86_64_STANDARD, BOTTLEROCKET_X86_64, WINDOWS_CORE_2019_X86_64, WINDOWS_CORE_2022_X86_64, WINDOWS_FULL_2019_X86_64, WINDOWS_FULL_2022_X86_64 or don't specify any/);
});
Copy link
Contributor Author

@guessi guessi Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure MNG with LT could capture invalid instance type exception.

Ref: 'VPCPrivateSubnet2SubnetCFCDAA7A',
},
],
AmiType: 'AL2023_x86_64_STANDARD',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure MNG could be created with no instance type specified.

];
const x8664AmiTypes: NodegroupAmiType[] = [
NodegroupAmiType.AL2_X86_64,
NodegroupAmiType.AL2023_X86_64_STANDARD,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the core to the issue, missing definition for AL2023.

const versionMap: { [key: string]: any } = {
1.24: KubectlV24Layer,
1.29: KubectlV29Layer,
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure it could compatible with existing integ tests.

@guessi
Copy link
Contributor Author

guessi commented Mar 22, 2024

@pahud please have a look while you are free, thanks!

Copy link
Contributor

@pahud pahud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We'll need another review and approval from the maintainers.

@GavinZZ GavinZZ self-assigned this Mar 22, 2024
Copy link
Contributor

@GavinZZ GavinZZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add an example to README.md with AL2023_ARM_64_STANDARD usage example?

@@ -187,7 +187,6 @@ cluster.addNodegroupCapacity('custom-node-group', {
instanceTypes: [new ec2.InstanceType('m5.large')],
minSize: 4,
diskSize: 100,
amiType: eks.NodegroupAmiType.AL2_X86_64_GPU,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for why removing this line

  • The section here should focus on addNodegroupCapacity but not amiType.
  • Instance type here m5.large does not support GPU which is misleading.

@guessi guessi force-pushed the missing-instance-types-support-al2023 branch from 18de6ee to 5d74fdb Compare March 23, 2024 07:20
instanceTypes: [new ec2.InstanceType('m6g.medium')], // NOTE: if amiType is ARM-based image, the instance types here must be ARM-based.
amiType: eks.NodegroupAmiType.AL2023_ARM_64_STANDARD,
});
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: should we address AL2023 here? or maybe keep it as default AL2_ARM_64.

// X86_64 based AMI managed node group
cluster.addNodegroupCapacity('custom-node-group', {
instanceTypes: [new ec2.InstanceType('m5.large')], // NOTE: if amiType is x86_64-based image, the instance types here must be x86_64-based.
amiType: eks.NodegroupAmiType.AL2023_X86_64_STANDARD,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: should we address AL2023 here? or maybe keep it as default AL2_X86_64.

@guessi guessi force-pushed the missing-instance-types-support-al2023 branch from 5d74fdb to 0bee78e Compare March 23, 2024 08:04
@guessi guessi requested review from GavinZZ and pahud March 23, 2024 08:11
@guessi
Copy link
Contributor Author

guessi commented Mar 23, 2024

@GavinZZ I've updated the README.md as separated commit, please review once again, thanks!

As discussed with @pahud offline, I'm here leaving change for ARM64 Support section untouched as making change to addAutoScalingGroupCapacity() would need to make change to the enum of MachineImageType and the would trigger integration tests for cluster.ts, which is not we are going to see at this PR.

I can see there are many if... else... blocks for MachineImageType detection logic for bootstraping

I think AL2023 support for addAutoScalingGroupCapacity() would better to be done in separated PR, at this PR here we should focus on MNG first... WDYT?

GavinZZ
GavinZZ previously approved these changes Mar 25, 2024
Copy link
Contributor

mergify bot commented Mar 25, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@GavinZZ
Copy link
Contributor

GavinZZ commented Mar 25, 2024

@guessi can you please fix the merge conflict and this PR should be good to go.

@guessi guessi force-pushed the missing-instance-types-support-al2023 branch from 0bee78e to 86b3169 Compare March 26, 2024 00:17
@mergify mergify bot dismissed GavinZZ’s stale review March 26, 2024 00:18

Pull request has been modified.

@guessi
Copy link
Contributor Author

guessi commented Mar 26, 2024

@GavinZZ Just rebased, should have no conflict now 👍

@guessi guessi requested a review from GavinZZ March 26, 2024 00:24
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 86b3169
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

mergify bot commented Mar 26, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit e77ce26 into aws:main Mar 26, 2024
12 checks passed
@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Mar 26, 2024
@guessi guessi deleted the missing-instance-types-support-al2023 branch April 2, 2024 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. effort/medium Medium work item – several days of effort p1 repeat-contributor [Pilot] contributed between 3-5 PRs to the CDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eks: Amazon Linux 2023 is not fully supported in nodegroup
4 participants