Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_eks: error retrieving RESTMappings to prune #23376

Closed
charlesakalugwu opened this issue Dec 16, 2022 · 21 comments
Closed

aws_eks: error retrieving RESTMappings to prune #23376

charlesakalugwu opened this issue Dec 16, 2022 · 21 comments
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. documentation This is a problem with documentation. p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@charlesakalugwu
Copy link

charlesakalugwu commented Dec 16, 2022

Describe the bug

After using aws_eks.Cluster.add_manifest to apply kubernetes objects on my cluster the first time, subsequent attempts to update my application or any other manifest results in the following such error:

[INFO]	2022-12-19T18:03:17.565Z	4adf551c-1846-45a6-82bf-d16d68c20512	Running command: ['kubectl', 'apply', '--kubeconfig', '/tmp/kubeconfig', '-f', '/tmp/manifest.yaml', '--prune', '-l', 'aws.cdk.eks/prune-c890e450b1abcaee1ebedde5645f1196f5ae447ab8']
[ERROR] Exception: b'ingress.networking.k8s.io/gitlab configured\nerror: error retrieving RESTMappings to prune: invalid resource extensions/v1beta1, Kind=Ingress, Namespaced=true: no matches for kind "Ingress" in version "extensions/v1beta1"\n'
Traceback (most recent call last):
  File "/var/task/index.py", line 14, in handler
    return apply_handler(event, context)
  File "/var/task/apply/__init__.py", line 69, in apply_handler
    kubectl('apply', manifest_file, *kubectl_opts)
  File "/var/task/apply/__init__.py", line 91, in kubectl
    raise Exception(output)

I do not have any Ingress resources from the extensions/v1beta1 group in my cluster but I do have one from the networking.k8s.io group.


% kubectl get ingress --all-namespaces
NAMESPACE   NAME     CLASS    HOSTS                                          ADDRESS                                                           PORTS   AGE
gitlab      gitlab   <none>   gitlab.cdk-eks-fargate.cakalu.people.aws.dev   k8s-gitlab-xxx.elb.amazonaws.com   80      49m

% kubectl get ingress.networking.k8s.io -n gitlab
NAME     CLASS    HOSTS                                          ADDRESS                                                           PORTS   AGE
gitlab   <none>   gitlab.cdk-eks-fargate.example.com   k8s-gitlab-xxx.elb.amazonaws.com   80      49m

The cluster has been created with the default prune=True since I left out specification of the field

kubernetes_cluster = eks.Cluster(
            self,
            id=f"{prefix}-cluster",
            version=version,
            vpc=vpc,
            vpc_subnets=[
                ec2.SubnetSelection(
                    subnet_group_name="private-subnet",
                ),
            ],
            cluster_logging=[
                eks.ClusterLoggingTypes.AUDIT,
            ],
            default_capacity=0,
            endpoint_access=eks.EndpointAccess.PUBLIC_AND_PRIVATE,
            kubectl_layer=kubectl_v24.KubectlV24Layer(self, id=f"{prefix}-kubectl"),
            masters_role=masters_role,
            output_masters_role_arn=False,
            place_cluster_handler_in_vpc=True,
            secrets_encryption_key=kms_key_data,
            output_cluster_name=False,
            output_config_command=False,
            tags=tags,
        )

As you can see, I am using the Kubectl_V24 layer which supposedly has the correct version of kubectl to match the cluster version i'm working on which is 1.24.

I have seen this issue on Fargate EKS 1.22, 1.23 and 1.24

Related Issues

#19843
#15736
#15072

Expected Behavior

I should be able to continuously update my application without it failing

Current Behavior

The update of any part of the application always fails with the aforementioned error

Reproduction Steps

Create a new EKS Fargate 1.22+ cluster
Use the aws_eks.Cluster.add_manifest method to apply a manifest e.g. a gitlab deployment
Run cdk deploy
Update the gitlab deployment for example by updating the image tag, adding an env variable, changing an env variable value etc.
Run cdk deploy
Error occurs

Possible Solution

N/A

Additional Information/Context

No response

CDK CLI Version

2.55.0

Framework Version

No response

Node.js Version

18.10.0

OS

Ubuntu 22.04

Language

Python

Language Version

3.9.14

Other information

No response

@charlesakalugwu charlesakalugwu added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 16, 2022
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Dec 16, 2022
@peterwoodworth
Copy link
Contributor

Can you share exactly what you're changing in your app between the first and second deployment, as well as sharing the cdk diff output between the first and second deployment?

@peterwoodworth peterwoodworth added p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Dec 16, 2022
@github-actions
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Dec 19, 2022
@charlesakalugwu
Copy link
Author

@peterwoodworth The change can be as trivial as updating the image tag for the deployment pod, updating environment variable values on the deployment, changing the ingress. Any change at all will cause this error to happen.

For what its worth, this is definitely a CDK issue because I have tried with kubectl apply directly and I do not see this error.

@github-actions github-actions bot removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Dec 20, 2022
@charlesakalugwu
Copy link
Author

@peterwoodworth I discovered that none of the kubectl layers being distributed by the CDK team contains any kubectl greater than 1.20.

Until that's fixed, we are building our own custom kubectl layer containing kubectl 1.24.8. We'll also be able to independently evolve the kubectl version we use as we upgrade the EKS versions in the future. Sounds alright to me.

@zorrofox
Copy link
Contributor

zorrofox commented Jan 28, 2023

I don't have any Ingress in my cluster, but I also have the same issue.

(.venv) xxx@xxxx cdk2-eks % cdk diff
Stack Cdk2EksStack
Resources
[~] Custom::AWSCDK-EKS-KubernetesResource hello-eks/manifest-ADOT-ClusterRoleBinding/Resource helloeksmanifestADOTClusterRoleBindingB50C231E 
 └─ [~] Manifest
     ├─ [-] [{"kind":"ClusterRoleBinding","apiVersion":"rbac.authorization.k8s.io/v1","metadata":{"name":"aoc-agent-role-binding","namespace":"amazon-metrics","labels":{"aws.cdk.eks/prune-c80bb8ca89a214719f1c394396bfe48688e41b066e":""}},"subjects":[{"kind":"ServiceAccount","name":"aws-otel-sa","namespace":"aws-otel-eks"}],"roleRef":{"kind":"ClusterRole","name":"aoc-agent-role","apiGroup":"rbac.authorization.k8s.io"}}]
     └─ [+] [{"kind":"ClusterRoleBinding","apiVersion":"rbac.authorization.k8s.io/v1","metadata":{"name":"aoc-agent-role-binding","namespace":"amazon-metrics","labels":{"aws.cdk.eks/prune-c80bb8ca89a214719f1c394396bfe48688e41b066e":""}},"subjects":[{"kind":"ServiceAccount","name":"aws-otel-sa","namespace":"amazon-metrics"}],"roleRef":{"kind":"ClusterRole","name":"aoc-agent-role","apiGroup":"rbac.authorization.k8s.io"}}]

Error message:

Cdk2EksStack: creating CloudFormation changeset...
11:34:10 PM | UPDATE_FAILED        | Custom::AWSCDK-EKS-KubernetesResource | helloeksmanifestAD...oleBindingB50C231E
Received response status [FAILED] from custom resource. Message returned: Error: b'clusterrolebinding.rbac.authorization.k8s.io/a
oc-agent-role-binding configured\nerror: error retrieving RESTMappings to prune: invalid resource extensions/v1beta1, Kind=Ingres
s, Namespaced=true: no matches for kind "Ingress" in version "extensions/v1beta1"\n'

Logs: /aws/lambda/Cdk2EksStack-awscdkawseksKubectlPr-Handler886CB40B-cB9kptoidq5f

at invokeUserFunction (/var/task/framework.js:2:6)
at processTicksAndRejections (internal/process/task_queues.js:95:5)
at async onEvent (/var/task/framework.js:1:365)
at async Runtime.handler (/var/task/cfn-response.js:1:1543) (RequestId: 70cd68fa-720f-4a21-9112-de3cdd50d6c3)

11:34:29 PM | UPDATE_FAILED        | Custom::AWSCDK-EKS-KubernetesResource | helloeksmanifestAD...oleBindingB50C231E
Received response status [FAILED] from custom resource. Message returned: Error: b'clusterrolebinding.rbac.authorization.k8s.io/a
oc-agent-role-binding configured\nerror: error retrieving RESTMappings to prune: invalid resource extensions/v1beta1, Kind=Ingres
s, Namespaced=true: no matches for kind "Ingress" in version "extensions/v1beta1"\n'

Logs: /aws/lambda/Cdk2EksStack-awscdkawseksKubectlPr-Handler886CB40B-cB9kptoidq5f

at invokeUserFunction (/var/task/framework.js:2:6)
at processTicksAndRejections (internal/process/task_queues.js:95:5)
at async onEvent (/var/task/framework.js:1:365)
at async Runtime.handler (/var/task/cfn-response.js:1:1543) (RequestId: 473c8b09-c148-47cc-84b5-cd2f4e974b11)


 ❌  Cdk2EksStack failed: Error: The stack named Cdk2EksStack failed to deploy: UPDATE_ROLLBACK_FAILED (The following resource(s) failed to update: [helloeksmanifestADOTClusterRoleBindingB50C231E]. ): Received response status [FAILED] from custom resource. Message returned: Error: b'clusterrolebinding.rbac.authorization.k8s.io/aoc-agent-role-binding configured\nerror: error retrieving RESTMappings to prune: invalid resource extensions/v1beta1, Kind=Ingress, Namespaced=true: no matches for kind "Ingress" in version "extensions/v1beta1"\n'

Logs: /aws/lambda/Cdk2EksStack-awscdkawseksKubectlPr-Handler886CB40B-cB9kptoidq5f

    at invokeUserFunction (/var/task/framework.js:2:6)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async onEvent (/var/task/framework.js:1:365)
    at async Runtime.handler (/var/task/cfn-response.js:1:1543) (RequestId: 70cd68fa-720f-4a21-9112-de3cdd50d6c3)
    at FullCloudFormationDeployment.monitorDeployment (/Users/huadebin/node_modules/aws-cdk/lib/api/deploy-stack.ts:505:13)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at deployStack2 (/Users/huadebin/node_modules/aws-cdk/lib/cdk-toolkit.ts:265:24)
    at /Users/huadebin/node_modules/aws-cdk/lib/deploy.ts:39:11
    at run (/Users/huadebin/node_modules/p-queue/dist/index.js:163:29)

 ❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named Cdk2EksStack failed to deploy: UPDATE_ROLLBACK_FAILED (The following resource(s) failed to update: [helloeksmanifestADOTClusterRoleBindingB50C231E]. ): Received response status [FAILED] from custom resource. Message returned: Error: b'clusterrolebinding.rbac.authorization.k8s.io/aoc-agent-role-binding configured\nerror: error retrieving RESTMappings to prune: invalid resource extensions/v1beta1, Kind=Ingress, Namespaced=true: no matches for kind "Ingress" in version "extensions/v1beta1"\n'

Logs: /aws/lambda/Cdk2EksStack-awscdkawseksKubectlPr-Handler886CB40B-cB9kptoidq5f

    at invokeUserFunction (/var/task/framework.js:2:6)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async onEvent (/var/task/framework.js:1:365)
    at async Runtime.handler (/var/task/cfn-response.js:1:1543) (RequestId: 70cd68fa-720f-4a21-9112-de3cdd50d6c3)
    at deployStacks (/Users/huadebin/node_modules/aws-cdk/lib/deploy.ts:61:11)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at CdkToolkit.deploy (/Users/huadebin/node_modules/aws-cdk/lib/cdk-toolkit.ts:339:7)
    at exec4 (/Users/huadebin/node_modules/aws-cdk/lib/cli.ts:384:12)

Stack Deployments Failed: Error: The stack named Cdk2EksStack failed to deploy: UPDATE_ROLLBACK_FAILED (The following resource(s) failed to update: [helloeksmanifestADOTClusterRoleBindingB50C231E]. ): Received response status [FAILED] from custom resource. Message returned: Error: b'clusterrolebinding.rbac.authorization.k8s.io/aoc-agent-role-binding configured\nerror: error retrieving RESTMappings to prune: invalid resource extensions/v1beta1, Kind=Ingress, Namespaced=true: no matches for kind "Ingress" in version "extensions/v1beta1"\n'

Logs: /aws/lambda/Cdk2EksStack-awscdkawseksKubectlPr-Handler886CB40B-cB9kptoidq5f

    at invokeUserFunction (/var/task/framework.js:2:6)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async onEvent (/var/task/framework.js:1:365)
    at async Runtime.handler (/var/task/cfn-response.js:1:1543) (RequestId: 70cd68fa-720f-4a21-9112-de3cdd50d6c3)

@AlyIbrahim
Copy link

The main problem happened with this PR: #22677 last November.

Trying to externalize lambda assets to reduce the size end up with hardcoding the kubectl layer to version 20
as in this file:

packages/@aws-cdk/lambda-layer-kubectl/lib/kubectl-layer.ts
import { ASSET_FILE, LAYER_SOURCE_DIR } from '@aws-cdk/**asset-kubectl-v20**';

And of course the related package.json dependency.

Before this PR, kubectl version was parameterized with ARG KUBECTL_VERSION=1.22.0 in layer/Dockerfile

We need to find a more flexible solution that depends on the Kubernetes version installed or let it be a parameter.

@AlyIbrahim
Copy link

OK, Digging deeper I found the solution !!!!

First you need to import the version of kubectl that matches you current cluster to the CDK project dependencies.
If you are using Typescript Project and your Kubernetes Version is 24 you will need to run the following command in your project directory:
npm install -s @aws-cdk/lambda-layer-kubectl-v24
You can use other versions though the max I found was 24

Now in your stack you need to import the Kubectl layer as:
import { KubectlV24Layer } from '@aws-cdk/lambda-layer-kubectl-v24';

And while creating the cluster you need to specify the Kubectl layer version as a one of the Clusterprops
kubectlLayer: new KubectlV24Layer(this, 'Kubectlv24Layer')

As a complete example:

cons myCluster = new eks.Cluster(this, 'my-cluster'.{
clusterName: 'my-cluster',
version: eks.KubernetesVersion.V1_24,
kubectlLayer: new KubectlV24Layer(this, 'Kubectlv24Layer'),
vpc: myVPC,
...
...
})

I tested this solution and it is working fine.

Documentation should be clear and change this prop to a mandatory prop as it can cause breaking problems.
Also it's not clear if AWS will provide versions beyond 24 or will this be left to the community.

@peterwoodworth
Copy link
Contributor

Thank you so much for your investigation @AlyIbrahim, how would you specifically suggest improving the documentation here?

@AlyIbrahim
Copy link

@peterwoodworth Thanks for your response ..

I suggest the following:

  • Add kubectlLayer prop to the main example with the required import and a description on how to add the required layer npm install -s @aws-cdk/lambda-layer-kubectl-v24
  • Provide a link to the available lambda-layer-kubectl-vXX packages
  • Add a warning for the default behavior if KubeLayer is not specified
  • Add a troubleshooting section for this kind of error

I see that you had part of this already with the parameter but it's down below and the regular user will not realize that this prop is required above v20. so unless the user read all doc for all the props it's not easy to understand this breaking change ..

Another option is to make the option mandatory to force users to know about the prop, but proper documentation can avoid this.

@AlyIbrahim
Copy link

@peterwoodworth More importantly are you planning to maintain lambda-layer-kubectl package for versions above v24?

@peterwoodworth
Copy link
Contributor

Thanks for the suggestions @AlyIbrahim, are you interested in creating a PR with your suggested changes? I think we'd want to keep the prop optional and just stick with just documentation adjustments.

are you planning to maintain lambda-layer-kubectl package for versions above v24?

I'm not sure - @pahud do you know what the plan for this is? Also @pahud, could you see about creating a PR for this if @AlyIbrahim isn't able to contribute?

@peterwoodworth peterwoodworth added the documentation This is a problem with documentation. label Feb 2, 2023
@AlyIbrahim
Copy link

@peterwoodworth
Yeah I can create a PR. can you share the link to the docs?

@pahud
Copy link
Contributor

pahud commented Feb 6, 2023

@peterwoodworth Yes, lambda-layer-kubectl is maintained by the cdk core team at https://github.com/cdklabs/awscdk-asset-kubectl and it upgrades the v24 on the daily basis with github actions https://github.com/cdklabs/awscdk-asset-kubectl/actions/workflows/upgrade-kubectl-v24-main.yml

Agree to have a PR to elaborate more about the info and usage.

@AlyIbrahim I believe you can find the required info at https://github.com/cdklabs/awscdk-asset-kubectl. Is this something you are looking for?

@AlyIbrahim
Copy link

@pahud I am looking for newer versions 25, 26, etc once they are available.

For the PR, I couldn't find the API Reference documentation in this repository, I may have missed it. Can you point me to it so I can file the PR?

@peterwoodworth
Copy link
Contributor

@AlyIbrahim our API reference documentation is automatically generated from our codebase. To update a module's overview tab, you'll want to edit the module's README.

If you want to change documentation particular to one of the constructs, you'd want to edit the comments above whichever construct/property you want to change. See here, the comments in the code match up with the descriptions on the API ref

/**
* A Cluster represents a managed Kubernetes Service (EKS)
*
* This is a fully managed cluster of API Servers (control-plane)
* The user is still required to create the worker nodes.
*/
export class Cluster extends ClusterBase {

@srinivasreddych
Copy link

Hello Team. We are impacted as well with kubectl being bundled with old version into the lambda assets. For now, i am importing the kubectl v24 lambda layer and declaring it during the eks cluster creation

@YikaiHu
Copy link

YikaiHu commented Feb 9, 2023

OK, Digging deeper I found the solution !!!!

First you need to import the version of kubectl that matches you current cluster to the CDK project dependencies. If you are using Typescript Project and your Kubernetes Version is 24 you will need to run the following command in your project directory: npm install -s @aws-cdk/lambda-layer-kubectl-v24 You can use other versions though the max I found was 24

Now in your stack you need to import the Kubectl layer as: import { KubectlV24Layer } from '@aws-cdk/lambda-layer-kubectl-v24';

And while creating the cluster you need to specify the Kubectl layer version as a one of the Clusterprops kubectlLayer: new KubectlV24Layer(this, 'Kubectlv24Layer')

As a complete example:

cons myCluster = new eks.Cluster(this, 'my-cluster'.{
clusterName: 'my-cluster',
version: eks.KubernetesVersion.V1_24,
kubectlLayer: new KubectlV24Layer(this, 'Kubectlv24Layer'),
vpc: myVPC,
...
...
})

I tested this solution and it is working fine.

Documentation should be clear and change this prop to a mandatory prop as it can cause breaking problems. Also it's not clear if AWS will provide versions beyond 24 or will this be left to the community.

Thanks @AlyIbrahim , this is really very helpful.

@pahud
Copy link
Contributor

pahud commented Feb 23, 2023

@AlyIbrahim

We have a feature request for aws-eks 1.25 support now - #24282

At the same time, if you need the kubectl layer 1.25 assets, please give an upvote on cdklabs/awscdk-asset-kubectl#166

@pahud
Copy link
Contributor

pahud commented Feb 23, 2023

So I assume this issue is due to lack of the explicit kubectlLayer property. I am setting an auto close on this if no further response.

@pahud pahud added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Feb 23, 2023
@p5k6
Copy link

p5k6 commented Feb 23, 2023

For those using python cdk - I was able to get this running with the following:

npm install -s @aws-cdk/lambda-layer-kubectl-v24
pip install aws-cdk.lambda-layer-kubectl-v24
from aws_cdk.lambda_layer_kubectl_v24 import KubectlV24Layer

cluster = aws_eks.Cluster(self, 'cluster',
                          masters_role=self._eks_admin_role,
                          vpc=self._host_vpc,  # private_subnet_ids
                          vpc_subnets=[ec2.SubnetSelection(subnet_filters=[ec2.SubnetFilter.by_ids(private_subnet_ids)])],
                          default_capacity=0,
                          version=aws_eks.KubernetesVersion.V1_24,
                          output_cluster_name=True,
                          output_masters_role_arn=True,
                          role=self._eks_admin_role,
                          kubectl_layer=KubectlV24Layer(self, 'KubectlV24Layer'),
                          )

Not that much different than the typescript version, but took a bit of digging to figure out some of the namespacing as it isn't explicitly listed in the python package. Hopefully this helps someone else out. Thanks for the original solution @AlyIbrahim!

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Feb 24, 2023
@pahud pahud added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed bug This issue is a bug. labels Mar 1, 2023
@github-actions
Copy link

github-actions bot commented Mar 1, 2023

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Mar 1, 2023
@github-actions github-actions bot closed this as completed Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. documentation This is a problem with documentation. p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

9 participants