Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lambda): allow to delete log group of a lambda function when it is removed #21820

Closed
wants to merge 20 commits into from

Conversation

tmokmss
Copy link
Contributor

@tmokmss tmokmss commented Aug 30, 2022

No more orphaned log groups after your trial and error!

image

closes #21804

Now we can set autoDeleteLogGroup: true to delete an associated Lambda log automatically.

new lambda.Function(stack, 'AutoDeleteLogGroup', {
  code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
  handler: 'index.handler',
  runtime: lambda.Runtime.NODEJS_14_X,
  autoDeleteLogGroup: true,
});

Consideration on property removal

If a user remove autoDeleteLogGroup property as below, the logRetention resource will be removed from a synthesized template, and it will result in the removal of log group itself. This implicit behavior should be clearly warned in the doc.

new lambda.Function(stack, 'AutoDeleteLogGroup', {
  code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
  handler: 'index.handler',
  runtime: lambda.Runtime.NODEJS_14_X,
-  autoDeleteLogGroup: true,
});

There was an issue regarding s3 Bucket's autoDeleteObjects feature: #16603. We might need to implement a similar approach. It might be difficult since we are not directly managing LogGroup in CFn though.


All Submissions:

Adding new Unconventional Dependencies:

  • This PR adds new unconventional dependencies following the process described here

New Features

  • Have you added the new feature to an integration test?
    • Did you use yarn integ to deploy the infrastructure and generate the snapshot (i.e. yarn integ without --dry-run)?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Aug 30, 2022

@github-actions github-actions bot added effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2 labels Aug 30, 2022
@aws-cdk-automation aws-cdk-automation requested a review from a team August 30, 2022 04:28
Copy link
Contributor

@TheRealAmazonKendra TheRealAmazonKendra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. Please see my comments inline.

*
* @default false
*/
readonly autoDeleteLogGroup?: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with our contract elsewhere when the user sets the Removal Policy. Please see other places where we have done this and follow that model. You use the enum below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TheRealAmazonKendra thank you for the review!

This API is actually inspired by s3.Bucket.autoDeleteObjects. In this case the only concern for developers is whether the log group is automatically deleted or not. It is only a binary choice and there is no need to use RemovalPolicy enum to specify that. One of the main advantages of CDK constructs is abstraction, and here we abstract away RemovalPolicy to provide better developer experience.

Also internally we do not use CloudFormation RemovalPolicy here but using a feature implemented in the LogRetention custom resource below. I'm not sure if it is inconsistent to use the different contract from other CFn removal policies. Isn't it natural that different things will have different contracts?

RemovalPolicy: props.removalPolicy,

I'm open to discussion about this :) currently I don't see much benefits to use enum here though.

@mergify mergify bot dismissed TheRealAmazonKendra’s stale review August 31, 2022 08:19

Pull request has been modified.

Copy link
Contributor

@TheRealAmazonKendra TheRealAmazonKendra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#21113 is the experience this should be consistent with. In fact, we should be using LogRetentionProps here instead of the individual props. As a CDK team member, I understand that abstraction is a key advantage of constructs. Consistent user experience is also an important.

@tmokmss
Copy link
Contributor Author

tmokmss commented Sep 1, 2022

Hi @TheRealAmazonKendra I really appreciate your review and effort you expended on my PR! So let me summarize the discussion so far.

I assume there are currently three possible options to be considered:

// #1
// Pros: Less typing. Directly expressing developer's intent.
// Cons: Due to more abstraction, the API becomes different from LogRetention construct. 
new lambda.Function(stack, 'AutoDeleteLogGroup', {
    code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
    handler: 'index.handler',
    runtime: lambda.Runtime.NODEJS_14_X,
    autoDeleteLogGroup: true,
});

// #2
// Pros: Similar API to LogRetention construct.
// Cons: * Requires more typing. We have to import aws-cdk-lib.RemovalPolicy, and the property name seems a little redundant.
//       * We can also pass RemovalPolicy.SNAPSHOT although it means nothing here, which is unnecessarily confusing.
//       * It may look like removal policy of Lambda function itself, which is also misleading.
new lambda.Function(stack, 'AutoDeleteLogGroup', {
    code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
    handler: 'index.handler',
    runtime: lambda.Runtime.NODEJS_14_X,
    logRetentionRemovalPolicy: RemovalPolicy.DESTROY,
});

// #3
// If we go with this approach we need much refactor work. I guess we can ignore this approach for now.
new lambda.Function(stack, 'AutoDeleteLogGroup', {
    code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
    handler: 'index.handler',
    runtime: lambda.Runtime.NODEJS_14_X,
    logRetentionProps: {
         // also move other logRetention* props here
        removalPolicy: RemovalPolicy.DESTROY,
    },
});

I agree with API consistency is important, and I somewhat agree with that we need to be consistent with props of internal constructs. But LogRetentionProps and FunctionPros are in different abstraction layers. Developers do not necessarily have to know about implementation detail of LogRetention when they want to automatically delete log groups. To me using RemovalPolicy here seems kind of leaky abstraction where we can achieve simpler API.

What we also need to be care about is the consistency between constructs that are using LogRetention. I did some study about this and found there are now five constructs using LogRetention: AppSync, ChatBot, DocumentDB, RDS, and Lambda. None of them currently have implemented log removal feature yet, so I believe we don’t have to care about consistency in terms of this point, and we can design the API from scratch. Also as you can see, they do not currently have consistent API with each other at all anyway:(

To be honest after the release of this feature I’ll be specifying this option every time I create a Lambda function. And that's why I’m inclined to keep this API as simple as possible... I'd like to avoid to specify cdk.RemovalPolicy repetitively where a single boolean is enough. As a side note, I asked my colleagues about this today and most of them prefer #1 over #2 due to its simplicity.

That being said I'm willing to follow your final decision. Considering all of the above, how do you think about #1 and #2? And sorry for writing such a long message with my poor English. Let me know if there are any sentences that does not make sense! Thanks.

@mergify mergify bot dismissed TheRealAmazonKendra’s stale review September 1, 2022 14:50

Pull request has been modified.

Copy link
Contributor

@TheRealAmazonKendra TheRealAmazonKendra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3 is the direction I think is the right one, though I wouldn't necessarily call the field logRetentionProps, it just needs to be of the type LogRetentionProps. This way, the user gets access to all the functionality there within. The only change in logs is that logGroupName now needs to be optional. This is the contract that should have existed from the beginning so we're not rewriting the same stuff over and over again. I'm not sure why we deviated here, but I think that was a mistake. If we're changing the contract now, it should be to fix that and not further deviate.

@tmokmss
Copy link
Contributor Author

tmokmss commented Sep 5, 2022

@TheRealAmazonKendra
According to the design guidelines, flat props structure is prefferred over nested structure. (doc). I guess it partially explains why most of the log properties are currently designed as such instead of #3 approach.

If we are to fully replace the existing contract design, and if the contract consistency is a top priority, how about adding a new interface to set log retention?

e.g.

interface ILoggable {
    setLogRetention(props: LogRetentionProps): LogRetention;
    logGroup: ILogGroup;
}

class Function implements ILoggable {
    setLogRetention(props: LogRetentionProps) {
        // ...
    }
}

const func = new lambda.Function(stack, 'AutoDeleteLogGroup', {
    code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
    handler: 'index.handler',
    runtime: lambda.Runtime.NODEJS_14_X,
});

func.setLogRetention({ removalPolicy: RemovalPolicy.DESTROY })

As long as they implement this interface, the API will be ensured to be consistent between constructs.

However I'm not sure if the above approach will work well with some constructs such as AppSync.GraphqlApi, which seemingly want to provide optimized experience by defining their own props (e.g. LogConfig). I guess we also need to discuss with people who are developing the constructs?

Also I think we should, ideally, define a dedicated props interface for setLogRetention instead of re-using the existing LogRetentionProps, because setting logGroupName here does not make sense at all even if logGroupName is optional. The log group name should always be computed from the resource properties (e.g. function name.) So we need something like Omit<LogRetentionProps, 'logGroupName'> instead (of course we cannot use generics with jsii so it should be manually defined.)

@tmokmss
Copy link
Contributor Author

tmokmss commented Sep 7, 2022

Hi @TheRealAmazonKendra
As I noted on the PR description, I found there will be an issue similar to #16603 in LogRetention construct (i.e. users may remove a log group unintentionally.) Is it something we must resolve before adding this PR's feature? If so I'll work on it before this PR.

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the CHANGES REQUESTED state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

@mergify mergify bot dismissed TheRealAmazonKendra’s stale review September 24, 2022 08:54

Pull request has been modified.

@stec00
Copy link

stec00 commented Oct 12, 2022

@tmokmss Please could you advise on the current state of this PR (e.g. active, blocked or abandoned)?
It looks very useful to what my team is currently trying to achieve if it can be progressed/merged.

@l0b0
Copy link
Contributor

l0b0 commented Nov 21, 2022

Ping - We're hoping to avoid implementing a hacky workaround for the most obvious non-working solution.

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the MERGE CONFLICTS state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the MERGE CONFLICTS state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the MERGE CONFLICTS state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label May 5, 2023
@aws-cdk-automation
Copy link
Collaborator

This PR cannot be merged because it has conflicts. Please resolve them. The PR will be considered stale and closed if it remains in an unmergeable state.

@mrgrain
Copy link
Contributor

mrgrain commented Jul 31, 2023

@tmokmss Apologies for the delay. Let's wrap up the open questions:

  1. what API will look like? (ref)

The API for logRetention is a huge mess and desperately needs to be refactored. In the meantime #2 is the way to go as per the design docs. It has a flat structure and uses RemovalPolicy. S3 Bucket actually is the only (documented) special case, so it kind of deserves a different property. Although arguably we could have created a RemovalPolicy.DESTROY_WITH_CONTENTS at the time.

  1. a log group can be accidentally deleted, as described in the PR body; do we need to handle this issue?

Yes, but that's already implemented in the CR anyway.

I also have one more thing I'd like to clarify:

If a user remove autoDeleteLogGroup property as below, the logRetention resource will be removed from a synthesized template, and it will result in the removal of log group itself. This implicit behavior should be clearly warned in the doc.

I don't see how this is happening with the code you have on the PR. Is this still the case, if so can you help me understand how?

@tmokmss
Copy link
Contributor Author

tmokmss commented Jul 31, 2023

Hi @mrgrain, thanks for the review!

I don't see how this is happening with the code you have on the PR. Is this still the case, if so can you help me understand how?

Sure. Let's say we have the following template deployed.

new lambda.Function(stack, 'Function', {
  code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
  handler: 'index.handler',
  runtime: lambda.Runtime.NODEJS_14_X,
  autoDeleteLogGroup: true,
});

The CFn template for this code looks like this:

  "FunctionF74A7E0D": {
   "Type": "AWS::Lambda::Function",
   "Properties": { ...
   }
  },
  "FunctionLogRetentionAC501FFB": {
   "Type": "Custom::LogRetention",
   "Properties": {
    ...
    "RemovalPolicy": "destroy"
   }
  }

Now if we remove the autoDeleteLogGroup property:

new lambda.Function(stack, 'Function', {
  code: new lambda.InlineCode('exports.handler = (event) => console.log(JSON.stringify(event));'),
  handler: 'index.handler',
  runtime: lambda.Runtime.NODEJS_14_X,
-   autoDeleteLogGroup: true,
});

the CFn template will look like this:

  "FunctionF74A7E0D": {
   "Type": "AWS::Lambda::Function",
   "Properties": { ...
   }
  },
-  "FunctionLogRetentionAC501FFB": {
-   "Type": "Custom::LogRetention",
-   "Properties": {
-    ...
-    "RemovalPolicy": "destroy"
-   }
-  }

Note that only the log retention resource is deleted, and the function remains as-is.

Here is where the unexpected behavior comes in. When a Custom::LogRetention with "RemovalPolicy": "destroy" is removed, it will always remove the log group, even though the log group is still used by the function.

//When the requestType is delete, delete the log group if the removal policy is delete
if (event.RequestType === 'Delete' && event.ResourceProperties.RemovalPolicy === 'destroy') {
await deleteLogGroup(logGroupName, logGroupRegion, retryOptions);
//else retain the log group
}

Actually there is no way for the custom resource to know if the corresponding function is removed or not. I thought we could query the Lambda service if the function exists or not, but that is also difficult, because of the dependency between a LogRetention and a function. A LogRetention depends on a corresponding function (to get the function name), so it must be deleted before the function.

const logRetention = new logs.LogRetention(this, 'LogRetention', {
logGroupName: `/aws/lambda/${this.functionName}`,

I still has not found a good solution for this problem, but any suggestion is welcome! :) I think the easiest approach is just to accept this behavior as expected, and document details about this.

@mrgrain
Copy link
Contributor

mrgrain commented Aug 1, 2023

@tmokmss Thanks for the explanation. I understand the problem now. I think the issue comes down to creating a false equality between the LogRetention custom resource and the LogGroup that the Lambda function uses. Before #21113 the CR did not concern itself with a the lifecycle of the log group. It was a simple fix to overwrite the retention. Actually #26049 has a similar issue in this regard.

Not quite sure what the solution would be for all of these... 🤔

@tmokmss
Copy link
Contributor Author

tmokmss commented Aug 1, 2023

@mrgrain Another solution would be to always define a LogRetention custom resource regardless of a function's properties. Then even if we remove those properties, the log retention resource and the log group will still be retained.

I hesitated to do this because it would require updates to many many existing integ tests. But If other features are also facing this problem, it might be reasonable to change the current behavior.

@mrgrain
Copy link
Contributor

mrgrain commented Aug 1, 2023

Been thinking about this all morning. It's fairly easy to implement a variation of the fix that S3's autoDeleteObjects has. But only if the parent resource (here: a Lambda Function) is known. It get's complicated for the general case because each resource uses a slightly different tagging interface. 😕

@mrgrain mrgrain marked this pull request as draft August 2, 2023 13:58
@mrgrain
Copy link
Contributor

mrgrain commented Aug 2, 2023

@tmokmss Thanks again for this PR. I'm drafting this for now. We have to solve the deletion issue first. Same as here: #26049

I have some ideas for it though.

@tmokmss
Copy link
Contributor Author

tmokmss commented Aug 2, 2023

@mrgrain ok, thanks! I'm looking forward to that:)

@mrgrain
Copy link
Contributor

mrgrain commented Aug 2, 2023

I'm planning to create a generic DependentLogGroup (or similar) that actually supports all log group properties.
To avoid the deletion of the log group, we need to tag the parent (e.g. Lambda Function) with a special tag. The only blocker is that it's not really possible to read tags from generic parent resource. I think we will have to configure them manually.

@tmokmss
Copy link
Contributor Author

tmokmss commented Aug 2, 2023

Nice! I just found issue #16756 and the discussions there appear to cover most related considerations. After reading them, I'm now convinced that tagging a parent resource is (maybe only) a viable solution. (I didn't know the special ordering of tags.)

Looking up the current deployed template (reverted in d220b94) is interesting but I guess it will also requires special handling for each parent resource type anyway.

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the MERGE CONFLICTS state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 5d74c13
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@aws-cdk-automation aws-cdk-automation removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Aug 4, 2023
@mrgrain mrgrain closed this Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(lambda): Add property for log removal policy of Lambda function log groups
7 participants