`NewInstancesProtectedFromScaleIn` causing ASGs to take ages to update. #977

toothbrush · 2021-12-22T00:06:47Z

Good day! 👋

Describe the bug

We use realestate's stackup to manage rollouts of the aws-stack.yml you provide. Mostly works great. In 178c253 you have enabled NewInstancesProtectedFromScaleIn: true to the ASG, but the behaviour i'm now seeing is that when i make a change to the AWS Elastic stack (e.g. an updated AMI ID), the old ASG takes ages (1h+) to delete/stabilise, since the members are protected from scale-in.

Steps To Reproduce
Steps to reproduce the behavior:

Spin up the AWS elastic stack
wait for it to be ready and CREATE_COMPLETE
change a parameter, e.g. AMI ID
a new LaunchTemplate will be created and the old one will attempt to delete (along with its instances) but will hang for... long.

Expected behavior

Previously, updates were pretty snappy because the old ASG members would just be terminated.

Stack parameters (please complete the following information):

AWS Region: us-east-1
Version: v5.7.2

The text was updated successfully, but these errors were encountered:

toothbrush · 2021-12-22T00:12:37Z

Ah just looking around, maybe i'm in fact being re-bitten by #927?

In any case i'm not convinced that the instance protection thing is good for us.

eleanorakh · 2022-01-11T04:40:17Z

Hey hey @toothbrush. Thanks for reporting, we'll look into it!

freewil · 2022-01-24T15:48:58Z

Somewhat related is #768

Also been working to create custom AMIs and update the stack via ImageIdParameter. This causes a new ASG to be created, which is a problem in my case, since the instances in the old ASG will be terminated, which causes in-progress jobs to fail.

freewil · 2022-01-24T15:53:27Z

when i make a change to the AWS Elastic stack (e.g. an updated AMI ID), the old ASG takes ages (1h+) to delete/stabilise, since the members are protected from scale-in.

I've run into this issue myself when i want to scale down a stack rapidly (~400 instances to 0) via manually changing the ASG desired count/min/max values. It hasn't been a major issue for me as this is typically only done when launching new stacks to replace old stacks. I simply go to the instance management tab for the ASG in the AWS console and manually remove scale-in protection to speed up the scale down.

huguesb · 2022-06-10T19:16:06Z

I am running into this issue with v5.9.0 of the stack and have had to repeatedly go into the AWS console to manually disable protection for instances in the old stacks to allow the update to complete. This is obnoxious! It makes even the smallest configuration changes a major PITA, especially when some stacks have thousands of instances and AWS only allow removal of scale-in protection in batches of 50...

Please fix this asap.

gitlon · 2022-08-25T04:52:48Z

We made a hacky but effective fix for this problem by co-opting the AzRebalancingSuspenderFunction to remove scale-in-protection for running instances when the stack is updated or deleted. We're able to do this in our solution because we fork the ElasticCI template for other reasons. This also required changes to the function's role/permission and timeout/duration.

eg:

              client = boto3.client('autoscaling')
              props = event['ResourceProperties']
            
              if event['RequestType'] in ('Delete', 'Update'):
                instances = client.describe_auto_scaling_instances()['AutoScalingInstances']
                instances = [i['InstanceId'] for i in instances if i['AutoScalingGroupName'] == props['AutoScalingGroupName']]
                if instances:
                  response = client.set_instance_protection(InstanceIds=instances, AutoScalingGroupName=props['AutoScalingGroupName'], ProtectedFromScaleIn=False)
              else:
                response = client.suspend_processes(AutoScalingGroupName=props['AutoScalingGroupName'], ScalingProcesses=['AZRebalance'])
    
etc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`NewInstancesProtectedFromScaleIn` causing ASGs to take ages to update. #977

`NewInstancesProtectedFromScaleIn` causing ASGs to take ages to update. #977

toothbrush commented Dec 22, 2021

toothbrush commented Dec 22, 2021

eleanorakh commented Jan 11, 2022

freewil commented Jan 24, 2022 •

edited

freewil commented Jan 24, 2022 •

edited

huguesb commented Jun 10, 2022

gitlon commented Aug 25, 2022 •

edited

NewInstancesProtectedFromScaleIn causing ASGs to take ages to update. #977

NewInstancesProtectedFromScaleIn causing ASGs to take ages to update. #977

Comments

toothbrush commented Dec 22, 2021

toothbrush commented Dec 22, 2021

eleanorakh commented Jan 11, 2022

freewil commented Jan 24, 2022 • edited

freewil commented Jan 24, 2022 • edited

huguesb commented Jun 10, 2022

gitlon commented Aug 25, 2022 • edited

`NewInstancesProtectedFromScaleIn` causing ASGs to take ages to update. #977

`NewInstancesProtectedFromScaleIn` causing ASGs to take ages to update. #977

freewil commented Jan 24, 2022 •

edited

freewil commented Jan 24, 2022 •

edited

gitlon commented Aug 25, 2022 •

edited