Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deliver] increase chances of success when creating a new app version even when Apple servers are degraded #21742

Merged
merged 9 commits into from
Feb 9, 2024
9 changes: 9 additions & 0 deletions deliver/lib/deliver/options.rb
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,15 @@ def self.available_options
description: "Rejects the previously submitted build if it's in a state where it's possible",
type: Boolean,
default_value: false),
FastlaneCore::ConfigItem.new(key: :version_check_wait_retry_limit,
env_name: "DELIVER_VERSION_CHECK_WAIT_RETRY_LIMIT",
description: "After submitting a new version, App Store Connect takes some time to recognize the new version and we must wait until it's available before attempting to upload metadata for it. There is a mechanism that will check if it's available and retry with an exponential backoff if it's not available yet. " \
"This option specifies how many times we should retry before giving up. Setting this to a value below 5 is not recommended and will likely cause failures. Increase this parameter when Apple servers seem to be degraded or slow",
type: Integer,
default_value: 7,
verify_block: proc do |value|
UI.user_error!("'#{value}' needs to be greater than 0") if value <= 0
end),

# release
FastlaneCore::ConfigItem.new(key: :automatic_release,
Expand Down
36 changes: 22 additions & 14 deletions deliver/lib/deliver/upload_metadata.rb
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ def upload(options)
enabled_languages = detect_languages(options)

app_store_version_localizations = verify_available_version_languages!(options, app, enabled_languages) unless options[:edit_live]
app_info = fetch_edit_app_info(app)
app_info = fetch_edit_app_info(app, max_retries: options[:version_check_wait_retry_limit])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider creating an object called RetryConfig and move all fields related to the retry (max_retries, initial_wait_time) into it.

This would change the retry_if_nil method like this

    def retry_if_nil(message, retry_config:)
      wait_time = retry_config[:initial_wait_time]
      tries = retry_config[:tries]

and all other calls would pass a single parameter that could be initialized in one place.

Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it does, just not sure if it's worth unifying these into an object, since it's just 2 configs? I usually do these type of things once there are 3 or more parameters 👀 Do you think this would make a significant difference? In terms of e.g. UX/DX, testability, maintainability…? @lacostej

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We fetch the options[:version_check_wait_retry_limit] 5 times in the code, I suspect there should be a way to make this a bit better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a proposal

  1. we do not add any retry related parameter to the fetch_* methods. We even remove the wait_time one
  2. we also remove the parameters from retry_if_nil, and instead, initialize the initial_wait_time from options within the implementation

this should remove most retry specific information in most of the code

  1. remains the question of the tests. I guess we pass a default value of 0.01 to ensure that they do not take too long. There we could instead stub the sleep call instead, and just ensure it gets called the right amount of times. It will even make the tests slightly faster. Stubbing sleep requires a bit more work to do right.

Copy link
Member Author

@rogerluan rogerluan Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lacostej I think this would be a good approach, I'll try to come back to this later this week 🙏

If you have a clear understanding on how to implement this and wants to take a stab before I have time to get to it, that'd probably help us getting this out of the door faster as I don't think I'll have much availability in the next few days 🙇 if not, all good too! 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey team, I won't have time to look into this in the upcoming future 😢 @lacostej or anyone else, feel free to pick this up 🙏I'm also ok with merging this in as is and creating a separate task to refactor later (although I suspect it won't be picked up because it won't be a high priority 😞)

Copy link
Member

@mollyIV mollyIV Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello 👋 I will take it over 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The follow-up pull request that addresses requested changes: #21861 🙇

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mollyIV

Yes this is good. If you look at the overall combined PR, the code is I think cleaner.

thanks for spending the time to do this!

app_info_localizations = verify_available_info_languages!(options, app, app_info, enabled_languages) unless options[:edit_live] || !updating_localized_app_info?(options, app, app_info)

if options[:edit_live]
Expand All @@ -100,7 +100,7 @@ def upload(options)

if version.nil?
UI.message("Couldn't find live version, editing the current version on App Store Connect instead")
version = fetch_edit_app_store_version(app, platform)
version = fetch_edit_app_store_version(app, platform, max_retries: options[:version_check_wait_retry_limit])
# we don't want to update the localised_options and non_localised_options
# as we also check for `options[:edit_live]` at other areas in the code
# by not touching those 2 variables, deliver is more consistent with what the option says
Expand All @@ -109,7 +109,7 @@ def upload(options)
UI.message("Found live version")
end
else
version = fetch_edit_app_store_version(app, platform)
version = fetch_edit_app_store_version(app, platform, max_retries: options[:version_check_wait_retry_limit])
localised_options = (LOCALISED_VERSION_VALUES.keys + LOCALISED_APP_VALUES.keys)
non_localised_options = NON_LOCALISED_VERSION_VALUES.keys
end
Expand Down Expand Up @@ -427,41 +427,49 @@ def detect_languages(options)
.uniq
end

def fetch_edit_app_store_version(app, platform, wait_time: 10)
retry_if_nil("Cannot find edit app store version", wait_time: wait_time) do
def fetch_edit_app_store_version(app, platform, max_retries:, initial_wait_time: 10)
retry_if_nil("Cannot find edit app store version", tries: max_retries, initial_wait_time: initial_wait_time) do
app.get_edit_app_store_version(platform: platform)
end
end

def fetch_edit_app_info(app, wait_time: 10)
retry_if_nil("Cannot find edit app info", wait_time: wait_time) do
def fetch_edit_app_info(app, max_retries:, initial_wait_time: 10)
retry_if_nil("Cannot find edit app info", tries: max_retries, initial_wait_time: initial_wait_time) do
app.fetch_edit_app_info
end
end

def fetch_live_app_info(app, wait_time: 10)
retry_if_nil("Cannot find live app info", wait_time: wait_time) do
def fetch_live_app_info(app, max_retries:, initial_wait_time: 10)
retry_if_nil("Cannot find live app info", tries: max_retries, initial_wait_time: initial_wait_time) do
app.fetch_live_app_info
end
end

def retry_if_nil(message, tries: 5, wait_time: 10)
# Retries a block of code if the return value is nil, with an exponential backoff.
def retry_if_nil(message, tries:, initial_wait_time: 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI will timeout in this retry method because of no console output if tries are more and taking more than 10 minutes because we are doubling seconds after every retry.
A better solution could be that keep printing the retry status on regular-interval on the console to avoid CI timeout. Please see the attached modified version of this method, diff, and the output. cc: @rogerluan

  def retry_if_nil(message, tries:, initial_wait_time: 10)
    wait_time = initial_wait_time
    loop do
      tries -= 1

      value = yield
      return value if value
      UI.message("#{message}... Retrying after #{wait_time} seconds (remaining: #{tries} tries)")
      
      retrying_status_time = 0
      while retrying_status_time < wait_time  do
        UI.message("Retrying status: #{retrying_status_time}/#{wait_time} seconds")
        sleep(initial_wait_time)
        retrying_status_time += initial_wait_time
      end

      return nil if tries.zero?

      wait_time *= 2 # Double the wait time for the next iteration
    end
  end

Diff

Screenshot 2024-01-06 at 00 35 52

Output:

Screenshot 2024-01-06 at 00 19 48

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point @crazymanish !

I'm torn between your implementation (which's great IMO), or incorporating the feedback discussed earlier in this PR about trying exponentially only until a certain threshold (e.g. 5 minutes), and then from then on keep trying every 5 minutes. As long as this threshold is <10minutes (which is the default console log timeout used in many CI services), we should be good.

What do you think? Should we incorporate your code regardless of the decision above? There's no harm in it, just makes code more complicated (justifiably so)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! i missed the threshold comment...Any retrying logic before the CI timeout (10 minutes) sounds good...
we should be good as long as CI will not timeout and CI will respect DELIVER_VERSION_CHECK_WAIT_RETRY_LIMIT even if the retry_limit value is long i.e 10 or 15 retries


Another small thing we could do is: we can have the customized threshold value This will give freedom to our users to retry after every 1 minute, 3 minutes, 5 minutes, or any X minutes (based on their CI timeout X values)

  • Define DELIVER_VERSION_CHECK_WAIT_RETRY_THRESHOLD or version_check_wait_retry_threshold param
  • Make this threshold by default every 3 minutes to avoid CI timeout and it should be easy to implement too
wait_time = min(wait_time * 2, version_check_wait_retry_threshold * 60)`

btw, Thank you very much for doing the good work in this PR, it seems lots of people are facing this issue!!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @crazymanish 🙏 I usually try to avoid bloating up the actions/tools with so many options, and I think in this case there's no need to allow that level of customization. I'd go with that implementation and go for a safe fixed number of 5 minutes which seems to be reasonable and work with every CI provider I can think of 👀 do you feel strongly we should allow customization of this by the users?

I'm going to go ahead and implement this now but I can change later too 🙏

wait_time = initial_wait_time
loop do
tries -= 1

value = yield
return value if value

UI.message("#{message}... Retrying after #{wait_time} seconds (remaining: #{tries})")
sleep(wait_time)
# Calculate sleep time to be the lesser of the exponential backoff or 5 minutes.
# This prevents problems with CI's console output timeouts (of usually 10 minutes), and also
# speeds up the retry time for the user, as waiting longer than 5 minutes is a too long wait for a retry.
sleep_time = [wait_time * 2, 5 * 60].min
UI.message("#{message}... Retrying after #{sleep_time} seconds (remaining: #{tries})")
sleep(sleep_time)

return nil if tries.zero?

wait_time *= 2 # Double the wait time for the next iteration
end
end

# Checking if the metadata to update includes localised App Info
def updating_localized_app_info?(options, app, app_info)
app_info ||= fetch_live_app_info(app)
app_info ||= fetch_live_app_info(app, max_retries: options[:version_check_wait_retry_limit])
unless app_info
UI.important("Can't find edit or live App info. Skipping upload.")
return false
Expand Down Expand Up @@ -533,7 +541,7 @@ def verify_available_info_languages!(options, app, app_info, languages)
# Finding languages to enable
def verify_available_version_languages!(options, app, languages)
platform = Spaceship::ConnectAPI::Platform.map(options[:platform])
version = fetch_edit_app_store_version(app, platform)
version = fetch_edit_app_store_version(app, platform, max_retries: options[:version_check_wait_retry_limit])

unless version
UI.user_error!("Cannot update languages - could not find an editable version for '#{platform}'")
Expand Down
40 changes: 25 additions & 15 deletions deliver/spec/upload_metadata_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -182,22 +182,22 @@ def create_metadata(path, text)
it "no retry" do
expect(app).to receive(:get_edit_app_store_version).and_return(version)

edit_version = uploader.fetch_edit_app_store_version(app, 'IOS', wait_time: 0.1)
edit_version = uploader.fetch_edit_app_store_version(app, 'IOS', max_retries: 5, initial_wait_time: 0.1)
expect(edit_version).to eq(version)
end

it "1 retry" do
expect(app).to receive(:get_edit_app_store_version).and_return(nil)
expect(app).to receive(:get_edit_app_store_version).and_return(version)

edit_version = uploader.fetch_edit_app_store_version(app, 'IOS', wait_time: 0.1)
edit_version = uploader.fetch_edit_app_store_version(app, 'IOS', max_retries: 5, initial_wait_time: 0.01)
expect(edit_version).to eq(version)
end

it "5 retry" do
expect(app).to receive(:get_edit_app_store_version).and_return(nil).exactly(5).times

edit_version = uploader.fetch_edit_app_store_version(app, 'IOS', wait_time: 0.1)
edit_version = uploader.fetch_edit_app_store_version(app, 'IOS', max_retries: 5, initial_wait_time: 0.01)
expect(edit_version).to eq(nil)
end
end
Expand All @@ -206,22 +206,22 @@ def create_metadata(path, text)
it "no retry" do
expect(app).to receive(:fetch_edit_app_info).and_return(app_info)

edit_app_info = uploader.fetch_edit_app_info(app, wait_time: 0.1)
edit_app_info = uploader.fetch_edit_app_info(app, max_retries: 5, initial_wait_time: 0.01)
expect(edit_app_info).to eq(app_info)
end

it "1 retry" do
expect(app).to receive(:fetch_edit_app_info).and_return(nil)
expect(app).to receive(:fetch_edit_app_info).and_return(app_info)

edit_app_info = uploader.fetch_edit_app_info(app, wait_time: 0.1)
edit_app_info = uploader.fetch_edit_app_info(app, max_retries: 5, initial_wait_time: 0.01)
expect(edit_app_info).to eq(app_info)
end

it "5 retry" do
expect(app).to receive(:fetch_edit_app_info).and_return(nil).exactly(5).times

edit_app_info = uploader.fetch_edit_app_info(app, wait_time: 0.1)
edit_app_info = uploader.fetch_edit_app_info(app, max_retries: 5, initial_wait_time: 0.01)
expect(edit_app_info).to eq(nil)
end
end
Expand Down Expand Up @@ -258,7 +258,8 @@ def create_metadata(path, text)
platform: "ios",
metadata_path: metadata_path,
name: { "en-US" => "App name" },
description: { "en-US" => "App description" }
description: { "en-US" => "App description" },
version_check_wait_retry_limit: 5,
}

# Get number of versions (used for if whats_new should be sent)
Expand Down Expand Up @@ -295,7 +296,8 @@ def create_metadata(path, text)
platform: "ios",
metadata_path: metadata_path,
privacy_url: { "en-US" => "https://fastlane.tools" },
apple_tv_privacy_policy: { "en-US" => "https://fastlane.tools/tv" }
apple_tv_privacy_policy: { "en-US" => "https://fastlane.tools/tv" },
version_check_wait_retry_limit: 5,
}

# Get number of versions (used for if whats_new should be sent)
Expand All @@ -319,7 +321,8 @@ def create_metadata(path, text)
options = {
platform: "ios",
metadata_path: metadata_path,
auto_release_date: 1_595_395_800_000
auto_release_date: 1_595_395_800_000,
version_check_wait_retry_limit: 5,
}

# Get number of version (used for if whats_new should be sent)
Expand All @@ -343,7 +346,8 @@ def create_metadata(path, text)
platform: "ios",
metadata_path: metadata_path,
phased_release: true,
automatic_release: false
automatic_release: false,
version_check_wait_retry_limit: 5,
}

# Get number of version (used for if whats_new should be sent)
Expand Down Expand Up @@ -372,7 +376,8 @@ def create_metadata(path, text)
options = {
platform: "ios",
metadata_path: metadata_path,
phased_release: false
phased_release: false,
version_check_wait_retry_limit: 5,
}

# Get number of version (used for if whats_new should be sent)
Expand Down Expand Up @@ -400,7 +405,8 @@ def create_metadata(path, text)
options = {
platform: "ios",
metadata_path: metadata_path,
reset_ratings: true
reset_ratings: true,
version_check_wait_retry_limit: 5,
}

# Get number of version (used for if whats_new should be sent)
Expand All @@ -425,7 +431,8 @@ def create_metadata(path, text)
options = {
platform: "ios",
metadata_path: metadata_path,
reset_ratings: false
reset_ratings: false,
version_check_wait_retry_limit: 5,
}

# Get number of version (used for if whats_new should be sent)
Expand Down Expand Up @@ -455,6 +462,7 @@ def create_metadata(path, text)
options = {
platform: "ios",
metadata_path: metadata_path,
version_check_wait_retry_limit: 5,
}

# Get live app info
Expand All @@ -471,7 +479,8 @@ def create_metadata(path, text)
options = {
platform: "ios",
metadata_path: metadata_path,
name: { "en-US" => "App name" }
name: { "en-US" => "App name" },
version_check_wait_retry_limit: 5,
}

# Get live app info
Expand All @@ -497,7 +506,8 @@ def create_metadata(path, text)
options = {
platform: "ios",
metadata_path: metadata_path,
name: { "en-US" => "New app name" }
name: { "en-US" => "New app name" },
version_check_wait_retry_limit: 5,
}

allow(Deliver).to receive(:cache).and_return({ app: app })
Expand Down