Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NXDOMAIN errors should not be retried #2083

Merged
merged 1 commit into from Apr 7, 2023
Merged

NXDOMAIN errors should not be retried #2083

merged 1 commit into from Apr 7, 2023

Conversation

rem7
Copy link
Contributor

@rem7 rem7 commented Apr 5, 2023

The standard retryer keeps retrying domains that do not resolve. This hangs the execution until retries are exhausted.
NXDOMAIN or 'no such host' errors are not transient and should fail immediately. An example would be when a user enters an invalid region that doesn't resolve.

For changes to files under the /codegen/aws-models folder, and manual edits to autogenerated code (e.g. /service/s3/api.go) please create an Issue instead of a PR for those type of changes.

If the PR addresses an existing bug or feature, please reference it here.

To help speed up the process and reduce the time to merge please ensure that Allow edits by maintainers is checked before submitting your PR. This will allow the project maintainers to make minor adjustments or improvements to the submitted PR, allow us to reduce the roundtrip time for merging your request.

@rem7 rem7 requested a review from a team as a code owner April 5, 2023 20:38
Copy link
Contributor

@aajtodd aajtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. You've provided motivation for the PR but a code example of what is currently not working would be helpful.

Some minor correctness issues but otherwise looks ok.


switch {
case errors.As(err, &dnsError):
// NXDOMAIN errors should not be retried
retryable = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: Not all instances of net.DNSError are related to NXDOMAIN errors. This should be predicated on the fields in net.DNSError.

e.g. something like

case errors.As(err, &dnsError):
    // NXDOMAIN and other non temporary DNS errors should not be retried
    retryable = !dnsError.IsNotFound && dnsError.IsTemporary

The standard retryer keeps retrying domains that do not resolve. This
hangs the execution until retries are exhausted.
NXDOMAIN or 'no such host' errors are not transient and should fail immediately.
An example would be when a user enters an invalid region that doesn't resolve.
@rem7
Copy link
Contributor Author

rem7 commented Apr 6, 2023

Hi @aajtodd thanks for reviewing my code and providing feedback. I've updated the PR to include your suggestion and tested it.

Here is a code example. Use case is when a user is trying to upload an object to Amazon S3 and enters the wrong region. If the user increases the MaxAttempts, the delay is even more noticeable since it keeps retrying the NXDOMAIN issue.

var (
	region = flag.String("region", "us-west-2", "region")
)

func init() {
	flag.Parse()
}

func main() {
	fmt.Printf("region input: %s\n", *region)
	cfg, err := config.LoadDefaultConfig(context.TODO(),
		config.WithRegion(*region),
		config.WithRetryer(func() aws.Retryer {
			return retry.NewStandard(func(o *retry.StandardOptions) {
				o.MaxAttempts = 10
			})
		}))

	if err != nil {
		log.Fatal(err.Error())
	}
	client := s3.NewFromConfig(cfg)
	bucket := "bolyanko-temp"
	key := "test-object"
	_, err = client.PutObject(context.TODO(), &s3.PutObjectInput{
		Bucket: &bucket,
		Key:    &key,
		Body:   new(bytes.Buffer),
	})
	if err != nil {
		panic(err)
	}
	fmt.Printf("object uploaded successfully\n")
}
# with code as-is today it takes 1min 49 seconds to fail:
time ./pathtest --region us-west-x
region input: us-west-x
panic: operation error S3: PutObject, exceeded maximum number of attempts, 10, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Put "https://bolyanko-temp.s3.us-west-x.amazonaws.com/test-object?x-id=PutObject": dial tcp: lookup bolyanko-temp.s3.us-west-x.amazonaws.com: no such host

goroutine 1 [running]:
main.main()
	/Users/bolyanko/src/s3-obj-lambda/pathTests/main.go:45 +0x318
./pathtest --region us-west-x  0.01s user 0.02s system 0% cpu 1:48.99 total
# with this PR it fails immediately
time ./pathtest --region us-west-x
region input: us-west-x
panic: operation error S3: PutObject, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Put "https://bolyanko-temp.s3.us-west-x.amazonaws.com/test-object?x-id=PutObject": dial tcp: lookup bolyanko-temp.s3.us-west-x.amazonaws.com: no such host

goroutine 1 [running]:
main.main()
	/Users/bolyanko/src/s3-obj-lambda/pathTests/main.go:45 +0x318
./pathtest --region us-west-x  0.00s user 0.01s system 1% cpu 0.706 total 

@aajtodd aajtodd merged commit d40a16e into aws:main Apr 7, 2023
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants