Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor instance/region discovery #763

Merged
merged 4 commits into from
Dec 20, 2023

Conversation

Avery-Dunn
Copy link
Contributor

During the discussions about #762 it was pointed out that over the years the code flow for instance and region discovery has gotten messy and inefficient.

This PR attempts to refactor the logic around it to make the code flow clearer and avoid making unnecessary/repetitive method calls.

if (msalRequest.application().azureRegion() != null) {
host = getRegionalizedHost(authorityUrl.getHost(), msalRequest.application().azureRegion());
} else if (msalRequest.application().autoDetectRegion() && detectedRegion != null) {
msalRequest.application().azureRegion = detectedRegion;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you also update the host in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest commit I refactored some of the logic here:

  1. The part in the if block has been moved up closer to the start of the method, so if the application's azureRegion value is set it will do the host = getRegionalizedHost... and the regional host will be used when looking through the cache
  2. The part in the else if block has been adjusted so the host = getRegionalizedHost... is done if autodetection succeeds and the azureRegion value was not already set

}

InstanceDiscoveryMetadataEntry result = cache.get(host);
//If there is no cached instance metadata, do instance and region discovery and cache the result
if (cache.get(host) == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this top level logic needs logging statements, which will help investiagte issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest commit I added debug logs for all the major if blocks: when instance discovery is disabled, when there's no cached metadata, when region API is used, etc.

serviceBundle.getServerSideTelemetry().getCurrentRequest().regionOutcome(
determineRegionOutcome(detectedRegion, msalRequest.application().azureRegion(), msalRequest.application().autoDetectRegion()));
//If instanceDiscovery flag set to false or instance discovery previously failed, do not do instance discovery
if (!msalRequest.application().instanceDiscovery() || instanceDiscoveryFailed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be simpler to do smth like:

  • if a cached entry exists, use it
  • if not and instance discovery has been disabled, add the 1-host dumb entry to the cache. Same if instance discovery fails.

makes the code shorter and removes the need for having instanceDiscoveryFailed flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest commit I rearranged it so this is now the pattern:

  1. If instance discovery is disabled, just add a default entry (if it doesn't already exist) and return it
  2. If a region is set at the application level, adjust the host variable to use it
  3. If no cached entry for host exists, then do instance discovery
  4. If any part of the region API is used, try to detect the region
  5. If region autodetect enabled and a region wasn't set at the application level, set the region at the application level and adjust the host to use it
  6. Do instance discovery for host, and add the default entry if it fails
  7. Create cache entries for whatever host is at the end of all that
  8. Return the cached result

This means on future runs the cached entry will be returned quicker, because:

  • Instance discovery was disabled, so step 1 returns immediately
  • Instance discovery failed and a default entry for host was cached, so it skips steps 3-7 and returns the cached entry
  • Instance discovery succeeded and either a regional or non-regional entry for host was cached, so it skips steps 3-7 and returns the cached entry

if (cache.get(host) == null) {
if (shouldUseRegionalEndpoint(msalRequest)) {
//Server side telemetry requires the result from region discovery when any part of the region API is used
String detectedRegion = discoverRegion(msalRequest, serviceBundle);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the request provides an azureRegion, do you still need to do the discovery? Might be better to use the if(masalRequest.application().azureRegion() != null) logic before making this call.

Suggested change
String detectedRegion = discoverRegion(msalRequest, serviceBundle);
if (msalRequest.application().azureRegion() != null) {
host = getRegionalizedHost(authorityUrl.getHost(), msalRequest.application().azureRegion());
}
else if (msalRequest.application().autoDetectRegion() ) {
//Server side telemetry requires the result from region discovery when any part of the region API is used
String detectedRegion = discoverRegion(msalRequest, serviceBundle);
msalRequest.application().azureRegion = detectedRegion;
host = getRegionalizedHost(authorityUrl.getHost(), msalRequest.application().azureRegion());
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble finding the design doc, but there's a list of region telemetry values for different scenarios, including:

  • A developer set the region, and MSAL is able to detect the same one
  • A developer set the region, but MSAL found a different one
  • A developer set the region, but MSAL's autodetect failed for some reason

This means that we need to do autodetection whenever any part of the region API is used, unless we decide to just not collect this telemetry. I believe this behavior is the same in .NET and other MSALs too (feel free to correct me on that @bgavrilMS).

Our RegionTelemetry enum gives a good overview of all the different scenarios we look for: https://github.com/AzureAD/microsoft-authentication-library-for-java/blob/dev/msal4j-sdk/src/main/java/com/microsoft/aad/msal4j/RegionTelemetry.java

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we will try to auto-detect the region even if the app developer tells us "use region X". Auto-detection should only happen once per process (even if it fails) and I believe is set to timeout after 2 sec.

If this is too slow, we can discuss about removing this logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to these changes we were incorrectly doing autodetection every time a region was used, since the code to do it was happening before the code to check if instance metadata was cached.

With these changes autodetection happens only if no instance is cached, so it'll only happen once for a given authority/region and thus the 2 second timeout will also only happen once.

@Avery-Dunn Avery-Dunn merged commit c5a14c4 into dev Dec 20, 2023
5 checks passed

//If region autodetection is enabled and a specific region was not already set, set the application's
// region to the discovered region so that future requests can skip the IMDS endpoint call
if (msalRequest.application().azureRegion() == null

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Avery-Dunn don't you need to record the telemetry before this block? As is if the developer didn't provide the region and only did auto-detect it is going to look like the developer provided the region and the auto-detect got the same result as provided.

@Avery-Dunn Avery-Dunn deleted the avdunn/instance-discovery-improvements branch March 11, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants