Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Alpine Target RID #40843

Merged
merged 3 commits into from
May 23, 2024
Merged

Fix Alpine Target RID #40843

merged 3 commits into from
May 23, 2024

Conversation

ellahathaway
Copy link
Member

Closes dotnet/source-build#4393

Looking at the artifacts produced in this build (internal Microsoft SDK), the alpine target RID was 3.19-x64 but is now 3.19.1-x64. This needs to be updated in the sdk diff test pipeline so that the correct build artifacts can be found.

@ellahathaway ellahathaway requested review from a team as code owners May 13, 2024 16:17
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Infrastructure untriaged Request triage from a team member labels May 13, 2024
@ellahathaway ellahathaway enabled auto-merge (squash) May 13, 2024 17:04
Copy link
Member

@mthalman mthalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the RID change though?

@richlander
Copy link
Member

I have the same question as @mthalman. This seems very concerning from a maintenance standpoint. Also, why are we using Alpine RIDs at all?

@mthalman mthalman disabled auto-merge May 13, 2024 17:40
@ellahathaway
Copy link
Member Author

Why did the RID change though?

Hm I'm not sure. Looks like when the build starts, the build environment is Build Environment: x64 Release linux alpine.3.19.1-x64 (internal link). Continuing to investigate.

@richlander
Copy link
Member

I'm not entirely sure how these RIDs are being used. We're no longer using RIDs like these "in the product". We've switched entirely to portable RIDs like linux and linux-musl. It would be good if we could do the same elsewhere.

Context: dotnet/runtime#83246

If we need a solution that describes a specific OS, it would be good to call them something other than RIDs.

@ellahathaway
Copy link
Member Author

ellahathaway commented May 13, 2024

The issue is that the _DistroRid value was changed in dotnet/runtime#100580. This was then ported to Arcade in dotnet/arcade#14752. Not sure how to proceed given this finding. A simple fix would be to add the calculation in ./build.sh if it can't go back into init-distro-rid.sh.

That said, I'm not entirely sure why this change in calculation was made to begin with. @am11 - Can you give guidance here please?

We've switched entirely to portable RIDs like linux and linux-musl. It would be good if we could do the same elsewhere.

The issue of using a portable rid like linux-musl in the SB product is a bit beyond the scope of the original issue. @MichaelSimons, do you have thoughts on this?

@richlander
Copy link
Member

OK. I'm getting a bit more clarity with those links.

Let's we set the RID as 3.19.1 (the architecture doesn't matter for this conversation). Which other RIDs are compatible / configured for fall back? In particular, would 3.19 or 3.18 assets be considered compatible?

@am11
Copy link
Member

am11 commented May 13, 2024

Explicitly specifying the TargetRid for non-portable builds is the right way to go about it.

That said, I'm not entirely sure why this change in calculation was made to begin with. @am11 - Can you give guidance here please?

@ellahathaway, the caller of VMR's build.sh can pass -p:TargetRid= from the command line. The reason is we don't want to maintain a long list of distros (there are hundreds of them out there). This gives VMR freedom to pass in the value based on given platform community's preferred naming convention for non-portable RID. Some prefer no version (rolling releases), some have major version only etc. so maintaining that info by ourselves across the repos is less meaningful since .NET 8. For MSFT builds, I think having major.minor for Alpine is enough, but in the end, it doesn't really have any meaningful effect (now that the RID graph is gone).

@mthalman
Copy link
Member

Let's we set the RID as 3.19.1 (the architecture doesn't matter for this conversation). Which other RIDs are compatible / configured for fall back? In particular, would 3.19 or 3.18 assets be considered compatible?

There would be no fallback from 3.19.1 since there's no RID graph, right? We wouldn't want 3.19.1 for our build anyway because we'd just have to update the reference RID again when 3.19.2 is released. I think we'd want to explicitly set --outputrid to a stable value.

@richlander
Copy link
Member

set --outputrid to a stable value

And that stable value can be either linux-musl or alpine (with fallback/compatibility to linux-musl). I don't think there are other options.

@am11
Copy link
Member

am11 commented May 13, 2024

There would be no fallback from 3.19.1 since there's no RID graph, right?

The fallback is the portable platform, which in this case is linux-musl. The fallback to linux-musl is linux and so on.

@richlander
Copy link
Member

there's no RID graph

We still have a RID graph. It's just WAY smaller. The link I provided documents it.

@ellahathaway
Copy link
Member Author

To solve this issue, the question I'm currently trying to answer is whether our assets are portable or not. To my knowledge, if assets do not have dependencies on native libraries or do not have runtime dependencies that vary by operating system or other runtime-environment characteristics, then they are portable.

Given the above definition, I think that our assets are portable. If this thought is true and our assets are in fact portable, then we can use a portable RID without issue. Otherwise, we need to know if using the portable RID means that our assets are also portable. It is the latter question that I do not have the answers to yet.

@richlander
Copy link
Member

The typical test is whether the assets are bound to a single version of OpenSSL. If that's the case, then they are distro-specific. If they can tolerate running on multiple versions, they are portable.

@am11
Copy link
Member

am11 commented May 18, 2024

Normally, in the binlogs, if PortableBuild is evaluated to true, then that's the easiest way to tell the difference. Other toggles are derived from there. Another way is to run dotnet with corehost tracing variable COREHOST_TRACE=1 dotnet and look for Host RID list:'s first value; if it is some sort of a distro name like alpine.3.19.1-x64, the build was non-portable.

I said normally because technically someone can mix and match these concepts by passing some internal properties, which make the build look partially portable, but it's not the normal course of action. 😅

@richlander
Copy link
Member

Pragmatically, we want two patterns:

  • Portable Linux builds, using linux RID
  • Distro-specific builds, using as broad a RID as possible like alpine.3.19 or redhat.7. We want to avoid super specific RIDs like alpine.3.19.1.

There are at least two problems with super specific RIDs:

  • They create a maintenance burden.
  • They are one level even more incompatible unless we make alpine.3.19.1 compatible with alpine.3.19, for example.

I know that Red Hat has had this same problem. I'm curious how they solved it.

@omajid

@am11
Copy link
Member

am11 commented May 19, 2024

If Microsoft source builds should be using non-portable RIDs, then we should be using linux-musl-x64 in this place.

For distro maintainers, they will call -p:TargetRid=<name of their choice> which they can derive from /etc/os-release or other means.

If Microsoft source builds indeed need non-portable RIDs for some reason, we can pass -p:TargetRid=<that value> from pipeline in the build which has the same handling as runtime/build.sh -outputrid used in runtime pipeline: https://github.com/dotnet/runtime/blob/5474ab5c50fc639e780e0322cd500b2e43ee9b0d/eng/pipelines/runtime.yml#L501 (note the 3-parts version which that folks prefer)

We removed this handling from arcade to simplify the process which prevents us from maintaining many distro infos and their preferred style of 0,1,2,3 parts of versions in RID.

@omajid
Copy link
Member

omajid commented May 21, 2024

I know that Red Hat has had this same problem. I'm curious how they solved it.

We haven't solved it yet, but I plan to use an explicit -p:TargetRid/--outputrid with values like rhel.8-x64 and rhel.9-arm64, etc.

@am11
Copy link
Member

am11 commented May 21, 2024

@omajid, just an FYI, we can cleanup this special handling from VMR as well: https://github.com/dotnet/dotnet/blob/41de9105d46ea7d01fd14a10ab5a12d1474bbc75/Directory.Build.props#L54 and make --outputrid a required arg (maps to -p:TargetRID) to keep things really explicit. I've found cycles to do the cleanup and may not get around it anytime soon, so when you start looking into it, go ahead and do the honors. ;)

@ViktorHofer
Copy link
Member

When it makes sense, please use the GNU configure terms, i.e. the outputrid CLI arg should probably just be called --target-rid / -targetRid.

@am11
Copy link
Member

am11 commented May 21, 2024

Can't agree more @ViktorHofer! The term __DistroRid mapping to TargetRid, which in turn mapping to OuputRID is not needed after the recent simplifications made in this area. So -r, --target-rid build.sh argument sounds ... POSIX-y enough and we can just map to TargetRid in msbuild scripts all the way to runtme / src / native / corehost build, where it is finally used. 👌

@ellahathaway
Copy link
Member Author

ellahathaway commented May 21, 2024

I wanted to give a heads up on some testing I did of the source-build build using portable RIDs:

I ran a build where I set the TargetRid property to the portable RIDs for centos and alpine (linux-x64 and linux-musl-x64). The centos build was successful, but the Alpine build encountered a failure during the aspnetcore build process. The build failed because aspnetcore sets the TargetOsName as 'linux', and this is then used to determine the version of crossgen to use. Consequently, aspnetcore expects to use the linux version of crossgen, when in reality, the linux-musl version is being produced.

Furthermore, after building SB on centos9 with TargetRid=linux-x64, I tried to run the produced sdk on a prereqs centos8 container. I received the following output when doing this:

[root@dc446d83b92f dotnet]# .dotnet/dotnet 
.dotnet/dotnet: /lib64/libc.so.6: version `GLIBC_2.33' not found (required by .dotnet/dotnet)
.dotnet/dotnet: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by .dotnet/dotnet)

Seems like a container issue rather than an issue with the sdk having a portable RID because I was able to successfully run the produced sdk on a prereqs ubuntu-22.04 container.

@omajid
Copy link
Member

omajid commented May 21, 2024

Furthermore, after building SB on centos9 with TargetRid=linux-x64

I think this illustrates a problem with naming in the context of of CI. The linux-x64 is an identifier/name that implies certain properties ("this is a portable build that will work anywhere"). But just giving something that name doesn't give it the properties that are associated with the name.

To get something that works like a general linux-x64 build, we need to build it in a special configuration - use a portable build, build on the oldest glibc (~ CentOS 8, not CentOS 9), etc. Otherwise we get something that claims it's linux-x64 but wont behave like it.

Should we force portable-build mode when a portable RID is specified?

Should we check and fail if we are building with a portable RID but the glibc we are building against is newer than our actual baseline?

@am11
Copy link
Member

am11 commented May 21, 2024

I wanted to give a heads up on some testing I did of the source-build build using portable RIDs:

VMR defaults to false, so we just need to pass the prop: -p:PortableBuild=true. runtime defaults to true.

If reverse mapping is cheap, we can do it, but PortableBuild is the property we have for a while to override the flavor when needed.

@ellahathaway
Copy link
Member Author

ellahathaway commented May 21, 2024

The linux-x64 is an identifier/name that implies certain properties ("this is a portable build that will work anywhere"). But just giving something that name doesn't give it the properties that are associated with the name.

This is a fantastic point and something that I think is missing in our current infra. IMO, if we apply a RID that has certain properties, the build should follow those properties... it's clear now that this is not the case. We'd likely have to set other properties in the build configuration (i.e. -p:PortableBuild=true). I think that requiring developers to set these properties really hurts the out of the box experience. Rather, as you suggest Omair, maybe there should be infra in place to force portable builds or fail building if certain conditions are not met when we set the TargetRID to be something portable.

VMR defaults to false, so we just need to pass the prop: -p:PortableBuild=true

I understand this. I think that the question I'm trying to answer is whether or not our artifacts are actually portable (is source-build actually portable if we set the portable build property to true). I don't have the answer to this question.

Overall, I worry also about the upcoming preview5 deadline. If we release preview5 with our alpine artifact as 3.19.1, it's going to cause breaks in our source-build pipeline (something similar happened when we went from alpine 3.18 to alpine 3.19). Given the current focus on other work, I'd like to avoid a setback like this and come to a solution quickly. My current proposed solution is to specify the target RID to be the correct 3.19 version for alpine. This is a quick solution (albeit a bit hacky) and gives us time to return to the greater issue at hand when there's a bit more time.

@am11
Copy link
Member

am11 commented May 21, 2024

I understand this. I think that the question I'm trying to answer is whether or not our artifacts are actually portable (is source-build actually portable if we set the portable build property to true). I don't have the answer to this question.

Yes. PortableBuild=true makes the build portable and PortableBuild=false makes it non-portable.

@ellahathaway
Copy link
Member Author

Test build: https://dev.azure.com/dnceng/internal/_build/results?buildId=2457692&view=results (internal Microsoft link)

@ellahathaway
Copy link
Member Author

The test build failed. The issue for Alpine is the same as what I had described earlier (crossgen gets produced as linux-musl and aspnetcore expects crossgen to be linux).

As for the centos legs, those fail when running the scenario tests. The error is:

System.InvalidOperationException : Failed to execute /vmr/artifacts/obj/extracted-dotnet-sdk/dotnet new console --name SdkTemplateTestsComplex_Console_CSharp --output /vmr/artifacts/scenario-tests/artifacts/SdkTemplateTestsComplex_Console_CSharp --language "C#"
    Exit code: 145
    9.0.100-preview.5.24266.1 [/vmr/artifacts/obj/extracted-dotnet-sdk/sdk]
    The command could not be loaded, possibly because:
      * You intended to execute a .NET application:
          The application 'new' does not exist.
      * You intended to execute a .NET SDK command:
          A compatible .NET SDK was not found.

cc @mthalman

@am11
Copy link
Member

am11 commented May 22, 2024

The test build failed. The issue for Alpine is the same as what I had described earlier (crossgen gets produced as linux-musl and aspnetcore expects crossgen to be linux).

@ellahathaway, looking at https://github.com/dotnet/dotnet/blob/1f01ca0ae82f4cd3a74756ef8b94c99f85290bf7/repo-projects/aspnetcore.proj#L21, for aspnetcore it is setting TargetRuntimeIdentifier=$(TargetRid), which should override the hardcoded value on line 108 here https://github.com/dotnet/aspnetcore/blob/86ee868589c556f63c105cb361988a0f26ffa6bc/src/Framework/App.Runtime/src/Microsoft.AspNetCore.App.Runtime.csproj#L109. Perhaps some precondition is failing?

@ellahathaway
Copy link
Member Author

@am11 - Thanks for your help! I logged dotnet/source-build#4418 and dotnet/source-build#4419 so that these conversations can be continued there rather than in this PR.

I've updated this PR with a quick fix to unblock SB.

@ellahathaway ellahathaway merged commit 119d6c7 into dotnet:main May 23, 2024
36 checks passed
@ellahathaway ellahathaway deleted the apline-artifact branch May 23, 2024 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Infrastructure DO NOT MERGE untriaged Request triage from a team member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alpine Artifact Changed from 3.19 to 3.19.1
7 participants