Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Restore failure due to The process cannot access the file x because it is being used by another process #77364

Closed
ayakael opened this issue Oct 24, 2022 · 68 comments · Fixed by #79856
Assignees
Milestone

Comments

@ayakael
Copy link
Contributor

ayakael commented Oct 24, 2022

Description

When building runtime with mono-flavored runtime, restore often fails in CI environment with The process cannot access the file x because it is being used by another process. This affects runtime, as well as other builds (confirmed roslyn), thus specificity to runtime is that this does not occur on coreclr-flavored runtime.

Reproduction Steps

In an Alpine Edge linux-musl-x64 environment, you can use this modified aport to reproduce bug. Following steps to reproduce:

git clone https://gitlab.alpinelinux.org/ayakael/aports -b dotnet7/mono-restore
cd  aports/testing/dotnet7-stage0
abuild deps unpack prepare build

It should eventually fail.

The aport builds a minimum set of components (runtime-mono, roslyn, sdk, aspnetcore, installer) to be able to build an SDK tar, then using that produced tarball with mono-flavored runtime it builds the whole stack again. This aport is usually used to crossbuild to other platforms, but in this case I am using it to easily reproduce the bug.

You'd likely be able to reproduce this on linux-x64 by building runtime with /p:PrimaryRuntimeFlavor=Mono` and trying to build runtime with produced artifacts.

Expected behavior

Restore should occur without issue

Actual behavior

Restore fails with the following error:

/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error : The process cannot access the file '/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/artifacts/obj/System.IO.Ports' because it is being used by another process. [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at System.IO.FileSystem.CreateDirectory(String fullPath, UnixFileMode unixCreateMode) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at System.IO.FileSystem.CreateDirectory(String fullPath) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at System.IO.Directory.CreateDirectory(String path) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.BuildAssetsUtils.WriteFiles(IEnumerable`1 files, ILogger log) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreResult.CommitAssetsFileAsync(LockFileFormat lockFileFormat, IRestoreResult result, ILogger log, Boolean toolCommit, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreResult.CommitAsync(ILogger log, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreRunner.CommitAsync(RestoreResultPair restoreResult, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreRunner.ExecuteAndCommitAsync(RestoreSummaryRequest summaryRequest, IRestoreProgressReporter progressReporter, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreRunner.CompleteTaskAsync(List`1 restoreTasks) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreRunner.RunAsync(IEnumerable`1 restoreRequests, RestoreArgs restoreArgs, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreRunner.RunAsync(RestoreArgs restoreContext, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Build.Tasks.BuildTasksUtility.RestoreAsync(DependencyGraphSpec dependencyGraphSpec, Boolean interactive, Boolean recursive, Boolean noCache, Boolean ignoreFailedSources, Boolean disableParallel, Boolean force, Boolean forceEvaluate, Boolean hideWarningsAndErrors, Boolean restorePC, Boolean cleanupAssetsForUnsupportedProjects, ILogger log, CancellationToken cancellationToken) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Build.Tasks.Console.MSBuildStaticGraphRestore.RestoreAsync(String entryProjectFilePath, IDictionary`2 globalProperties, IReadOnlyDictionary`2 options) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]

Regression?

This error is confirmed not present with .NET runtime 6.0.10 in s390x, thus it seems to be a regression.

Known Workarounds

None so far.

Configuration

Version: SDK 7.0.100-rtm.22519.39
OS: Alpine LInux Edge docker-based CI Pipelines (although occurs to a lesser degree in s390x VM)
Architecure: Confirmed reproducible in s390x, ppc64le and x64 (with mono-flavored runtime)
Specificity: specific to mono on all platforms

Other information

logs
ppc64le with runtime: link
s390x with runtime: link
s390x with roslyn: link

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 24, 2022
@ayakael
Copy link
Contributor Author

ayakael commented Oct 24, 2022

This is confirmed reproducible on x64. I've updated the post for steps to reproduce on x64.

@ayakael
Copy link
Contributor Author

ayakael commented Oct 24, 2022

False alarm - updating to latest commit of release/7.0.1xx on installer did the trick.

@ayakael ayakael closed this as completed Oct 24, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Oct 24, 2022
@ayakael
Copy link
Contributor Author

ayakael commented Oct 24, 2022

Never mind, indeed it happens less, but it still occurs enough to make our build fail consistently:

/builds/ayakael/aports/testing/dotnet7-build/src/dotnet-e6dd91c290b808f971a1ac69c2fb29395bbf1051/Tools/source-built/Microsoft.DotNet.Arcade.Sdk/tools/SourceBuild/SourceBuildArcadeBuild.targets(116,5): error MSB3026: Could not copy "/builds/ayakael/aports/testing/dotnet7-build/src/dotnet-e6dd91c290b808f971a1ac69c2fb29395bbf1051/src/fsharp/vsintegration/src/FSharp.ProjectSystem.PropertyPages/PropertyPages/xlf/ApplicationPropPage.de.xlf" to "/builds/ayakael/aports/testing/dotnet7-build/src/dotnet-e6dd91c290b808f971a1ac69c2fb29395bbf1051/src/fsharp/artifacts/source-build/self/src/vsintegration/src/FSharp.ProjectSystem.PropertyPages/PropertyPages/xlf/ApplicationPropPage.de.xlf". Beginning retry 1 in 1000ms. The process cannot access the file '/builds/ayakael/aports/testing/dotnet7-build/src/dotnet-e6dd91c290b808f971a1ac69c2fb29395bbf1051/src/fsharp/artifacts/source-build/self/src/vsintegration/src/FSharp.ProjectSystem.PropertyPages/PropertyPages/xlf' because it is being used by another process.  [/builds/ayakael/aports/testing/dotnet7-build/src/dotnet-e6dd91c290b808f971a1ac69c2fb29395bbf1051/Tools/source-built/Microsoft.DotNet.Arcade.Sdk/tools/Build.proj]

@ayakael ayakael reopened this Oct 24, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 24, 2022
@ghost
Copy link

ghost commented Oct 24, 2022

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When building runtime with mono-flavored runtime, restore often fails in CI environment with The process cannot access the file x because it is being used by another process. This affects runtime, as well as other builds (confirmed roslyn), thus specificity to runtime is that this does not occur on coreclr-flavored runtime.

Reproduction Steps

In an Alpine Edge linux-musl-x64 environment, you can use this modified aport to reproduce bug. Following steps to reproduce:

git clone https://gitlab.alpinelinux.org/ayakael/aports -b dotnet7/mono-restore
cd  aports/testing/dotnet7-stage0
abuild deps unpack prepare build

It should eventually fail.

The aport builds a minimum set of components (runtime-mono, roslyn, sdk, aspnetcore, installer) to be able to build an SDK tar, then using that produced tarball with mono-flavored runtime it builds the whole stack again. This aport is usually used to crossbuild to other platforms, but in this case I am using it to easily reproduce the bug.

You'd likely be able to reproduce this on linux-x64 by building runtime with /p:PrimaryRuntimeFlavor=Mono` and trying to build runtime with produced artifacts.

Expected behavior

Restore should occur without issue

Actual behavior

Restore fails with the following error:

/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error : The process cannot access the file '/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/artifacts/obj/System.IO.Ports' because it is being used by another process. [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at System.IO.FileSystem.CreateDirectory(String fullPath, UnixFileMode unixCreateMode) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at System.IO.FileSystem.CreateDirectory(String fullPath) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at System.IO.Directory.CreateDirectory(String path) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.BuildAssetsUtils.WriteFiles(IEnumerable`1 files, ILogger log) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreResult.CommitAssetsFileAsync(LockFileFormat lockFileFormat, IRestoreResult result, ILogger log, Boolean toolCommit, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreResult.CommitAsync(ILogger log, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreRunner.CommitAsync(RestoreResultPair restoreResult, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreRunner.ExecuteAndCommitAsync(RestoreSummaryRequest summaryRequest, IRestoreProgressReporter progressReporter, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreRunner.CompleteTaskAsync(List`1 restoreTasks) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreRunner.RunAsync(IEnumerable`1 restoreRequests, RestoreArgs restoreArgs, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Commands.RestoreRunner.RunAsync(RestoreArgs restoreContext, CancellationToken token) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Build.Tasks.BuildTasksUtility.RestoreAsync(DependencyGraphSpec dependencyGraphSpec, Boolean interactive, Boolean recursive, Boolean noCache, Boolean ignoreFailedSources, Boolean disableParallel, Boolean force, Boolean forceEvaluate, Boolean hideWarningsAndErrors, Boolean restorePC, Boolean cleanupAssetsForUnsupportedProjects, ILogger log, CancellationToken cancellationToken) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]
/builds/ayakael/aports/testing/dotnet7-stage0/src/bootstrap/sdk/7.0.100-rtm.22519.39/NuGet.RestoreEx.targets(19,5): error :    at NuGet.Build.Tasks.Console.MSBuildStaticGraphRestore.RestoreAsync(String entryProjectFilePath, IDictionary`2 globalProperties, IReadOnlyDictionary`2 options) [/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/Build.proj]

Regression?

This error is confirmed not present with .NET runtime 6.0.10 in s390x, thus it seems to be a regression.

Known Workarounds

None so far.

Configuration

Version: SDK 7.0.100-rtm.22519.39
OS: Alpine LInux Edge docker-based CI Pipelines (although occurs to a lesser degree in s390x VM)
Architecure: Confirmed reproducible in s390x, ppc64le and x64 (with mono-flavored runtime)
Specificity: specific to mono on all platforms

Other information

logs
ppc64le with runtime: link
s390x with runtime: link
s390x with roslyn: link

Author: ayakael
Assignees: -
Labels:

area-Infrastructure, untriaged

Milestone: -

@uweigand
Copy link
Contributor

Not sure what causes this, I've never see this particular error in my builds ... Looking at the runtime sources, it seems this message is triggered only if some Linux native library call returns with errno set to EWOULDBLOCK - but I'm not sure why this would happen when just copying a file.

@ayakael
Copy link
Contributor Author

ayakael commented Oct 25, 2022

Not sure what causes this, I've never see this particular error in my builds ... Looking at the runtime sources, it seems this message is triggered only if some Linux native library call returns with errno set to EWOULDBLOCK - but I'm not sure why this would happen when just copying a file.

Indeed, can't make heads or tails. One piece of data is that it going from tarball from dotnet/installer commit d41bfec to e6dd91c lowered occurence of this bug substantially. Although still, build always fails at fsharp with the above error on s390x and always fails during runtime restore with ppc64le.

Diff between those two commits is hard to figure out.

@omajid Have you started trying build of dotnet7 on s390x, do you encounter this error?

@ayakael
Copy link
Contributor Author

ayakael commented Oct 25, 2022

The diff to git-info shows the following:

--- dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/git-info/AllRepoVersions.props
+++ dotnet-e6dd91c290b808f971a1ac69c2fb29395bbf1051/git-info/AllRepoVersions.props
@@ -15,8 +15,8 @@
     <templatingOutputPackageVersion>7.0.100-rtm.22510.12</templatingOutputPackageVersion>
     <xdtGitCommitHash>9a1c3e1b7f0c8763d4c96e593961a61a72679a7b</xdtGitCommitHash>
     <xdtOutputPackageVersion>7.0.0-preview.22423.2</xdtOutputPackageVersion>
-    <arcadeGitCommitHash>720af493900b2f2bdc48e9ee12577983a5c9be36</arcadeGitCommitHash>
-    <arcadeOutputPackageVersion>7.0.0-beta.22464.4</arcadeOutputPackageVersion>
+    <arcadeGitCommitHash>02e28316bf35d1028683ee313f0794776bff18d1</arcadeGitCommitHash>
+    <arcadeOutputPackageVersion>7.0.0-beta.22513.4</arcadeOutputPackageVersion>
     <aspnetcoreGitCommitHash>c6865355c01e1fb170b65427846d939559ab3789</aspnetcoreGitCommitHash>
     <aspnetcoreOutputPackageVersion>7.0.0-rtm.22512.1</aspnetcoreOutputPackageVersion>
     <deploymenttoolsGitCommitHash>c3ad00ae84489071080a606f6a8e43c9a91a5cc2</deploymenttoolsGitCommitHash>
@@ -49,7 +49,7 @@
     <vstestOutputPackageVersion>17.4.0-release-20220926-01</vstestOutputPackageVersion>
     <xlifftasksGitCommitHash>740189d758fb3bbdc118c5b6171ef1a7351a8c44</xlifftasksGitCommitHash>
     <xlifftasksOutputPackageVersion>1.0.0-beta.22427.1</xlifftasksOutputPackageVersion>
-    <installerGitCommitHash>d41bfecf5090e9163aa2da251246abed9e756e53</installerGitCommitHash>
+    <installerGitCommitHash>e6dd91c290b808f971a1ac69c2fb29395bbf1051</installerGitCommitHash>
     <installerOutputPackageVersion>7.0.100</installerOutputPackageVersion>
   </PropertyGroup>
 </Project>

The following components have thus changed:

  • arcade
  • installer

@ayakael
Copy link
Contributor Author

ayakael commented Oct 25, 2022

In turn, diff of arcade between those two commits goes as follows:

diff --git a/.config/tsaoptions.json b/.config/tsaoptions.json
new file mode 100644
index 00000000..ae219a82
--- /dev/null
+++ b/.config/tsaoptions.json
@@ -0,0 +1,10 @@
+{  
+    "instanceUrl": "https://devdiv.visualstudio.com/",
+    "template": "TFSDEVDIV",
+    "projectName": "DEVDIV",
+    "areaPath": "DevDiv\\NET Fundamentals\\Infrastructure\\Arcade\\SDL",
+    "iterationPath": "DevDiv",
+    "notificationAliases": [ "dnceng@microsoft.com" ],
+    "repositoryName":"Arcade",
+    "codebaseName": "Arcade"
+}
\ No newline at end of file
diff --git a/azure-pipelines-codeql.yml b/azure-pipelines-codeql.yml
index ea648627..d85d52d2 100644
--- a/azure-pipelines-codeql.yml
+++ b/azure-pipelines-codeql.yml
@@ -1,49 +1,63 @@
+parameters:
+  # Optionally do not publish to TSA. Useful for e.g. verifying fixes before PR.
+- name: TSAEnabled
+  displayName: Publish results to TSA
+  type: boolean
+  default: true
+
 variables:
-  - name: _TeamName
-    value: DotNetCore
-  - group: SDL_Settings
-    
+- template: eng/common-variables.yml
+  # CG is handled in the primary CI pipeline
+- name: skipComponentGovernanceDetection
+  value: true
+  # Force CodeQL enabled so it may be run on any branch
+- name: Codeql.Enabled
+  value: true
+  # Do not let CodeQL 3000 Extension gate scan frequency
+- name: Codeql.Cadence
+  value: 0
+  # CodeQL needs this plumbed along as a variable to enable TSA
+- name: Codeql.TSAEnabled
+  value: ${{ parameters.TSAEnabled }}
+
+  # Build variables
+- name: _BuildConfig
+  value: Release
+
 trigger: none
 
 schedules:
   - cron: 0 12 * * 1
-    displayName: Weekly Monday CodeQL/Semmle run
+    displayName: Weekly Monday CodeQL run
     branches:
       include:
       - main
+      - release/6.0
+      - release/7.0
     always: true
 
-stages:
-- stage: build
-  displayName: Build
-  # Three phases for each of the three OSes we want to run on
-  jobs:
-  - template: /eng/common/templates/jobs/codeql-build.yml
-    parameters:
-      jobs:
-      - job: Windows_NT_CSharp
-        timeoutInMinutes: 90
-        pool:
-          name: NetCore1ESPool-Svc-Internal
-          demands: ImageOverride -equals windows.vs2019.amd64
+jobs:
+- job: codeql
+  displayName: CodeQL
+  pool:
+    name: NetCore1ESPool-Svc-Internal
+    demands: ImageOverride -equals 1es-windows-2022
+  timeoutInMinutes: 90
 
-        steps:
-        - checkout: self
-          clean: true
+  steps:
 
-        - template: /eng/common/templates/steps/execute-codeql.yml
-          parameters:
-            executeAllSdlToolsScript: 'eng/common/sdl/execute-all-sdl-tools.ps1'
-            buildCommands: 'eng\common\cibuild.cmd -configuration Release -prepareMachine /p:Test=false /p:Sign=false'
-            language: csharp
-            publishGuardianDirectoryToPipeline: true
-            additionalParameters: '-SourceToolsList @("semmle")
-            -TsaInstanceURL $(_TsaInstanceURL)
-            -TsaProjectName $(_TsaProjectName)
-            -TsaNotificationEmail $(_TsaNotificationEmail)
-            -TsaCodebaseAdmin $(_TsaCodebaseAdmin)
-            -TsaBugAreaPath $(_TsaBugAreaPath)
-            -TsaIterationPath $(_TsaIterationPath)
-            -TsaRepositoryName "Arcade"
-            -TsaCodebaseName "Arcade"
-            -TsaPublish $True'
+  - task: UseDotNet@2
+    inputs:
+      useGlobalJson: true
+
+  - task: CodeQL3000Init@0
+    displayName: CodeQL Initialize
+
+  - script: eng\common\cibuild.cmd
+      -configuration $(_BuildConfig)
+      -prepareMachine
+      /p:Test=false
+    displayName: Windows Build
+
+  - task: CodeQL3000Finalize@0
+    displayName: CodeQL Finalize
diff --git a/eng/Version.Details.xml b/eng/Version.Details.xml
index 5916476a..6002233c 100644
--- a/eng/Version.Details.xml
+++ b/eng/Version.Details.xml
@@ -47,9 +47,9 @@
       <Uri>https://github.com/dotnet/xharness</Uri>
       <Sha>fbeb09787a4cdcf8a375382cf7a4f5edfaf1b9d7</Sha>
     </Dependency>
-    <Dependency Name="Microsoft.Net.Compilers.Toolset" Version="4.4.0-2.22426.8">
+    <Dependency Name="Microsoft.Net.Compilers.Toolset" Version="4.4.0-3.22472.2">
       <Uri>https://github.com/dotnet/roslyn</Uri>
-      <Sha>a9ddbc56659be7d9fdd956c926af3b7cd5b8e44f</Sha>
+      <Sha>7a48ae565f3c36155fc8f606384e20b2a54f98c8</Sha>
     </Dependency>
     <Dependency Name="Microsoft.NET.ILLink.Tasks" Version="6.0.100-1.22103.2">
       <Uri>https://github.com/dotnet/linker</Uri>
diff --git a/eng/Versions.props b/eng/Versions.props
index e83705a2..540ea350 100644
--- a/eng/Versions.props
+++ b/eng/Versions.props
@@ -40,7 +40,7 @@
     <MicrosoftExtensionsFileSystemGlobbingVersion>2.0.0</MicrosoftExtensionsFileSystemGlobbingVersion>
     <MicrosoftExtensionsLoggingConsoleVersion>2.1.1</MicrosoftExtensionsLoggingConsoleVersion>
     <MicrosoftNETCorePlatformsVersion>2.1.0</MicrosoftNETCorePlatformsVersion>
-    <MicrosoftNetCompilersToolsetVersion>4.4.0-2.22426.8</MicrosoftNetCompilersToolsetVersion>
+    <MicrosoftNetCompilersToolsetVersion>4.4.0-3.22472.2</MicrosoftNetCompilersToolsetVersion>
     <MicrosoftNetTestSdkVersion>17.4.0-preview-20220707-01</MicrosoftNetTestSdkVersion>
     <MicrosoftNETILLinkTasksVersion>6.0.100-1.22103.2</MicrosoftNETILLinkTasksVersion>
     <MicrosoftSignedWixVersion>1.0.0-v3.14.0.5722</MicrosoftSignedWixVersion>
diff --git a/eng/common/build.ps1 b/eng/common/build.ps1
index 8943da24..e0420a64 100644
--- a/eng/common/build.ps1
+++ b/eng/common/build.ps1
@@ -26,6 +26,7 @@ Param(
   [string] $runtimeSourceFeed = '',
   [string] $runtimeSourceFeedKey = '',
   [switch] $excludePrereleaseVS,
+  [switch] $nativeToolsOnMachine,
   [switch] $help,
   [Parameter(ValueFromRemainingArguments=$true)][String[]]$properties
 )
@@ -66,6 +67,7 @@ function Print-Usage() {
   Write-Host "  -prepareMachine         Prepare machine for CI run, clean up processes after build"
   Write-Host "  -warnAsError <value>    Sets warnaserror msbuild parameter ('true' or 'false')"
   Write-Host "  -msbuildEngine <value>  Msbuild engine to use to run build ('dotnet', 'vs', or unspecified)."
+  Write-Host "  -nativeToolsOnMachine   Sets the native tools on machine environment variable (indicating that the script should use native tools on machine)"
   Write-Host "  -excludePrereleaseVS    Set to exclude build engines in prerelease versions of Visual Studio"
   Write-Host ""
 
@@ -146,6 +148,9 @@ try {
     $nodeReuse = $false
   }
 
+  if ($nativeToolsOnMachine) {
+    $env:NativeToolsOnMachine = $true
+  }
   if ($restore) {
     InitializeNativeTools
   }
diff --git a/eng/common/init-tools-native.ps1 b/eng/common/init-tools-native.ps1
index 8d48ec56..fbc67eff 100644
--- a/eng/common/init-tools-native.ps1
+++ b/eng/common/init-tools-native.ps1
@@ -98,11 +98,12 @@ try {
               Write-Error "Arcade tools directory '$ArcadeToolsDirectory' was not found; artifacts were not properly installed."
               exit 1
             }
-            $ToolDirectory = (Get-ChildItem -Path "$ArcadeToolsDirectory" -Filter "$ToolName-$ToolVersion*" | Sort-Object -Descending)[0]
-            if ([string]::IsNullOrWhiteSpace($ToolDirectory)) {
+            $ToolDirectories = (Get-ChildItem -Path "$ArcadeToolsDirectory" -Filter "$ToolName-$ToolVersion*" | Sort-Object -Descending)
+            if ($ToolDirectories -eq $null) {
               Write-Error "Unable to find directory for $ToolName $ToolVersion; please make sure the tool is installed on this image."
               exit 1
             }
+            $ToolDirectory = $ToolDirectories[0]
             $BinPathFile = "$($ToolDirectory.FullName)\binpath.txt"
             if (-not (Test-Path -Path "$BinPathFile")) {
               Write-Error "Unable to find binpath.txt in '$($ToolDirectory.FullName)' ($ToolName $ToolVersion); artifact is either installed incorrectly or is not a bootstrappable tool."
@@ -112,6 +113,7 @@ try {
             $ToolPath = Convert-Path -Path $BinPath
             Write-Host "Adding $ToolName to the path ($ToolPath)..."
             Write-Host "##vso[task.prependpath]$ToolPath"
+            $env:PATH = "$ToolPath;$env:PATH"
             $InstalledTools += @{ $ToolName = $ToolDirectory.FullName }
           }
         }
diff --git a/eng/common/sdk-task.ps1 b/eng/common/sdk-task.ps1
index c35087a0..39be08d4 100644
--- a/eng/common/sdk-task.ps1
+++ b/eng/common/sdk-task.ps1
@@ -64,7 +64,7 @@ try {
       $GlobalJson.tools | Add-Member -Name "vs" -Value (ConvertFrom-Json "{ `"version`": `"16.5`" }") -MemberType NoteProperty
     }
     if( -not ($GlobalJson.tools.PSObject.Properties.Name -match "xcopy-msbuild" )) {
-      $GlobalJson.tools | Add-Member -Name "xcopy-msbuild" -Value "17.2.1" -MemberType NoteProperty
+      $GlobalJson.tools | Add-Member -Name "xcopy-msbuild" -Value "17.3.1" -MemberType NoteProperty
     }
     if ($GlobalJson.tools."xcopy-msbuild".Trim() -ine "none") {
         $xcopyMSBuildToolsFolder = InitializeXCopyMSBuild $GlobalJson.tools."xcopy-msbuild" -install $true
diff --git a/eng/common/tools.ps1 b/eng/common/tools.ps1
index aba6308a..44912694 100644
--- a/eng/common/tools.ps1
+++ b/eng/common/tools.ps1
@@ -365,8 +365,8 @@ function InitializeVisualStudioMSBuild([bool]$install, [object]$vsRequirements =
 
   # If the version of msbuild is going to be xcopied,
   # use this version. Version matches a package here:
-  # https://dev.azure.com/dnceng/public/_packaging?_a=package&feed=dotnet-eng&package=RoslynTools.MSBuild&protocolType=NuGet&version=17.2.1&view=overview
-  $defaultXCopyMSBuildVersion = '17.2.1'
+  # https://dev.azure.com/dnceng/public/_packaging?_a=package&feed=dotnet-eng&package=RoslynTools.MSBuild&protocolType=NuGet&version=17.3.1view=overview
+  $defaultXCopyMSBuildVersion = '17.3.1'
 
   if (!$vsRequirements) {
     if (Get-Member -InputObject $GlobalJson.tools -Name 'vs') {
diff --git a/global.json b/global.json
index 00c6f187..4a5989c8 100644
--- a/global.json
+++ b/global.json
@@ -1,6 +1,6 @@
 {
   "tools": {
-    "dotnet": "7.0.100-rc.1.22431.12"
+    "dotnet": "7.0.100-rc.2.22477.23"
   },
   "msbuild-sdks": {
     "Microsoft.DotNet.Arcade.Sdk": "7.0.0-beta.22426.8",
diff --git a/src/Microsoft.DotNet.Arcade.Sdk/tools/SdkTasks/VisualStudio.BuildIbcTrainingSettings.proj b/src/Microsoft.DotNet.Arcade.Sdk/tools/SdkTasks/VisualStudio.BuildIbcTrainingSettings.proj
index abf2b732..ce73ed93 100644
--- a/src/Microsoft.DotNet.Arcade.Sdk/tools/SdkTasks/VisualStudio.BuildIbcTrainingSettings.proj
+++ b/src/Microsoft.DotNet.Arcade.Sdk/tools/SdkTasks/VisualStudio.BuildIbcTrainingSettings.proj
@@ -17,7 +17,6 @@
   -->
 
   <Import Project="Directory.Build.props" />
-  <Import Project="Directory.Build.targets" />
 
   <PropertyGroup>
     <_VisualStudioBuildTasksAssembly>$(NuGetPackageRoot)microsoft.dotnet.build.tasks.visualstudio\$(MicrosoftDotNetBuildTasksVisualStudioVersion)\tools\net472\Microsoft.DotNet.Build.Tasks.VisualStudio.dll</_VisualStudioBuildTasksAssembly>
diff --git a/src/Microsoft.DotNet.Arcade.Sdk/tools/SourceBuild/SourceBuildArcadeBuild.targets b/src/Microsoft.DotNet.Arcade.Sdk/tools/SourceBuild/SourceBuildArcadeBuild.targets
index 336aafb0..02c7d543 100644
--- a/src/Microsoft.DotNet.Arcade.Sdk/tools/SourceBuild/SourceBuildArcadeBuild.targets
+++ b/src/Microsoft.DotNet.Arcade.Sdk/tools/SourceBuild/SourceBuildArcadeBuild.targets
@@ -82,11 +82,6 @@
       <!-- The inner build needs to reference the overall output dir for nupkg transport etc. -->
       <InnerBuildArgs>$(InnerBuildArgs) /p:SourceBuildOutputDir=$(SourceBuildOutputDir)</InnerBuildArgs>
       <InnerBuildArgs>$(InnerBuildArgs) /p:SourceBuiltBlobFeedDir=$(SourceBuiltBlobFeedDir)</InnerBuildArgs>
-
-      <!-- Work around issue where local clone may cause failure using non-origin remote fallback: https://github.com/dotnet/sourcelink/issues/629 -->
-      <InnerBuildArgs>$(InnerBuildArgs) /p:EnableSourceControlManagerQueries=false</InnerBuildArgs>
-      <InnerBuildArgs>$(InnerBuildArgs) /p:EnableSourceLink=false</InnerBuildArgs>
-      <InnerBuildArgs>$(InnerBuildArgs) /p:DeterministicSourcePaths=false</InnerBuildArgs>
     </PropertyGroup>
 
     <ItemGroup>
diff --git a/src/Microsoft.DotNet.Helix/Sdk/tools/dotnet-cli/DotNetCli.props b/src/Microsoft.DotNet.Helix/Sdk/tools/dotnet-cli/DotNetCli.props
index b0b5978c..f594af9f 100644
--- a/src/Microsoft.DotNet.Helix/Sdk/tools/dotnet-cli/DotNetCli.props
+++ b/src/Microsoft.DotNet.Helix/Sdk/tools/dotnet-cli/DotNetCli.props
@@ -1,7 +1,7 @@
 <Project>
   <PropertyGroup>
     <IncludeDotNetCli Condition=" '$(IncludeDotNetCli)' != 'true' ">false</IncludeDotNetCli>
-    <AspNetCoreRuntimeVersion>7.0.0-rc.1.22423.7</AspNetCoreRuntimeVersion>
+    <AspNetCoreRuntimeVersion>7.0.0-rc.2.22476.2</AspNetCoreRuntimeVersion>
     <DotNetCliPackageType Condition=" '$(DotNetCliPackageType)' == '' ">runtime</DotNetCliPackageType>
     <DotNetCliVersion Condition=" '$(DotNetCliVersion)' == '' AND '$(DotNetCliPackageType)' == 'runtime' ">$(BundledNETCoreAppPackageVersion)</DotNetCliVersion>
     <!-- TODO (https://github.com/dotnet/arcade/issues/7022): We are hardcoding this version to use the one tied to the SDK version from global.json -->
diff --git a/src/Microsoft.DotNet.Helix/Sdk/tools/xharness-runner/XHarnessRunner.targets b/src/Microsoft.DotNet.Helix/Sdk/tools/xharness-runner/XHarnessRunner.targets
index 5af41dc7..c1c8414d 100644
--- a/src/Microsoft.DotNet.Helix/Sdk/tools/xharness-runner/XHarnessRunner.targets
+++ b/src/Microsoft.DotNet.Helix/Sdk/tools/xharness-runner/XHarnessRunner.targets
@@ -2,7 +2,7 @@
   <PropertyGroup Condition=" '$(IncludeXHarnessCli)' == 'true' ">
     <IncludeDotNetCli>true</IncludeDotNetCli>
     <XHarnessTargetFramework Condition=" '$(XHarnessTargetFramework)' == '' ">net7.0</XHarnessTargetFramework>
-    <DotNetCliVersion Condition=" '$(XHarnessTargetFramework)' == 'net7.0' ">7.0.100-rc.1.22431.12</DotNetCliVersion>
+    <DotNetCliVersion Condition=" '$(XHarnessTargetFramework)' == 'net7.0' ">7.0.100-rc.2.22477.23</DotNetCliVersion>
     <DotNetCliVersion Condition=" '$(XHarnessTargetFramework)' == 'net6.0' ">6.0.202</DotNetCliVersion>
     <DotNetCliPackageType>sdk</DotNetCliPackageType>
   </PropertyGroup>
diff --git a/src/Microsoft.DotNet.SharedFramework.Sdk/targets/sharedfx.targets b/src/Microsoft.DotNet.SharedFramework.Sdk/targets/sharedfx.targets
index 902c8db7..09ea6fb3 100644
--- a/src/Microsoft.DotNet.SharedFramework.Sdk/targets/sharedfx.targets
+++ b/src/Microsoft.DotNet.SharedFramework.Sdk/targets/sharedfx.targets
@@ -360,7 +360,7 @@
   </Target>
 
   <Target Name="_CreatePlatformManifest" DependsOnTargets="_GenerateTemplatedPlatformManifest;_GeneratePlatformManifestFromRuntimePack"
-          Condition="'$(PlatformPackageType)' == 'TargetingPack'">
+          Condition="'$(PlatformPackageType)' == 'TargetingPack' and '$(SkipGeneratingPlatformManifest)' == ''">
     <ItemGroup>
       <_PlatformManifestFile Include="$(IntermediateOutputPath)PlatformManifest.txt" TargetPath="data" GeneratedBuildFile="true" />
     </ItemGroup>

Anything look suspicious?

@omajid
Copy link
Member

omajid commented Oct 25, 2022

I haven not seen this error on s390x, though I still occasionally run into roslyn hangs.

@ViktorHofer
Copy link
Member

Please share an msbuild binlog when that happens again which might help us to root cause the issue.

@ayakael
Copy link
Contributor Author

ayakael commented Nov 13, 2022

Finally got the error to reappear on my s390x VM. This is with 7.0.100 GA. The error occurs much more often on Alpine pipelines, while in other environments it occurs from time to time. In this run it was during restore process of format:

format.binlog.log
source-build.binlog.log

Had to append .log to allow upload on github.

@ayakael
Copy link
Contributor Author

ayakael commented Nov 14, 2022

Higher quality binlogs extracted directly from Alpine pipelines for ppc64le. For some reason, ppc64le seems to have this error more than s390x.

ppc64le - runtime:
runtime.binlog.log
ppc64le - roslyn
dotnet7-stage0-roslyn.binlog.log

The roslyn binlog seems much richer in information when replying in diag mode.

Strange observation: when building with -v diag the error occurs much less. Working hypothesis is that it slows down the build. Since this error occurs only when restoring, is there a way to slow down restore operations, or disable parralel operations that might be stepping on each other's toes?

@ayakael
Copy link
Contributor Author

ayakael commented Nov 14, 2022

Binlogs mention src/NuGet.Core/NuGet.Common/ConcurrencyUtilities.cs
@tmds has a few commits related to file locking issues. Would you have any insights on this error?

@ayakael
Copy link
Contributor Author

ayakael commented Nov 14, 2022

Exporting NUGET_ConcurrencyUtils_DeleteOnClose=1 did not work. This seems more like a nuget problem, so I'll report this issue over there.

@ayakael
Copy link
Contributor Author

ayakael commented Nov 14, 2022

Closing in favor of NuGet/Home#12242

@ayakael ayakael closed this as completed Nov 14, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Nov 14, 2022
@ayakael ayakael reopened this Nov 16, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 16, 2022
@ayakael
Copy link
Contributor Author

ayakael commented Nov 16, 2022

@tmds @uweigand Moving back the conversation to runtime, as a nuget dev has confirmed that this is likely a mono issue. The referenced issue has a bunch of different binlogs for when the issue occurs as datapoints. A workaround was setting /p:RestoreDisableParallel=true, but while that lowered the occurance of the bug it does not eliminate it. Either the flag is not applying everywhere consistently or is just not working, build of the full SDK always fails on ppc64le and often fails on s390x.

The error: « while trying to create a directory, it looks like some syscall (mkdir?) returns EAGAIN/EWOULDBLOCK. This is not expected. » (@tmds) Because this occurs on both ppc64le and s390x, on Alpine pipelines and s390x VM provided by IBM, I don't know to what extent this is filesystem related. It seems more likely to be a mono bug introduced between version 6.0.11 and 7.0.0 of runtime.

Exploring the above, this patch was attempted, but unfortunately there was no positive result.

diff --git a/src/native/libs/System.Native/pal_io.c b/src/native/libs/System.Native/pal_io.c
index d211328f71..fe5ebc25b9 100644
--- a/src/native/libs/System.Native/pal_io.c
+++ b/src/native/libs/System.Native/pal_io.c
@@ -227,7 +227,7 @@ int32_t SystemNative_Stat(const char* path, FileStatus* output)
 {
     struct stat_ result;
     int ret;
-    while ((ret = stat_(path, &result)) < 0 && errno == EINTR);
+    while ((ret = stat_(path, &result)) < 0 && (errno == EINTR || errno == EAGAIN));
 
     if (ret == 0)
     {
@@ -689,7 +689,7 @@ int32_t SystemNative_FcntlGetIsNonBlocking(intptr_t fd, int32_t* isNonBlocking)
 int32_t SystemNative_MkDir(const char* path, int32_t mode)
 {
     int32_t result;
-    while ((result = mkdir(path, (mode_t)mode)) < 0 && errno == EINTR);
+    while ((result = mkdir(path, (mode_t)mode)) < 0 && (errno == EINTR || errno == EAGAIN));
     return result;
 }

@tmds
Copy link
Member

tmds commented Nov 16, 2022

Have you tried building with mono runtime on x64 (should be possible with dotnet/installer#14792)? Does that also reproduce the issue?

That @omajid and @uweigand have not seen it on RHEL builds suggests it may be related to the Alpine build environment (musl/filesystem/...).

If you can reproduce it with an Alpine mono x64 build, I will try if I can reproduce it on Fedora.

Errno is stored thread local (both unmanaged as managed). Maybe we're picking up someone else's errno.

@jkotas
Copy link
Member

jkotas commented Dec 12, 2022

The design of the interop marshalling source generators assumes that it is possible to reliably retrieve errno right after PInvoke. We have introduced public APIs to do that in .NET 6 #46843 and shipped with a ton of source generated interop marshalling based on these public APIs in .NET 7. It is too late to change this design.

We had a few bugs where various lazy helpers did not preserve errno correctly (e.g. #75922). This looks like another one of these bugs.

@ayakael

This comment was marked as resolved.

@vargaz
Copy link
Contributor

vargaz commented Dec 12, 2022

It might be possible to modify mono to save/restore errno, but there are a lot of random code paths which could get executed between the 2 pinvokes, so this seems like a brittle approach.

@ayakael

This comment was marked as resolved.

@tmds
Copy link
Member

tmds commented Dec 19, 2022

It might be possible to modify mono to save/restore errno, but there are a lot of random code paths which could get executed between the 2 pinvokes, so this seems like a brittle approach.

Yes, it requires special care to ensure every path preserves errno.

Anyone source-building .NET using mono runtime is now using the LibraryImportAttribute and mono not preserving errno can be a source of random failures.

@lambdageek
Copy link
Member

lambdageek commented Dec 19, 2022

Looking at emit_native_wrapper_ilgen, one thing that could be happening between the return of the underlying native function and the return of the managed-to-native wrapper is a call to mono_threads_exit_gc_safe_region_unbalanced to exit the GC Safe mode (GetLastSystemError is marked SuppressGCTransition, so it doesn't have GC transitions, but the normal pinvoked method isn't marked with it).

In mono_threads_exit_gc_safe_region_unbalanced_internal we have

/* Common to use enter/exit gc safe around OS API's affecting last error. */
/* This method can call OS API's that will reset last error on some platforms. */
/* To reduce errors, we need to restore last error before exit gc safe. */
W32_DEFINE_LAST_ERROR_RESTORE_POINT;

Which points to mono/mono#14101 which describes a very similar situation:

Added restore logic to Win32 last error when exiting GC safe mode. This is needed since we have code that depends on GetLastError that could be clobbered if exiting GC safe mode enter a wait. Also added last error restore logic to uninstall interrupt handler since it is commonly used right after exiting GC safe mode.

So it seems likely that when we exit from GC safe mode while there's already a GC underway, we end up blocking which calls some underlying primitive that clobbers errno. (Which explains why it's hard to repro - we need a GC STW to start shortly after the pinvoke starts, before it returns).

Seems likely we need xplat analogues of W32_DEFINE_LAST_ERROR_RESTORE_POINT/W32_RESTORE_LAST_ERROR_FROM_RESTORE_POINT

/fyi @lateralusX

@lambdageek
Copy link
Member

lambdageek commented Dec 19, 2022

Interesting thing about musl: sem_wait(sem) is just sem_timedwait(sem,0) which calls sem_trywait(sem) which returns EAGAIN if the semaphore value is 0. sem_wait in glibc doesn't look like it does this same kind of delegation I don't think it can return EAGAIN from sem_wait (update glibc can return EAGAIN if there's a spurious wakeup, I think, I'm not totally sure)

mono_thread_info_wait_for_resume (called when we exit GC Safe mode and have to wait for the runtime to resume) just calls sem_wait on Linux.

so more evidence that it's the exit from GC safe that is responsible for the clobbered errno

@lambdageek lambdageek added area-VM-threading-mono and removed area-Infrastructure untriaged New issue has not been triaged by the area owner labels Dec 20, 2022
@lambdageek lambdageek added this to the 8.0.0 milestone Dec 20, 2022
@lambdageek lambdageek self-assigned this Dec 20, 2022
lambdageek added a commit to lambdageek/runtime that referenced this issue Dec 20, 2022
We already save/restore GetLastError on win32.  Do it on posix
platforms, too.

If one thread is in a pinvoke wrapper, while another thread triggers a
STW, the pinvoke wrapper will self-suspend the thread and wait for a
notification to resume.  Depending on the platform we can use win32
primitives, Mach semaphores or POSIX semaphores.  win32 and posix can
both change the value of last error (errno, respectively) while the
thread is suspended.

That means that code like this (generated by the
LibraryImportAttribute source generator) cannot reliably retrieve the
error from the last pinvoke:

```csharp
__retVal = __PInvoke(__path_native, mode); // there is a pinvoke wrapper here, that transitions from GC Safe to GC Unsafe mode
__lastError = System.Runtime.InteropServices.Marshal.GetLastSystemError();
```

The solution is to explicitly preserve the value of GetLastError/errno
when exiting from GC Safe.

Fixes dotnet#77364
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 20, 2022
@lambdageek
Copy link
Member

@ayakael would you be able to try a backport of #79856 in your environment to see if it fixes the issue?

@ayakael
Copy link
Contributor Author

ayakael commented Dec 23, 2022

@ayakael would you be able to try a backport of #79856 in your environment to see if it fixes the issue?

I tested your fix against x64 and s390x with positive results. Seems to be fixed, thanks!

@ayakael
Copy link
Contributor Author

ayakael commented Dec 23, 2022

Ah well this is funny - just as I sent the last comment, my build of roslyn on s390x failed with the error. Attached is a binlog:
dotnet7-stage0-roslyn.binlog.log

edit Redoing tests as to cover ppc64le here: https://gitlab.alpinelinux.org/ayakael/aports/-/merge_requests/12
edit Further tests show that this doesn't completly fix x64 either. Attached binlog for x64:
dotnet7-stage0-roslyn-x64.binlog.log

@richlander
Copy link
Member

richlander commented Feb 8, 2023

This appears to be the best summary of the issue: #77364 (comment)

@ViktorHofer looks like there is a binlog to look at.

FYI @marek-safar

@ViktorHofer
Copy link
Member

@ViktorHofer looks like there is a binlog to look at.

The binlog unfortunately doesn't tell where the double write / lock comes from:

/var/build/dotnet7/testing/dotnet7-stage0/src/produced-bootstrap/sdk/7.0.100/NuGet.targets(132,5): The process cannot access the file '/var/build/dotnet7/testing/dotnet7-stage0/src/produced-nuget/microsoft.codeanalysis.publicapianalyzers/3.3.4-beta1.22204.1/build/config' because it is being used by another process. [/var/build/dotnet7/testing/dotnet7-stage0/src/dotnet-v7.0.100-rtm.22521.12/src/roslyn/Compilers.sln]

@tmds
Copy link
Member

tmds commented Feb 8, 2023

Ah well this is funny - just as I sent the last comment, my build of roslyn on s390x failed with the error. Attached is a binlog:
dotnet7-stage0-roslyn.binlog.log
edit Redoing tests as to cover ppc64le here: https://gitlab.alpinelinux.org/ayakael/aports/-/merge_requests/12
edit Further tests show that this doesn't completly fix x64 either. Attached binlog for x64:
dotnet7-stage0-roslyn-x64.binlog.log

If the issue occurs less after applying #79856, we could assume that it fixes a case where errno gets cluttered.

The fails observed after applying the change could be another case to be handled (or even a different bug).

@ayakael did you see a clear improvement after applying #79856?

@ayakael
Copy link
Contributor Author

ayakael commented Mar 1, 2023

Ah well this is funny - just as I sent the last comment, my build of roslyn on s390x failed with the error. Attached is a binlog:
dotnet7-stage0-roslyn.binlog.log
edit Redoing tests as to cover ppc64le here: https://gitlab.alpinelinux.org/ayakael/aports/-/merge_requests/12
edit Further tests show that this doesn't completly fix x64 either. Attached binlog for x64:
dotnet7-stage0-roslyn-x64.binlog.log

If the issue occurs less after applying #79856, we could assume that it fixes a case where errno gets cluttered.

The fails observed after applying the change could be another case to be handled (or even a different bug).

@ayakael did you see a clear improvement after applying #79856?

Apologies for the late reply, I wanted to package dotnet8-preview1 to test. I've tested on linux-musl-s390x and linux-musl-ppc64le after application on dotnet8, and the bugs seems to be fixed. dotnet7 still has restore failures despite application. Might I be missing a backport of another bug fix?

lambdageek added a commit that referenced this issue Jul 12, 2023
* [threads] Save errno when using posix semaphores for thread transitions

   We already save/restore GetLastError on win32.  Do it on posix platforms, too.

   If one thread is in a pinvoke wrapper, while another thread triggers a STW, the pinvoke wrapper will self-suspend the thread and wait for a notification to resume.  Depending on the platform we can use win32 primitives, Mach semaphores or POSIX semaphores.  win32 and posix can both change the value of last error (errno, respectively) while the thread is suspended.

   That means that code like this (generated by the LibraryImportAttribute source generator) cannot reliably retrieve the error from the last pinvoke:

   ```csharp
   __retVal = __PInvoke(__path_native, mode); // there is a pinvoke wrapper here, that transitions from GC Safe to GC Unsafe mode
   __lastError = System.Runtime.InteropServices.Marshal.GetLastSystemError();
   ```

   The solution is to explicitly preserve the value of GetLastError/errno when exiting from GC Safe.

   Fixes #77364

* rename W32_DEFINE_LAST_ERROR_RESTORE_POINT to MONO_DEFINE_LAST_ERROR_RESTORE_POINT

   and W32_RESTORE_LAST_ERROR_FROM_RESTORE_POINT to MONO_RESTORE_LAST_ERROR_FROM_RESTORE_POINT
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 12, 2023
@ayakael
Copy link
Contributor Author

ayakael commented Aug 11, 2023

It seems like there's a regression. I tested preview.7 on ppc64le, and still occasionally get this issue. Should I open a new issue?

@dotnet dotnet locked as resolved and limited conversation to collaborators Sep 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.