Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't inline expand boxes with many GC pointers #101669

Closed
wants to merge 1 commit into from

Conversation

MichalStrehovsky
Copy link
Member

No description provided.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 29, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@MichalStrehovsky
Copy link
Member Author

/benchmark plaintext,json,fortunes aspnet-citrine-win runtime,libs

Copy link

pr-benchmarks bot commented Apr 29, 2024

Benchmark started for plaintext, json, fortunes on aspnet-citrine-win with runtime, libs. Logs: link

Copy link

pr-benchmarks bot commented Apr 29, 2024

plaintext - aspnet-citrine-win

application plaintext.base plaintext.pr
Max CPU Usage (%) 80 70 -12.50%
Max Cores usage (%) 2,233 1,961 -12.18%
Max Working Set (MB) 72 71 -1.39%
Max Private Memory (MB) 94 94 0.00%
Build Time (ms) 8,219 6,994 -14.90%
Start Time (ms) 358 352 -1.68%
Published Size (KB) 105,748 105,748 0.00%
Symbols Size (KB) 54 54 0.00%
.NET Core SDK Version 9.0.100-preview.5.24227.1 9.0.100-preview.5.24227.1
load plaintext.base plaintext.pr
Max CPU Usage (%) 83 82 -1.20%
Max Cores usage (%) 2,328 2,305 -0.99%
Max Working Set (MB) 46 46 0.00%
Max Private Memory (MB) 370 370 0.00%
Build Time (ms) 6,432 5,524 -14.12%
Start Time (ms) 0 0
Published Size (KB) 72,259 72,259 0.00%
Symbols Size (KB) 0 0
.NET Core SDK Version 8.0.204 8.0.204
First Request (ms) 90 89 -1.11%
Requests/sec 9,995,353 9,944,556 -0.51%
Requests 150,903,360 150,159,616 -0.49%
Mean latency (ms) 15.58 17.14 +10.01%
Max latency (ms) 551.67 442.85 -19.73%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 1,198.08 1,198.08 0.00%
Latency 50th (ms) 0.90 0.90 0.00%
Latency 75th (ms) 8.77 8.99 +2.51%
Latency 90th (ms) 51.77 56.33 +8.81%
Latency 99th (ms) 177.66 214.30 +20.62%

json - aspnet-citrine-win

application json.base json.pr
Max CPU Usage (%) 77 79 +2.60%
Max Cores usage (%) 2,160 2,203 +1.99%
Max Working Set (MB) 61 62 +1.64%
Max Private Memory (MB) 83 84 +1.20%
Build Time (ms) 3,563 3,604 +1.15%
Start Time (ms) 362 353 -2.49%
Published Size (KB) 105,748 105,748 0.00%
Symbols Size (KB) 54 54 0.00%
.NET Core SDK Version 9.0.100-preview.5.24227.1 9.0.100-preview.5.24227.1
load json.base json.pr
Max CPU Usage (%) 78 78 0.00%
Max Cores usage (%) 2,185 2,176 -0.41%
Max Working Set (MB) 46 46 0.00%
Max Private Memory (MB) 363 363 0.00%
Build Time (ms) 3,288 3,329 +1.25%
Start Time (ms) 0 0
Published Size (KB) 72,259 72,259 0.00%
Symbols Size (KB) 0 0
.NET Core SDK Version 8.0.204 8.0.204
First Request (ms) 124 119 -4.03%
Requests/sec 1,163,836 1,148,431 -1.32%
Requests 17,573,830 17,341,261 -1.32%
Mean latency (ms) 2.39 2.34 -2.09%
Max latency (ms) 104.85 104.56 -0.28%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 162.05 159.90 -1.33%
Latency 50th (ms) 0.28 0.29 +1.05%
Latency 75th (ms) 0.98 1.02 +4.08%
Latency 90th (ms) 7.82 7.36 -5.88%
Latency 99th (ms) 27.17 27.34 +0.63%

fortunes - aspnet-citrine-win

db fortunes.base fortunes.pr
Max CPU Usage (%) 31 32 +3.23%
Max Cores usage (%) 873 886 +1.49%
Max Working Set (MB) 53 53 0.00%
Build Time (ms) 1,194 1,185 -0.75%
Start Time (ms) 1,194 1,198 +0.34%
Published Size (KB) 421,392 421,392 0.00%
application fortunes.base fortunes.pr
Max CPU Usage (%) 89 88 -1.12%
Max Cores usage (%) 2,500 2,452 -1.92%
Max Working Set (MB) 530 561 +5.85%
Max Private Memory (MB) 540 589 +9.07%
Build Time (ms) 3,703 3,837 +3.62%
Start Time (ms) 2,244 2,136 -4.81%
Published Size (KB) 105,755 105,755 0.00%
Symbols Size (KB) 55 55 0.00%
.NET Core SDK Version 9.0.100-preview.5.24227.1 9.0.100-preview.5.24227.1
load fortunes.base fortunes.pr
Max CPU Usage (%) 39 38 -2.56%
Max Cores usage (%) 1,081 1,069 -1.11%
Max Working Set (MB) 46 46 0.00%
Max Private Memory (MB) 363 363 0.00%
Build Time (ms) 3,397 3,423 +0.77%
Start Time (ms) 0 0
Published Size (KB) 72,259 72,259 0.00%
Symbols Size (KB) 0 0
.NET Core SDK Version 8.0.204 8.0.204
First Request (ms) 106 105 -0.94%
Requests/sec 439,111 436,789 -0.53%
Requests 6,630,672 6,595,331 -0.53%
Mean latency (ms) 1.51 1.49 -1.32%
Max latency (ms) 40.65 24.59 -39.51%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 577.48 574.43 -0.53%
Latency 50th (ms) 1.06 1.07 +0.94%
Latency 75th (ms) 1.32 1.32 0.00%
Latency 90th (ms) 1.93 1.92 -0.52%
Latency 99th (ms) 11.57 11.78 +1.82%

// After a certain number of GC pointers, the write barriers used
// in inline expansion stop being profitable.
ClassLayout* layout = typGetObjLayout(pResolvedToken->hClass);
if (layout->GetGCPtrCount() > 3)
Copy link
Member

@EgorBo EgorBo Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this might need more work, e.g. when we box a struct with 4 fields but all of them are null or e.g. nongc strings - this will be a regression. Also, I suspect this might ruin some "optimize boxing" optimizations.

I think we need to either fix all those places to deal with a helper call or we need to emit a bulk copy in codegen, just like I did in #99096 (but to do it for batched-copy, not batched-precise-barrier).

@jkotas
Copy link
Member

jkotas commented Apr 29, 2024

We have optimized JIT_Box helper variants only on Windows:

#ifdef TARGET_UNIX
SetJitHelperFunction(CORINFO_HELP_NEWSFAST, JIT_NewS_MP_FastPortable);
SetJitHelperFunction(CORINFO_HELP_NEWSFAST_ALIGN8, JIT_NewS_MP_FastPortable);
SetJitHelperFunction(CORINFO_HELP_NEWARR_1_VC, JIT_NewArr1VC_MP_FastPortable);
SetJitHelperFunction(CORINFO_HELP_NEWARR_1_OBJ, JIT_NewArr1OBJ_MP_FastPortable);
ECall::DynamicallyAssignFCallImpl(GetEEFuncEntryPoint(AllocateString_MP_FastPortable), ECall::FastAllocateString);
#else // TARGET_UNIX
// if (multi-proc || server GC)
if (GCHeapUtilities::UseThreadAllocationContexts())
{
SetJitHelperFunction(CORINFO_HELP_NEWSFAST, JIT_TrialAllocSFastMP_InlineGetThread);
SetJitHelperFunction(CORINFO_HELP_NEWSFAST_ALIGN8, JIT_TrialAllocSFastMP_InlineGetThread);
SetJitHelperFunction(CORINFO_HELP_BOX, JIT_BoxFastMP_InlineGetThread);
. We would need to implemented the optimized non-Windows JIT_Box helper variants on non-Windows too to make this optimization work well, using the same strategy as the optimized non-Windows allocation helpers are implemented.

@MichalStrehovsky
Copy link
Member Author

We would need to implemented the optimized non-Windows JIT_Box helper variants on non-Windows too to make this optimization work well, using the same strategy as the optimized non-Windows allocation helpers are implemented.

For ContainsPointers the optimized assembly helper needs to thunk out to C++ anyway so the optimization might not amount to much for the relevant cases. I'll add this to the issue. Not actually planning to work on this based on Egor's comments.

@EgorBo
Copy link
Member

EgorBo commented Apr 29, 2024

Not actually planning to work on this based on Egor's comments.

I can try myself, but I think it should be more or less trivial, you just need to introduce a new helper (where you'll just do memcpyGC) and call it in genCodeForCpObj (so the actuall allocation for the box will be handled efficiently via the NEWS helper as is).

@MichalStrehovsky
Copy link
Member Author

I can try myself, but I think it should be more or less trivial, you just need to introduce a new helper

If we want to keep the NEWS, we might just be able to use the CORINFO_HELP_ASSIGN_STRUCT to do the copying.

@EgorBo
Copy link
Member

EgorBo commented Apr 29, 2024

I can try myself, but I think it should be more or less trivial, you just need to introduce a new helper

If we want to keep the NEWS, we might just be able to use the CORINFO_HELP_ASSIGN_STRUCT to do the copying.

Yep, that's what I meant, the helper is completely unused/unimplemented, so at least we won't have to bump R2R format 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants