[mono] Interpret static constructors without a runtime invoke wrapper #101324

kg · 2024-04-20T03:26:41Z

Right now when mono needs to run a static constructor it does so through the generic runtime invoke path, which (for the interpreter at least) relies on a dynamically generated invoke wrapper to do things like argument/return value marshaling and exception handling.

cctors don't have arguments, a this-reference, or return values, so those parts of the wrapper aren't useful. And they shouldn't throw, so it isn't necessary for their exception handling to be optimized, as long as it works.

This PR is an attempt at bypassing the wrapper when invoking a cctor from inside the interpreter. It seems to work, and based on my initial profiling it makes startup measurably faster. For my test app we invoke 50 cctors during startup, and each cctor has a wrapper with 112 bytes of IL in it, so it makes sense that it would deliver a speedup.

cc @BrzVlad @kotlarmilos is this a horrible idea? Ideas on how to do this better? Should I remove the cctor name check and do this for any static no-args-no-result method?

…invoke wrapper

kg · 2024-04-21T23:50:19Z

Exception handling in here is broken as I expected:


[04:54:09] info: 04:54:09.417 Running test: JIT/Methodical/eh/basics/throwinclassconstructor_d/throwinclassconstructor_d.dll
[04:54:09] info:  try
[04:54:09] info: 	 try
[04:54:09] info: 	 finally
[04:54:09] info: interp_runtime_invoke_cctor reporting exception
[04:54:09] info: 	 finally
[04:54:09] info:  catch
[04:54:09] info: 
[04:54:09] info: FAILED!
[04:54:09] info: 
[04:54:09] info: [EXPECTED OUTPUT]
[04:54:09] info:  try
[04:54:09] info: 	 try
[04:54:09] info: 	 finally
[04:54:09] info:  catch
[04:54:09] info: 
[04:54:09] info: [DIFF RESULT]
[04:54:09] info: <  catch
[04:54:09] info: ---
[04:54:09] info: > 	 finally
[04:54:09] info: 
[04:54:09] info: Xunit.Sdk.EqualException: Assert.Equal() Failure: Values differ
[04:54:09] info: Expected: 100
[04:54:09] info: Actual:   1
[04:54:09] info:    at Xunit.Assert.Equal[Int32](Int32 expected, Int32 actual, IEqualityComparer`1 comparer)
[04:54:09] info:    at Xunit.Assert.Equal[Int32](Int32 expected, Int32 actual)
[04:54:09] info:    at Program.<>c__DisplayClass0_0.<<Main>$>g__TestExecutor101|102(TestFilter filter, StreamWriter tempLogSw, StreamWriter statsCsvSw)
[04:54:09] info: 04:54:09.431 Failed test: JIT/Methodical/eh/basics/throwinclassconstructor_d/throwinclassconstructor_d.dll

kotlarmilos · 2024-04-22T08:39:10Z

Ideas on how to do this better?

I believe this is a good idea but I may lack more context.

Should I remove the cctor name check and do this for any static no-args-no-result method?

Could this affect inlining if a wrapper is omitted? Additionally, how does this affect wrappers for P/Invoke functions?

kg · 2024-04-23T03:17:23Z

Ideas on how to do this better?

I believe this is a good idea but I may lack more context.

Should I remove the cctor name check and do this for any static no-args-no-result method?

Could this affect inlining if a wrapper is omitted? Additionally, how does this affect wrappers for P/Invoke functions?

Wrappers are always optimized, so potentially methods were being inlined into the wrapper, and then their call targets were getting inlined. With this change, there will be no wrapper and the call target (the cctor) won't automatically be optimized, so its call targets won't get inlined. But I believe in most cases the wrapper's call target wouldn't have gotten inlined to begin with given our default inlining rules. Vlad would probably know better.

lambdageek · 2024-04-23T18:37:30Z

Should I remove the cctor name check and do this for any static no-args-no-result method?

Why does it have to be no-args-no-result? In essence you're making a specialized entrypoint into the interpreter that adjusts from the calling convention of interp_runtime_invoke (args on the native stack, callee-allocated return value, out-arg exception to the calling convention of the interpreter: (args on the interp stack, return on the interp stack, exceptions in the interpreter state (not sure about this)). In principle we can do all the work of placing the arguments correctly based on the MonoMethodSignature, no need to go through the IL wrapper.

One other thing I'm unclear about is whether we can do this during unwinding

lambdageek · 2024-04-23T18:20:54Z

src/mono/mono/mini/interp/interp.c

+	ThreadContext *context = get_context ();
+	stackval *sp = (stackval*)context->stack_pointer;
+	InterpMethod *imethod = mono_interp_get_imethod (method);
+	g_assert (!imethod->transformed);


cctors shouldn't run twice, and I put that there to verify that I understood how all of this code flows

lewing · 2024-05-14T15:53:29Z

@BrzVlad can you take a look

BrzVlad · 2024-05-15T08:23:47Z

Seems that this PR has broken handling of exceptions in cctors, so doesn't look like we can go with this approach. Also I would imagine the wrapper adds very few instructions to be executed while being shared for all cctor invocations. It is unclear to me why there would be a speedup. 50 cctors is not that much, should only amount to a few hundred interp instructions additionally executed.

kg · 2024-05-15T09:39:05Z

Seems that this PR has broken handling of exceptions in cctors, so doesn't look like we can go with this approach. Also I would imagine the wrapper adds very few instructions to be executed while being shared for all cctor invocations. It is unclear to me why there would be a speedup. 50 cctors is not that much, should only amount to a few hundred interp instructions additionally executed.

In my testing, the wrapper is not shared across all cctors, we compile one new one for each type. Does that sound like a bug, then? It looked like we're doing it intentionally for some reason, but the comment didn't explain why.

My hope was that we could make the exception handling work, it sounds like you think that isn't possible. I don't know how to fix it.

In my profiling so far, we spend 50-70% of startup time underneath do_transform_method (this includes execution of cctors and vtable setup, though), so my goal was to cut down the number of methods we generate as much as possible, and cctor wrappers were a good target since they're optimized (which makes them more expensive to generate). This motivated disabling of inlining for exception ctors since those showed up in wrappers. The cost of executing the instructions doesn't seem to be important.

BrzVlad · 2024-05-15T10:08:07Z

This suggests that the startup problem with cctors is not that we are going through a wrapper (that I understand should just be a try/catch negligible in overhead) but rather that we compile a new wrapper for each invocation. I don't remember the specifics but mono_marshal_get_runtime_invoke_full receives a need_direct_wrapper. If this is false then we cache wrappers based on signature. I think it is worthwhile to investigate whether we can pass false here instead.

kg · 2024-05-15T16:35:52Z

This suggests that the startup problem with cctors is not that we are going through a wrapper (that I understand should just be a try/catch negligible in overhead) but rather that we compile a new wrapper for each invocation. I don't remember the specifics but mono_marshal_get_runtime_invoke_full receives a need_direct_wrapper. If this is false then we cache wrappers based on signature. I think it is worthwhile to investigate whether we can pass false here instead.

I tried:

	if (method->flags & METHOD_ATTRIBUTE_PINVOKE_IMPL)
		target_method = mono_marshal_get_native_wrapper (target_method, FALSE, FALSE);
	else if ((sig->param_count == 0) && !sig->hasthis && (sig->ret == mono_get_void_type ()))
		need_direct_wrapper = FALSE;
	MonoMethod *invoke_wrapper = mono_marshal_get_runtime_invoke_full (target_method, FALSE, need_direct_wrapper);

where the need_direct_wrapper = FALSE is the narrowest change I could come up with, and it causes the invokes to fail with an invalid function pointer. I'm guessing I need to do something to account for a different calling convention for non-direct wrappers?

Maybe I'm changing the wrong code.

kg · 2024-05-16T16:16:18Z

From more investigation, what appears to be happening in this PR is that the unwinder is unwinding out of the cctor when it throws, into the closest managed frame. If there is some way to disable unwinding in this scenario, everything should work.

BrzVlad · 2024-05-16T17:06:38Z

We can't really catch an exception from native code. We need some sort of managed wrapper with a try/catch. I don't see any clean way around it.

kg · 2024-05-16T17:16:54Z

We can't really catch an exception from native code. We need some sort of managed wrapper with a try/catch. I don't see any clean way around it.

Makes sense. Do you have any theories for why it needs a direct wrapper? I've tried a bunch of stuff and it's not working. I'm not really sure where to look to find an example of doing this correctly and it seems like the calling convention should be the same?

Add fast path for interpreting static constructors without a runtime …

bbd4e62

…invoke wrapper

kg added the area-Codegen-Interpreter-mono label Apr 20, 2024

dotnet-policy-service bot assigned kg Apr 20, 2024

build-analysis bot mentioned this pull request Apr 20, 2024

Timeout in System.Text.Json.Tests.Utf8JsonWriterTests.WriteNumbers #101193

Open

lambdageek reviewed Apr 23, 2024

View reviewed changes

kg marked this pull request as ready for review April 26, 2024 18:35

kg requested review from BrzVlad and kotlarmilos as code owners April 26, 2024 18:35

kg closed this May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono] Interpret static constructors without a runtime invoke wrapper #101324

[mono] Interpret static constructors without a runtime invoke wrapper #101324

kg commented Apr 20, 2024

kg commented Apr 21, 2024

kotlarmilos commented Apr 22, 2024

kg commented Apr 23, 2024

lambdageek commented Apr 23, 2024

lambdageek Apr 23, 2024

kg Apr 23, 2024

lewing commented May 14, 2024

BrzVlad commented May 15, 2024

kg commented May 15, 2024

BrzVlad commented May 15, 2024 •

edited

kg commented May 15, 2024

kg commented May 16, 2024 •

edited

BrzVlad commented May 16, 2024

kg commented May 16, 2024

[mono] Interpret static constructors without a runtime invoke wrapper #101324

[mono] Interpret static constructors without a runtime invoke wrapper #101324

Conversation

kg commented Apr 20, 2024

kg commented Apr 21, 2024

kotlarmilos commented Apr 22, 2024

kg commented Apr 23, 2024

lambdageek commented Apr 23, 2024

lambdageek Apr 23, 2024

Choose a reason for hiding this comment

kg Apr 23, 2024

Choose a reason for hiding this comment

lewing commented May 14, 2024

BrzVlad commented May 15, 2024

kg commented May 15, 2024

BrzVlad commented May 15, 2024 • edited

kg commented May 15, 2024

kg commented May 16, 2024 • edited

BrzVlad commented May 16, 2024

kg commented May 16, 2024

BrzVlad commented May 15, 2024 •

edited

kg commented May 16, 2024 •

edited