Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Interpret static constructors without a runtime invoke wrapper #101324

Closed
wants to merge 1 commit into from

Conversation

kg
Copy link
Contributor

@kg kg commented Apr 20, 2024

Right now when mono needs to run a static constructor it does so through the generic runtime invoke path, which (for the interpreter at least) relies on a dynamically generated invoke wrapper to do things like argument/return value marshaling and exception handling.

cctors don't have arguments, a this-reference, or return values, so those parts of the wrapper aren't useful. And they shouldn't throw, so it isn't necessary for their exception handling to be optimized, as long as it works.

This PR is an attempt at bypassing the wrapper when invoking a cctor from inside the interpreter. It seems to work, and based on my initial profiling it makes startup measurably faster. For my test app we invoke 50 cctors during startup, and each cctor has a wrapper with 112 bytes of IL in it, so it makes sense that it would deliver a speedup.

cc @BrzVlad @kotlarmilos is this a horrible idea? Ideas on how to do this better? Should I remove the cctor name check and do this for any static no-args-no-result method?

@kg
Copy link
Contributor Author

kg commented Apr 21, 2024

Exception handling in here is broken as I expected:


[04:54:09] info: 04:54:09.417 Running test: JIT/Methodical/eh/basics/throwinclassconstructor_d/throwinclassconstructor_d.dll
[04:54:09] info:  try
[04:54:09] info: 	 try
[04:54:09] info: 	 finally
[04:54:09] info: interp_runtime_invoke_cctor reporting exception
[04:54:09] info: 	 finally
[04:54:09] info:  catch
[04:54:09] info: 
[04:54:09] info: FAILED!
[04:54:09] info: 
[04:54:09] info: [EXPECTED OUTPUT]
[04:54:09] info:  try
[04:54:09] info: 	 try
[04:54:09] info: 	 finally
[04:54:09] info:  catch
[04:54:09] info: 
[04:54:09] info: [DIFF RESULT]
[04:54:09] info: <  catch
[04:54:09] info: ---
[04:54:09] info: > 	 finally
[04:54:09] info: 
[04:54:09] info: Xunit.Sdk.EqualException: Assert.Equal() Failure: Values differ
[04:54:09] info: Expected: 100
[04:54:09] info: Actual:   1
[04:54:09] info:    at Xunit.Assert.Equal[Int32](Int32 expected, Int32 actual, IEqualityComparer`1 comparer)
[04:54:09] info:    at Xunit.Assert.Equal[Int32](Int32 expected, Int32 actual)
[04:54:09] info:    at Program.<>c__DisplayClass0_0.<<Main>$>g__TestExecutor101|102(TestFilter filter, StreamWriter tempLogSw, StreamWriter statsCsvSw)
[04:54:09] info: 04:54:09.431 Failed test: JIT/Methodical/eh/basics/throwinclassconstructor_d/throwinclassconstructor_d.dll

@kotlarmilos
Copy link
Member

Ideas on how to do this better?

I believe this is a good idea but I may lack more context.

Should I remove the cctor name check and do this for any static no-args-no-result method?

Could this affect inlining if a wrapper is omitted? Additionally, how does this affect wrappers for P/Invoke functions?

@kg
Copy link
Contributor Author

kg commented Apr 23, 2024

Ideas on how to do this better?

I believe this is a good idea but I may lack more context.

Should I remove the cctor name check and do this for any static no-args-no-result method?

Could this affect inlining if a wrapper is omitted? Additionally, how does this affect wrappers for P/Invoke functions?

Wrappers are always optimized, so potentially methods were being inlined into the wrapper, and then their call targets were getting inlined. With this change, there will be no wrapper and the call target (the cctor) won't automatically be optimized, so its call targets won't get inlined. But I believe in most cases the wrapper's call target wouldn't have gotten inlined to begin with given our default inlining rules. Vlad would probably know better.

@lambdageek
Copy link
Member

Should I remove the cctor name check and do this for any static no-args-no-result method?

Why does it have to be no-args-no-result? In essence you're making a specialized entrypoint into the interpreter that adjusts from the calling convention of interp_runtime_invoke (args on the native stack, callee-allocated return value, out-arg exception to the calling convention of the interpreter: (args on the interp stack, return on the interp stack, exceptions in the interpreter state (not sure about this)). In principle we can do all the work of placing the arguments correctly based on the MonoMethodSignature, no need to go through the IL wrapper.

One other thing I'm unclear about is whether we can do this during unwinding

ThreadContext *context = get_context ();
stackval *sp = (stackval*)context->stack_pointer;
InterpMethod *imethod = mono_interp_get_imethod (method);
g_assert (!imethod->transformed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cctors shouldn't run twice, and I put that there to verify that I understood how all of this code flows

@kg kg marked this pull request as ready for review April 26, 2024 18:35
@lewing
Copy link
Member

lewing commented May 14, 2024

@BrzVlad can you take a look

@BrzVlad
Copy link
Member

BrzVlad commented May 15, 2024

Seems that this PR has broken handling of exceptions in cctors, so doesn't look like we can go with this approach. Also I would imagine the wrapper adds very few instructions to be executed while being shared for all cctor invocations. It is unclear to me why there would be a speedup. 50 cctors is not that much, should only amount to a few hundred interp instructions additionally executed.

@kg
Copy link
Contributor Author

kg commented May 15, 2024

Seems that this PR has broken handling of exceptions in cctors, so doesn't look like we can go with this approach. Also I would imagine the wrapper adds very few instructions to be executed while being shared for all cctor invocations. It is unclear to me why there would be a speedup. 50 cctors is not that much, should only amount to a few hundred interp instructions additionally executed.

In my testing, the wrapper is not shared across all cctors, we compile one new one for each type. Does that sound like a bug, then? It looked like we're doing it intentionally for some reason, but the comment didn't explain why.

My hope was that we could make the exception handling work, it sounds like you think that isn't possible. I don't know how to fix it.

In my profiling so far, we spend 50-70% of startup time underneath do_transform_method (this includes execution of cctors and vtable setup, though), so my goal was to cut down the number of methods we generate as much as possible, and cctor wrappers were a good target since they're optimized (which makes them more expensive to generate). This motivated disabling of inlining for exception ctors since those showed up in wrappers. The cost of executing the instructions doesn't seem to be important.

@BrzVlad
Copy link
Member

BrzVlad commented May 15, 2024

This suggests that the startup problem with cctors is not that we are going through a wrapper (that I understand should just be a try/catch negligible in overhead) but rather that we compile a new wrapper for each invocation. I don't remember the specifics but mono_marshal_get_runtime_invoke_full receives a need_direct_wrapper. If this is false then we cache wrappers based on signature. I think it is worthwhile to investigate whether we can pass false here instead.

@kg
Copy link
Contributor Author

kg commented May 15, 2024

This suggests that the startup problem with cctors is not that we are going through a wrapper (that I understand should just be a try/catch negligible in overhead) but rather that we compile a new wrapper for each invocation. I don't remember the specifics but mono_marshal_get_runtime_invoke_full receives a need_direct_wrapper. If this is false then we cache wrappers based on signature. I think it is worthwhile to investigate whether we can pass false here instead.

I tried:

	if (method->flags & METHOD_ATTRIBUTE_PINVOKE_IMPL)
		target_method = mono_marshal_get_native_wrapper (target_method, FALSE, FALSE);
	else if ((sig->param_count == 0) && !sig->hasthis && (sig->ret == mono_get_void_type ()))
		need_direct_wrapper = FALSE;
	MonoMethod *invoke_wrapper = mono_marshal_get_runtime_invoke_full (target_method, FALSE, need_direct_wrapper);

where the need_direct_wrapper = FALSE is the narrowest change I could come up with, and it causes the invokes to fail with an invalid function pointer. I'm guessing I need to do something to account for a different calling convention for non-direct wrappers?

Maybe I'm changing the wrong code.

@kg
Copy link
Contributor Author

kg commented May 16, 2024

From more investigation, what appears to be happening in this PR is that the unwinder is unwinding out of the cctor when it throws, into the closest managed frame. If there is some way to disable unwinding in this scenario, everything should work.

@BrzVlad
Copy link
Member

BrzVlad commented May 16, 2024

We can't really catch an exception from native code. We need some sort of managed wrapper with a try/catch. I don't see any clean way around it.

@kg
Copy link
Contributor Author

kg commented May 16, 2024

We can't really catch an exception from native code. We need some sort of managed wrapper with a try/catch. I don't see any clean way around it.

Makes sense. Do you have any theories for why it needs a direct wrapper? I've tried a bunch of stuff and it's not working. I'm not really sure where to look to find an example of doing this correctly and it seems like the calling convention should be the same?

@kg kg closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants