-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JObject.implement deadlocks #1908
Comments
Note that I'm having same issue with using ffi with |
Yes, that's why we didn't do this before. Maybe we can only do this when we detect that we're on the main isolate of a Flutter application where the thread is indeed pinned. cc @dcharkes @liamappelbe @mkustermann for ideas. |
Can you elaborate on this? Are you seeing a deadlock with
IIUC, jnigen does blocking callbacks similarly to ffigen, and there are 2 code paths. When the callback is coming from a random thread, it sends a message to the target isolate and waits for a reply. When the callback is coming from the same thread as the target isolate, the callback is invoked synchronously. And it sounds like the issue here is that the check that decides which code path to take is a bit unreliable on flutter. jnigen is using
Another option would be to discard the message sending code path entirely, and just enter the target isolate and invoke the callback synchronously. In fact, this is one of the
|
Apologies for confusion, perhaps shouldn't have mixed these under same issue. The // this works because dart_ffi_callback is called while isolate is active
void dart_ffi_callback(void (*isolate_local_trampoline)(void)) {
isolate_local_trampoline();
}
// this doesn't work, even though the trampoline is invoked on same thread, because
// the trampoline is invokedwhile pumping the dispatch queue and isolate is
// no longer active.
void dart_ffi_callback(void (*isolate_local_trampoline)(void)) {
dispatch_async(dispatch_get_main_queue(), ^{
// same thread, fails.
isolate_local_trampoline();
});
}
// This works again. This could be done automatically by the trampoline if we saved thread Id
// with the callback metadata, but it might be the wrong thing to do if we don't know that isolate
// is always running on a particular thread (i.e. like flutter UI thread).
void dart_ffi_callback(void (*isolate_local_trampoline)(void)) {
Dart_Isolate isolate = Dart_CurrentIsolate_DL();
dispatch_async(dispatch_get_main_queue(), ^{
Dart_EnterIsolate_DL(isolate);
isolate_local_trampoline();
Dart_ExitIsolate_DL(isolate);
});
} As far as I can tell, unlike jnigen, dart ffi trampolines never block? |
@liamappelbe is in the process of designing a solution for this. |
Since it's going to take a while until the fix lands on Flutter stable, I'll use a workaround of only calling |
Which fix is meant to land on flutter stable? |
Flutter promising it only runs the main isolate on the platform thread, and us being able to query that. So, that we can check whether it's safe to "enter isolate" if we're not currently entered the isolate. |
Ah, ownership isolate API. Nice, was not aware that was in the works. Can that also be used for |
We won't be changing how |
go/dart-isolate-ownership-api Change-Id: Ia778a916de3fecec9f0aa1a5c8bc9fd7dd421267 Bug: dart-lang/native#1908 TEST=runtime/vm/dart_api_impl_test.cc Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/407700 Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Liam Appelbe <liama@google.com>
we just got bitten by this trying to run our app with native video player with the latest Flutter version 3.29.0. any idea when and how this will be fixed? |
I've started working on this now. It will be fixed by the end of the week. |
The isolate ownership API was [introduced recently](https://dart-review.googlesource.com/c/sdk/+/407700) to solve [some deadlock bugs](dart-lang/native#1908) in native callbacks. A native callback is a call from native code into a Dart function. Currently all such callbacks must run that Dart function in the isolate that created the callback (called the target isolate). The only native callback primitives at the moment are `NativeCallable.isolateLocal` (blocking, but must be invoked from the same thread as the target isolate, and the target isolate must be currently entered on that thread) and `NativeCallable.listener` (non-blocking, can be invoked from any thread). To build blocking callbacks that can be called from any thread, we can use a `NativeCallable.listener`, and use a synchronization object like a mutex or a condition variable to block until the callback is complete. However, if we try to do this on the thread that is currently entered in the target isolate, we will deadlock: we invoke the listener, a message is sent to the target isolate, and we block waiting for the message to be handled, so we never pass control flow back to the isolate to handle the message, and never stop waiting. To fix this deadlock, Ffigen and Jnigen both have a mechanism that checks if we're on the target isolate's thread first: - If the native caller is already on the same thread as the target isolate, and the target isolate is entered: - Call the Dart function directly using `NativeCallable.isolateLocal` or similar - Otherwise, if the native caller is coming from a different thread: - Call the Dart function asynchronously using `NativeCallable.listener` or similar - Block until the callback finishes However, this neglects the case where we're on the target isolate's thread, but not entered into the isolate. This case happens in Flutter when the callback is invoked from the UI thread (or the platform thread when thread merging is enabled), and the target isolate is the root isolate. When the native callback is invoked, the root isolate is not entered, so we hit the second case: we send a message to the root isolate, and block to wait for a response. Since the root isolate is exclusively run on the UI thread, and we're blocking the UI thread, the message will never be handled, and we deadlock. The isolate ownership API fixes this by allowing the embedder to inform the VM that it will run a particular isolate exclusively on a particular thread, using `Dart_SetCurrentThreadOwnsIsolate`. Other native code can then query that ownership using `Dart_GetCurrentThreadOwnsIsolate`. This lets us add a third case to our conditional: - If the native caller is on the thread that is currently entered in the target isolate: - Call the Dart function directly using `NativeCallable.isolateLocal` or similar - Otherwise, if the native caller is on the thread that owns the target isolate - Enter the target isolate - Call the Dart function directly using `NativeCallable.isolateLocal `or similar - Exit the target isolate - Otherwise, the native caller is coming from an unrelated thread: - Call the Dart function asynchronously using `NativeCallable.listener` or similar - Block until the callback finishes **Note:** We don't need to set the ownership of VM managed threads, because they run in a thread pool exclusively used by the VM, so there's no way for native code to be executed on the thread (except by FFI, in which case we're entered into the isolate anyway). We only need this for Flutter's root isolate because work can be sent to the UI thread/platform thread using OS specific APIs like Android's `Looper.getMainLooper()`.
Today! |
Uhm including Flutter?
Am 27. Feb. 2025, 07:30 -0500 schrieb Hossein Yousefi ***@***.***>:
… > When will there be a hot fix?
Today!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
HosseinYousefi left a comment (dart-lang/native#1908)
> When will there be a hot fix?
Today!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
I landed a fix in the package only which is not dependent on the Flutter fix so you can use it in any version. |
Ah, as this was filed in the language repo I was fearing that it is a problem in the language compiler itself
Am 27. Feb. 2025, 07:33 -0500 schrieb Hossein Yousefi ***@***.***>:
… > Uhm including Flutter?
> Am 27. Feb. 2025, 07:30 -0500 schrieb Hossein Yousefi @.***>:
> …
I landed a fix in the package only which is not dependent on the Flutter fix so you can use it in any version.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
HosseinYousefi left a comment (dart-lang/native#1908)
> Uhm including Flutter?
> Am 27. Feb. 2025, 07:30 -0500 schrieb Hossein Yousefi @.***>:
> …
I landed a fix in the package only which is not dependent on the Flutter fix so you can use it in any version.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
Not sure if I'm following, this is the native repo. |
Sorry, you are right, my mistake. I just wasn't aware that it o n ly affects a package
Am 27. Feb. 2025, 10:31 -0500 schrieb Hossein Yousefi ***@***.***>:
… > Ah, as this was filed in the language repo I was fearing that it is a problem in the language compiler itself
Not sure if I'm following, this is the native repo.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
HosseinYousefi left a comment (dart-lang/native#1908)
> Ah, as this was filed in the language repo I was fearing that it is a problem in the language compiler itself
Not sure if I'm following, this is the native repo.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
Java code
Dart code
Java Stack trace
Native Stack trace
The problem that
Java_com_github_dart_1lang_jni_PortProxyBuilder__1invoke
checksDart_CurrentIsolate_DL
to determine whether the call is coming from another thread, and if that returnsnull
it sends message on port and wait. However in case of Flutter on Android, the platform thread is the isolate thread, which means it is essentially blocking the main thread. Note thatDart_CurrentIsolate_DL
returnsnull
, because after posting the callback to main looper the isolate has been exited.The solution that would work in the context of Flutter is to remember the thread Id alongside isolate, and if the thread Id matches, calling
Dart_EnterIsolate_DL
andDart_ExitIsolate_DL
around the trampoline.Now while this works for Flutter, I'm not sure the solution is generic enough since it makes assumption about the isolate being "pinned" to a specific thread.
cc @HosseinYousefi
The text was updated successfully, but these errors were encountered: