Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block GdCell::borrow and GdCell::borrow_mut on other threads #736

Merged
merged 1 commit into from
Jun 9, 2024

Conversation

TitanNano
Copy link
Contributor

@TitanNano TitanNano commented May 27, 2024

Should fix: #709 and #556.

This introduces a new threads feature to godot-cell. When the feature is enabled, GdCell::borrow will block the current thread if any other thread is currently holding a mutable reference. GdCell::borrow_mut on the other hand, will block if any other thread is currently holding a shared (or mutable reference), but the current thread is not. In any other case, the methods panic as before.

I named the feature threads and not experimental-threads as this should not be considered experimental inside godot-cell. The feature is then enabled by godot-cores experimental-threads feature.

This also adds parking_lot as a dependency of godot-cell as the standard library RwLock does not support write lock downgrading.

@GodotRust
Copy link

API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-736

Copy link
Member

@Bromeon Bromeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for looking into this! 👍

Should fix: #709 and #556.

That would be awesome! Did you have a chance to test one of them?

Also, do you know if this could have an impact on any of the other issues in #713?

@Bromeon Bromeon added bug c: core Core components labels May 28, 2024
@Bromeon Bromeon linked an issue May 28, 2024 that may be closed by this pull request
@TitanNano
Copy link
Contributor Author

Also, do you know if this could have an impact on any of the other issues in #713?

As far as I see, the other two issues are both solved by enabling the experimental-threads feature, so this change should have no impact on them.

@TitanNano TitanNano force-pushed the jovan/blocking_godot_cell branch from 2a90b98 to 09850a8 Compare May 28, 2024 14:23
@TitanNano
Copy link
Contributor Author

That would be awesome! Did you have a chance to test one of them?

I now had the time to test #556, and it no longer panics during a reimport.

Copy link
Member

@Bromeon Bromeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

If ever needed in the future, we might also consider a separate cell with the threading logic, but for now I'd keep complexity low. As long as people don't use experimental-threads, they still have the full performance.

There's one part where I'm not completely following -- in claim_mut_thread, the caller intends to have mutable access. But why is a read lock (not a write one) returned, and why is the downgrade needed? If you could elaborate this in code, would be perfect.

@lilizoey
Copy link
Member

id like to have a look at this later before it's merged, since i made the original gd-cell I'd just wanna be sure this seems correct

@TitanNano
Copy link
Contributor Author

There's one part where I'm not completely following -- in claim_mut_thread, the caller intends to have mutable access. But why is a read lock (not a write one) returned, and why is the downgrade needed? If you could elaborate this in code, would be perfect.

So the RwLock is used to track all existing RefGuards and MutGuards across all threads. If a MutGuard would hold a write lock, then all additional borrows would block their thread and cause deadlocks. By giving every guard a read lock, we can track if there are any active guards. We can then use the write lock to block the current thread until either all shared references have been dropped or the singular mutable reference has been dropped.

The constraints on how many shared or mutable references can exist at the same time continues to be enforced by the existing implementation of GdCell.

I will try to better describe this in the code.

@lilizoey
Copy link
Member

lilizoey commented May 28, 2024

I'm not sure this is the right approach. It's a bit difficult to tell whether this is gonna be sound or not, it'd be a lot better if you could somehow separate the blocking logic entirely from the gd-cell logic. Since i dont think any unsafe would be needed for blocking.

If you could somehow make a

pub struct GdCellBlocking<T> {
    cell: Pin<BoxGdCell<T>>>,
    // Other stuff you need to check blocking
}

Which has the same API as GdCell, but it's methods do not touch any internal state of GdCell at all and are entirely written in safe rust. That is they only call public functions of GdCell. It would be a lot easier to be confident that none of what's done here impacts the safety of GdCell.

Additionally your usage of RwLock is rather confusing. It doesn't seem like you're using RwLock for its intended purpose (to ensure thread-safe access to some variable) but rather just for its blocking logic. It may be better to use something like Condvar if possible, since that's more inline with its intended purpose (blocking threads until some condition is fulfilled).

@TitanNano TitanNano force-pushed the jovan/blocking_godot_cell branch from c4d0d15 to a002864 Compare May 28, 2024 22:19
@TitanNano
Copy link
Contributor Author

It's a bit difficult to tell whether this is gonna be sound or not, it'd be a lot better if you could somehow separate the blocking logic entirely from the gd-cell logic. Since i dont think any unsafe would be needed for blocking.

This PR does not add any unsafe code or change the way the GdCell behaves. The changes only introduce the option to block a thread under certain conditions instead of panicking straight away.

Yes, the unsafe RefGuard::new() and MutGuard::new() functions gained a new argument, but this is not unsafe code. As far as I can see, you only marked this function unsafe because certain invariants of the guard must be upheld, and not because the contents of the function itself are unsafe.

[...] it'd be a lot better if you could somehow separate the blocking logic entirely from the gd-cell logic. [...]

If you'd like, I can move the new parts of borrow() and borrow_mut() into their own functions, but I don't think that would add much to the comprehensiveness of the implementation.

If you could somehow make a GdCellBlocking<T> which has the same API as GdCell [...]

That is certainly possible but would just add more overhead to everything. I would also have to wrap the two guard types to track when they are dropped. The guard types would also need an unsafe new function, as their invariants would be the same as the ones of GdCell

[..] but it's methods do not touch any internal state of GdCell at all and are entirely written in safe rust. That is they only call public functions of GdCell. [...]

What is the problem with inspecting the internal state of the cell? The internal state is not being mutated by these changes.

By only using the current public interface, I would have to re-track the internal state that is inspected to make decisions on whether to block a thread or not, adding more overhead.

It doesn't seem like you're using RwLock for its intended purpose (to ensure thread-safe access to some variable) but rather just for its blocking logic.

It's primarily a lock, and we are locking the access to the exclusive mutable reference. Do you see an actual issue with using it?

It may be better to use something like Condvar if possible, since that's more inline with its intended purpose (blocking threads until some condition is fulfilled).

The same behavior could be replicated with a Condvar but we would have to reimplement the behavior of the RwLock and add notify calls to all Drop implementations. This makes the implementation somewhat more spread out, while a RwLock can do the same job right away.

@lilizoey
Copy link
Member

lilizoey commented May 28, 2024

It's a bit difficult to tell whether this is gonna be sound or not, it'd be a lot better if you could somehow separate the blocking logic entirely from the gd-cell logic. Since i dont think any unsafe would be needed for blocking.

This PR does not add any unsafe code or change the way the GdCell behaves. The changes only introduce the option to block a thread under certain conditions instead of panicking straight away.

Yes, the unsafe RefGuard::new() and MutGuard::new() functions gained a new argument, but this is not unsafe code. As far as I can see, you only marked this function unsafe because certain invariants of the guard must be upheld, and not because the contents of the function itself are unsafe.

Yes, because the implementation relies on these invariants to be upheld for safety elsewhere. By adding more logic and arguments to these function you are increasing the surface area of what needs to be verified, even if it's all correct it's still more that needs to be kept in mind when ensuring things are safe.

[...] it'd be a lot better if you could somehow separate the blocking logic entirely from the gd-cell logic. [...]

If you'd like, I can move the new parts of borrow() and borrow_mut() into their own functions, but I don't think that would add much to the comprehensiveness of the implementation.

When dealing with unsafe and designing a safe api, it is very useful that anything which touches the safety invariants of some structure are minimal. As this makes it easier to ensure nothing is done incorrectly.

If you could somehow make a GdCellBlocking<T> which has the same API as GdCell [...]

That is certainly possible but would just add more overhead to everything. I would also have to wrap the two guard types to track when they are dropped. The guard types would also need an unsafe new function, as their invariants would be the same as the ones of GdCell

I dont see why? The outer functions can just call borrow/borrow_mut and handle the error case without panicking but instead by blocking when it's caused by other threads containing references. This might be better if you add some extra cases to the error-struct these methods return that signal those conditions explicitly.

[..] but it's methods do not touch any internal state of GdCell at all and are entirely written in safe rust. That is they only call public functions of GdCell. [...]

What is the problem with inspecting the internal state of the cell? The internal state is not being mutated by these changes.

By only using the current public interface, I would have to re-track the internal state that is inspected to make decisions on whether to block a thread or not, adding more overhead.

You can add new read-only public (or crate-public ideally, avoid leaking internal state where possible) methods that track the internal state. But if you have direct access to the internal state then there's nothing stopping you from mutating it. Which means that when verifying that the implementation is correct and doesn't break anything i need to also ensure that the implementation doesn't mutate any of the internal state it has access to.

We're dealing with unsafe here so need to be extra cautious. When im verifying that you're doing things correctly i have to take on the role of the compiler in verifying that it's all sound. But im a human so im much more liable to make mistakes. So it's preferable to make the job as easy as possible, by avoiding complexity in unsafe and making things like this more straightforward. And the less opportunities for fucking up, the easier it is to ensure that nothing bad is missed.

It doesn't seem like you're using RwLock for its intended purpose (to ensure thread-safe access to some variable) but rather just for its blocking logic.

It's primarily a lock, and we are locking the access to the exclusive mutable reference. Do you see an actual issue with using it?

It's confusing is my main point. Usually the RwLock directly protects the data it's holding. But here you're both

  1. Protecting data that's external to the RwLock
  2. Only holding read-locks, and only momentarily grabbing the write lock to ensure exclusivity.

These are both non-standard uses of RwLock, which makes it a lot harder to track how the control-flow works here.

It may be better to use something like Condvar if possible, since that's more inline with its intended purpose (blocking threads until some condition is fulfilled).

The same behavior could be replicated with a Condvar but we would have to reimplement the behavior of the RwLock and add notify calls to all Drop implementations. This makes the implementation somewhat more spread out, while a RwLock can do the same job right away.

I dont think it's that much more straight-forward. You end up needing this claim_mut_thread function for instance which isn't all that easy to understand the purpose of. The reason it all works (i assume) isn't straight-forward to figure out. Which again is because you're using an RwLock in a non-standard case here.

Whereas a Condvar implementation would likely be much more explicit about:

  1. When blocking happens
  2. What conditions we wait for when blocking
  3. When threads are unblocked

All of which happen much more implicitly with the RwLock.


Overall, having a "dumber" implementation which has more boilerplate/overhead, is preferable imo over a "clever" implementation with minimal boilerplate/overhead. Entirely because we're in a safety-critical situation, and it's very important that the unsafe code is correct. Keep in mind that rust can often optimize some amount of boilerplate and trivial overhead. At least until it's clear that this is something that needs to be optimized for performance, at which point there are likely several other places to optimize that are bigger gains than this.

If we're confident the current unsafe code is correct, then i think we should touch it as little as possible when adding functionality. And i do think that in this case it is possible to implement blocking functionality while minimally changing the unsafe code (perhaps not at all), and just adding some extra functions which inspects the internal state without modifying it.

@Bromeon
Copy link
Member

Bromeon commented May 29, 2024

I'm a bit split on this. I think @lilizoey is making a good point that it's safer to not touch existing logic as the surface area increases. However, that also means:

  • Following this line of thought strictly, we can't really maintain code inside GdCell once it's "correct", because any changes have a risk to introduce bugs.
  • Creating a wrapper cell also comes with some complexity for the interaction, which itself adds surface that can be prone to errors. Possibly handling it externally makes it harder to verify that external interaction. But you're right that the existing GdCell remains safe, which is a big gain.

All in all, it's hard for me to estimate how complex an external solution (wrapper cell) would be, so maybe we should try to find that out?

@lilizoey
Copy link
Member

yeah, if an external solution is a lot worse/more complicated in some ways then it's preferable to do something internal to GdCell. i just dont think it is gonna be much worse here, it may even be simpler to understand. and if that is the case then it'd be better to avoid touching the current unsafe code if we can.

@TitanNano
Copy link
Contributor Author

That is certainly possible but would just add more overhead to everything. I would also have to wrap the two guard types to track when they are dropped. The guard types would also need an unsafe new function, as their invariants would be the same as the ones of GdCell…

I dont see why? The outer functions can just call borrow/borrow_mut and handle the error case without panicking but instead by blocking when it's caused by other threads containing references. This might be better if you add some extra cases to the error-struct these methods return that signal those conditions explicitly.

@lilizoey yes, but we also have to track (in the case of Condvar, call a notify method) reference guards are being dropped, so we know when to unblock.

All in all, it's hard for me to estimate how complex an external solution (wrapper cell) would be, so maybe we should try to find that out?

@Bromeon I don't think it would be overly complex and primarily just introduce more overhead and boilerplate. I will give it a shot and report back if I think it's getting out of hand.

@Bromeon
Copy link
Member

Bromeon commented May 29, 2024

Thank you! Maybe makes sense to do it in a new commit, so we can compare side by side?

@TitanNano TitanNano force-pushed the jovan/blocking_godot_cell branch from a002864 to 1ac3e44 Compare May 29, 2024 17:07
@TitanNano
Copy link
Contributor Author

This PR now has the new version, which is implemented as a wrapper type. I just did it as quickly as possible, and we can streamline it and remove unnecessary stuff. I left some new crate internal functions on GdCell which are probably not actually necessary and a bunch of debugging output that I can remove once we choose this version.

The previous version is here: 747a372

@lilizoey
Copy link
Member

It is a bit unfortunate to have to duplicate some of the reference counting logic, and having to add a new set of guards. but in my opinion this new version is easier to reason through and simpler to understand. The fact that no unsafe is needed for the implementation here, and none of the existing unsafe functions are touched is also a big plus imo.

@Bromeon
Copy link
Member

Bromeon commented May 30, 2024

This is now our 5th guard type/pair in the library 😀
(GdRef/Mut, BaseRef/Mut, GlobalGuard, godot_cell::Mut/RefGuard so far).

But generally I agree, it's nice that this can be done non-intrusively and safely, and I think this is a typical case of "the complexity has to live somewhere": as much as I'd like to make things simpler, I think we can't deny reality that interfacing with Godot is inherently difficult, especially when it comes to re-entrancy and threads. As long as we can shield users from all that complexity, I'm OK with a bit of internal duplication, if it in turn makes 2 separate concepts (blocking access in this PR, and safe re-entrancy in the existing GdCell) easier to understand on their own.

Currently the library would enable either the blocking implementations (in threads feature) or the panicking one (without the feature). Do you think there are situations where we need both available? Because that would be another reason to keep them separate.


Looking at the implementation, there's possibly some code that can be deduplicated between borrow/borrow_mut and between block_mut/block_immut.

Also, you currently use .lock().unwrap() -- are there scenarios where poisoning can happen, e.g. a panic in user code (which will poison guards even if panics are caught).

@TitanNano
Copy link
Contributor Author

Do you think there are situations where we need both available?

I can't think of a reason why someone would want to use the existing GdCell in a multithreaded context and risk running into random panics. The only case I can think of is when someone can identify the additional reference counting and mutex locking as a bottleneck in their code path, and they are absolutely certain they are accessing the cell from a single thread only. But we have to consider that GdCell is currently an internal type and supposed to remain this way, so any usage should be general purpose, I think.

Also, you currently use .lock().unwrap() -- are there scenarios where poisoning can happen, e.g. a panic in user code (which will poison guards even if panics are caught).

There shouldn't be any reason for a panic as far as I can see. The mutex lock is always very short-lived when doing the reference counting and acquiring the inner RefGuard. Once borrow and borrow_mut return the lock is dropped.

This is now our 5th guard type/pair in the library 😀

Maybe we want to consider hiding the GdCellBlocking from the public interface of the godot-cell crate. We could rename the existing cell to something like RawGdCell or InternalGdCell and introduce a new wrapper GdCell, that is a new-type without the threads feature and a GdCellBlocking when the feature is on. This would also hide the Pin<Box<GdCell>> from the consumer. A downside to this is that the reference guard types would differ unless we new-type them as well.

@lilizoey
Copy link
Member

i can imagine in the future allowing people to choose whichever user-data wrapper they want. (like in gdnative i believe?) in that case having both available would be nice. But for that we'd probably want to introduce a trait to abstract over userdata wrappers.

the existing implementation should be equivalent to the blocking one for single-threaded uses, but the blocking impl might be slightly less performant.

@Bromeon
Copy link
Member

Bromeon commented May 31, 2024

Maybe we want to consider hiding the GdCellBlocking from the public interface of the godot-cell crate. We could rename the existing cell to something like RawGdCell or InternalGdCell and introduce a new wrapper GdCell, that is a new-type without the threads feature and a GdCellBlocking when the feature is on. This would also hide the Pin<Box<GdCell>> from the consumer. A downside to this is that the reference guard types would differ unless we new-type them as well.

I think the current naming is OK, maybe I'd use BlockingGdCell instead of GdCellBlocking. But keeping the other one at just GdCell makes it clear that the blocking one adds "more" on top and isn't just a different flavor/implementation.

Regarding conditional compilation, wouldn't it be possible to provide the same API for both towards the outside and hide any differences behind use statements or type aliases?

#[cfg(feature = "threads")]
mod types {
    pub use crate::BlockingGdCell as Cell;
    pub use ... as RefGuard;
    pub use ... as MutGuard;
}

#[cfg(not(feature = "threads"))]
mod types {
    pub use crate::GdCell as Cell;
    pub use ... as RefGuard;
    pub use ... as MutGuard;
}

// The public API is then this:
pub use types::*;

To avoid exposing the basic GdCell in such a case, it can be moved to a crate-private module, i.e. crate::basic::GdCell. But the type itself needs to be pub to remain re-exportable.

@TitanNano
Copy link
Contributor Author

@Bromeon sure we can do this, it's just that GdCell is instantiated as Pin<Box<GdCell<T>>> and blocking would be BlockingGdCell<T>, so the public interface of the base cell currently exposes the pin dynamics of the cell. This is something BlockingGdCell hides.

All this means that the usage of the two types is slightly different. (this is something visible in the changes I made to the mock tests). If everyone is fine with these differences, then I'm happy to just use a type alias.

@Bromeon
Copy link
Member

Bromeon commented May 31, 2024

But it doesn't seem like the Pin can be properly abstracted (i.e. fully contained inside godot-cell crate)?

For example, https://github.com/godot-rust/gdext/blob/master/godot-core/src/obj/script.rs mentiones both owned and borrowed types:

Pin<Box<GdCell<T>>>
Pin<&'a GdCell<T>>

Or maybe you have a suggestion? Would you wrap those separately?

@TitanNano
Copy link
Contributor Author

Or maybe you have a suggestion? Would you wrap those separately?

Pin<&'a GdCell> can be obtained inside a &Cell. I just pushed a commit where the public Cell type now has the same interface as the BlockingGdCell.

Everything is still WIP, and I'm still working on cleaning things up and documenting the new types and methods.

@TitanNano TitanNano force-pushed the jovan/blocking_godot_cell branch from d0d0ab2 to ce50e82 Compare June 2, 2024 23:03
@TitanNano
Copy link
Contributor Author

Documentation is currently messed up due to everything being private except the type aliases. I haven't looked into solving it yet, but if you have suggestions on what you would prefer, please let me know.

The GdCell type is now also dead code if the threads feature is on.

@TitanNano TitanNano force-pushed the jovan/blocking_godot_cell branch from ce50e82 to 146fce3 Compare June 2, 2024 23:17
@Bromeon
Copy link
Member

Bromeon commented Jun 3, 2024

Documentation is currently messed up due to everything being private except the type aliases. I haven't looked into solving it yet, but if you have suggestions on what you would prefer, please let me know.

Which documentation are you referring to?

The GdCell type is now also dead code if the threads feature is on.

That should be easy to solve with a #[cfg(not(feature = "threads"))], no? 🙂

@lilizoey
Copy link
Member

lilizoey commented Jun 3, 2024

The GdCell type is now also dead code if the threads feature is on.

That should be easy to solve with a #[cfg(not(feature = "threads"))], no? 🙂

I think there's another way to solve that, just make both the blocking and non-blocking version public to gd-cell. then pick the appropriate one in godot-core. like

// in gd-cell/src/lib.rs
pub mod blocking {
  pub use some::path::to::GdCellBlocking as GdCell;
  pub use some::path::to::RefGuardBlocking as RefGuard;
  ...
}

pub mod panicking {
  pub use some::other::path::to::GdCell;
  pub use some::other::path::to::RefGuard;
  ...
}

// in godot-core somewhere
#[cfg(feature = experimental_threads)]
use gd_cell::blocking::*;
#[cfg(not(feature = experimental_threads))]
use gd_cell::panicking::*;

Since both the blocking and non-blocking version are public here, and thus accessible outside of gd-cell, this shouldn't trigger unused code warnings.

@TitanNano TitanNano force-pushed the jovan/blocking_godot_cell branch 2 times, most recently from 455db8d to ddd331c Compare June 3, 2024 22:28
@TitanNano TitanNano requested a review from Bromeon June 4, 2024 20:13
Copy link
Member

@Bromeon Bromeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the update!

@TitanNano TitanNano force-pushed the jovan/blocking_godot_cell branch from ddd331c to f4fcdc7 Compare June 5, 2024 11:17
Copy link
Member

@lilizoey lilizoey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks pretty good, some nitpicks here and there and also just noticed that i didn't include a Send bound in the Sync impl for GdCell which should be there.

@TitanNano TitanNano force-pushed the jovan/blocking_godot_cell branch from f4fcdc7 to eec138d Compare June 5, 2024 21:23
@TitanNano TitanNano force-pushed the jovan/blocking_godot_cell branch from 8859d01 to efa52b7 Compare June 5, 2024 21:39
@TitanNano
Copy link
Contributor Author

@Bromeon @lilizoey thanks for all the improvement suggestions.

@TitanNano TitanNano requested a review from Bromeon June 7, 2024 14:08
@lilizoey
Copy link
Member

lilizoey commented Jun 9, 2024

I think this looks good now! Might cause a minor conflict with #741 but i can just fix that on my end.

@lilizoey lilizoey added this pull request to the merge queue Jun 9, 2024
Merged via the queue into godot-rust:master with commit dcfd5df Jun 9, 2024
15 checks passed
@TitanNano TitanNano deleted the jovan/blocking_godot_cell branch June 9, 2024 18:13
@Bromeon Bromeon added the c: script-instance Script-instance APIs label Mar 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug c: core Core components c: script-instance Script-instance APIs
Projects
None yet
4 participants