-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ETS atomically on tracker update #172
Update ETS atomically on tracker update #172
Conversation
state = bump_clock(%State{state | clouds: pruned_clouds, delta: new_delta}) | ||
|
||
# Update ETS entry and produce add-like delta | ||
state = bump_clock(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we need to bump the clock twice, I just mirrored the exact behaviour of leave + join.
❤️❤️❤️🐥🔥 |
Released in v2.1.2. Thank you so much! |
Beautiful, thanks! <3 |
With this change I am seeing issues where the presence ETS values tables on different nodes are drifting out of sync. It seems that presence entries from processeses on other nodes are not always cleaned up. This forced me to roll back to 2.1.1 for the time being. I will see if I can come up with a reproducible case. |
@arjan are you comparing the ETS rows, or the result of I went through the changes a couple times and so far haven't pinpointed anything that would introduce any meaningful difference. |
I noticed it because our stats use the output of |
Tracker update is handled as leave + join and ETS updates are applied in this notion, that is, a delete followed by an insert. Since tracker lookup reads ETS on the caller side, it can happen concurrently to tracker update, so there is a race condition when the ETS read happens between said delete and insert.
The "values" ETS table (which keeps the metadata) is an :ordered_set, which means that we can do an atomic update just by inserting the new value, without prior delete. So I made two changes:
State.leave_join
, which works just likeState.leave
+State.join
, but without the unnecessary deleteState.merge
to gather desired inserts/deletes to ETS, do a simple diff and only then apply the changes (which makes an entry update atomic)The changes only affect applying changes to the internal ETS storage, the update is reported as leave + join in exactly the same way.
Context: in Livebook we use tracker for registering notebook session processes and we also store basic session information necessary for listing, which gets updated. We have a lot of tests that create and lookup sessions and we started to observe more and more intermittent test failures that basically go: create session -> lookup session -> session not found. Running tests in a loop I could reproduce this behaviour and tracked it down to this race condition, with this PR it no longer happens.