Update ETS atomically on tracker update #172

jonatanklosko · 2023-05-24T00:00:20Z

Tracker update is handled as leave + join and ETS updates are applied in this notion, that is, a delete followed by an insert. Since tracker lookup reads ETS on the caller side, it can happen concurrently to tracker update, so there is a race condition when the ETS read happens between said delete and insert.

The "values" ETS table (which keeps the metadata) is an :ordered_set, which means that we can do an atomic update just by inserting the new value, without prior delete. So I made two changes:

for the local case I added State.leave_join, which works just like State.leave + State.join, but without the unnecessary delete
for the remote case I updated State.merge to gather desired inserts/deletes to ETS, do a simple diff and only then apply the changes (which makes an entry update atomic)

The changes only affect applying changes to the internal ETS storage, the update is reported as leave + join in exactly the same way.

Context: in Livebook we use tracker for registering notebook session processes and we also store basic session information necessary for listing, which gets updated. We have a lot of tests that create and lookup sessions and we started to observe more and more intermittent test failures that basically go: create session -> lookup session -> session not found. Running tests in a loop I could reproduce this behaviour and tracked it down to this race condition, with this PR it no longer happens.

jonatanklosko · 2023-05-24T00:01:04Z

lib/phoenix/tracker/state.ex

+    state = bump_clock(%State{state | clouds: pruned_clouds, delta: new_delta})
+
+    # Update ETS entry and produce add-like delta
+    state = bump_clock(state)


Not sure if we need to bump the clock twice, I just mirrored the exact behaviour of leave + join.

chrismccord · 2023-05-24T20:25:54Z

❤️❤️❤️🐥🔥

chrismccord · 2023-05-24T20:28:59Z

Released in v2.1.2. Thank you so much!

jonatanklosko · 2023-05-24T21:00:40Z

Beautiful, thanks! <3

arjan · 2023-06-13T14:04:47Z

With this change I am seeing issues where the presence ETS values tables on different nodes are drifting out of sync. It seems that presence entries from processeses on other nodes are not always cleaned up. This forced me to roll back to 2.1.1 for the time being. I will see if I can come up with a reproducible case.

jonatanklosko · 2023-06-13T17:07:31Z

@arjan are you comparing the ETS rows, or the result of Presence.list? Also, do they differ just in excess processes or the metadata itself?

I went through the changes a couple times and so far haven't pinpointed anything that would introduce any meaningful difference.

arjan · 2023-06-14T06:29:07Z

I noticed it because our stats use the output of Presence.list and the counts were way off yesterday morning. Hopefully I can reproduce it today.

Update ETS atomically on tracker update

1345035

jonatanklosko commented May 24, 2023

View reviewed changes

chrismccord merged commit fc5686f into phoenixframework:main May 24, 2023

jonatanklosko deleted the jk-atomic-ets-update branch May 24, 2023 20:26

jonatanklosko mentioned this pull request May 24, 2023

Bump phoenix_pubsub livebook-dev/livebook#1923

Merged

arjan mentioned this pull request Jun 14, 2023

Presence list keep growing when using Presence.update #175

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ETS atomically on tracker update #172

Update ETS atomically on tracker update #172

jonatanklosko commented May 24, 2023 •

edited

jonatanklosko May 24, 2023

chrismccord commented May 24, 2023

chrismccord commented May 24, 2023

jonatanklosko commented May 24, 2023

arjan commented Jun 13, 2023

jonatanklosko commented Jun 13, 2023 •

edited

arjan commented Jun 14, 2023

Update ETS atomically on tracker update #172

Update ETS atomically on tracker update #172

Conversation

jonatanklosko commented May 24, 2023 • edited

jonatanklosko May 24, 2023

Choose a reason for hiding this comment

chrismccord commented May 24, 2023

chrismccord commented May 24, 2023

jonatanklosko commented May 24, 2023

arjan commented Jun 13, 2023

jonatanklosko commented Jun 13, 2023 • edited

arjan commented Jun 14, 2023

jonatanklosko commented May 24, 2023 •

edited

jonatanklosko commented Jun 13, 2023 •

edited