Add a busy_handler_timeout setter #443

fractaledmind · 2023-12-21T16:40:23Z

One of the largest pain-points when using SQLite in a Rails app is the limited concurrency support. While SQLite does support only one writer, from a Rails app's point-of-view this does not mean that that app simply cannot run in, for example, Puma clustered mode. As I have detailed, the primary issues are that the GVL isn't released while the SQLite busy_timeout C function is running and transactions are run in DEFERRED mode.

We can solve the first of these by providing a Ruby busy_handler callback that will release the GVL between retries. This allows multiple threads to naturally coordinate and "queue up" to acquire SQLite's single write lock.

I initially proposed providing this in the Rails codebase (see: rails/rails#50370); however, it was suggested that this functionality more naturally belongs in this codebase. So, I would love to start brainstorming the interface for this. Once built and released, we can then update Rails to use this instead of the busy_timeout.

Jean Boussier suggested lock_wait_timeout as a name. I'm opening this PR to get the ball rolling. I'm open to naming suggestions as well as test suggestions.

fractaledmind · 2023-12-27T15:29:58Z

@suwyn Do you happen to have a better alternative name for this? I know that you have been thinking about this problem space for a while, so I imagine you might have some language around distinguishing a Ruby busy_timeout from the SQLite/C busy_timeout in a way that is succinct but communicative.

suwyn · 2023-12-27T16:11:15Z

Oh boy, naming is hard! Maybe busy_handler_timeout?

fractaledmind · 2023-12-27T16:13:37Z

I like that. I'll switch to that

suwyn · 2023-12-27T16:16:53Z

@fractaledmind Also consider we'll want to configure this from the database config file so adding it as an option would be beneficial. Here is an example from #426

Default of nil which wouldn't set the busy handler.

suwyn · 2023-12-27T16:29:15Z

Also noting for prosperity why this is needed. The underlining issue with the interpreter is documented here.

From my experiments there is a lot of speed to be gained if there was a way to fix the underlining issue.

fractaledmind · 2023-12-27T16:35:46Z

This will be set in Rails the same way that busy_timeout is currently set. Rails' database.yml won't change at all, it will just be that the timeout value will be passed to this function and not the busy_timeout function.

flavorjones · 2023-12-27T18:28:43Z

Ignore any CI failures for "native packaging" with "head", that's only because the HEAD version rolled over to 3.4.dev (you should be able to fix with a rebase now that #447 has been merged)

lib/sqlite3/database.rb

…e GVL between connection retries, but also errors after the timeout passes

Don't use a modulo and pre-compute the timeout_deadline

fractaledmind · 2023-12-30T18:39:38Z

@djmb @rosa: I was just reading thru the SolidQueue source as I work on integrating it into a project I have at work. I came across this commit that patches Rails' SQLite3 adapter to improve concurrency. Reading the commit description, it sounds like you have had similar experiences as I have working to get SQLite to handle concurrency in a reasonable and resilient way.

I note that in your comment and patch, you lean on the retries option and patch it to have linear backoff (sleeping count milliseconds on each busy_handler invocation). I'd love to hear more about what, in particular, was causing problems with the current Rails implementation of retries that doesn't backoff between retries. I'd also be keen to get your perspective on this implementation of a busy_handler and the general plan for Rails. To recap conversations had across repos, the plan as it stands currently is:

get this PR merged into the SQLite3 driver to provide a busy_handler_timeout method that accepts a timeout in milliseconds, but uses a Ruby busy_handler instead of the SQLite C busy_timeout to ensure that the GVL is released between retries
deprecate the retries option in Rails for the SQLite adapter
have the timeout option in Rails for the SQLite adapter call this busy_handler_timeout method from the SQLite3 driver instead of the busy_timeout method

After working with the retries option for some time, and getting feedback from the community, I believe that the mental model of timeout is clearer and people better understand what kind of value to set for their needs. However, we need to deal with the GVL lock issue of the busy_timeout method in this driver. Does the current plan make sense given you and your team's experience with SQLite usage in real-world apps? Does the current implementation of the busy_handler_timeout here make sense given you and your team's experience?

I believe your perspective could help ensure that both this PR and the larger plan for Rails' SQLite support are as solid as possible.

… (hopefully)

fractaledmind · 2024-01-02T00:49:54Z

@flavorjones @byroot: Added two simple tests focusing on the GVL blocking difference between busy_timeout and busy_handler_timeout. AppVeyor is green now. Is there anything still blocking here?

…meout

suwyn · 2024-01-02T14:34:19Z

lib/sqlite3/database.rb

+    # while SQLite sleeps and retries.
+    def busy_handler_timeout=( milliseconds )
+      timeout_seconds = milliseconds.fdiv(1000)
+      timeout_deadline = Process.clock_gettime(Process::CLOCK_MONOTONIC) + timeout_seconds


@fractaledmind Looking at the code again, I believe there is a bug here.

This is setting a clock based timeout deadline at the time busy_handler_timeout is set rather than when the busy_handler gets invoked for the first time. Essentially setting a hard timeout for the timeout_deadline for all invocations of the busy handler.

Consider the following (I didn't test it but I believe this should show the issue, if I am reading the code correctly)

busy_handler_timeout_db = SQLite3::Database.open( "test.db" ) busy_handler_timeout_db.busy_handler_timeout = 1000 sleep 1.5 # timeout_deadline should be expired now since that was set # at the point in time when we set busy_handler_timeout to 1000. # Any invocations of the busy handler will now throw busy errors

I think you're right. Great catch. Will fix when I'm back at my laptop

@suwyn Checkout 48daae1

Looks better @fractaledmind

It still has me wondering if it's thread safe though since it is using an instance variable. If two threads used the same connection, you'd have an issue. I did some experimenting in Rails and the connection pool appears to be thread safe (as it states in the docs) .

Experimenting with the gem on its own, I wasn't able quickly able to cause any issue. SQLite doesnt have a way to sleep so I stopped short of writing a long running UDF.

I think the ideal solution is to either bake the busy handler into the C code or pass the time of first invocation into the handler from C, like it does with count. I'd obviously strongly prefer the former, but have no idea on what that would take and outside of the scope of this PR.

So short of using the count to estimate the clock time (e.g. the (count * RETRY_INTERVAL) > timeout_seconds approach) instead of the clock time we'll just have to know that a race condition probably exists.

@suwyn: You should join this Discord server where we talk a lot about the SQLite ecosystem in Ruby: https://discord.gg/ehdDh5C4

That would allow us to connect more easily and even pair on this. I'd love some help on writing useful tests. I have a long running UDF already written:

WITH RECURSIVE r(i) AS ( VALUES(0) UNION ALL SELECT i FROM r LIMIT 10000000 ) SELECT i FROM r ORDER BY i LIMIT 1;

What I don't have are resilient and expansive tests. Want to work on getting this over the line together?

djmb · 2024-01-03T11:20:36Z

I note that in your comment and patch, you lean on the retries option and patch it to have linear backoff (sleeping count milliseconds on each busy_handler invocation). I'd love to hear more about what, in particular, was causing problems with the current Rails implementation of retries that doesn't backoff between retries.

Hi @fractaledmind! The issue with using retries without a backoff was that we would still get SQLite3::BusyExceptions.

I've just tried it out with no sleep and to avoid the BusyExceptions, I need to set the retries to about 1,000,000, at which point it seems to hang. I don't know exactly what the mechanism is there though - whether the thread holding the lock is actually blocked or if it is being starved.

I'd also be keen to get your perspective on this implementation of a busy_handler and the general plan for Rails. To recap conversations had across repos, the plan as it stands currently is:

get this PR merged into the SQLite3 driver to provide a busy_handler_timeout method that accepts a timeout in milliseconds, but uses a Ruby busy_handler instead of the SQLite C busy_timeout to ensure that the GVL is released between retries
deprecate the retries option in Rails for the SQLite adapter
have the timeout option in Rails for the SQLite adapter call this busy_handler_timeout method from the SQLite3 driver instead of the busy_timeout method
After working with the retries option for some time, and getting feedback from the community, I believe that the mental model of timeout is clearer and people better understand what kind of value to set for their needs. However, we need to deal with the GVL lock issue of the busy_timeout method in this driver. Does the current plan make sense given you and your team's experience with SQLite usage in real-world apps? Does the current implementation of the busy_handler_timeout here make sense given you and your team's experience?

I believe your perspective could help ensure that both this PR and the larger plan for Rails' SQLite support are as solid as possible.

We only used retries here because the current timeout wouldn't release the GVL lock, so your plan here sounds perfect. I agree that timeout is much more intuitive.

It seems likely we will still need a sleep in the handler to avoid the issues we've been seeing, but it would be good to know why. I'll see if I can debug what's going on in Solid Queue myself. It's tricky because the I/O of logging anything from the handler seems to have the same effect as a sleep and the BusyExceptions go away.

If you want to dig about yourself, the patch for sleeping is to allow the tests to pass, you can run them with TARGET_DB=sqlite rails test. Removing the patch should give you the BusyExceptions.

tenderlove · 2024-01-03T17:28:09Z

lib/sqlite3/database.rb

+        elsif now > @timeout_deadline
+          next false
+        else
+          sleep(0.001)


Is this sleep necessary? I think the VM is interruptible on any method call so the calls to clock_gettime etc should do the trick (I think).

@tenderlove it's not about being interruptible, it's about forcibly releasing the GVL. The assumption is that the client that is currently holding the SQLite3 write lock might be another thread in the same process, hence we should try to switch back to it rather than busy loop for 100ms until the thread scheduler quantum is reached.

And even if it's not in that process, yielding the GVL allow other unrelated threads to proceed.

NB: for the "in same process case" a shared Ruby mutex would be way more efficient, but we'd need some way to tell two clients are pointing that the same database, hence should share a single Mutex.

The assumption is that the client that is currently holding the SQLite3 write lock might be another thread in the same process, hence we should try to switch back to it

I see. In that case wouldn't Thread.pass also do the trick?

It would yes, but then the risk is that there's no other ready thread, causing the process to essentially be busy looping, which would pin one CPU to 100% and may not be desirable. A very short sleep actually make sense IMO.

Sounds good to me!

tenderlove

I'm fine with the patch if @byroot is OK with it. I had one question wrt the sleep (but including the sleep is fine if it's necessary)

flavorjones · 2024-01-04T12:28:06Z

@fractaledmind this is causing failures in CI on windows

  1) Failure:
TC_Integration_Pending#test_busy_timeout_blocks_gvl [D:/a/sqlite3-ruby/sqlite3-ruby/test/test_integration_pending.rb:100]:
SQLite3::BusyException expected but nothing was raised.

see e.g. https://github.com/sparklemotion/sqlite3-ruby/actions/runs/7403622873/job/20143757496

Can you please take a look? Or let me know if you'd prefer me to revert.

fractaledmind · 2024-01-04T12:57:17Z

I have been trying to debug, but can't reproduce. I am now trying to research a more direct way to test "holds GVL" vs "releases GVL" for these two. I opened a PR with a fix that grounds the tests in the same kind of setup that was used in the other tests, which I am confident is more deterministic: #456

On a related note, is there a way to ensure local tests are running with current head version of SQLite? It seems my tests run against my default macOS version of SQLite.

flavorjones · 2024-01-04T17:27:49Z

On a related note, is there a way to ensure local tests are running with current head version of SQLite? It seems my tests run against my default macOS version of SQLite.

By default this should not happen. Try:

bundle exec rake clean clobber
bundle exec rake compile
bundle exec rake test

If you're still using the system libraries, open a new issue and attach

full output from these commands
the Makefile and mkmf.log from the compilation phase (which should be under ./tmp/x86_64-darwin/sqlite3_native/3.2.2 or similarly-named directory)

This is the outcome of running: git revert f80c5ff..0c93d30

…ndler-timeout-for-now Revert pull request #443 from fractaledmind/lock-wait-timeout

$fractaledmind$

byroot reviewed Dec 27, 2023

View reviewed changes

lib/sqlite3/database.rb Outdated Show resolved Hide resolved

$@fractaledmind$ fractaledmind changed the title ~~Add a lock_wait_timeout setter~~ Add a busy_handler_timeout setter Dec 27, 2023

byroot reviewed Dec 28, 2023

View reviewed changes

lib/sqlite3/database.rb Outdated Show resolved Hide resolved

fractaledmind added 5 commits December 29, 2023 15:40

$@fractaledmind$

Add a lock_wait_timeout setter to set a busy_handler that releases th…

b05600e

…e GVL between connection retries, but also errors after the timeout passes

$@fractaledmind$

Rename to busy_handler_timeout

0b2ff58

$@fractaledmind$

Update test_integration_pending.rb

ec687cf

$@fractaledmind$

Simplify the busy_handler_timeout

6ef4db2

Don't use a modulo and pre-compute the timeout_deadline

$@fractaledmind$

Update database.rb

17c9db8

$@fractaledmind$ fractaledmind force-pushed the lock-wait-timeout branch from 9179ede to 17c9db8 Compare December 29, 2023 14:40

$@fractaledmind$ fractaledmind mentioned this pull request Jan 1, 2024

Ensure SQLite transaction default to IMMEDIATE mode rails/rails#50371

Open

4 tasks

fractaledmind added 3 commits January 2, 2024 00:32

$@fractaledmind$

Write better tests of the busy_handler_timeout

e84b777

$@fractaledmind$

Make the tests more expressive to make failures (why) easier to debug…

a8d1715

… (hopefully)

$@fractaledmind$

Try to create a more resilient test for the busy_handler_timeout

77906c4

fractaledmind added 2 commits January 2, 2024 02:12

$@fractaledmind$

Add additional assertion to ensure that timings are within a range

59ecd77

$@fractaledmind$

Make test directly compare time using busy_timeout vs busy_handler_ti…

9444b9b

…meout

suwyn reviewed Jan 2, 2024

View reviewed changes

fractaledmind added 2 commits January 2, 2024 17:17

$@fractaledmind$

Fix bug with busy_handler_timeout and write better tests

48daae1

$@fractaledmind$

Ensure that db connections are closed

0c93d30

tenderlove reviewed Jan 3, 2024

View reviewed changes

tenderlove approved these changes Jan 3, 2024

View reviewed changes

tenderlove merged commit 0d487ae into sparklemotion:main Jan 3, 2024
100 of 107 checks passed

flavorjones added a commit that referenced this pull request Jan 4, 2024

Revert pull request #443 from fractaledmind/lock-wait-timeout

c248cde

This is the outcome of running: git revert f80c5ff..0c93d30

tenderlove added a commit that referenced this pull request Jan 4, 2024

Merge pull request #457 from sparklemotion/flavorjones-revert-busy-ha…

5361528

…ndler-timeout-for-now Revert pull request #443 from fractaledmind/lock-wait-timeout

$@fractaledmind$ fractaledmind deleted the lock-wait-timeout branch January 7, 2024 11:55

$@fractaledmind$ fractaledmind mentioned this pull request Jan 7, 2024

busy_handler_timeout pt2 #456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a busy_handler_timeout setter #443

Add a busy_handler_timeout setter #443

$@fractaledmind$ fractaledmind commented Dec 21, 2023

fractaledmind commented Dec 27, 2023

suwyn commented Dec 27, 2023

fractaledmind commented Dec 27, 2023

suwyn commented Dec 27, 2023

suwyn commented Dec 27, 2023

fractaledmind commented Dec 27, 2023

flavorjones commented Dec 27, 2023 •

edited

fractaledmind commented Dec 30, 2023

fractaledmind commented Jan 2, 2024

suwyn Jan 2, 2024 •

edited

$@fractaledmind$ fractaledmind Jan 2, 2024

$@fractaledmind$ fractaledmind Jan 2, 2024

suwyn Jan 3, 2024

$@fractaledmind$ fractaledmind Jan 3, 2024

djmb commented Jan 3, 2024

tenderlove Jan 3, 2024

byroot Jan 3, 2024

tenderlove Jan 3, 2024

byroot Jan 3, 2024

tenderlove Jan 3, 2024

tenderlove left a comment

flavorjones commented Jan 4, 2024

fractaledmind commented Jan 4, 2024 •

edited

flavorjones commented Jan 4, 2024

Add a busy_handler_timeout setter #443

Add a busy_handler_timeout setter #443

Conversation

fractaledmind commented Dec 21, 2023

fractaledmind commented Dec 27, 2023

suwyn commented Dec 27, 2023

fractaledmind commented Dec 27, 2023

suwyn commented Dec 27, 2023

suwyn commented Dec 27, 2023

fractaledmind commented Dec 27, 2023

flavorjones commented Dec 27, 2023 • edited

fractaledmind commented Dec 30, 2023

fractaledmind commented Jan 2, 2024

suwyn Jan 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djmb commented Jan 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tenderlove left a comment

Choose a reason for hiding this comment

flavorjones commented Jan 4, 2024

fractaledmind commented Jan 4, 2024 • edited

flavorjones commented Jan 4, 2024

$@fractaledmind$ fractaledmind commented Dec 21, 2023

flavorjones commented Dec 27, 2023 •

edited

suwyn Jan 2, 2024 •

edited

fractaledmind commented Jan 4, 2024 •

edited