remove `synchronized` from `LocalCache::get(key, loader)` to allow for `VirtualThread`-friendly value-loading #6845

cortlepp · 2023-11-21T13:28:47Z

API(s)

com.google.common.cache.LocalCache$Segment.lockedGetOrLoad

How do you want it to be improved?

This load should be possible without using synchronized.

Why do we need it to be improved?

Using synchronized here is not performant when the current Thread is a Java 21 VirtualThread and the loader does some blocking operation, because the synchronized prevents the JVM from unmounting the Thread. The performance could be drastically improved by using p.e. ReentrantLock.

Example

Thread.ofVirtual().start( () -> {
  Database database;
  var databaseLoader = new CacheLoader<String, Object>() {
        @Override
        public Object load(String key) {
            return database.load(key); // loads an object by querying a database over the network 
        }
    };
var cache = CacheBuilder.newBuilder().build(databaseLoader);
var fooObject = cache.getOrLoad("foo"); // cache miss, invokes the loader
});

Current Behavior

In the above example, the loader is invoked. This invocation happens inside a synchronized block in com.google.common.cache.LocalCache$Segment.lockedGetOrLoad. Because the loader does I/O but is inside the synchronized block the VirtualThread can not be unmounted and therefore blocks its carrier. This is bad for the throughput of the application. This behavior can be observed using any loader that does a blocking operation when it is invoked from a VirtualThread (just start your JVM with -Djdk.tracePinnedThreads=full). The performance impact is dependent on how long the loader takes, but since generally one only caches things that are expensive to do I believe this to be a problem for pretty much anyone using LocalCache in combination with VirtualThreads.

Desired Behavior

com.google.common.cache.LocalCache$Segment.lockedGetOrLoad should work gracefully with VirtualThreads and not block the carrier Thread if a loader does I/O or some other blocking operation.

The method should therefore be refactored to lock using some other method, p.e. a ReentrantLock.

Concrete Use Cases

We use a LocalCache to cache frequently used database objects. Our loader therefore executes a SQL query over the network, which is a blocking operation and takes comparatively long. We are currently experimenting with using VirtualThreads in order to improve the throughput of I/O heavy workloads.

Checklist

I agree to follow the code of conduct.
I have read and understood the contribution guidelines.
I have read and understood Guava's philosophy, and I strongly believe that this proposal aligns with it.

The text was updated successfully, but these errors were encountered:

cortlepp · 2023-11-21T13:39:49Z

I would be happy to work on a PR for this myself, If someone can explain to me what exactly the synchronized is currently trying to prevent. The comment suggests that otherwise something can load "recursively". How can one recursively call this method? And how does a synchronized prevent this, can't it just at best only delay it?

I have been trying to wrap my head around this for some time now (without success), and I can't find a test case that covers this.

cpovirk · 2023-11-21T14:33:45Z

The fix was made before common.cache was open-sourced, so I can't link to a specific commit for it.

The fix cited #369. It added the following test, which was removed as part of a switch to "white-box" tests, again before the code was open-sourced:

    public void testRecursiveDuplicateComputation() throws InterruptedException {
      final class RecursiveFunction implements Function<String, String> {
        Map<String, String> map;
        @Override public String apply(String key) {
          return map.get(key);
        }
      }
      final RecursiveFunction computeFunction = new RecursiveFunction();
      final ConcurrentMap<String, String> map = new MapMaker()
          .makeComputingMap(computeFunction);
      computeFunction.map = map;

      // tells the test when the compution has completed
      final CountDownLatch doneSignal = new CountDownLatch(1);

      Thread thread = new Thread() {
        @Override public void run() {
          try {
            map.get("foo");
          } finally {
            doneSignal.countDown();
          }
        }
      };
      thread.setUncaughtExceptionHandler(new UncaughtExceptionHandler() {
        @Override public void uncaughtException(Thread t, Throwable e) {}
      });
      thread.start();

      boolean done = doneSignal.await(1, TimeUnit.SECONDS);
      if (!done) {
        StringBuilder builder = new StringBuilder();
        for (StackTraceElement trace : thread.getStackTrace()) {
          builder.append("\tat ").append(trace).append('\n');
        }
        fail(builder.toString());
      }
    }

The diff to the prod code was:

@@ -17,6 +17,7 @@
 package com.google.common.collect;
 
 import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
 
 import com.google.common.base.Equivalence;
 import com.google.common.base.Function;
@@ -119,7 +120,13 @@
             // This thread solely created the entry.
             boolean success = false;
             try {
-              V value = computingValueReference.compute(key, hash);
+              V value = null;
+              // Synchronizes on the entry to allow failing fast when a
+              // recursive computation is detected. This is not full-proof
+              // since the entry may be copied when the segment is written to.
+              synchronized (entry) {
+                value = computingValueReference.compute(key, hash);
+              }
               checkNotNull(value, "compute() returned null unexpectedly");
               success = true;
               return value;
@@ -136,6 +143,7 @@
         try {
           while (true) {
             try {
+              checkState(!Thread.holdsLock(entry), "Recursive computation");
               ValueReference<K, V> valueReference = entry.getValueReference();
               V value = valueReference.waitForValue();
               if (value == null) {

The holdsLock check still exists. I assume that that's what the lock is for.

Perhaps we should be using a ThreadLocal instead? (ThreadLocal also comes up in discussions of "things to avoid when using virtual threads." But I think the concern there is with things like ThreadLocal<byte[]> that can hold lots of memory. So if all we did was set a boolean and then remove it afterward, we'd likely be fine.)

All that said: Our usual recommendation is to use Caffeine instead of our caches. Caffeine has so far taken the position that it's not worth restructuring the code to support virtual threads because eventually synchronized won't force threads to be pinned (ben-manes/caffeine#1018, ben-manes/caffeine#860).

I don't know whether we'd act on a PR here or not. On the one hand, I'm shocked that this is the only use of synchronized in common.cache, so eliminating it might actually provide meaningful improvements. (I gather from your comment about -Djdk.tracePinnedThreads=full that you haven't seen other pinning, such an inside JDK libraries that common.cache uses?) On the other hand, we really don't want to encourage people to move away from Caffeine, nor give the impression that Guava is virtual-thread-friendly on the whole. (Notably, we have one of those ThreadLocal<char> instances in escape.Platform.)

ben-manes · 2023-11-21T17:16:34Z

My recollection is that this was added in one of my first bug fixes when learning that code. The original author was on sabbatical and no one else knew the code very well, but a commonly reported footgun were deadlocks in production due to recursive loads. I didn't want to use a ThreadLocal because those had a reputation for performance issues (multiple rewrites). The alternative was to stash the thread id in the loading entry as extra bookkeeping, but that needed care for proper clean up. I advocated for the holdsLock trick because an uncontended lock is free, it had no extra memory (header bit), there were no observed degradations, and it was a very low risk change to unfamiliar code. We then unofficially supported recursive loads despite discouraging them, and future discussions wrt ConcurrentHashMap facing the same problem. (My name might not be on the internal commit bit due to my manager having preferred that our team ships CLs to others to avoid perceptions that 20% projects might distract from the core project's timeline)

Virtual thread support in mutexes is on going work (latest status). It may have been premature to have released the feature in Java 21 without it, as it is very easy to deadlock in JDK only code due to ample usage of synchronization throughout the standard library. The answer isn't to remove all of that and switching to ReentrantLock was only meant as advice for those wanting to experiment for feedback purposes. I wouldn't use VTs in production until these footguns are resolved, which will likely happen before most are ready to upgrade to 21 anyway.

I still stand by my position in Caffeine that it is not worth the effort and instead the JDK team will solve it soonish. It is good to be aware of, but primarily as a reason to avoid adoption of VTs until they are more robust.

cortlepp · 2023-11-21T19:24:19Z

Ok a couple of things first just so I understand correctly:

the recursive call is not something that regularly happens. It is always an error, this synchronized is only a fail-fast mechanism and as such doesn't fix anything
If I were to remove both the synchronized and the Thread.holdsLock call everything would work fine, as long as I don't try to load a value recursively. If I would though then I'd get a deadlock

Regarding your question about other pinning @cpovirk : So far I haven't discovered anything else, but I'm still working my way through it.

Regarding the possible fix using a ThreadLocal: I also think this would work, and I would also the agree that it should not be a problem performance/memory wise using either platform or virtual threads (and since we are on the slow path anyways I at least would not worry too much about it). Even though It pains me a bit to be using a ThreadLocal, I think in this case, since we can just clean everything up inside the finally this should really be manageable (ThreadLocal creation and destruction would only be ~10 loc apart). If we just put a static object inside the ThreadLocal (and then check whether it is there or not) the memory footprint should also be fairly minimal. For what it's worth I think this would be a fairly small fix that would bring a lot of value to everyone who already wants to adopt VirtualThreads, all of their problems aside.

The only question for me that remains is: would you consider a PR for this? I will probably end up implementing either the ThreadLocal solution or just throw out the synchronized in my fork (I am doing this for a thesis, so I don't absolutely need a real released and permanent solution, but I would definitely prefer one).

But I also absolutely understand your point though @ben-manes . I just happen to need a solution now, so I kind of just need to deal with all the quirks that VirtualThreads currently have ^^.

cortlepp · 2023-11-22T07:26:04Z

I went ahead and implemented the ThreadLocal solution in my fork. Unlike I said yesterday though I didn't put a static object inside the ThreadLocal but the entry itself, because otherwise we would also fail on the (maybe a bit weird) legal case that someone tries to load a different key in their loader.

An edgecase that this implementation would not detect is if one tries to recursively load the same entry, but with another "normal" loader in the middle. In that case the threadlocal would be reset by the second loader, and the third one could no longer determine that loader nr. 1 already tried to load this entry. But this really seems very niche to me.

I also wrote a test for it. Please do let me know If you would like me to submit a PR, or if you have other suggestions or comments.

ben-manes · 2023-11-22T14:25:30Z

Unfortunately circular recursive calls (a -> b -> a) is not unheard of for longer dependency chains. The recursion is rarely intentional but rather a side-effect of calling an application api that circles back eventually. A fast fail is much better than a deadlock since the developer won’t understand at first what went wrong.

Thus, the ThreadLocal would need to capture the list of loading entries. That’s still not horrible or difficult to implement.

Alternatively the loading reference entry would need to capture the thread. This entry type is temporary and swapped out once complete. That would avoid the entry copying concerns that the doc comment mentions since the copy could capture the thread as needed. This is probably the ideal approach, just a bit more work and think time to avoid regressions.

If you can use Caffeine, then AsyncCache is compatible as it defers the compute to a future so the monitor is held briefly for establishing the mapping. The executor could be set to a VT and the synchronous view used for convenience.

cortlepp · 2023-11-24T12:23:01Z

@ben-manes I updated my implementation to remeber the loading Thread like you suggested and added a test for the "proxied recursive load" scenario that this is supposed to fix (and it seems to be working for me locally). I didn't find a copying of the loading entry though, so I didn't adjust anything there.
Also @cpovirk what do you think about this now?

Thanks for the hint, I'm going to have a look at it. But if possible I would still prefer to fix the issue in guava. We use guava caches quite extensively, and I'm guessing that switching to caffeine would require a pretty major rework.

ben-manes · 2023-11-24T21:41:21Z

Those changes look good. From a quick review it doesn't seem this entry type needs to be copied (copyEntry / copyFor return itself) so presumably the original doc comment may have been an incorrect concern.

Caffeine is mostly a drop-in replacement for Guava's cache with minor differences. The performance difference can be quite stark, see below for a zipfian workload on my macbook with 16 threads and varying Guava's concurrencyLevel (defaults to 4). Whether that translates to realizable gains is application specific, so you might consider capturing a jfr profile to see if the caches remain a bottleneck after your fix.

cortlepp · 2023-11-25T14:36:16Z

Ok great. Can I open a PR for this now?

Cool. A migration to Caffeine is definitely on our radar and will happen at some point.

cpovirk · 2023-11-25T14:41:24Z

No guarantees, but I'd say that opening a PR is likely to be worth your time. I can have a look at it and run some tests. Thanks.

Fixes #6851 Fixes #6845 RELNOTES=n/a PiperOrigin-RevId: 586666218

cpovirk · 2023-12-18T21:30:24Z

The fix is now released as part of 33.0.0.

…12730). The `LocalCache` change led to [false reports of recursion during refresh](#6851 (comment)). This CL keeps the new tests from the `LocalCache` change, which already would have passed before the CL. Ideally we would additionally include [tests that demonstrate the refresh problem](#6851 (comment)), but we'll need to see if the contributor can sign a CLA. This rollback un-fixes #6845. Users of `common.cache` and virtual threads will likely have to wait until `synchronized` no longer pins threads. (And at that point, if not earlier, they should migrate to [Caffeine](https://github.com/ben-manes/caffeine) :)) RELNOTES=`cache`: Fixed a bug that could cause [false "recursive load" reports during refresh](#6851 (comment)). PiperOrigin-RevId: 605049501

…12730). The `LocalCache` change led to [false reports of recursion during refresh](#6851 (comment)). This CL keeps the new tests from the `LocalCache` change, which already would have passed before the CL. Ideally we would additionally include [tests that demonstrate the refresh problem](#6851 (comment)), but we'll need to see if the contributor can sign a CLA. This rollback un-fixes #6845. Users of `common.cache` and virtual threads will likely have to wait until `synchronized` no longer pins threads. (And at that point, if not earlier, they should migrate to [Caffeine](https://github.com/ben-manes/caffeine) :)) RELNOTES=`cache`: Fixed a bug that could cause [false "recursive load" reports during refresh](#6851 (comment)). PiperOrigin-RevId: 605069776

cpovirk · 2024-03-20T18:08:18Z

...and, as #6851 (comment) notes, the fix was rolled back for 33.1.0.

Given that the fix was really more of a workaround for a temporary issue with virtual threads (on which there has been progress), and given that we continue to recommend Caffeine over Guava's cache, and given that our previous fix caused problems, I don't anticipate that we'll work on this further.

cortlepp added the type=enhancement Make an existing feature better label Nov 21, 2023

google deleted a comment from SteveBombardier Nov 22, 2023

cortlepp mentioned this issue Nov 26, 2023

refactor: make LocalCache not use synchronized to detect recursive loads (#6845) #6851

Closed

cpovirk added package=cache P2 labels Dec 1, 2023

copybara-service bot pushed a commit that referenced this issue Dec 1, 2023

Make LocalCache not use synchronized to detect recursive loads.

52d072a

Fixes #6851 Fixes #6845 RELNOTES=n/a PiperOrigin-RevId: 586666218

copybara-service bot mentioned this issue Dec 1, 2023

Make LocalCache not use synchronized to detect recursive loads. #6857

Merged

copybara-service bot pushed a commit that referenced this issue Dec 1, 2023

Make LocalCache not use synchronized to detect recursive loads.

4bbed94

Fixes #6851 Fixes #6845 RELNOTES=n/a PiperOrigin-RevId: 586666218

copybara-service bot pushed a commit that referenced this issue Dec 1, 2023

Make LocalCache not use synchronized to detect recursive loads.

4117d35

Fixes #6851 Fixes #6845 RELNOTES=n/a PiperOrigin-RevId: 586666218

copybara-service bot pushed a commit that referenced this issue Dec 7, 2023

Make LocalCache not use synchronized to detect recursive loads.

8980fca

Fixes #6851 Fixes #6845 RELNOTES=n/a PiperOrigin-RevId: 586666218

copybara-service bot closed this as completed in 1512730 Dec 7, 2023

copybara-service bot mentioned this issue Feb 7, 2024

Roll back [the change that made LocalCache avoid synchronized](https://github.com/google/guava/commit/1512730820a99e329a475b7143b7295775ef26ba). #6972

Merged

g-stream mentioned this issue Mar 5, 2024

update guava version to aviod guava cache pin problem sofastack/sofa-ark#853

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove `synchronized` from `LocalCache::get(key, loader)` to allow for `VirtualThread`-friendly value-loading #6845

remove `synchronized` from `LocalCache::get(key, loader)` to allow for `VirtualThread`-friendly value-loading #6845

cortlepp commented Nov 21, 2023

cortlepp commented Nov 21, 2023

cpovirk commented Nov 21, 2023

ben-manes commented Nov 21, 2023 •

edited

cortlepp commented Nov 21, 2023

cortlepp commented Nov 22, 2023

ben-manes commented Nov 22, 2023 •

edited

cortlepp commented Nov 24, 2023

ben-manes commented Nov 24, 2023

cortlepp commented Nov 25, 2023

cpovirk commented Nov 25, 2023 •

edited

cpovirk commented Dec 18, 2023

cpovirk commented Mar 20, 2024

remove synchronized from LocalCache::get(key, loader) to allow for VirtualThread-friendly value-loading #6845

remove synchronized from LocalCache::get(key, loader) to allow for VirtualThread-friendly value-loading #6845

Comments

cortlepp commented Nov 21, 2023

API(s)

How do you want it to be improved?

Why do we need it to be improved?

Example

Current Behavior

Desired Behavior

Concrete Use Cases

Checklist

cortlepp commented Nov 21, 2023

cpovirk commented Nov 21, 2023

ben-manes commented Nov 21, 2023 • edited

cortlepp commented Nov 21, 2023

cortlepp commented Nov 22, 2023

ben-manes commented Nov 22, 2023 • edited

cortlepp commented Nov 24, 2023

ben-manes commented Nov 24, 2023

cortlepp commented Nov 25, 2023

cpovirk commented Nov 25, 2023 • edited

cpovirk commented Dec 18, 2023

cpovirk commented Mar 20, 2024

remove `synchronized` from `LocalCache::get(key, loader)` to allow for `VirtualThread`-friendly value-loading #6845

remove `synchronized` from `LocalCache::get(key, loader)` to allow for `VirtualThread`-friendly value-loading #6845

ben-manes commented Nov 21, 2023 •

edited

ben-manes commented Nov 22, 2023 •

edited

cpovirk commented Nov 25, 2023 •

edited