Eliminate leak of non-concrete subclass references #8591

headius · 2025-01-27T04:50:59Z

My implementation of a new subclasses structure in #8462
was a bit too "clever" leading to a potential leak of subclass
references. To avoid using a ReferenceQueue, I tied the "cleaning"
of the subclasses list to encountering a percentage of empty
subclass references during traversal. However if traversal only
walks concrete subclass references (via Class#subclasses),
empty non-concrete (singleton, included, prepended) subclass
references could accumulate. This would only be cleared out with
a full (not concrete-only) subclass walk, which only happened for
a few internal uses of this list.

The patch here reverts to the Map-based approach previously
used with two changes:

Instead of a fully-concurrent or synchronized WeakHashMap, we
use read/write locking to favor mostly-read operations like
Class#subclasses and hierarchy-walking.
Concrete subclasses are tracked separately in a second map, to
allow fast performance of the user-facing Class#subclasses.

Both map fields are lazily initialized:

If the operation is for read and the field is null, they are
initialized to Collections.EMPTY_MAP.
If the operation is for write and the field is null or
Collections.EMPTY_MAP, they are initialized to a new
read/write locking WeakHashMap.

headius · 2025-01-27T04:57:02Z

Performance comparison with 9.4.10.0, which leaked but had a more efficient data structure for traversal (needed for Class#subclasses:

9.4.10.0:

1 thread Numeric.subclasses
                          2.638k (± 7.8%) i/s  (379.10 μs/i) -     13.167k in   5.028838s
1 thread Object.subclasses
                        226.221 (± 3.5%) i/s    (4.42 ms/i) -      1.134k in   5.020683s
1 thread Custom.subclasses with 1000 singletons
                          5.680k (± 3.2%) i/s  (176.04 μs/i) -     28.779k in   5.072108s
5 thread Numeric.subclasses
                          3.022k (± 8.1%) i/s  (330.87 μs/i) -     14.994k in   5.001531s
5 thread Object.subclasses
                        433.825 (±13.6%) i/s    (2.31 ms/i) -      2.115k in   4.998935s
5 thread Custom.subclasses with 1000 singletons
                          3.568k (± 7.5%) i/s  (280.24 μs/i) -     17.982k in   5.068942s
10 thread Numeric.subclasses
                          1.995k (± 6.3%) i/s  (501.32 μs/i) -     10.098k in   5.089584s
10 thread Object.subclasses
                        465.116 (±12.7%) i/s    (2.15 ms/i) -      2.279k in   5.004614s
10 thread Custom.subclasses with 1000 singletons
                          1.981k (± 5.2%) i/s  (504.75 μs/i) -     10.036k in   5.079595s
50 thread Numeric.subclasses
                        418.415 (± 6.2%) i/s    (2.39 ms/i) -      2.100k in   5.037772s
50 thread Object.subclasses
                        318.736 (± 8.5%) i/s    (3.14 ms/i) -      1.584k in   5.010015s
50 thread Custom.subclasses with 1000 singletons
                        422.564 (± 3.5%) i/s    (2.37 ms/i) -      2.142k in   5.076173s

New impl:

1 thread Numeric.subclasses
                          1.736k (± 5.6%) i/s  (575.96 μs/i) -      8.692k in   5.024334s
1 thread Object.subclasses
                         98.895 (± 3.0%) i/s   (10.11 ms/i) -    495.000 in   5.009933s
1 thread Custom.subclasses with 1000 singletons
                          4.784k (± 4.6%) i/s  (209.03 μs/i) -     23.922k in   5.011886s
5 thread Numeric.subclasses
                        497.323 (± 3.8%) i/s    (2.01 ms/i) -      2.499k in   5.032506s
5 thread Object.subclasses
                         76.209 (±10.5%) i/s   (13.12 ms/i) -    371.000 in   5.039466s
5 thread Custom.subclasses with 1000 singletons
                          2.996k (± 5.4%) i/s  (333.75 μs/i) -     14.950k in   5.005631s
10 thread Numeric.subclasses
                        412.690 (± 6.1%) i/s    (2.42 ms/i) -      2.100k in   5.106619s
10 thread Object.subclasses
                         73.359 (±10.9%) i/s   (13.63 ms/i) -    364.000 in   5.048339s
10 thread Custom.subclasses with 1000 singletons
                          1.832k (±10.7%) i/s  (545.74 μs/i) -      9.071k in   5.048197s
50 thread Numeric.subclasses
                        294.017 (±41.5%) i/s    (3.40 ms/i) -      1.218k in   5.308313s
50 thread Object.subclasses
                         66.175 (±24.2%) i/s   (15.11 ms/i) -    308.000 in   5.006706s
50 thread Custom.subclasses with 1000 singletons
                        421.576 (± 3.6%) i/s    (2.37 ms/i) -      2.142k in   5.087584s

Notice that performance is not as fast as the (broken) linked list implementation, but compare to CRuby and JRuby 9.8.9.0 (before any of these changes):

CRuby 3.4

1 thread Numeric.subclasses
                          1.020k (± 0.9%) i/s -      5.202k in   5.098416s
1 thread Object.subclasses
                         78.389 (± 1.3%) i/s -    392.000 in   5.001083s
1 thread Custom.subclasses with 1000 singletons
                         11.712 (± 0.0%) i/s -     59.000 in   5.037649s
5 thread Numeric.subclasses
                        902.478 (± 2.3%) i/s -      4.512k in   5.002271s
5 thread Object.subclasses
                         76.954 (± 1.3%) i/s -    385.000 in   5.003450s
5 thread Custom.subclasses with 1000 singletons
                         11.701 (± 0.0%) i/s -     59.000 in   5.042449s
10 thread Numeric.subclasses
                        766.111 (± 1.7%) i/s -      3.864k in   5.045062s
10 thread Object.subclasses
                         76.226 (± 1.3%) i/s -    385.000 in   5.051375s
10 thread Custom.subclasses with 1000 singletons
                         11.605 (± 0.0%) i/s -     59.000 in   5.084250s
50 thread Numeric.subclasses
                        399.096 (± 2.0%) i/s -      2.009k in   5.035707s
50 thread Object.subclasses
                         68.591 (± 1.5%) i/s -    348.000 in   5.075215s
50 thread Custom.subclasses with 1000 singletons
                         11.266 (± 8.9%) i/s -     57.000 in   5.168617s

JRuby is much faster than CRuby for Class#subclasses when the target class has many singletons. It is faster for Numeric.subclasses and Object.subclasses when not under contention. The read/write locking is not as efficient as we would like.

JRuby 9.4.8.0

1 thread Numeric.subclasses
                          1.272k (± 3.1%) i/s -      6.426k in   5.057134s
1 thread Object.subclasses
                         55.546  (± 3.6%) i/s -    280.000  in   5.045423s
1 thread Custom.subclasses with 1000 singletons
                         10.783  (± 0.0%) i/s -     54.000  in   5.011619s
5 thread Numeric.subclasses
                          2.447k (± 3.8%) i/s -     12.375k in   5.065684s
5 thread Object.subclasses
                        151.642  (± 2.0%) i/s -    768.000  in   5.067208s
5 thread Custom.subclasses with 1000 singletons
                         29.373  (±13.6%) i/s -    144.000  in   5.051597s
10 thread Numeric.subclasses
                          1.719k (±12.9%) i/s -      8.624k in   5.151087s
10 thread Object.subclasses
                         96.859  (±36.1%) i/s -    437.000  in   5.107834s
10 thread Custom.subclasses with 1000 singletons
                         26.941  (±18.6%) i/s -    129.000  in   5.052812s
50 thread Numeric.subclasses
                        377.859  (±10.6%) i/s -      1.880k in   5.060021s
50 thread Object.subclasses
                        141.944  (±15.5%) i/s -    714.000  in   5.225651s
50 thread Custom.subclasses with 1000 singletons
                         12.148  (±32.9%) i/s -     57.000  in   5.060756s

The new impl is again far faster than 9.4.8.0 for all singleton-heavy Class#subclasses and faster for non-concurrent access to Numeric.subclasses and Object.subclasses. The new impl is slower under high concurrency, again likely due to the cost of read/write locking.

My implementation of a new subclasses structure in jruby#8462 was a bit too "clever" leading to a potential leak of subclass references. To avoid using a ReferenceQueue, I tied the "cleaning" of the subclasses list to encountering a percentage of empty subclass references during traversal. However if traversal only walks concrete subclass references (via Class#subclasses), empty non-concrete (singleton, included, prepended) subclass references could accumulate. This would only be cleared out with a full (not concrete-only) subclass walk, which only happened for a few internal uses of this list. The patch here reverts to the Map-based approach previously used with two changes: * Instead of a fully-concurrent or synchronized WeakHashMap, we use read/write locking to favor mostly-read operations like Class#subclasses and hierarchy-walking. * Concrete subclasses are tracked separately in a second map, to allow fast performance of the user-facing Class#subclasses. Both map fields are lazily initialized: * If the operation is for read and the field is null, they are initialized to Collections.EMPTY_MAP. * If the operation is for write and the field is null or Collections.EMPTY_MAP, they are initialized to a new read/write locking WeakHashMap.

This commit introduces specialized collection types adapted for subclass traversal: * SubclassList and SubclassArray, extending ArrayList and RubyArray respectively but also implementing BiConsumer, so they can be passed as the lambda for traversal. * InvalidatorList, extending ArrayList and implementing BiConsumer and Consumer so it can be passed in place of the lambda versions. This reduces allocation for Class#subclasses to just the eventual Ruby Array and does similar for the internal (and rarely-used) RubyClass#subclasses(boolean) logic. Allocation for invalidator gathering is reduced to the invalidator list. All cases now avoid multiple levels of lambda allocation due to recursion and use of forEach internal iteration.

This makes a few changes to improve concurrent traversal of the subclasses collections. * Replace ReentrantReadWriteLock with StampedLock. * Avoid CAS operations once the fields have been fully initialized.

headius · 2025-01-27T07:59:19Z

With all changes up to this point, here's the new numbers. We are consistently faster than CRuby on all benchmarks. Performance is comparable to the old ConcurrentWeakHashMap-based implementation but is slower to traverse under contention. Performance is slower than the (broken) linked list implementation but not by a significant degree.

1 thread Numeric.subclasses
                          1.845k (± 9.2%) i/s  (542.05 μs/i) -      9.275k in   5.084890s
1 thread Object.subclasses
                        108.973 (± 1.8%) i/s    (9.18 ms/i) -    550.000 in   5.049141s
1 thread Custom.subclasses with 1000 singletons
                          5.164k (±10.3%) i/s  (193.65 μs/i) -     25.750k in   5.091201s
5 thread Numeric.subclasses
                        546.254 (±19.4%) i/s    (1.83 ms/i) -      2.597k in   5.084985s
5 thread Object.subclasses
                         88.478 (±28.3%) i/s   (11.30 ms/i) -    408.000 in   5.026991s
5 thread Custom.subclasses with 1000 singletons
                          3.483k (± 4.7%) i/s  (287.11 μs/i) -     17.450k in   5.022823s
10 thread Numeric.subclasses
                        474.393 (± 9.3%) i/s    (2.11 ms/i) -      2.392k in   5.084416s
10 thread Object.subclasses
                        110.091 (± 5.5%) i/s    (9.08 ms/i) -    552.000 in   5.028974s
10 thread Custom.subclasses with 1000 singletons
                          1.948k (± 7.8%) i/s  (513.24 μs/i) -      9.690k in   5.020453s
50 thread Numeric.subclasses
                        436.501 (± 2.7%) i/s    (2.29 ms/i) -      2.200k in   5.044202s
50 thread Object.subclasses
                        105.087 (± 6.7%) i/s    (9.52 ms/i) -    530.000 in   5.071900s
50 thread Custom.subclasses with 1000 singletons
                        442.992 (± 6.8%) i/s    (2.26 ms/i) -      2.226k in   5.043026s

headius · 2025-01-27T08:00:39Z

It is worth pointing out that the implementation at this point also reduces allocation of lambda instances for traversal for all subclass walking, including cache invalidation, method invalidator gathering, and the usual subclass list aggregation operations. This will improve performance of several core runtime operations.

headius added this to the JRuby 9.4.11.0 milestone Jan 27, 2025

headius force-pushed the subclasses_leak branch from ee6ce75 to 9d2f6bd Compare January 27, 2025 06:07

headius added 2 commits January 27, 2025 01:26

Improve concurrency of subclass traversal

Loading
Loading status checks…

9cc6d49

This makes a few changes to improve concurrent traversal of the subclasses collections. * Replace ReentrantReadWriteLock with StampedLock. * Avoid CAS operations once the fields have been fully initialized.

headius marked this pull request as ready for review January 27, 2025 08:00

NotMineNevaWasGp mentioned this pull request Jan 27, 2025

https://github.com/jruby/jruby/issues/new/choose #8594

Closed

skunkworker mentioned this pull request Jan 27, 2025

Memory leak from ActiveRecord_Relation after upgrading from JRuby 9.4.9.0 to 9.4.10.0 #8598

Closed

headius merged commit 6a89fee into jruby:master Jan 27, 2025
95 checks passed

headius deleted the subclasses_leak branch January 27, 2025 23:28

headius linked an issue Jan 28, 2025 that may be closed by this pull request

Memory leak from ActiveRecord_Relation after upgrading from JRuby 9.4.9.0 to 9.4.10.0 #8598

Closed

jsvd mentioned this pull request Jan 30, 2025

upgrade jruby to 9.4.12.0 elastic/logstash#16986

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate leak of non-concrete subclass references #8591

Eliminate leak of non-concrete subclass references #8591

headius commented Jan 27, 2025 •

edited

Loading

headius commented Jan 27, 2025 •

edited

Loading

headius commented Jan 27, 2025

headius commented Jan 27, 2025

Eliminate leak of non-concrete subclass references #8591

Eliminate leak of non-concrete subclass references #8591

Conversation

headius commented Jan 27, 2025 • edited Loading

headius commented Jan 27, 2025 • edited Loading

headius commented Jan 27, 2025

headius commented Jan 27, 2025

headius commented Jan 27, 2025 •

edited

Loading

headius commented Jan 27, 2025 •

edited

Loading