Race condition in DataBufferUtils.readAsynchronousFileChannel resulting in FD leak (Channel remains open) #25831

gitdude1458 · 2020-09-29T02:29:15Z

Affects: 5.2.x (but probably others too)

DataBufferUtils.readAsynchronousFileChannel is prone to file descriptor leak with certain race condition.

Channel is created using method Flux.using and provided channel supplier.

Flux<DataBuffer> flux = Flux.using(channelSupplier,
				channel -> Flux.create(sink -> {
					ReadCompletionHandler handler =
							new ReadCompletionHandler(channel, sink, position, bufferFactory, bufferSize);
					sink.onCancel(handler::cancel);
					sink.onRequest(handler::request);
				}),
				channel -> {
					// Do not close channel from here, rather wait for the current read callback
					// and then complete after releasing the DataBuffer.
				});

However, the builtin Flux cleanup method isn't used here and instead it relies on sink and channel callbacks. This is probably the core of this issue, although it seems reasonable why it was done like this (to prevent DataBuffer leaks as each DataBuffer needs to be released).

Let's presume the Flux is subscribed and something like this happens:

onSubscribe([Fuseable] FluxOnAssembly.OnAssemblyConditionalSubscriber)
request(128)
onNext(PooledUnsafeDirectByteBuf(ridx: 0, widx: 4100, cap: 8192))
cancel()

This can result in following chain of events in ReadCompletionHandler:

completed() method is called (but not finished) with disposed=false, passing the initial condition isNotDisposed and second condition read != -1
cancel() method is called while having disposed = false and reading = true, which results in:
- Channel is not closed (because of condition if (!reading.get()) {closeChannel(channel)}
- disposed is set to true
read() method is called from completed handler while disposed=true, which result in NOOP (because of condition java sink.requestedFromDownstream() > 0 && isNotDisposed && reading.compareAndSet(false, true)
completed() method finishes

The outcome of this chain of events is that the Channel is left open and nobody ever closes it as no further callbacks are delivered (the sink is cancelled and cancel event was delivered, the channel finished last reading operation and no further operations were performed).

The race condition is inevitable because while AtomicBooleans are used to synchronize the callbacks between NIO2 and Reactor. The outcomes are driven by 2 AtomicBooleans which in this case results in non atomic synchronization of operations if certain order of operations happen across different threads.

I would be willing to help with PR, but I don't believe there is simple fix as this looks as more fundamental design flaw in this code. While it would be easy to check if disposed==true in read() method and if yes and no reading is going to happen, then close the channel, I don't believe this will cover all possible scenarios how the methods can be executed in parallel and the resulting order of operations serialized. In my opinion the Flux cleanup handler should be used to atomically set disposed to true, also checking if reading is in progress and if not, close the channel or single atomic state needs to be used instead of 2 independent booleans.

Also it's pretty fundamental part of framework as this method (readAsynchronousFileChannel) is used by the framework itself to for example serve resources, so it may be worth having bit of discussion how this should be fixed for all possible cases.

The text was updated successfully, but these errors were encountered:

gaoyeming · 2020-09-29T03:13:29Z

???

rstoyanchev · 2020-10-15T16:28:14Z

Thanks for bringing this up. There was indeed a race condition between a concurrent read completion and cancellation. Arguably it is also less than ideal to have any delay in closing the channel.

I've made changes to close the channel immediately which should trigger a failed read callback to release the DataBuffer, as well as consolidated the 2 AtomicBoolean flags, and made a further improvement to stay in READING mode without switching states when there is sufficient demand.

spring-projects-issues added the status: waiting-for-triage label Sep 29, 2020

rstoyanchev self-assigned this Oct 15, 2020

rstoyanchev added in: core type: bug and removed status: waiting-for-triage labels Oct 15, 2020

rstoyanchev added this to the 5.2.10 milestone Oct 15, 2020

rstoyanchev closed this as completed in c45b106 Oct 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition in DataBufferUtils.readAsynchronousFileChannel resulting in FD leak (Channel remains open) #25831

Race condition in DataBufferUtils.readAsynchronousFileChannel resulting in FD leak (Channel remains open) #25831

gitdude1458 commented Sep 29, 2020

gaoyeming commented Sep 29, 2020

rstoyanchev commented Oct 15, 2020 •

edited

Loading

Race condition in DataBufferUtils.readAsynchronousFileChannel resulting in FD leak (Channel remains open) #25831

Race condition in DataBufferUtils.readAsynchronousFileChannel resulting in FD leak (Channel remains open) #25831

Comments

gitdude1458 commented Sep 29, 2020

gaoyeming commented Sep 29, 2020

rstoyanchev commented Oct 15, 2020 • edited Loading

rstoyanchev commented Oct 15, 2020 •

edited

Loading