You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DataBufferUtils.readAsynchronousFileChannel is prone to file descriptor leak with certain race condition.
Channel is created using method Flux.using and provided channel supplier.
Flux<DataBuffer> flux = Flux.using(channelSupplier,
channel -> Flux.create(sink -> {
ReadCompletionHandlerhandler =
newReadCompletionHandler(channel, sink, position, bufferFactory, bufferSize);
sink.onCancel(handler::cancel);
sink.onRequest(handler::request);
}),
channel -> {
// Do not close channel from here, rather wait for the current read callback// and then complete after releasing the DataBuffer.
});
However, the builtin Flux cleanup method isn't used here and instead it relies on sink and channel callbacks. This is probably the core of this issue, although it seems reasonable why it was done like this (to prevent DataBuffer leaks as each DataBuffer needs to be released).
Let's presume the Flux is subscribed and something like this happens:
This can result in following chain of events in ReadCompletionHandler:
completed() method is called (but not finished) with disposed=false, passing the initial condition isNotDisposed and second condition read != -1
cancel() method is called while having disposed = false and reading = true, which results in:
Channel is not closed (because of condition if (!reading.get()) {closeChannel(channel)}
disposed is set to true
read() method is called from completed handler while disposed=true, which result in NOOP (because of condition java sink.requestedFromDownstream() > 0 && isNotDisposed && reading.compareAndSet(false, true)
completed() method finishes
The outcome of this chain of events is that the Channel is left open and nobody ever closes it as no further callbacks are delivered (the sink is cancelled and cancel event was delivered, the channel finished last reading operation and no further operations were performed).
The race condition is inevitable because while AtomicBooleans are used to synchronize the callbacks between NIO2 and Reactor. The outcomes are driven by 2 AtomicBooleans which in this case results in non atomic synchronization of operations if certain order of operations happen across different threads.
I would be willing to help with PR, but I don't believe there is simple fix as this looks as more fundamental design flaw in this code. While it would be easy to check if disposed==true in read() method and if yes and no reading is going to happen, then close the channel, I don't believe this will cover all possible scenarios how the methods can be executed in parallel and the resulting order of operations serialized. In my opinion the Flux cleanup handler should be used to atomically set disposed to true, also checking if reading is in progress and if not, close the channel or single atomic state needs to be used instead of 2 independent booleans.
Also it's pretty fundamental part of framework as this method (readAsynchronousFileChannel) is used by the framework itself to for example serve resources, so it may be worth having bit of discussion how this should be fixed for all possible cases.
The text was updated successfully, but these errors were encountered:
Thanks for bringing this up. There was indeed a race condition between a concurrent read completion and cancellation. Arguably it is also less than ideal to have any delay in closing the channel.
I've made changes to close the channel immediately which should trigger a failed read callback to release the DataBuffer, as well as consolidated the 2 AtomicBoolean flags, and made a further improvement to stay in READING mode without switching states when there is sufficient demand.
Affects: 5.2.x (but probably others too)
DataBufferUtils.readAsynchronousFileChannel is prone to file descriptor leak with certain race condition.
Channel is created using method Flux.using and provided channel supplier.
However, the builtin Flux cleanup method isn't used here and instead it relies on sink and channel callbacks. This is probably the core of this issue, although it seems reasonable why it was done like this (to prevent DataBuffer leaks as each DataBuffer needs to be released).
Let's presume the Flux is subscribed and something like this happens:
This can result in following chain of events in ReadCompletionHandler:
isNotDisposed
and second conditionread != -1
if (!reading.get()) {closeChannel(channel)
}java sink.requestedFromDownstream() > 0 && isNotDisposed && reading.compareAndSet(false, true)
The outcome of this chain of events is that the Channel is left open and nobody ever closes it as no further callbacks are delivered (the sink is cancelled and cancel event was delivered, the channel finished last reading operation and no further operations were performed).
The race condition is inevitable because while AtomicBooleans are used to synchronize the callbacks between NIO2 and Reactor. The outcomes are driven by 2 AtomicBooleans which in this case results in non atomic synchronization of operations if certain order of operations happen across different threads.
I would be willing to help with PR, but I don't believe there is simple fix as this looks as more fundamental design flaw in this code. While it would be easy to check if disposed==true in read() method and if yes and no reading is going to happen, then close the channel, I don't believe this will cover all possible scenarios how the methods can be executed in parallel and the resulting order of operations serialized. In my opinion the Flux cleanup handler should be used to atomically set disposed to true, also checking if reading is in progress and if not, close the channel or single atomic state needs to be used instead of 2 independent booleans.
Also it's pretty fundamental part of framework as this method (readAsynchronousFileChannel) is used by the framework itself to for example serve resources, so it may be worth having bit of discussion how this should be fixed for all possible cases.
The text was updated successfully, but these errors were encountered: