-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CancelWrite after Close should be a no-op #4404
Comments
@nisdas Very excited to hear that! Please feel free to reach out with any problems you might run into, happy to help!
I'm not sure I understand what the problem is. In general, the flow is the following:
The distinction between normal and abrupt termination is important here: If you close the write side of the stream, all data will be delivered reliably, i.e. quic-go will retransmit stream data until it is acknowledged by the peer (modulo a race with connection termination, which is what this issue is about). If you reset a stream, you do that because something went wrong and you don't want to send the entire response (or because the peer asked you to do so via CancelRead). In that case, you don't care about any of the data being delivered, so 1. the sender will not retransmit any data and 2. the receiver will immediately surface the reset error, and discard any data received. In order to not leak streams, you therefore need to make sure that every code path either calls Close or CancelWrite at some point (or kills the entire connection). Does that make sense? |
@marten-seemann ok thank you for the explanation on the termination flow for quic. I have a good idea now on why we are running into issues with resetting streams for QUIC connections. The following is the flow for our stream handler:
For yamux, 4) is a no-op if 3) was successful. However for QUIC it does appear to have very different semantics where abrupt termination will infact cause the remote peer to drop the data received. We can only require QUIC streams to be abruptly terminated in the event we have issues closing the stream. Otherwise in the happy case we simply do not reset them. Thanks for all your help, feel free to close the issue. |
What issues would that be? At least for quic-go, calling
I see where the misunderstanding lies. The logic I described above only applies to the first call that terminates a stream (i.e. |
Actually, this is how it's supposed to work. But it doesn't! This is pretty bad. Fix incoming. |
Ok great thanks for clarifying @marten-seemann . This would explain why it got triggered for us |
Interestingly, there's a failing test on #4408. Apparently, a few years ago, when most of the stream state machine was written, we thought that resetting after closing was a feature: Lines 893 to 902 in 183d42a
I still stand with the conclusion of this issue (reset after close should be a noop), but it's interesting to see that this was not just an oversight, but a conscious design decision back then. |
Thanks for the update, would that mean this being part of the specification(RESET after CLOSE) would block #4408 from merging right now ? |
No, it doesn’t block us from merging #4408. Just because the spec allows this state transition, doesn’t mean that we need to expose an API for that. What I described in #4404 (comment) is an optimization building on top of #4408. Release-wise, I’m planning to cut a patch release for #4408 (maybe or maybe not including this optimization) in the next few days. Does that work for you? |
@marten-seemann Can you elaborate on
How is it a misuse of the API? I would argue that the current behaviour is correct. If the data is sent and not received it is buffered up in case the data is lost in transit. By calling Reset I want to clear up all that memory. If the data in transit is not lost and is delivered correctly, that's great. If it doesn't, I don't want to retransmit. |
@sukunrt Ok, let me try to explain: We’re only looking at the send direction here. Assume you received a request for a resource, and you started generating the response. Now two things can happen:
Now what is the meaning of calling I assume you’re asking now “What if the receiver wants to stop receiving data?”. It would do so by calling
Currently we’re doing (1). This is the most efficient way to implement things, for obvious reasons. With #4088, we’d (temporarily) do (2), until we implement the optimization in #4404 (comment), which will bring us back to (1). Does this make sense? |
Perfect @marten-seemann , that works great for us |
It does explain things better. Thank you. I have one question. How do I ask quic to discard queued write data?
The peer however is not responsive. Is there no way to drop the queued data? |
I'm beginning to wonder if the suggested API change is the right thing to do. While I agree that in many cases, calling For users, it is easy to make |
#4419 fixes the documentation for the SendStream interface, making it clear that |
I think this is the right thing to do. As you've explained users can wrap and make |
Thank you to everyone who participated in this discussion! This was very enlightening, and we got to consider multiple different options for the API, and fixed the current documentation. |
@marten-seemann Hey any update on this ? We are adding support for quic to Prysm and are running into this issue. It does appear this is also related to #4139 , but that appears to still be a RFC.
Currently we ensure that all open streams are eventually reset so that they can be appropriately cleaned up. However with libp2p's Stream API and the necessity to support other multiplexers(yamux, mplex), resetting QUIC streams the same way unfortunately leads to data being lost with the remote peer unable to read the transmitted data when we initiate a reset.
We could fix this by adding a special sleep for QUIC streams so that data is reliably sent out before we reset it, but would prefer to use a cleaner/more graceful solution.
Originally posted by @nisdas in #3291 (comment)
The text was updated successfully, but these errors were encountered: