Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client-side streaming early OK fails with StatusCode.CANCELLED from python client #992

Closed
ebraraktas opened this issue May 2, 2022 · 10 comments

Comments

@ebraraktas
Copy link

Bug Report

Version

└── tonic v0.7.1
└── tonic-build v0.7.1

Platform

Darwin Mustafas-MacBook-Pro.local 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64 x86_64

Description

I implemented simple client-side streaming endpoint using tonic-rs. It may return early if some condition occurs:

async fn record_route(
    &self,
    request: Request<Streaming<Point>>,
) -> Result<Response<RouteSummary>, Status> {
    let mut stream = request.into_inner();

    let mut summary = RouteSummary::default();

    while let Some(point) = stream.next().await {
        let point = point?;
        summary.point_count += 1;

        if point.latitude == 0 {
            println!("Return early: {}", summary.point_count);
            return Ok(Response::new(summary));
        }
    }
    println!("Return normal: {}", summary.point_count);
    Ok(Response::new(summary))
}

However, when I call this server from a simple python client it fails with the exception (StatusCode.CANCELLED) below:

# python client.py 5000
port = 5000
early_return_index = 2
Sleep 500 ms
Sleep 500 ms
Sleep 500 ms
Traceback (most recent call last):
  File "client.py", line 46, in <module>
    main()
  File "client.py", line 42, in main
    call_stream(client)
  File "client.py", line 34, in call_stream
    print(client.RecordRoute(iter_points()))
  File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 1131, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.CANCELLED
        details = "Received RST_STREAM with error code 8"
        debug_error_string = "{"created":"@1651470233.217719000","description":"Error received from peer ipv6:[::1]:5000","file":"src/core/lib/surface/call.cc","file_line":1075,"grpc_message":"Received RST_STREAM with error code 8","grpc_status":1}"
>

GitHub repo to reproduce issue: https://github.com/ebraraktas/tonic-client-stream-early-ok

As the Issue section of README of the linked repo states, it runs ok with rust client. I know this may be related with grpc Python, but same server implemented in Python works smoothly if it is called from same Python client.

@LucioFranco
Copy link
Member

Yeah interesting, I believe the cancelled comes from the python code since the h2 connection is getting RST. Curious if you could reproduce this with a rust client?

@ebraraktas
Copy link
Author

Hi @LucioFranco , as I stated in my first message I can call rust server from rust client without an issue. I am not familiar with the internals of the tonic, but can you explain why you think that the cancelled comes from python?

In addition, I implemented same server in go and I could manage to call go server from the same python client without an issue. Interestingly, go client can call rust server with no issue as well. Here is a matrix describing the issue:

client\server rust go python
rust ⛔*
go
python

*: My rust client gives error below when I try to connect to the go server:

Error: tonic::transport::Error(Transport, hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 61, kind: ConnectionRefused, message: "Connection refused" })))

@LucioFranco
Copy link
Member

So the cancelled is likely returned from the client when it encounters a GoAway/connection drop/request ended before getting the full trailers etc. The error is not passed from server -> client. As for why go works and python does not....that one I don't know I would maybe open an issue in the python implementation?

As for your last error there that is just a regular tcp error coming from the tcp connection in the underlying OS, so maybe you're not connecting on the right port etc.

@ebraraktas
Copy link
Author

@LucioFranco I opened an issue in the official gRPC repository and they responded that they think the issue arises from gRPC implementation of the tonic-rs. The linked message has the details that I am not very familiar with, but I hope they may be helpful.

@ebraraktas
Copy link
Author

@LucioFranco are there any update on this issue?

@LucioFranco
Copy link
Member

Hi @ebraraktas I've started to look but have had a few other high priority items come up. I am about to head out on vacation so not sure how far I will get on this until I get back. This sounds like it will need some low level investigation to understand where the gap is.

@LucioFranco
Copy link
Member

Small update: I dug pretty deep into the issue on Friday afternoon (weekend hit before I had time to reply here). From what I can tell the behavior from the Rust side is interesting in that we send a RST_STREAM.Cancel even though we reply with a regular Ok status. My theory has to do with the fact that we drop the RecvStream before the sender side has half-closed it. Will keep looking into this and return with a better answer.

@LucioFranco
Copy link
Member

Once we get the next release out of h2 hyperium/h2#633 this will fix this issue I believe.

@erebe
Copy link

erebe commented Aug 16, 2022

I tried the repro repo with the patch, and indeed it should fix the issue.

/tmp/tonic-client-stream-early-ok/proto main 16s
venv ❯ python client.py
port = 5000
early_return_index = 3
Sleep 500 ms
Sleep 500 ms
Sleep 500 ms
Sleep 500 ms
point_count: 4
/tmp/tonic-client-stream-early-ok main 1m 25s
❯ cargo run --bin routeguide-server
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/routeguide-server`
Return early: 4

At least no traceback

@ebraraktas
Copy link
Author

I confirm that it is fixed. Updated & archived the repro repo. Thanks for your efforts @erebe and @LucioFranco .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants