Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_message() sometimes missing close reason #158

Closed
pikeas opened this issue May 21, 2021 · 17 comments · Fixed by #159
Closed

get_message() sometimes missing close reason #158

pikeas opened this issue May 21, 2021 · 17 comments · Fixed by #159
Labels
bug Something isn't working

Comments

@pikeas
Copy link

pikeas commented May 21, 2021

while True:
    try:
        raw: str = await ws.get_message()
    except trio_websocket.ConnectionClosed:
        logger.info(f"Websocket closed: {ws.closed}")
        await trio.sleep(1)
        logger.info(f"Websocket closed: {ws.closed}")
[14:55:33] INFO:app:Websocket closed: None                                                                                                                     
[14:55:34] INFO:app:Websocket closed: CloseReason<code=1000, name=NORMAL_CLOSURE, reason=None> 

There appears to be some sort of race condition between closing the memory channel and setting _close_reason.

@belm0
Copy link
Member

belm0 commented May 22, 2021

Thank you for reporting. Are you doing anything special on the other end to make this happen?

@belm0
Copy link
Member

belm0 commented May 22, 2021

It's concerning, since the implementation is often using _close_reason to determine whether the connection is closed.

if self._close_reason:
    raise ConnectionClosed(self._close_reason)

@pikeas
Copy link
Author

pikeas commented May 22, 2021

Thank you for reporting. Are you doing anything special on the other end to make this happen?

Nope, the other side is a 3rd party API. I have a) a receive task that consumes messages from the websocket and pushes them to my own MemoryReceiveChannel, b) a send task that pushes out messages with await ws.send_message(ujson.dumps({...})), and c) a heartbeat task that ping/sleeps.

Here's my receive function in full:

async def receive(self):
    async with self.channel_tx:
        # TODO: consider async for raw in self.ws._recv_channel ?
        while True:
            try:
                raw: str = await self.ws.get_message()
            except trio_websocket.ConnectionClosed:
                await trio.sleep(1) # https://github.com/HyperionGray/trio-websocket/issues/158
                logger.info(f"Websocket closed: {self.ws.closed}")
                break
            message = parse_message(raw)
            if isinstance(message, dict):
                logger.info(f"Skipping message: {message}")
            else:
                await self.channel_tx.send(message)

@belm0
Copy link
Member

belm0 commented May 22, 2021

Perhaps it's happening in this path, where there is a context switch (sleep(0)) as well as blocking send in the REMOTE_CLOSING case, that delay setting the close reason via _close_web_socket().

https://github.com/HyperionGray/trio-websocket/blob/15c64e8193270208f1f5b0eb475efe3abbb99f32/trio_websocket/_impl.py#L1085-L1095

I may not have time to address it for a while-- help appreciated (debug, write a test, verify autobahn test suite, etc.).

@belm0
Copy link
Member

belm0 commented May 22, 2021

I think you can confirm the problem source by instrumenting trio-websocket as follows, and run your sleep(1) example.

async def _handle_close_connection_event(self, event):
    self._for_testing_peer_closed_connection.set()
    print('_handle_close_connection_event() sleep')
    await trio.sleep(0)
    if self._wsproto.state == ConnectionState.REMOTE_CLOSING:
        print('_handle_close_connection_event() send response')
        await self._send(event.response())
    print('_handle_close_connection_event() set reason')
    await self._close_web_socket(event.code, event.reason or None)

@belm0
Copy link
Member

belm0 commented May 22, 2021

the example client has this exception after executing the close command

cmd> close
DEBUG:trio-websocket:client-0 sending 8 bytes
DEBUG:trio-websocket:client-0 websocket closed ConnectionClosed<CloseReason<code=1006, name=ABNORMAL_CLOSURE, reason=None>>
DEBUG:trio-websocket:client-0 reader task finished
Traceback (most recent call last):
  File "client.py", line 72, in handle_connection
    nursery.start_soon(get_messages, ws)
  File "/.../site-packages/trio/_core/_run.py", line 815, in __aexit__
    raise combined_error_from_nursery
  File "client.py", line 130, in get_messages
    message = await ws.get_message()
  File "/.../trio_websocket/_impl.py", line 873, in get_message
    raise ConnectionClosed(self._close_reason) from None
trio_websocket._impl.ConnectionClosed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "client.py", line 136, in <module>
    if not trio.run(main, parse_args()):
  File "/.../site-packages/trio/_core/_run.py", line 1928, in run
    raise runner.main_task_outcome.error
  File "client.py", line 58, in main
    await handle_connection(conn, args.heartbeat)
  File "client.py", line 74, in handle_connection
    reason = '<no reason>' if cc.reason.reason is None else '"{}"'.format(
AttributeError: 'NoneType' object has no attribute 'reason'

@belm0
Copy link
Member

belm0 commented May 22, 2021

The problem seems to have been introduced way back in #79 ("Delay connection closed")

@pikeas
Copy link
Author

pikeas commented May 22, 2021

Instrumenting my client:

[22:40:43] INFO:app.ws:Websocket closed: None
[22:40:44] INFO:temp:_close
           INFO:temp:_close after sleep
           INFO:temp:_close after send
           INFO:temp:_close after _close_web_socket
           INFO:app.ws:Websocket closed: CloseReason<code=1000, name=NORMAL_CLOSURE, reason=None>

@belm0
Copy link
Member

belm0 commented May 22, 2021

So the problem is happening prior to the close connection handler-- that's helpful, thank you.

@belm0
Copy link
Member

belm0 commented May 23, 2021

Pending PR #159 fixes the case in the example client close command, but probably not the original report here.

@pikeas in your example would you print the ConnectionClosed exception? Because the exception itself also holds reason, and I'm wondering if that is populated even though the websocket closed property isn't.

@belm0
Copy link
Member

belm0 commented May 23, 2021

Also, I may have missed it, but would you clarify which side is closing the connection in your case?

@pikeas
Copy link
Author

pikeas commented May 23, 2021

@pikeas in your example would you print the ConnectionClosed exception? Because the exception itself also holds reason, and I'm wondering if that is populated even though the websocket closed property isn't.

Reason is None on the exception.

Also, I may have missed it, but would you clarify which side is closing the connection in your case?

I'm calling await ws.aclose().

Hope this helps, I'll try the PR when it lands!

@pikeas
Copy link
Author

pikeas commented May 23, 2021

Also, ConnectionClosed is instantiated as ConnectionClosed(self._close_reason), so when _close_reason is None, printing the exception (rather than the repr) displays a blank string.

@belm0
Copy link
Member

belm0 commented May 24, 2021

I'm calling await ws.aclose().

Hope this helps, I'll try the PR when it lands!

oh, if you're calling aclose() on the side having the missing reason, then the pending PR may fix it

perhaps you could confirm the fix by installing it from source:

git clone -b close_reason_race https://github.com/HyperionGray/trio-websocket.git
cd trio-websocket/
pip install -e .

@pikeas
Copy link
Author

pikeas commented May 27, 2021

Confirmed, with the new branch I'm seeing CloseReason<code=1000, name=NORMAL_CLOSURE, reason=None> when catching trio_websocket.ConnectionClosed.

@belm0
Copy link
Member

belm0 commented May 27, 2021

Thank you for confirming. Please give me some time to write tests, merge, and release.

I'm still concerned about a possible regression, since internally the code often uses _close_reason to signal being in connection closed state.

@belm0 belm0 added the bug Something isn't working label Mar 18, 2023
@belm0
Copy link
Member

belm0 commented Mar 19, 2023

the scope of the bug is wider than just the timing of setting the closed attribute

This will cause send_message() during the close handshake to silently be ignored (wsproto < 0.2.0) or leak LocalProtocolError (wsproto >= 0.2.0). It should raise ConnectionClosed.

belm0 added a commit that referenced this issue Mar 19, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
During a local-initiated close handshake, the following incorrect
behavior was observed:
  * `closed` attribute would be `None`
  * `send_message()` would be silently ignored (wsproto < 0.2.0) or leak a
`LocalProtocolException` (wsproto >= 0.2.0)

Upon local-initiated close, `closed` will now have the reason, and
`send_message()` will raise `ConnectionClosed`.

Fixes #158.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants