change incoming_message to bytearray to improve += performance #15

Mitch-Martinez · 2022-06-24T21:47:55Z

Using += on bytes is inefficient because bytes is immutable. Making incoming_message a bytearray allows += to efficiently reallocate memory for incoming_message when incoming messages are large.

It is not the usual use case for websockets but when a client attempts to send a 50MB file the += operation on a bytes object begins to become extremely expensive. After modifying the imcoming_message object it greatly improved the performance of reading large messages, mainly ones greater than 10MB.

…eceiving messages

miguelgrinberg · 2022-06-24T23:43:13Z

The code that you edited handles both text and binary messages, so it cannot be changed to bytearray as that breaks text.

Mitch-Martinez · 2022-06-24T23:56:55Z

Ah yeah that’s a huge oversight on my part, I’ll close the PR when I get a chance and see if it can be fixed another way.

…

On Fri, Jun 24, 2022 at 6:43 PM Miguel Grinberg ***@***.***> wrote: The code that you edited handles both text and binary messages, so it cannot be changed to bytearray as that breaks text. — Reply to this email directly, view it on GitHub <#15 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKKMJHH6XOIRWWQJWDDBLDVQZBZZANCNFSM5ZZCMYEA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

…stance twice

Mitch-Martinez · 2022-06-27T02:11:43Z

Made an update to differentiate between text and binary data.

I opted to split checking for ByteMessage and TextMessage to avoid always having to call isinstance() twice.

Would returning bytearray vs bytes be an issue for anything using this module? I would assume anybody calling receive would appreciate that the returned data is now mutable, but I could definitely be missing something.

Mitch-Martinez · 2022-06-29T21:33:53Z

Update to fix flake8 errors, I separated test_recieve_large in the server test so it should now get coverage of the entire diff. I appreciate your patience with all this.

codecov-commenter · 2022-06-29T22:51:56Z

Codecov Report

Merging #15 (21045e3) into main (9e024fd) will increase coverage by 0.17%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #15      +/-   ##
==========================================
+ Coverage   97.66%   97.83%   +0.17%     
==========================================
  Files           2        2              
  Lines         171      185      +14     
  Branches       35       39       +4     
==========================================
+ Hits          167      181      +14     
  Misses          3        3              
  Partials        1        1

Impacted Files	Coverage Δ
src/simple_websocket/ws.py	`97.82% <100.00%> (+0.17%)`	⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

miguelgrinberg · 2022-06-29T23:14:41Z

Would returning bytearray vs bytes be an issue for anything using this module?

I'm not sure. I need to find out if bytearray(some_bytes) and bytes(some_bytearray) are cheap operations, because in that case the array can be converted to bytes. If no data is copied in these conversions, then I'd rather have the payload be delivered as bytes to keep things compatible with the current version.

Mitch-Martinez · 2022-06-30T06:24:55Z

Data is copied in these conversions, doing the following shows it has a different id, which should mean that it's in a different place in memory.

>>> a = b'somebytedata'
>>> b = bytearray(a)
>>> id(a)
140171735861408
>>> id(b)
140171735743728

So after knowing that a copy is taking place I tested using timeit
running timeit on the bytearry(some_bytes) operation takes a few hundred nanoseconds. I used 4096 because that should be the default max message size.

>>> timeit.timeit("b = bytearray(a)", setup="a=b'0'*4096", number=100000)/100000
3.420370000003459e-07

Would this be expensive for clients/servers that are sending a ton of small packets at a given time?

I've been focusing on doing large transfers of data and not paying too much attention to the efficiency of small packets.

Just out of curiosity I did a similar test with 1MB of data and saw that it takes about a 1/10th of a millisecond. Doing this type of coversion only once is a very small price compared to the many copy operations being done currently, but that can only be said for large messages at the moment.

>>> timeit.timeit("b = bytearray(a)", setup="a=b'0'*1048576", number=100000)/100000
8.481695999999829e-05

So without any further testing would it make sense to generalize that this change would add about 350 nanoseconds per message but it would start to save more time if messages get much larger (really in the MB range is when it is visually noticable on the command line). But when messages do get past 10MB the receiver slows down significantly when it should only take a few tens to hundreds of milliseconds to transfer depending on network speed.

I'll probably take some time to test out the time differences with and without the changes to be a little more thorough. It might make sense to only do these conversions if the incoming message has to be reassembled.

miguelgrinberg · 2022-07-03T10:56:35Z

Thanks, this was a useful discussion. I ended up implementing a slightly different solution that switches both text and byte messages to bytearray for faster appending, but only on arrival of the second part. So single part messages, which are the most common, do not suffer any performance penalties. Multipart messages are then converted back to str or bytes once the final part is received.

Could you please give the main branch a try, and let me know if this is satisfactory for your use case?

Mitch-Martinez · 2022-07-12T17:06:51Z

Sorry for the late reply, just got back from a short vacation.

These changes do work for me when using simple-websockets. How can these changes be reflected in the flask-sock package? This was how I was originally turned onto simple-websockets.

miguelgrinberg · 2022-07-15T19:02:32Z

@Mitch-Martinez This fix is now release. Upgrade simple-websocket and you'll have access to this from Flask-Sock.

change incoming_message to bytearray to improve += performance when r…

829ff6e

…eceiving messages

Mitchell Martinez added 2 commits June 26, 2022 20:30

changes to handle text and bytearrays

b9837f2

split ByteMessage and TextMessage checks to avoid always calling isin…

b608cd6

…stance twice

fix flake8 errors, separate test_recieve_large into text & binary tests

21045e3

miguelgrinberg closed this in ca2ea38 Jul 3, 2022

Mitch-Martinez deleted the increase-receive-speed branch July 12, 2022 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

change incoming_message to bytearray to improve += performance #15

change incoming_message to bytearray to improve += performance #15

Mitch-Martinez commented Jun 24, 2022

miguelgrinberg commented Jun 24, 2022

Mitch-Martinez commented Jun 24, 2022 via email

Mitch-Martinez commented Jun 27, 2022

Mitch-Martinez commented Jun 29, 2022

codecov-commenter commented Jun 29, 2022

miguelgrinberg commented Jun 29, 2022

Mitch-Martinez commented Jun 30, 2022

miguelgrinberg commented Jul 3, 2022 •

edited

Loading

Mitch-Martinez commented Jul 12, 2022

miguelgrinberg commented Jul 15, 2022

change incoming_message to bytearray to improve += performance #15

change incoming_message to bytearray to improve += performance #15

Conversation

Mitch-Martinez commented Jun 24, 2022

miguelgrinberg commented Jun 24, 2022

Mitch-Martinez commented Jun 24, 2022 via email

Mitch-Martinez commented Jun 27, 2022

Mitch-Martinez commented Jun 29, 2022

codecov-commenter commented Jun 29, 2022

Codecov Report

miguelgrinberg commented Jun 29, 2022

Mitch-Martinez commented Jun 30, 2022

miguelgrinberg commented Jul 3, 2022 • edited Loading

Mitch-Martinez commented Jul 12, 2022

miguelgrinberg commented Jul 15, 2022

miguelgrinberg commented Jul 3, 2022 •

edited

Loading