Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaying messages on a topic stream with a lot of message is very slow if they are at the end of the stream #5072

Closed
david-wakeo opened this issue Feb 13, 2024 · 18 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@david-wakeo
Copy link

david-wakeo commented Feb 13, 2024

Observed behavior

Given a stream with:

  • 3M messages with a total of 10 GiB
  • 200k subjects in the form domain.{type}.fixed_part.{id}.{name}
  • cardinality on {type} or {name} is low and should be assumed to be around 10
  • cardinality on {id} is ~40k

Retrieving messages on domain.*.fixed_part.{id}> takes up to 30 seconds and cause used memory on the server to spike to ~2.3 GiB.

Note the delay is strongly correlated to where the messages are located in the stream and is more prevalent if all messages are toward the end of the stream

Expected behavior

If this use case is supported then retrieving those messages should be fast (<1 s?) and memory usage should be kept at a reasonable level?
If it's not, maybe documentation could be improved to explain what performance to expect and/or pattern to avoid/use

Server and client version

Reproduced on nats-server:

  • nats 2.10.9
  • nats 2.10.10
  • nats nightly 20240121
  • nats nightly 20240118

on 2 different nats-client:

  • nats cli v 0.1.1 (installed with nix)
  • nats.ws@1.16.1

Works as expected on:
- nats nightly 20240119

See this comment for the detail

Host environment

No response

Steps to reproduce

No response

@david-wakeo david-wakeo added the defect Suspected defect such as a bug or regression label Feb 13, 2024
@david-wakeo
Copy link
Author

Hello,

Some additional context on this issue:

Thanks for the help

@derekcollison
Copy link
Member

Can you send me a snapshot of the stream in question?

@david-wakeo david-wakeo changed the title Replaying messages on a stream with a lot of message is very slow Replaying messages on a topic stream with a lot of message is very slow if they are at the end of the stream Feb 13, 2024
@david-wakeo
Copy link
Author

I have edited the topic/observation because I forgot to mention it was mostly for messages that were at the end of the stream

Can you send me a snapshot of the stream in question?

I'll ask if I can provide the one I use as is. If not I'll try to generate an equivalent one

@david-wakeo
Copy link
Author

I sent you the information to download the backup by mail

@david-wakeo
Copy link
Author

I tested the nightly 20240214 which embeds #5080 which deals with this bug

It's significantly faster now and on par with what one would when replaying from start.
I noticed some noticeable delays when replaying with a custom start time though. In the example I gave you:

  • no date works fast
  • --since=45d works fast (but a bit slower)
  • --since=35d sometimes works
  • --since=40d never works

As I understand, there are ongoing design effort on a new filestore which should handle all of those use cases. With that in mind, the current performance is quite good for my use case so I wouldn't mind considering closing this issue with the current solution.

@derekcollison
Copy link
Member

We discovered one more optimization with memory that will also land in 2.10.11.

#5083

@derekcollison
Copy link
Member

Should the since 35d and 40d work?

@david-wakeo
Copy link
Author

We discovered one more optimization with memory that will also land in 2.10.11.

#5083

Wow, thanks!

Should the since 35d and 40d work?

There is data to return for both queries so I would have expected it to work yes. But let me confirm it after I test again on the version with #5083

@derekcollison
Copy link
Member

ok keep us posted. And this is same stream we have correct?

@david-wakeo
Copy link
Author

I confirm that the 35d and 40d should have worked since both 45d(fast) and 30d(fast) did work and had data (on #5080)

On #5083, it's still the same but indeed the memory used is much lower and I end up on the same ~300MiB you told me in mail.

I tried it on another id with a different pattern where messages are present from the very start to the very end of the stream at regular interval and I noticed an improvement of 2.6~3.0 GiB for the full read on 2.10.9 (2.10.10 didn't even load the messages) to 1.6-2.0 GiB on the latest nightly. The speed is comparable.

So it looks like an overall great improvement

@derekcollison
Copy link
Member

I will take a look around those messages not being retrieved - Thanks..

@david-wakeo
Copy link
Author

Thanks a lot although to be fair it may be a peculiarity of the nats-cli client. With the nats.ws client, I could retrieve those messages on nats-server 2.10.10 although it was very slow

@derekcollison
Copy link
Member

derekcollison commented Feb 14, 2024 via email

@derekcollison
Copy link
Member

derekcollison commented Feb 14, 2024 via email

@derekcollison
Copy link
Member

derekcollison commented Feb 15, 2024

#5088 fixes the slow startup by time, e.g --since 30d. Will be in 2.10.11..

Thanks for the info, much appreciated!

@derekcollison
Copy link
Member

ok merged to main and kicked nightly build.

@david-wakeo
Copy link
Author

Thanks again for the fix and the explanation

I tested it and it works great indeed!

Unless you had more ideas on this topic, I think we can close this issue?

Essentially the time to sequence is fine, it's fast enough (binary search),
but we then blindly take that and do not consider that we will walk through
alot of stuff (in terms of blocks) to get to the real first one.. So I will
see if there is anything we can do there. We do not have perfect
information but we have some that might improve it..

As I understand you now jump to the first sequence of the consumer with a fast binary search and indeed it works great. But what about the next message in the consumer. Are those searched by a linear walk starting from first message or is there also some form of indexing that allows to skip ahead?

I ask because for the example I gave you, the message are all bundled together and are therefore fast to retrieve. But for another example where messages are spread out, the time of retrieval is slower with a larger memory consumption.

To be fair, unlike previous instances, performance is totally acceptable for a replay use case and I ask out of curiosity

@derekcollison
Copy link
Member

We improved that as well with #5089.

Thanks for the information and report. Will close for now, but feel free to open if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

2 participants