Replaying messages on a topic stream with a lot of message is very slow if they are at the end of the stream #5072

david-wakeo · 2024-02-13T15:14:01Z

Observed behavior

Given a stream with:

3M messages with a total of 10 GiB
200k subjects in the form domain.{type}.fixed_part.{id}.{name}
cardinality on {type} or {name} is low and should be assumed to be around 10
cardinality on {id} is ~40k

Retrieving messages on domain.*.fixed_part.{id}> takes up to 30 seconds and cause used memory on the server to spike to ~2.3 GiB.

Note the delay is strongly correlated to where the messages are located in the stream and is more prevalent if all messages are toward the end of the stream

Expected behavior

If this use case is supported then retrieving those messages should be fast (<1 s?) and memory usage should be kept at a reasonable level?
If it's not, maybe documentation could be improved to explain what performance to expect and/or pattern to avoid/use

Server and client version

Reproduced on nats-server:

nats 2.10.9
nats 2.10.10
nats nightly 20240121
nats nightly 20240118

on 2 different nats-client:

nats cli v 0.1.1 (installed with nix)
nats.ws@1.16.1

Works as expected on:
- nats nightly 20240119

See this comment for the detail

Host environment

No response

Steps to reproduce

No response

The text was updated successfully, but these errors were encountered:

david-wakeo · 2024-02-13T15:18:51Z

Hello,

Some additional context on this issue:

I initially opened it as a discussion (Memory usage when replaying messages with filter subject on a large stream #4906)
a fix was made ([IMPROVED] Do not load blocks when calling LoadNextMsg for the first time after restart or long inactivity #4969) and I confirmed it worked on the next nightly it was released
after merge of [IMPROVED] NumPending calculations and subject index memory in filestore #4960 the problem came back (it was the only pr merged on the nightly the problem started to reappear)

Thanks for the help

derekcollison · 2024-02-13T15:49:58Z

Can you send me a snapshot of the stream in question?

david-wakeo · 2024-02-13T16:06:01Z

I have edited the topic/observation because I forgot to mention it was mostly for messages that were at the end of the stream

Can you send me a snapshot of the stream in question?

I'll ask if I can provide the one I use as is. If not I'll try to generate an equivalent one

david-wakeo · 2024-02-13T17:04:24Z

I sent you the information to download the backup by mail

david-wakeo · 2024-02-14T14:25:47Z

I tested the nightly 20240214 which embeds #5080 which deals with this bug

It's significantly faster now and on par with what one would when replaying from start.
I noticed some noticeable delays when replaying with a custom start time though. In the example I gave you:

no date works fast
--since=45d works fast (but a bit slower)
--since=35d sometimes works
--since=40d never works

As I understand, there are ongoing design effort on a new filestore which should handle all of those use cases. With that in mind, the current performance is quite good for my use case so I wouldn't mind considering closing this issue with the current solution.

derekcollison · 2024-02-14T16:55:08Z

We discovered one more optimization with memory that will also land in 2.10.11.

#5083

derekcollison · 2024-02-14T16:55:30Z

Should the since 35d and 40d work?

david-wakeo · 2024-02-14T17:00:46Z

We discovered one more optimization with memory that will also land in 2.10.11.

#5083

Wow, thanks!

Should the since 35d and 40d work?

There is data to return for both queries so I would have expected it to work yes. But let me confirm it after I test again on the version with #5083

derekcollison · 2024-02-14T18:44:37Z

ok keep us posted. And this is same stream we have correct?

david-wakeo · 2024-02-14T21:15:13Z

I confirm that the 35d and 40d should have worked since both 45d(fast) and 30d(fast) did work and had data (on #5080)

On #5083, it's still the same but indeed the memory used is much lower and I end up on the same ~300MiB you told me in mail.

I tried it on another id with a different pattern where messages are present from the very start to the very end of the stream at regular interval and I noticed an improvement of 2.6~3.0 GiB for the full read on 2.10.9 (2.10.10 didn't even load the messages) to 1.6-2.0 GiB on the latest nightly. The speed is comparable.

So it looks like an overall great improvement

derekcollison · 2024-02-14T21:17:25Z

I will take a look around those messages not being retrieved - Thanks..

david-wakeo · 2024-02-14T21:19:12Z

Thanks a lot although to be fair it may be a peculiarity of the nats-cli client. With the nats.ws client, I could retrieve those messages on nats-server 2.10.10 although it was very slow

derekcollison · 2024-02-14T21:51:49Z

Yes, agree with NATS cli (I build every morning) it works but again is slow to startup. I will take a peek and see if there is something that can be done on time to sequence mapping.

derekcollison · 2024-02-14T22:41:38Z

ok I see what is going on.. I think I have an idea to improve. Essentially the time to sequence is fine, it's fast enough (binary search), but we then blindly take that and do not consider that we will walk through alot of stuff (in terms of blocks) to get to the real first one.. So I will see if there is anything we can do there. We do not have perfect information but we have some that might improve it..

derekcollison · 2024-02-15T01:03:54Z

#5088 fixes the slow startup by time, e.g --since 30d. Will be in 2.10.11..

Thanks for the info, much appreciated!

derekcollison · 2024-02-15T01:34:37Z

ok merged to main and kicked nightly build.

david-wakeo · 2024-02-15T10:28:58Z

Thanks again for the fix and the explanation

I tested it and it works great indeed!

Unless you had more ideas on this topic, I think we can close this issue?

Essentially the time to sequence is fine, it's fast enough (binary search),
but we then blindly take that and do not consider that we will walk through
alot of stuff (in terms of blocks) to get to the real first one.. So I will
see if there is anything we can do there. We do not have perfect
information but we have some that might improve it..

As I understand you now jump to the first sequence of the consumer with a fast binary search and indeed it works great. But what about the next message in the consumer. Are those searched by a linear walk starting from first message or is there also some form of indexing that allows to skip ahead?

I ask because for the example I gave you, the message are all bundled together and are therefore fast to retrieve. But for another example where messages are spread out, the time of retrieval is slower with a larger memory consumption.

To be fair, unlike previous instances, performance is totally acceptable for a replay use case and I ask out of curiosity

derekcollison · 2024-02-15T12:42:33Z

We improved that as well with #5089.

Thanks for the information and report. Will close for now, but feel free to open if needed.

david-wakeo added the defect Suspected defect such as a bug or regression label Feb 13, 2024

david-wakeo changed the title ~~Replaying messages on a stream with a lot of message is very slow~~ Replaying messages on a topic stream with a lot of message is very slow if they are at the end of the stream Feb 13, 2024

derekcollison closed this as completed Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replaying messages on a topic stream with a lot of message is very slow if they are at the end of the stream #5072

Replaying messages on a topic stream with a lot of message is very slow if they are at the end of the stream #5072

david-wakeo commented Feb 13, 2024 •

edited

david-wakeo commented Feb 13, 2024

derekcollison commented Feb 13, 2024

david-wakeo commented Feb 13, 2024

david-wakeo commented Feb 13, 2024

david-wakeo commented Feb 14, 2024

derekcollison commented Feb 14, 2024

derekcollison commented Feb 14, 2024

david-wakeo commented Feb 14, 2024

derekcollison commented Feb 14, 2024

david-wakeo commented Feb 14, 2024

derekcollison commented Feb 14, 2024

david-wakeo commented Feb 14, 2024

derekcollison commented Feb 14, 2024 via email •

edited

derekcollison commented Feb 14, 2024 via email •

edited

derekcollison commented Feb 15, 2024 •

edited

derekcollison commented Feb 15, 2024

david-wakeo commented Feb 15, 2024

derekcollison commented Feb 15, 2024

Replaying messages on a topic stream with a lot of message is very slow if they are at the end of the stream #5072

Replaying messages on a topic stream with a lot of message is very slow if they are at the end of the stream #5072

Comments

david-wakeo commented Feb 13, 2024 • edited

Observed behavior

Expected behavior

Server and client version

Host environment

Steps to reproduce

david-wakeo commented Feb 13, 2024

derekcollison commented Feb 13, 2024

david-wakeo commented Feb 13, 2024

david-wakeo commented Feb 13, 2024

david-wakeo commented Feb 14, 2024

derekcollison commented Feb 14, 2024

derekcollison commented Feb 14, 2024

david-wakeo commented Feb 14, 2024

derekcollison commented Feb 14, 2024

david-wakeo commented Feb 14, 2024

derekcollison commented Feb 14, 2024

david-wakeo commented Feb 14, 2024

derekcollison commented Feb 14, 2024 via email • edited

derekcollison commented Feb 14, 2024 via email • edited

derekcollison commented Feb 15, 2024 • edited

derekcollison commented Feb 15, 2024

david-wakeo commented Feb 15, 2024

derekcollison commented Feb 15, 2024

david-wakeo commented Feb 13, 2024 •

edited

derekcollison commented Feb 14, 2024 via email •

edited

derekcollison commented Feb 14, 2024 via email •

edited

derekcollison commented Feb 15, 2024 •

edited