pubsub: dynamically choose the number of messages for ReceiveBatch #1200

jba · 2019-01-27T12:01:06Z

To decide how many messages to pull at a time, we aim for the
in-memory queue of messages to be a certain size. That gives us a
buffer of messages to draw from, ensuring high throughput, without
pulling so many messages that the unconsumed ones languish.

We measure the size by time instead of message count. Time is more
relevant, because ack deadlines are expressed in time, and it's easier
to think about lost work (in the event of a crash) in terms of time
lost rather than messages lost.

We keep track of the average time it takes to process a message. Then
we can convert a queue size in time to a number of messages.

We compute processing time by measuring the time between when Receive
returns and when it is next called. Although this is incorrect in the
short term, because multiple goroutines may call Receive at the same time,
in the long run it is accurate enough.

We rejected the obvious alternative, measuring time from Receive to
Ack, because not every message will be acked. It is perfectly
reasonable for a subscriber to nack (or fail to ack) a significant
fraction of the messages it receives, but processing time for those
unacked messages should still be included in the calculation of how
many messages to pull.

This change significantly improves the Receive benchmark -- messages
per second is more than quadrupled. But there is more work to do. We
should pre-emptively pull messages when the queue size gets low, and
we should issue multiple ReceiveBatch calls concurrently.

Besides performance, this change also improves behavior over current
master at very low processing rates. Currently we pull a constant 10
messages per ReceiveBatch. If it takes a long time to process one
message, then the other 9 will sit in RAM and may expire. With this
change, we will pull just one message at a time if need be.

Addresses #691.

To decide how many messages to pull at a time, we aim for the in-memory queue of messages to be a certain size. That gives us a buffer of messages to draw from, ensuring high throughput, without pulling so many messages that the unconsumed ones languish. We measure the size by time instead of message count. Time is more relevant, because ack deadlines are expressed in time, and it's easier to think about lost work (in the event of a crash) in terms of time lost rather than messages lost. We keep track of the average time it takes to process a message. Then we can convert a queue size in time to a number of messages. We compute processing time by measuring the time between when Receive returns and when it is next called. Although this is incorrect in the short term, because multiple goroutines may call Receive at the same time, in the long run it is accurate enough. We rejected the obvious alternative, measuring time from Receive to Ack, because not every message will be acked. It is perfectly reasonable for a subscriber to nack (or fail to ack) a significant fraction of the messages it receives, but processing time for those unacked messages should still be included in the calculation of how many messages to pull. This change significantly improves the Receive benchmark -- messages per second is more than quadrupled. But there is more work to do. We should pre-emptively pull messages when the queue size gets low, and we should issue multiple ReceiveBatch calls concurrently. Besides performance, this change also improves behavior over current master at very low processing rates. Currently we pull a constant 10 messages per ReceiveBatch. If it takes a long time to process one message, then the other 9 will sit in RAM and may expire. With this change, we will pull just one message at a time if need be. Addresses google#691.

pubsub/averager.go

vangent · 2019-01-28T18:27:32Z

pubsub/averager.go

+	"time"
+)
+
+// An averager keeps track of an average value over a time interval.


Why not

https://github.com/VividCortex/ewma ?

Well, I didn't know about it. (How did you find it?) It looks much more sophisticated than what I have, which is nice. But it says

Current implementations assume an implicit time interval of 1.0 between every sample added. That is, the passage of time is treated as though it's the same as the arrival of samples. If you need time-based decay when samples are not arriving precisely at set intervals, then this package will not support your needs at present.

which means we can't use it.

But you raise the question whether time-based decay is what we want. I thought about implementing it, but decided the complexity wasn't worth it. I'm not sure it's going to matter much whether the contribution of 1-minute-old points is reduced or not. It's more important that their contribution eventually goes to zero, which my implementation achieves rather abruptly.

I found it like this:
https://www.google.com/search?q=golang+moving+average

which means we can't use it.

Is that true? I am convinced by your argument about using the predicted time-to-process of the queue to determine the batch size, but I'm not sure that that implies that we need to use time-based bucketing for the estimate of how long each message takes to process.

I'm not sure that that library will do better than what you have here, but it's worth thinking about. Are there scenarios where exponential decay will do the wrong thing? (probably).

What about an app that gets bursts of message periodically (e.g., every 15m it gets 1000 messages). IIUC, your impl will start at a constant (currently 1) for each burst and ramp up (because after 1m it forgets all about previous history). The decaying moving average wouldn't forget anything during the idle 15m. Which is better? (not obvious, TBH).

I see your point. It seemed obvious to me that we wanted the number to decay in proportion to elapsed time, but you're right, there's no strong reason for that (aside from a hand-wavy argument about things generally having some time locality).

ewma will forget in proportion to the number of messages processed, rather than time. Maybe that's fine. If it took you 100ms to process a message three days ago, why wouldn't it take the same time now? On the other hand, system behavior tends to be spiky. You want to ride the spikes when they happen, then quickly forget about them. But I'm really just speculating here.

I'm fine doing any of the following:

Keeping my code.

Switching to ewma

Doing a better job of time-based moving average. I think doing it exactly requires saving every point, but we could combine bucketing with a decay factor.

In any case, we should add this discussion to the issue, or a new issue.

Let's discuss at stand-up tomorrow.

pubsub/averager_test.go

pubsub/pubsub.go

vangent · 2019-01-28T18:34:53Z

pubsub/pubsub.go

 		s.waitc = make(chan struct{})
 		s.mu.Unlock()
 		// Even though the mutex is unlocked, only one goroutine can be here.
 		// The only way here is if s.waitc was nil. This goroutine just set
 		// s.waitc to non-nil while holding the lock.
-		msgs, err := s.getNextBatch(ctx)
+		msgs, err := s.getNextBatch(ctx, nMessages)


Test is failing:

drivertest.go:271: pubsub (code=Unknown): replayer: request not found: subscription:"projects/go-cloud-test-216917/subscriptions/TestConformance_TestSendReceiveTwo-subscription-1" max_messages:936

That looks likely to be flaky....

It's actually 9364. The Mac tests run a bit slower. I changed the cap to 1000 for now.

9364? Why? (why isn't this flaky?) Seems like it's dependent on timing, no?

I got the number 9364 from the actual travis log. I think you dropped a digit when you copied it. The relevant point is that it's > 1000.

It is time-dependent. But with a cap of 1000, it would have to take longer than 1ms between calls to Receive on average, and considering that we do nothing but call Ack, that's extremely unlikely.

I agree, though, that this solution isn't great.

pubsub/averager.go

vangent · 2019-01-28T22:30:54Z

pubsub/averager.go

+	"time"
+)
+
+// An averager keeps track of an average value over a time interval.


which means we can't use it.

Is that true? I am convinced by your argument about using the predicted time-to-process of the queue to determine the batch size, but I'm not sure that that implies that we need to use time-based bucketing for the estimate of how long each message takes to process.

I'm not sure that that library will do better than what you have here, but it's worth thinking about. Are there scenarios where exponential decay will do the wrong thing? (probably).

What about an app that gets bursts of message periodically (e.g., every 15m it gets 1000 messages). IIUC, your impl will start at a constant (currently 1) for each burst and ramp up (because after 1m it forgets all about previous history). The decaying moving average wouldn't forget anything during the idle 15m. Which is better? (not obvious, TBH).

vangent · 2019-01-29T02:01:01Z

pubsub/averager.go

@@ -49,10 +49,10 @@ func newAverager(dur time.Duration, nBuckets int) *averager {
 func (a *averager) average() float64 {


Nit: consider making this Average since it's an "external" function? (even though the whole struct isn't). Not sure what typical Go style for this is, but I think it can be useful to be clear about what the expected API of internal objects is even if it's not enforced.

This code could also be moved into the internal/batcher package.

Removed the whole thing.

ijt · 2019-01-29T17:11:36Z

pubsub/pubsub.go

+	// messages will wait in memory for a long time, possibly timing out (that is,
+	// their ack deadline will be exceeded). Those messages could have been handled
+	// by another process receiving from the same subscription.
+	desiredQueueLength = 2 * time.Second


Queue lengths are not defined in units of time. They are counts, so this should have a different name. Would desiredTimeInQueuePerMessage be more accurate?

Are you okay with desiredQueueDuration?

Sure, as long as the comment explains what it means. I think it's how long we want a message to spend in the queue, although I find this idea a bit strange since I don't have a preference about how long messages stay in the queue so long as it's a lot less than the ack deadline.

although I find this idea a bit strange

Exactly, that's why I don't think that's the right way to think about it. It's more like, we want to keep a few messages around in case Receive speeds up and starts chewing through them faster than it has been. In other words, a buffer. How many messages do we want to keep in the buffer? No, that's the wrong question: how much runway (in time) do we want before we run out of messages?

Yes, if Receive always takes the same time, then a message will spend all that time in the queue. But that's a side effect, the price we pay for having messages available on demand to improve throughput.

I see, so to write it in Javanese, it would be more like desiredAmountOfTimeBeforeQueueProbablyRunsOut. Makes sense.

vangent · 2019-01-30T21:19:53Z

There's a lot going on here (not just on this PR), and we need a path forward.

How to estimate "time to process" and "time to fetch"? Time-bucket-based moving average, decaying average, pick a constant, ....
For "time to process" in particular, which timepoints to use (only Receive, or Receive -> Ack).
When to fetch (wait till buffer is empty, predicted-buffer-empty-time minus "time to fetch" estimate, that plus some padding, multiple concurrent fetches makes it even more complicated).
How many to fetch.

We don't have to solve all of these at once. Let's start with simple changes that make big improvements and see how far that gets us. I propose:

Something like this PR that uses a simple estimate for "time to process" to decide how many to fetch. I do think we need to resolve blob: no way to distinguish non-existent blob #2 above (which timepoints to use), and I think I agree with @ijt that using Receive + Ack is simpler.
Next step IMHO would be pre-fetching (blob/fileblob: use slash paths instead of filesystem paths for keys #3) with a constant padding (e.g., 1s before we predict our buffer will be empty, make the RPC).
After that we can try different models for "time to process", try modeling "time to fetch" instead of using a constant, and making multiple fetches concurrently.

Thoughts?

jba · 2019-01-30T21:23:01Z

Totally agree.

Let me fix the name of the constant as @ijt requested, and change the measurement to end at ack. Then I think this is a workable and useful PR.

…aster-receive

jba · 2019-01-30T21:58:58Z

PTAL.

pubsub/pubsub.go

jba · 2019-01-30T22:52:42Z

PTAL.

vangent · 2019-01-30T23:02:03Z

Separately, it might be worth writing a fake Subscription driver implementation that can simulate timing for getting batches of messages. E.g., give it a minimum RPC duration, an incremental duration-per-message-requested, a distribution, etc. Then we could run benchmarks against it at various settings.

Similarly for a fake Receiver that simulates how long it takes to process messages.

ijt

LGTM

jba requested review from ijt and vangent January 27, 2019 12:01

go-cloud-bot bot assigned ijt and vangent Jan 27, 2019

googlebot added the cla: yes Google CLA has been signed! label Jan 27, 2019

jba mentioned this pull request Jan 27, 2019

pubsub: dynamically adjust number of messages fetched by Receive #691

Closed

vangent requested changes Jan 28, 2019

View reviewed changes

vangent reviewed Jan 28, 2019

View reviewed changes

reviewer comments

d0e8dc0

vangent reviewed Jan 29, 2019

View reviewed changes

switched to simple moving average

97a23fa

ijt suggested changes Jan 29, 2019

View reviewed changes

jba added 2 commits January 30, 2019 16:25

Merge branch 'master' of https://github.com/google/go-cloud into ps-f…

60b6aef

…aster-receive

measure process time by ack

c839c14

vangent reviewed Jan 30, 2019

View reviewed changes

pubsub/pubsub.go Outdated Show resolved Hide resolved

pubsub/pubsub.go Outdated Show resolved Hide resolved

init avg time to first point

891db08

vangent approved these changes Jan 30, 2019

View reviewed changes

ijt approved these changes Jan 31, 2019

View reviewed changes

jba merged commit 35df681 into google:master Jan 31, 2019

jba deleted the ps-faster-receive branch March 16, 2019 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pubsub: dynamically choose the number of messages for ReceiveBatch #1200

pubsub: dynamically choose the number of messages for ReceiveBatch #1200

jba commented Jan 27, 2019

vangent Jan 28, 2019

jba Jan 28, 2019

vangent Jan 28, 2019

vangent Jan 28, 2019

jba Jan 28, 2019

vangent Jan 29, 2019

vangent Jan 28, 2019

jba Jan 28, 2019

vangent Jan 29, 2019

jba Jan 29, 2019

vangent Jan 28, 2019

vangent Jan 29, 2019

jba Jan 29, 2019

ijt Jan 29, 2019

jba Jan 29, 2019

ijt Jan 29, 2019

jba Jan 29, 2019

ijt Jan 30, 2019

vangent commented Jan 30, 2019

jba commented Jan 30, 2019

jba commented Jan 30, 2019

jba commented Jan 30, 2019

vangent commented Jan 30, 2019

ijt left a comment

		@@ -49,10 +49,10 @@ func newAverager(dur time.Duration, nBuckets int) *averager {
		func (a *averager) average() float64 {

pubsub: dynamically choose the number of messages for ReceiveBatch #1200

pubsub: dynamically choose the number of messages for ReceiveBatch #1200

Conversation

jba commented Jan 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vangent commented Jan 30, 2019

jba commented Jan 30, 2019

jba commented Jan 30, 2019

jba commented Jan 30, 2019

vangent commented Jan 30, 2019

ijt left a comment

Choose a reason for hiding this comment