-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add idleTimeoutDecay param to QTP #9354
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
porting over comments from magibney@dfab46b
jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java
Outdated
Show resolved
Hide resolved
jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java
Outdated
Show resolved
Hide resolved
long itNanos = TimeUnit.MILLISECONDS.toNanos(idleTimeout); | ||
if (NanoTime.elapsed(idleBaseline, now) > itNanos && | ||
NanoTime.elapsed(last = _lastShrink.get(), now) > (siNanos = getShrinkInterval()) && | ||
_lastShrink.compareAndSet(last, Math.max(last, now - itNanos) + siNanos)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of the inline assignment as it is hard to read.
Also I'm not sure we actually need to check the idle timeout here, nor that the we need an idleBaseline.
Couldn't this just be:
long itNanos = TimeUnit.MILLISECONDS.toNanos(idleTimeout); | |
if (NanoTime.elapsed(idleBaseline, now) > itNanos && | |
NanoTime.elapsed(last = _lastShrink.get(), now) > (siNanos = getShrinkInterval()) && | |
_lastShrink.compareAndSet(last, Math.max(last, now - itNanos) + siNanos)) | |
if (NanoTime.elapsed(last, now) > getShrinkInterval() && _lastShrink.compareAndSet(last, now)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b082e96 removes the inline variable assignment (and adds some comments for clarity).
iiuc though, we definitely do still want to check idle timeout here (and need idleBaseline
), otherwise idleTimeout
is effectively a ceiling on the amount of time a thread might wait before considering itself potentially eligible to be removed from the pool, and I had thought we'd want idleTimeout
to be a floor for the amount of time a thread must be idle before considering itself to be eligible for removal.
If we don't check idleTimeout
here, then the only place we're really using idleTimeout
is in idleJobPoll()
-- so a thread will always try to remove itself (wrt only the shrinkInterval
), unless a job is immediately available. Respecting idleTimeout
as a floor for determining when a thread is eligible to die should help the pool grow more consistently (to accommodate high load), while allowing (via idleTimeoutDecay) the pool to shrink more efficiently once we're past the idleTimeout
window of a transient spike.
@magibney Probably time to mark this as non draft. |
Thanks again for the review/consideration, and please let me know if there's anything more I can do to assist! Not sure how much more work is anticipated on this PR from your end, but if there's any chance this might be able to make it into 10/11.0.14, that would be awesome. For context: Solr would immediately benefit from this feature (any memory not being used for idle threadpool capacity will passively be used by OS page cache). Of course, no worries if this doesn't make it into 10.0.14, but I figured it couldn't hurt to ask explicitly! |
@magibney it likely will not make Jetty 1x.0.14, since this mechanism is delicate, we likely have people relying on the current behavior, and the new behavior does not have a test case, so we would like to give it more consideration. |
What currently happens is that a thread idle times out, and looks at another "time" to decide whether to exit or go back polling from the task queue. I'd call this extra time Let's consider the case of 10 threads idling out at the same moment, which may be a common case during load spikes; let's assume one of them exits, so it sets the What is the right name for this property? @magibney sorry if we're being picky on this one, but it's a key Jetty component that we don't want to modify lightly. Let us know if you would be down to implement the above, along with a test case, otherwise we'll keep your contribution and do it ourselves. |
Totally understandable!
Agreed that this is a useful case to consider, and likely to be common in practice; but the behavior you describe is not exactly what should happen with the PR as currently implemented, and in fact I think that the current PR should exhibit the desired behavior. When a thread expires, it doesn't set
The proposed So in the example case, if I have some test cases (preparing a plugin variant to tide us over until a release), but wasn't sure where/how you'd want to add test cases here. If you can provide rough guidance on where/how you'd like to see tests added, I'd be happy to take a crack at porting my existing tests over. |
The behavior should be nearly identical for the default case. I say "nearly" because there is a subtle difference: Previously, the determination of whether a thread was eligible for removal was coordinated via the pool-scoped Ideally (even without this PR) one would arguably want pool shrinkage be considered based on how long some thread has been idle (indicating sustained idle capacity in the pool). This distinction matters more when decoupling
The worst-case limit of the worst-case scenario would be: many threads, serving very quick jobs, spending the vast majority of their time idle, but each receiving a job just often enough to keep them all alive. Perhaps it is indeed worth explicitly guarding against this somehow ... |
I'm just stepping back again on this one. What is the issue of just setting a very short idleTimeout? Threads do nothing on idle but check if they need to shrink, so why have an shrinkTime that is different to the idleTime? Also, I'm not sure the 10 thread example works they way you think. Even with idleTimeout=60 and idleTimeoutDecay=5, then 60s after a spike, 10 threads idle out. 1 winds the Cas and advances the lastShrink timeout. The another of the 9 could shrink in 5s time, but it now waits 60s for another job, so at that time, 9 threads wake up, 1 shrinks, 8 can shrink in 5s, but they all sleep for 60s. Secondly, why not just have an idleTimeout of 5s? Also, perhaps we need some randomness in the time that threads sleep, so they don't all wakeup at the same time? Perhaps an idle timeout of 10s means 0-10s ? This way peaks will be spread out and it might make sense to have a shrink time different to idle time? So more work is definitely needed, if only to convince ourselves this is the right approach. |
@magibney no, the current PR does not work as you think it does 😃 I think I'm leaning with @gregw on this: a short idle timeout is all that is needed to recover memory quickly, without the need of introducing another property that we struggle to identify. |
possible ... though I do have tests that verify the behavior as far as I've described it (perhaps could stand to be made more robust?). In any event, agreed to put this on hold while I gather the tests together, which should add clarity/confidence. I'll be doing this either way for a pluggable implementation, and hope to pick this back up before long. The main downside of short I'll workshop this.a bit though. Thanks again to you all for your consideration and guidance! |
@magibney i don't think thrashing is a problem. Either the server is busy, in which case threads wake up with a job. Or it's not and then a timeout every few seconds is no big deal while the pool shrinks to a reasonable size. |
I've addressed the race condition inherent in tracking "idle baseline" per-thread, now precisely tracking idle capacity globally for the pool. The new tests exercise and demonstrate the expected behavior (I think the http3 test failure on the CI build is unrelated?). Thrashing is not a problem when you use But with the proposed workaround, we'd be using It's obviously a bigger problem at the limit, e.g.: I want to accommodate spikes from 2k to 10k threads, but beyond a 2-minute window, I want the pool to shrink as quickly as possible (to make memory available to the OS page cache). We're forced to choose between "2-minute window" and "shrink as quickly as possible". If I prioritize the latter, e.g. by setting
The example |
@magibney this PR has now become quite complicated -- I have reservations. Can you please detail what would be the problem with a short idle timeout that this PR would solve? From your last comment, I gather that the case of "I spiked to 10k threads, I would like to keep them around for a couple of minutes, but beyond that shrink as fast as possible" seems really specific, and I gather this PR still won't solve this particular case? |
Thanks for taking a look.
This PR definitely solves that case -- if it didn't then there'd be no purpose to this PR at all! The fundamental problem is that as long as threads are alive and not being used, they consume memory. For large numbers of threads, even a very short timeout adds up to idle threads hanging around way longer than they're useful (to decay all of 8k threads in 2 minutes would currently require an absurdly short Most of the complexity of this PR now is in the test cases, and to really adequately exercise the common and edge cases I don't see a way around the complexity. The main class changes are actually pretty straightforward. It may also be worth noting that it would be possible to make |
Maybe. I have to say that it's difficult to follow the code for me. I find the calls to IIUC a random thread writes at a random index its "idle" nanoTime, and another thread reads from a different random index the "idle" nanoTime of another thread, to decide whether to exit or not; it's not intuitive (at least for me). With the right interleaving, only Sorry! I think I have a simpler alternative. |
Sounds good; I'm all for a simpler alternative if that's achievable (and again, thoroughly appreciate the caution and energy with which you're approaching this issue)! wrt this PR "solving that case": I put a considerable amount of effort into the test cases, and am quite confident in the behavior.
That's by design. Idle capacity at the pool level is tracked instead of idle capacity of individual threads, since the latter is dependent on random thread scheduling, and the former is more meaningful anyway.
This also is by design, though I'd emphasize after the first writes. It would be
It's good to call this out; I did this knowingly and I think it should be ok. For practical purposes, we can lean on the fact that the stored values are effectively relevant within the tolerance of Again, the tests bear all this out, and are designed to stress the system pretty hard, asserting very tight tolerances wrt expected shrink behavior. |
includes some changes to make it easy to run against current `main` branch (which doesn't support configurable shrink rate)
0bf3c16 illustrates an approach that extracts all the shrink logic into a single interface. This helps with clarify (at least as a point of reference for discussion), and also allows for the default shrink rate of 1 to be functionally identical to the existing implementation on |
also bring in the QTPBenchmark class
See discussion on #9237