Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The first invocation of mockk exceeds the runTest default timeout #3800

Closed
PaulWoitaschek opened this issue Jul 4, 2023 · 23 comments
Closed
Assignees

Comments

@PaulWoitaschek
Copy link
Contributor

Run Test Timeout Issues

The new 10s runTest timeout which was introduced by #3603 is currently creating a lot of issues.

There are various discussions about it here:

And it boils down to basically this:
Users of ByteBuddy (for instance mockk users) face a very long initialization phase on the very first mocking initialization.

Example

For a Mac M1 Max, this test:

  @Test
  fun myMockingTest() {
    measureTime {
      mockk<UUID>()
    }.also {
      println("That took $it")
    }
  }

Takes about 1.5 seconds on my machine. CI machines are often times slower than that and might run compilations in parallel or even run multiple projects at once using virtualization.
This has lead to the fact that we do see a lot of sporadically failing CI jobs for tests which don't do much, basides switching to an IO dispatcher to write to a test database.

Proposed Solution

I proposed several solutions here:
https://kotlinlang.slack.com/archives/C1CFAFJSK/p1688326989830229?thread_ts=1688240798.449439&cid=C1CFAFJSK

With my preferred one being:

Making the test timeout configurable through a global option (i.e. an env variable that can be set for all tests from gradle).

That way when running tests locally they could quickly time out but depending on the setup, developers could configure a longer time on the CI where they know that the machine is actually slow

@qwwdfsad qwwdfsad added the test label Jul 19, 2023
@dkhalanskyjb dkhalanskyjb changed the title Make the runTest timeout globally configurable The first invocation of mockk exceeds the runTest default timeout Jul 24, 2023
@dkhalanskyjb
Copy link
Collaborator

I edited the title of the issue. The original one didn't describe the problem but proposed a solution. So far, it's unclear whether adding a global parameter is the way to go. So far, I didn't see any indications that the timeout we chose isn't sensible, we're dealing with an edge case here.

A solution to this specific problem, proposed in that same thread, is

/**
* Initialize ByteBuddy in advance so it doesn't skew the reported runtime of the test code due to a long initialization sequence.
*/
@BeforeTest
fun initializeByteBuddy() {
  // initialize here
}

This is an easy, straightforward fix for code bases with a class that every test class inherits from, and if you're using kotlinx-coroutines-test, it's likely that you have one in order to call Dispatchers.setMain.

@gildor
Copy link
Contributor

gildor commented Aug 15, 2023

This is an easy, straightforward fix for code bases with a class that every test class inherits from

This solution assumes that there is a class that all tests inherit, neither in our project nor in many projects which I know it's not the case.

It would be more reasonable if it would be a simple way to do this in a multi-module project to specify a custom test runner for all unit tests in all modules, but it's not the case.

One more problem with this proposal is that it will slow down every single affected test because of bytebyddy init, even ones that do not use mocking, and make the whole suite of tests is slower, not faster.

it's likely that you have one in order to call Dispatchers.setMain

We do not override setMain by default because it's unnecessary for most tests, so there is no base test class.
Situations between slow test init and setMain are very different because in case if setMain is required, test will fail immediately and will be fixed, but in case of timeout caused by long environment init, it most probably will pass locally, and maybe even on CI, but become a flaky test

10 seconds is a very arbitrary timeout, and an option to configure it would be very beneficial I see at least to reasonable scenarios:

  1. Original issue mentions one more arbitrary but reasonable baseline "10 seconds works for 99.9% of cases". But if 10 seconds start working only for 99% or 97% of test runs, so directly hit test stability. And if one wants to increase stability back to this 99.9%, it may be to increase the timeout to 15 seconds as an easy fix without creating their wrapper on top of runTest
  2. If I don't have any potential environment init issues, 10 seconds is just too long, and 1-2 seconds may be more than enough timeout for small classpaths without mock libs in it. Currently, one just cannot configure it and set smaller timeout. I would also point out that if the proposed workaround to move initializeByteBuddy before the test works for the first case, it doesn't work for this case when I just want to make sure that my tests are fast

It will never be a timeout that works for all projects, and the suggestion to significantly change the structure of all tests by using the user's wrapper on top of runTest or by having some base class doesn't look very friendly for projects affected by this.
Another issue is that it increases the chance of a developer's error if one accidentally uses runTest or creates a test class without extending this base class, test will probably pass but would introduce flakiness to the whole suite.

Using jvmArgs for junit runner probably would be the best solution, so at least it would be configured not on the level of a particular test but on the level of a module (and for most projects, it would be easy to apply it to all modules if necessary, or only to ones which have larger classpath, mocking libraries and so on)

@dkhalanskyjb
Copy link
Collaborator

One more problem with this proposal is that it will slow down every single affected test because of bytebyddy init, even ones that do not use mocking, and make the whole suite of tests is slower, not faster.

Are you sure? I see mentions of it being a one-time thing, like

Users of ByteBuddy (for instance mockk users) face a very long initialization phase on the very first mocking initialization.

If it's not the case, you are right, multiplying such heavy initialization costs throughout test suites is not acceptable.

It will never be a timeout that works for all projects

So far, 10 seconds seems to be just fine for everyone but bytebuddy users. If it turns out bytebuddy is indeed the only case when this issue is significant, maybe we can work out some workaround specific to bytebuddy and keep the timeout.

@gildor
Copy link
Contributor

gildor commented Sep 12, 2023

Are you sure? I see mentions of it being a one-time thing, like

One time, yes, but there are multiple cases when one want to run tests only in a single module or in a file, or a particular test (during development), in all those cases single test run will load bytebuddy even when it's not needed

Also, it's not necessary a one-time, because multiple parallel test executions in different classloaders are possible each will have this

So far, 10 seconds seems to be just fine for everyone but bytebuddy users

Any slow init will cause it, the fact that butebuddy is often mentioned, is because it's a part of most popular mocking library, so many affected by this, it doesn't mean that there are no other cases. I don't see why some butebuddy specific handling is needed, let users override timeout and keep 10 sec as default.

maybe we can work out some workaround specific to bytebuddy and keep the timeout.

Why not just expose tests timeout with system properties? For one project 10 seconds can be too little, for another too large.

I don't see any actual reason why timeout should be 10 seconds specifically, not 100, or 5 seconds

@dkhalanskyjb
Copy link
Collaborator

want to run tests only in a single module or in a file, or a particular test (during development)

Got it, thanks!

Any slow init will cause it

Does any other slow init commonly happen?

let users override timeout and keep 10 sec as default

If it turns out there is a use case for this (or if it turns out there is no nicer way to support the slow ByteBuddy init; I haven't looked into this yet), we'll do it.

Why not just expose tests timeout with system properties?

kotlinx-coroutines-test also works on JS and Native. If it turns out that, indeed, only ByteBuddy is the culprit, system properties are enough. If there is a legitimate use case that is not just a workaround around unfortunate behavior, it won't be enough, some programmatic API will be needed.

@rolgalan
Copy link

rolgalan commented Sep 12, 2023

We have around ~10k unit tests in one of our Android projects. We also use mockk quite extensively, and sometimes also Robolectric (which is also famously renowned by a quite slow loading time). We did this coroutines update last week and we had also some timeout issues in about a dozen of tests.

After inspection we noticed that the problem was actually because those tests were mocking heavily inside the runTest execution scope (which was not happening everywhere in our project, just in some places). As soon as we broke that pattern in favor of mocking outside of this scope, we stopped having timeout for our tests, as then the mockk/bytebuddy initialization penalty happens out of the test scope, and is not subject to any timeout.

Maybe @gildor something like this is what is happening to you and can solve your issue? See an example below:

@Test
fun `this test might timeout due mocking inside the test scope`() = runTest {
      // Test setup
      val what : Ever = mockk()
      every { some.method() } return mockk()

      //The actual test
      sut.testMe()
      verify { what.ever() }
}

@Test
fun `this test should work fine`() {
      // Test setup
      val what : Ever = mockk()
      every { some.method() } return mockk()

      runTest {
          // The actual test
          sut.testMe()
          verify { what.ever() }
      }
}

I believe this pattern might be quite extended because the official Android documentation recommends using runTest as expression body. I believe this is pattern is mainly to avoid using multiple runTest, so there is no other reason to follow it in tests that require to initialize multiple mocks. Actually their example contains only the actual test execution, so most probably this use-case (test methods that need to declare mocks) was not considered when designing that guideline. 🤷

@JakeWharton
Copy link
Contributor

so there is no other reason to follow it

It's required to be used as an expression body for multiplatform tests where the return type of the function changes based on platform.

@rolgalan
Copy link

so there is no other reason to follow it

It's required to be used as an expression body for multiplatform tests where the return type of the function changes based on platform.

I was not aware of that requirement for multiplatform, good point. But it's still valid for pure Java/Android projects. I have not much experience with KMP, but also I wonder if multiplatform requires it to be an expression body, or just requires a TestResult as the return of the function? Because then the suggested idea would still apply by just returning the runTest executed inside the function body. What would be the problem with such approach?

@Test
fun `this test should work fine`() : TestResult {
      // Test setup

      return runTest {
          // The actual test
      }
}

@PaulWoitaschek
Copy link
Contributor Author

Let's have a look at that linked PR that "fixes the test flakiness":
element-hq/element-x-android#1226

These are 406 lines of code which contributes to the overall test complexity and draws the developers attention away from what really matters.

"Add this warmup rule whenever you write a test" is putting a lot of burden on developers. And this will also introduce a lot of new feedback loops. Sometimes developer will forget this. And they won't directly notice it as it's just a source of flakiness. This will lead to situations where all of a sudden a new PR starts failing which is completely unrelated to the changed code.

A property would fix the JVM tests. For us that's the vast majority. Native tests are another story.
I have no real data to answer if 10 seconds is a good default for all currently existing multiplatform targets. From our observations, running tests on iOS / watchos simulators can sometimes be really slow, due to things I don't understand.

My preferred solution for now would be to set the default back to one minute and (maybe) introduce a property to change it on the jvm.

@dkhalanskyjb
Copy link
Collaborator

Let me list the things that it looks like we agree on:

  • The ten-second timeout is too low for codebases that rely on ByteBuddy for testing.
  • In a significant portion of codebases, mitigating this is too cumbersome/unreliable, so a more comfortable solution is needed.
  • Some system property to configure the timeout would solve the immediate need.

What we don't know yet:

  • Is this just ByteBuddy, or are there other cases where the timeout is too small?
  • Are there non-JVM cases where this matters?
  • Does a ByteBuddy-specific workaround exist? What would be its tradeoffs?

I think it would be more productive to focus on these questions. They are important because a system property is a very subpar API and should be avoided. Having some magic incantation in the depths of build.gradle.kts is problematic:

  • If we fix the issue some other way down the line, there's no way to communicate this to the people who have already set the property.
  • On the other hand, removing this incantation will not break anything immediately and instead introduce flakiness. Together with the previous item, it's a perfect recipe for code rot.
  • If we do have to introduce global configuration on all platforms somewhere down the line, the API surface will bloat even more. Better to get this right on our first try if possible.

Because of all this, in this project, system properties are usually reserved for workarounds for tricky, uncommon requirements.

@gildor
Copy link
Contributor

gildor commented Sep 20, 2023

@dkhalanskyjb Let's discuss it, but I feel that until this issue is solved, kotlinx.coroutines should revert 10 second timeout and return to 60 second one. It's not perfect, but at least it reduces chance of false-positive timeouts

@gildor
Copy link
Contributor

gildor commented Sep 20, 2023

@rolgalan

Maybe @gildor something like this is what is happening to you and can solve your issue?

We solved this issue by refactoring all tests in the app to use our version of runTest and writing a custom lint check to prevent using kotlinx.coroutines.test.runTest. What you are suggesting is indeed a solution, but it worsen readability of tests imo and require much more refactring that just replace import as our solution. Also error prone, very easy to forget to do this or even just call a method which does mocking under the hood

@dkhalanskyjb

it won't be enough, some programmatic API will be needed.

It's indeed a good point about multiplatform. But system properties are already a part of kotlinx.coroutines API there is system.property API to enable kotlinx.coroutines.debug or configure thread pool size with kotlinx.coroutines.scheduler.max.pool.size

So maybe a more generic multiplatform solution is required, but I think it's reasonable to support system property first, considering that it is already used for other APIs and allow to configure timeout

Is this just ByteBuddy, or are there other cases where the timeout is too small?

Thanks to the link by @PaulWoitaschek on element-hq/element-x-android#1226
If you check, it looks like it has nothing to do with bytebuddy, but with Molecule which doesn't use bytebuddy

Are there non-JVM cases where this matters

I think it's just a matter of amount of large/complex projects to get cases on other platforms. On JS it easily can be caused by dynamic module loading, which can use even network for this (so it may be slow only on first run, and only whille module is not cached)

Does a ByteBuddy-specific workaround exist? What would be its tradeoffs?

Warm-up is the only way which I see now, and the tradeoff is increased complexity and warming-up test which may not use mockk at all, especially harmful when they run in isolation from the rest of the suite

They are important because a system property is a very subpar API and should be avoided

They are not good, but I feel that a universal MPP solution for this shouldn't block this particular request and be extracted to a separate issue, maybe even outside of kotlinx.coroutines, but on level of Kotlin plugin and MPP

@dkhalanskyjb
Copy link
Collaborator

I feel that until this issue is solved, kotlinx.coroutines should revert 10 second timeout

I hope this issue will get solved before the next release one way or another, so there's no need to go back and forth.

But system properties are already a part of kotlinx.coroutines API there is system.property API to enable kotlinx.coroutines.debug

kotlinx.coroutines.debug is JVM-only, so it's okay.

configure thread pool size with kotlinx.coroutines.scheduler.max.pool.size

Back when the property was introduced, there were no thread pools on other platforms, so it made sense at the time. If we introduced it now, we would think about whether it makes sense to expose the same API on Native.

I think it's reasonable to support system property first

I don't, and I listed the reasons why in #3800 (comment)

If you check, it looks like it has nothing to do with bytebuddy, but with Molecule which doesn't use bytebuddy

and

On JS it easily can be caused by dynamic module loading, which can use even network for this (so it may be slow only on first run, and only whille module is not cached)

Thanks for the find w.r.t. Molecule! We'll have to dig into why Molecule is also taking too long, but I'm wondering: if it's always the first invocation, maybe we could special-case it in the library. Something like, if the timeout is not set explicitly, allow a couple of tests per run to exceed the timeout.

Warm-up is the only way which I see now, and the tradeoff is increased complexity and warming-up test which may not use mockk at all, especially harmful when they run in isolation from the rest of the suite

Another way I can think of immediately is to try to check the stack trace at the moment of cancellation, and if it contains a bytebuddy (or Molecule) initialization frame, prolong the timeout. Or we could use reflection at the moment of the timeout to check if bytebuddy was asked to initialize. This could probably work, and if it fixes the issue for everyone, it may be a good decision. There are probably yet more options I don't see yet.

They are not good, but I feel that a universal MPP solution for this shouldn't block this particular request and be extracted to a separate issue, maybe even outside of kotlinx.coroutines, but on level of Kotlin plugin and MPP

Nothing is blocked until a release is near.

@mreichelt
Copy link

Just to chime in on this: we're having thousands of tests, and we're quite affected by the new Coroutine change dropping the timeouts from 60s to 10s. Some teams use mockk, some use Robolectric, some just do more stuff (e.g. Koin, just doing more things, etc.).

Not having a global timeout that we can change now forces us to spend developer hours on finding the tests that now became flaky. This is also quite tricky, considering that most tests run fine under normal circumstances, but somehow fail when run on an older Intel Mac, or on a slow CI instance, or just some other application (Slack/Teams/Chrome/etc.) wastes CPU cycles on a personal developer machine, or some Gradle workers decided to run on the same CPU.

Having a (let's say opt-in or deprecated) timeout that we could change globally would have at least eased our pain, because then we would have had more time to upgrade.

Another plus for a configurable global runTest timeout: by setting this to an extreme value (e.g. 1 second) we would be able to:

  • find slow tests much easier in our codebase 👌
  • level up our testing speed 💡 : iterate from slow speed (e.g. start at 30s) and then making our codebase healthier by iterating through to 20s, 10s, 5s, etc.

@gildor
Copy link
Contributor

gildor commented Nov 16, 2023

Another way I can think of immediately is to try to check the stack trace at the moment of cancellation, and if it contains a bytebuddy (or Molecule) initialization frame, prolong the timeout

Personally, I would be firmly against such a solution, it tries to solve what is not a problem and move the burden of decision on coroutines library instead give the owner of code base decide, who knows how many similar cases could be out there, and especially if this slow init caused by your own code, there is no way to add hack for it on level of kotlinx.coroutines. If I have my own project/library that has long init, let's say I have some dynamic network class loader, external mock server that is starting on a separate machine, or any other complex setup, it will be affected by this and it cannot be easily fixed, as stated in previous comment by @mreichelt it requires a lot of time just to find affected tests, not mentioning to fix them

@dkhalanskyjb
Copy link
Collaborator

A heads-up.

The lead about Molecule turned out to be very valuable. Digging in deeper, I discovered that Molecule has nothing to do with the problem, and the warm-up procedure works only somewhat incidentally. The real culprit is the coverage checking enabled for these tests: the coverage framework performs bytecode instrumentation for all classes it encounters. So, the reason warm-up works is that it touches sufficiently many classes that this instrumentation doesn't happen as much during the test execution itself.

This certainly rules out the option to patch the timeout on a case-by-case basis: here, the time loss is not in flipping some global switch like "bytebuddy was successfully initialized," they are spread throughout execution, even if the losses are much more pronounced when tests only start executing. The option to special-case timeouts so that a couple of them may slightly miss the time goal also doesn't seem all that robust anymore: I'm not even sure the Element X project won't need to introduce another warm-up procedure like the first one, but one that touches another set of classes largely disjoint from the first one, so, allowing just how many tests to miss the mark is enough? 2? 3? 5? This behavior would be a bit too brittle and magical.

The problem in the Element X project also seems general enough to potentially affect other projects: what if the code is heavily instrumented, significantly slowing it down?

So far, the solution seems to be twofold:

  • Introduce some API to globally control the timeout. There doesn't seem to be a way around it.
  • But also detect if bytebuddy was initialized during the test and have a more lenient timeout in that case, so that much fewer projects need to even learn about the API and have to give up the 10-second timeout (which so far still seems like an appropriate default for a typical project).

I looked into detecting whether bytebuddy is installed, and it seems easy enough. Mockk seems to work via ByteBuddyAgent (https://www.javadoc.io/doc/net.bytebuddy/byte-buddy-agent/1.7.3/net/bytebuddy/agent/ByteBuddyAgent.html), and getInstrumentation allows checking whether the instrumentation was already installed. One check before runTest and one check after a timeout—if they aren't equal, then prolong the timeout a bit.

If anyone has any specific arguments for why 10 seconds is not enough time for a test, now is the time to provide them.

@mreichelt
Copy link

mreichelt commented Nov 16, 2023

If anyone has any specific arguments for why 10 seconds is not enough time for a test, now is the time to provide them.

As mentioned before, there might be multiple reasons for having slower tests:

  • bytebuddy (mocking frameworks, etc.)
  • Robolectric
  • code coverage slowing tests additionally
  • the test is just doing a lot
  • Dependency Injection frameworks doing lots of work
  • many more

Most of these tests should actually fit into the 10s threshold, if run individually under good circumstances.
But the circumstances might be not so good:

  • your colleague has a slower machine than you
  • Lots of Gradle workers running (other workers slow down the one worker that is close to the 10s mark - sometimes)
  • Some other application decides to use CPU (Android Studio, Teams, iMovie, Slack, Chrome, ...)
  • The JVMs are not warmed up yet
  • The CI server might be slower
  • Many many more, which can not be foreseen

Don't get me wrong: I think having a 10s timeout can actually lead to some good, if we carefully migrate.
But is the change from 60s to 10s breaking things for many developers? Absolutely yes. Will they now spend cumulative developer years trying to fix all the things? Also yes.

If we consider this to be a breaking change, then it makes it more clear how a good migration could look like:

  • go back to 60s as a default and deprecate it
  • think of a new API that ups the stakes (lower timeout), but as an opt-in
  • wait for developers to migrate: they can now choose to update on their own time

@gildor
Copy link
Contributor

gildor commented Nov 16, 2023

I don't have anything against 10 seconds timeout as default, it's reasonable by itself
But I agree about Marc's comment above, there are many factors on larger projects and an option to set it globally solves the issue for many cases.

Maybe I would suggest creating a doc with a timeout explanation and how to change it if necessary as a part of the timeout fail message (or at least in a kdoc of this exception)

Still not sure about handling ByteBuddy explicitly, but probably it is indeed beneficial to avoid setting timeout globally higher for all tests, but how many of such cases will exist in future is hard to say, especially on MPP projects (maybe makes sense to have some kind global timeout policy handler for custom cases, but not sure how feasible it is)

@dkhalanskyjb
Copy link
Collaborator

I'd like to ask everyone once again to avoid descending into theoretic rhetoric and what-ifs that don't add anything to the discussion. We aren't proving theorems here, where a single theoretical counterexample breaks the whole logic, but doing engineering.

Examples of good contributions, the ones that meaningfully help:

Thank you, these are valuable leads that explain more about why some tests may take too long! On the other hand, just restating your points or aggressively trying to push your preferred solution won't get us anywhere.

Also, I don't understand the sense of desperation that seems to creep into the discussion, like

Will they now spend cumulative developer years trying to fix all the things? Also yes.

First of all, what's going on here? Where is this dramatic exaggeration even coming from? Reading this, one could get an impression that an industry-wide outage is in effect, even though the overwhelming majority of tests will always finish in a second no matter if you're running a dozen of copies of Slack in parallel.

Even if the tests that are slower are common, the change to the timeout is only since 1.7.0, and not much has happened in the test framework since 1.6.4. If this change causes much grief, are there any problems with sticking to kotlinx-coroutines-test:1.6.4 until we resolve this somehow? We won't leave you hanging. If you have shared the details of your problem here, it will be accounted for.

Most of these tests should actually fit into the 10s threshold, if run individually under good circumstances. But the circumstances might be not so good

I think my question, "is 10 seconds enough?" wasn't detailed enough.

Here are some examples of just how long is 10 seconds:

  • The Mocha test framework for JS has a timeout of 2 seconds by default, which does seem too extreme: it's possible to brush against it on a hiccup if your test does a lot of work or connects to the network. Many tests override this default: https://grep.app/search?q=this.timeout%28. Still, from a brief search, it looks like, for most projects, this default is fine.
  • I've recently brushed against the 10-second barrier when testing the code that formatted and parsed more than 100_000 different values with a backtracking parser, which is a lot of work.
  • In the Element X case, the CI occasionally hit the 10-second mark per test because every class gets instrumented, and that's a lot of code to instrument! See Add automatic instrumentation filter to project classes kotlinx-kover#418. Currently, adding a couple of obvious filters like "we don't want to check the test coverage of JetPack Compose or the Kotlin standard library" seems to reduce the amount of time taken by every test down to a couple of seconds at most without impacting what the tests actually do.

The point is that 10 seconds is a lot of time for a computer, and if you hit this limit somehow, there's a good chance your computer is doing a lot of meaningless work. (And you're also literally wasting cumulative developer-years of extra time staring at the screen and waiting for the tests to run). Of course, some tests are expected to take more than a couple of seconds to run, and that's mostly stress tests and exhaustive enumeration tests. But such tests are not at all standard!

So, the hypothesis under which we operate is that 2-4 seconds should be enough for almost any given test. Under especially bad circumstances, it should be able to run in 8 seconds. 10 should be enough even under bad circumstances. If it's not, then 10 seconds is not enough for that test.

At least—I should emphasize this—it's the way that I personally currently understand the situation. I may well be wrong. Hence the question: does anyone have sprawling codebases that contain lots of tests that are not just burning the CPU time for no clear reason but actually do something meaningful? What kinds of tests are these? @mreichelt, you mentioned some tests "just doing a lot"—could you elaborate?

go back to 60s as a default and deprecate it

We could close this issue today by restoring the timeout to 60 seconds and/or introducing a JVM system property to control it (I don't think anyone mentioned encountering this issue outside the JVM). No big deal. Most people would walk away happy, and we'd all forget about this. The problem is that this would be the lazy, suboptimal way out, and we still have the time to do better than that.

There are three conflicting goals at play:

  1. Developers want their tests not to eat too much CI time when they accidentally hang.
  2. Developers want their tests not to fail when the tested code is okay.
  3. Developers want not to spend time configuring their infrastructure's moving parts (though @mreichelt mentioned that someone might want to reduce the timeout gradually, so maybe this is not universally true)

With a 10-second timeout, we prioritized goal 1, compromising goal 2, causing some headaches. We should improve w.r.t. goal 2—while minimally compromising goals 1 and 3.

There are several different universes, and we don't know yet in which we live:

  • In one universe, 10 seconds is enough for all but a few select tests in every codebase. If a test takes longer, it needs to be investigated and fixed, so it's desirable that tests fail after 10 seconds: this way, the developer velocity could be improved. However, for some codebases, abruptly adding this new requirement is too problematic, so they need a temporary way to introduce the timeout gradually. In this case, the 10-second timeout is a splendid decision that we certainly shouldn't doubt, and what sort of "training wheels" for codebases we'll introduce doesn't matter much: it will always just be some temporary, discouraged API used on a need-to-know basis.
  • In another universe, 10 seconds is enough for all tests in most projects, but in some other projects, many tests take a lot of time to execute, and that's entirely expected. Adding a 10-second timeout to such projects is just an annoyance they don't want to ever deal with, so they would gladly disable this functionality and never look back. In this case, we need a well-defined, clearly documented, prominent way to increase the timeout, but 10 seconds is still a reasonable default.
  • In yet another universe, 10 seconds is enough only in some simple cases, but once your project reaches a particular stage of complexity, you'll have to go outside the limit. Then the 10-second limit is simply too low and mostly causes frustration, and we should revert its introduction.

Which one is it?

@dkhalanskyjb
Copy link
Collaborator

After a small interview with @mreichelt, here are some crucial points that I believe weren't raised before but are relevant to real-world projects that are large enough:

  • Even in the ideal world where all software is efficient, in addition to stress tests and enumeration tests, in large codebases, there are many wide-scale integration tests that naturally take a long time. Sometimes it's possible to separate the initialization of services into a @BeforeTest function, but not always. There's no clear way to make such tests run faster.
  • Inexperienced developers produce heavily inefficient test code that nonetheless does check what it needs to check. Poorly written tests can take 4-5 seconds under normal circumstances and look fine during code review. Inherently, these tests should run much faster, but investing time in not writing them in a maximally naive manner is often not worth it.
  • 10 seconds is a lot of time. Because of that, most tests, even suboptimal or inherently slow, won't encounter a timeout during development. So, most tests will pass the code review and the initial CI without manually setting a larger timeout even when it's needed. However, the test actually is slow, even though the 10-second timeout didn't demonstrate it during development, so during load, it will annoyingly resurface. If the timeout was 2 seconds, an inefficient test would be noticeable from the start.

I think these are very solid arguments for restoring the 60-second timeout and providing a well-established way to configure it globally. Let's consider two groups of projects:

  • The projects with excellent code quality standards. Deeply technical, infrastructural work does need to follow all best practices. In such teams, the tests will be meaningful, well-isolated, clear, and to the point, and will naturally either execute fast or be known in advance not to execute fast. Such teams reap the benefits of having such high standards by setting the timeout to something small and not wasting the build agent's time needlessly when something goes wrong; they'll be able to catch significant performance regressions and in general, tightly fit the timeout to what they realistically can ensure.
  • The typical projects. In a typical project that can't afford resource-wise to keep all code excellent, if a test fails in the CI but not on the developers' machines due to a timeout, one of the following will likely happen:
    • The test gets fixed. This is unlikely, but it can happen. This is the good scenario.
    • A developer researches the topic of timeouts in kotlinx-coroutines-test, finds the global property, messes around with the build system and hopefully succeeds in setting the timeout to something more appropriate to their project. All of this is in the 20 minutes that the developer allocated to work on this.
    • The timeout of this specific test will be set to a large number. The codebase will grow full of arbitrary numbers passed to runTest that no one will be able to explain later.
    • The flaky test will just get ignored, occasionally painting the build red.

From this perspective, I get where the "cumulative developer-years" figure came from and now am inclined to agree.

If 10 seconds is a timeout that's prone to letting most tests pass most of the time (as opposed to 2 seconds, which won't let most slow tests pass even on the developer's machine, or 60 seconds, which is reserved only for outliers), many typical projects can be negatively affected by us choosing it. On the other hand, the excellent-engineering-quality teams won't significantly benefit from any specific timeout that we'd set if we provide a way to configure it, as they will typically do their homework, as with everything, and will be able to tinker with the setting.

Huge kudos to @mreichelt for getting us out of this stalemate!

@JavierSegoviaCordoba
Copy link

Another argument as I shared before, you can't run parameterized tests in Kotlin Multiplatform. One solution is using Kotest Property testing, which means you can run N tests "in 1 test".

Running N iterations was working perfectly on JVM for 10 seconds, but it was crashing in native or JS. I think it was on Windows.

Obviously, there is no golden value for all iterations as you can set 1,000,000 iterations, but for the default standards (1,000) it was working right with 60 secs.

dkhalanskyjb added a commit that referenced this issue Nov 20, 2023
This commit attempts to fix #3800.
The first part of the fix, reverting the timeout to 60 seconds, is
successful.
The second part, allowing global configuration, not so much.

On JVM, Native, and Node JS, it's possible to use environment
variables to communicate data to the process, and it could in
theory be the solution, but it doesn't seem to interoperate well
with Gradle.

The best attempt so far was to use this:
```kotlin
tasks.withType(AbstractTestTask::class).all {
    if (this is ProcessForkOptions) {
        environment("kotlinx.coroutines.test.default_timeout", "1ms")
    }
}
```

Unfortunately, only `jvmTest` implements `ProcessForkOptions`.

Without a clear way to configure the `runTest` timeout in Gradle
builds, we can't claim that this is a proper solution.
@mreichelt
Copy link

Small note: thanks to the tip by @dkhalanskyjb to force kotlinx-coroutines-test:1.6.4 in our dependencies we were able to temporarily fix our issues with a flaky test suite and still be able to use coroutines 1.7.x. This is a good workaround until we can move to a proper solution, and allows us to have a look at the slow tests on our own time. 👍

@gildor
Copy link
Contributor

gildor commented Nov 23, 2023

@dkhalanskyjb Thanks for your last comment. It looks very reasonable to me

I would also add a bit of context about "The test gets fixed," I'm fine if a test fails, if it takes too long, or if it is an issue of the test itself (so code of the test), but from the fact that tests often rely on the test environment (classpath, additional services, the app initialization logic, mocking framework and so on).
One particular test rarely can fix it
Considering that the test execution order is not deterministic, it makes warmup cumbersome and error-prone. Of course, it can be fixed by encapsulating all those potential slow init to some abstraction with init memorization, but it is quite a significant technical and API challenge. This is why I see the need for a global baseline for test timeout as crucial to support a wide range of projects with different needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants