Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/common/util: Add possibility to set timeout for UCommand and UChild. Add rstest with timeout macro to dev dependencies #4231

Merged
merged 3 commits into from Dec 23, 2022

Conversation

Joining7943
Copy link
Contributor

This pr adds a builder method UCommand::timeout to set a timeout as Duration which is applied in UChild::wait_with_output and all dependent methods like UCommand:run, UChild::wait etc. The default timeout in UChild::kill can also be overwritten that way.

With rstest comes a timeout macro which can be used in test functions to complete the needs for setting timeouts in tests.

Closes #4226

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! I wonder what the default timeout should be. Most commands should not take more than a couple of seconds and ideally those have all timeouts, but we don't want intermittent failures of course. The None that you currently set as default is probably safest.

@sylvestre what do you think of this change and what the default timeout should be?

@sylvestre
Copy link
Sponsor Contributor

10 seconds ?
we can always increase it if it causes too much issues

@Joining7943
Copy link
Contributor Author

Any consent regarding the timeout? I can also just try different versions with 10s, 30s, 60s or similar in the ci and then let's see...

I also want to add, that the #[timeout(...)] from rstest currently reports wrong error messages if the timeout wasn't reached but a panic is triggered within the test function. I posted an issue there and we've fixed it. However, the fix isn't released, yet.

@tertsdiepraam
Copy link
Member

10 seconds sounds good. Great work on the fix in rstest!

@Joining7943
Copy link
Contributor Author

Looks like clippy got updated and there are a lot of new warnings but they are unrelated to this pr as far as I can see.

I'll summarize the test failures from this run https://github.com/uutils/coreutils/actions/runs/3706672957

Summary of the test failures because of a timeout:

  • linux, windows, macos
    test_factor::test_parallel
    test_factor::test_random_big
    test_sort::test_merge_batches

How to deal with this? Set the timeout to something higher in the affected tests or increase the default timeout?

Summary of the test failures because of stack overflow:

  • windows
    test_printf::sub_any_specifiers_after_first_param
    test_seq::test_hex_identifier_in_wrong_place

This is most likely due to the wait in a separate thread instead of the main thread. The default stack size is 2MB afaik. So there are two possibilities: Increase the stack size of the waiting thread or interprete it that way, that we should allocate the affected variables content on the heap instead of the stack and fix this in printf and seq

Other maybe related test failures:

  • macos
---- test_pr::test_with_stdin stdout ----
current_directory_resolved: 
open: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/.tmpnHi5ad/stdin.log
run: /Users/runner/work/coreutils/coreutils/target/debug/coreutils pr --pages=1:2 -n -
---- test_pr::test_with_stdin stderr ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: IndeterminateOffset', tests/by-util/test_pr.rs:45:51
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!

1 similar comment
@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!

@tertsdiepraam
Copy link
Member

How to deal with this? Set the timeout to something higher in the affected tests or increase the default timeout?

I'd say increase the timeout for these specific tests. Not sure about the other failures.

@Joining7943
Copy link
Contributor Author

I'd say increase the timeout for these specific tests. Not sure about the other failures.

Ok, I'll set a timeout of 30s for the tests, which are timing out and then let's see. But, maybe we should do both? I'm not sure if we're running into more intermittent timeout failures with the 10s timeout, especially on macos. macos in the ci is sometimes just slow and a simple echo can take 1-2s.

Regarding the stack size of the wait thread, I've increased it testwise to 8MB. I'm not sure, but I think ulimit -s shows the default stack size of the main thread on linux hosts and it's 8MB on my system.

Here's a quick update from the last run.

The following failures of this run https://github.com/uutils/coreutils/actions/runs/3707853030 add to the existing ones above

Timeout related failures

  • macos, windows
    test_install::test_install_and_strip
    test_sort::test_merge_batches

Other maybe related failures

  • macos
---- test_touch::test_touch_set_mdhms_time stdout ----
current_directory_resolved: 
run: /Users/runner/work/coreutils/coreutils/target/debug/coreutils touch -t 01011234.56 test_touch_set_mdhms_time
---- test_touch::test_touch_set_mdhms_time stderr ----
thread 'main' panicked at 'Error The system's UTC offset could not be determined retrieving the OffsetDateTime::now_local', tests/by-util/test_touch.rs:55:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- test_touch::test_touch_leap_second stdout ----
current_directory_resolved: 
run: /home/runner/work/coreutils/coreutils/target/debug/coreutils touch -t 197001010000.60 test_touch_leap_sec
---- test_touch::test_touch_leap_second stderr ----
thread 'main' panicked at 'Error The system's UTC offset could not be determined retrieving the OffsetDateTime::now_local', tests/by-util/test_touch.rs:55:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@tertsdiepraam
Copy link
Member

But, maybe we should do both?

Yeah if it causes intermittent failures then it should be increased.

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!
GNU test failed: tests/misc/timeout. tests/misc/timeout is passing on 'main'. Maybe you have to rebase?

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!

2 similar comments
@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!

@Joining7943
Copy link
Contributor Author

I'm pretty sure, that I could fix the intermittent stack overflow errors occurring on windows and the OffsetDateTime errors occuring on macos. I didn't join the wait thread with the main thread even though it would have been possible without errors. After changing to join where possible, there were no such errors anymore in the last 10 or so ci runs. I guess the errors disappeared because the join frees the resources held by the thread(s). Before the change, these errors occurred in almost every ci run, so I think we're on the safe side.

I've also increased the default timeout to 30s, because there were intermittent timeout failures with 10s.

I'm not sure how to deal with timeout being an Option. If there is no plan to disable the timeout entirely for example with a UCommand::disable_timeout, the timeout field could actually be a plain Duration, since we set a default timeout.

I had some additional ideas regarding runtimes of tests and commands (not necessarily meant to be implemented within this pr). It's sometimes hard to figure out how long a single run takes. So, we could record the runtime of each process in UChild and then for example

  • create kind of a report for tests how long the runs of each invoked UCommand::run, run_no_wait etc. took. This could be triggered for example via an env variable like UUTILS_TEST_SHOW_RUNTIMES for all tests or with a builder method in UCommand for single runs.
  • make assertions in CmdResult about the runtime of a single run. This could be useful in performance related tests or just helps figuring out how long a run takes and set the timeout appropriately.

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

…hods in UCommand and UChild.

Additional notes:
* Set default timeout to 30s
* Fix UChild::kill to always fail on timeout if zero
@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/misc/timeout is no longer failing!

@Joining7943
Copy link
Contributor Author

What do you think? The pr would be ready, so far.

@tertsdiepraam
Copy link
Member

Yeah looks good! I'm rerunning the android job to see whether it can succeed, but regardless of that result we can merge this in my opinion.

@tertsdiepraam tertsdiepraam merged commit bdd6f56 into uutils:main Dec 23, 2022
@Joining7943 Joining7943 deleted the add-timeout-in-tests branch January 2, 2023 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tests: Add possibility to set timeout for tests.
3 participants