Support for running criterion benches as tests? #96

jszwedko · 2022-03-04T21:48:18Z

Hi all!

I'm a big fan of what this project is doing.

I noticed when trying to integrate this into https://github.com/vectordotdev/vector that it fails to run test binaries built from criterion benchmarks which don't support the same --format flag that normal test binaries support:

cargo nextest run --workspace --no-fail-fast --no-default-features --features "default metrics-benches codecs-benches language-benches remap-benches statistic-benches dnstap-benches benches"
    Finished test [unoptimized + debuginfo] target(s) in 1.31s
error: Found argument '--format' which wasn't expected, or isn't valid in this context

USAGE:
    limit-2c27c6bee8522ca1 --list

For more information try --help
Error:
   0: error building test list
   1: running ''/Users/jesse.szwedko/workspace/vector/target/debug/deps/limit-2c27c6bee8522ca1 --list --format terse'' failed
   2: command ["/Users/jesse.szwedko/workspace/vector/target/debug/deps/limit-2c27c6bee8522ca1", "--list", "--format", "terse"] exited with code 1

Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
make: *** [test] Error 1

Running --help on the test binary:

Criterion Benchmark 

USAGE:
    limit-2c27c6bee8522ca1 [FLAGS] [OPTIONS] [FILTER]

FLAGS:
    -h, --help       Prints help information
        --list       List all benchmarks
    -n, --noplot     Disable plot and HTML generation.
    -v, --verbose    Print additional statistical information.

OPTIONS:
    -b, --baseline <baseline>                        Compare to a named baseline.
    -c, --color <color>
            Configure coloring of output. always = always colorize output, never = never colorize output, auto =
            colorize output if output is a tty and compiled for unix. [default: auto]  [possible values: auto, always,
            never]
        --confidence-level <confidence-level>        Changes the default confidence level for this run. [default: 0.95]
        --load-baseline <load-baseline>              Load a previous baseline instead of sampling new data.
        --measurement-time <measurement-time>        Changes the default measurement time for this run. [default: 5]
        --noise-threshold <noise-threshold>          Changes the default noise threshold for this run. [default: 0.01]
        --nresamples <nresamples>
            Changes the default number of resamples for this run. [default: 100000]

        --output-format <output-format>
            Change the CLI output format. By default, Criterion.rs will use its own format. If output format is set to
            'bencher', Criterion.rs will print output in a format that resembles the 'bencher' crate. [default:
            criterion]  [possible values: criterion, bencher]
        --plotting-backend <plotting-backend>
            Set the plotting backend. By default, Criterion.rs will use the gnuplot backend if gnuplot is available, or
            the plotters backend if it isn't. [possible values: gnuplot, plotters]
        --profile-time <profile-time>
            Iterate each benchmark for approximately the given number of seconds, doing no analysis and without storing
            the results. Useful for running the benchmarks in a profiler.
        --sample-size <sample-size>                  Changes the default size of the sample for this run. [default: 100]
    -s, --save-baseline <save-baseline>              Save results under a named baseline. [default: base]
        --significance-level <significance-level>
            Changes the default significance level for this run. [default: 0.05]

        --warm-up-time <warm-up-time>                Changes the default warm up time for this run. [default: 3]

ARGS:
    <FILTER>    Skip benchmarks whose names do not contain FILTER.


This executable is a Criterion.rs benchmark.
See https://github.com/bheisler/criterion.rs for more details.

To enable debug output, define the environment variable CRITERION_DEBUG.
Criterion.rs will output more debug information and will save the gnuplot
scripts alongside the generated plots.

To test that the benchmarks work, run `cargo test --benches`

NOTE: If you see an 'unrecognized option' error using any of the options above, see:
https://bheisler.github.io/criterion.rs/book/faq.html

I was just curious to get thoughts on handling this. Should I stick with normal cargo test --benches for that target for now and use nextest for the other targets?

The text was updated successfully, but these errors were encountered:

sunshowers · 2022-03-04T22:03:10Z

Thanks for reporting this! I think the easiest thing to do would be to make criterion support the --format terse option and output lines in the format nextest accepts: https://nexte.st/book/custom-test-harnesses.html

Note that following the standard Rust benchmark library, lines ending in ": bench" are ignored. We may want to add an option to run benchmarks as tests if that's a desired use case (is that what you want?)

jszwedko · 2022-03-04T22:25:16Z

Thanks for the response! That makes sense, I missed that section in the book. I'll open an issue over on criterion to see if they'd be open to adding those flags.

Note that following the standard Rust benchmark library, lines ending in ": bench" are ignored. We may want to add an option to run benchmarks as tests if that's a desired use case (is that what you want?)

Yeah, we have had cases where benches were broken accidentally so we started running them as tests in CI which seems to just run one benchmark iteration and lets us validate that no assertions failed (for criterion at least). That would be great if nextest supported them, but I can just run cargo test --benches -- --bench separately for now.

`nextest` doesn't work with criterion benches: nextest-rs/nextest#96 Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

sunshowers · 2022-03-04T22:32:03Z

There's also #38 which might be useful.

* chore: Try out nextest in CI Starting with flakey test retries. Signed-off-by: Jesse Szwedko <jesse@szwedko.me> * Split up benches and non-benches tests `nextest` doesn't work with criterion benches: nextest-rs/nextest#96 Signed-off-by: Jesse Szwedko <jesse@szwedko.me> * Avoid testing benches with nextest Since criterion doesn't support the flags mentioned on https://nexte.st/book/custom-test-harnesses.html Signed-off-by: Jesse Szwedko <jesse@szwedko.me> * Fix cargo nextest installation Signed-off-by: Jesse Szwedko <jesse@szwedko.me> * Run docs tests too since nextest doesn't support them Signed-off-by: Jesse Szwedko <jesse@szwedko.me> * Fix docs test Signed-off-by: Jesse Szwedko <jesse@szwedko.me> * Don't test benches since there isn't a way to test just them Signed-off-by: Jesse Szwedko <jesse@szwedko.me> * Install nextest for Windows Signed-off-by: Jesse Szwedko <jesse@szwedko.me> * Fix CLI test running Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

sunshowers · 2022-04-19T02:59:05Z

I think we can support this with a --run-benches option, similar to --run-ignored today.

sunshowers · 2022-04-19T21:55:05Z

I looked at this a bit more and it seems like criterion binaries built with cargo test don't appear to behave with --list:

/home/rain/dev/cargo-guppy/target/debug/deps/package_graph-4455225dc7640272 --list
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
Testing make_package_graph
Success

Testing depends_on
Success

Testing depends_on_cache
Success

Testing into_ids
Success

Testing resolve_package_name
Success

Testing make_package_name_hashmap
Success

Testing make_cycles
Success

I think this will need to be fixed in criterion first -- marking this "help wanted" in case someone wants to pick up the work in criterion. See https://nexte.st/book/custom-test-harnesses for more documentation (note that we also accept lines ending with : bench and currently skip them, but we can choose to not do that in the future.)

jszwedko · 2022-04-23T23:26:43Z

Thanks for following up @sunshowers ! I agree bheisler/criterion.rs#562 will need to be resolved first.

sunshowers · 2022-07-20T01:25:38Z

bheisler/criterion.rs#602 addresses this.

sunshowers · 2023-05-30T17:22:06Z

Criterion 0.5, which came out a few days ago, supports nextest.

jszwedko added a commit to vectordotdev/vector that referenced this issue Mar 4, 2022

Split up benches and non-benches tests

d3de3e0

`nextest` doesn't work with criterion benches: nextest-rs/nextest#96 Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

jszwedko added a commit to vectordotdev/vector that referenced this issue Mar 4, 2022

Split up benches and non-benches tests

c1ee754

`nextest` doesn't work with criterion benches: nextest-rs/nextest#96 Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

jszwedko mentioned this issue Mar 4, 2022

Support flags for cargo nextest? bheisler/criterion.rs#562

Open

sunshowers added the help wanted Extra attention is needed label Apr 19, 2022

alvicsam mentioned this issue Sep 27, 2022

[ci] Additional variables for benchmarks execution paritytech/jsonrpsee#884

Merged

sunshowers mentioned this issue May 10, 2023

Incorrect arguments passed to benchmarks #871

Closed

sunshowers removed the help wanted Extra attention is needed label May 10, 2023

sunshowers closed this as completed in 5e37642 May 30, 2023

Walther mentioned this issue Feb 4, 2024

Feature request: nextest support for divan nvzqz/divan#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for running criterion benches as tests? #96

Support for running criterion benches as tests? #96

jszwedko commented Mar 4, 2022

sunshowers commented Mar 4, 2022 •

edited

jszwedko commented Mar 4, 2022

sunshowers commented Mar 4, 2022

sunshowers commented Apr 19, 2022

sunshowers commented Apr 19, 2022 •

edited

jszwedko commented Apr 23, 2022

sunshowers commented Jul 20, 2022

sunshowers commented May 30, 2023

Support for running criterion benches as tests? #96

Support for running criterion benches as tests? #96

Comments

jszwedko commented Mar 4, 2022

sunshowers commented Mar 4, 2022 • edited

jszwedko commented Mar 4, 2022

sunshowers commented Mar 4, 2022

sunshowers commented Apr 19, 2022

sunshowers commented Apr 19, 2022 • edited

jszwedko commented Apr 23, 2022

sunshowers commented Jul 20, 2022

sunshowers commented May 30, 2023

sunshowers commented Mar 4, 2022 •

edited

sunshowers commented Apr 19, 2022 •

edited