Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for running criterion benches as tests? #96

Closed
jszwedko opened this issue Mar 4, 2022 · 8 comments
Closed

Support for running criterion benches as tests? #96

jszwedko opened this issue Mar 4, 2022 · 8 comments

Comments

@jszwedko
Copy link

jszwedko commented Mar 4, 2022

Hi all!

I'm a big fan of what this project is doing.

I noticed when trying to integrate this into https://github.com/vectordotdev/vector that it fails to run test binaries built from criterion benchmarks which don't support the same --format flag that normal test binaries support:

cargo nextest run --workspace --no-fail-fast --no-default-features --features "default metrics-benches codecs-benches language-benches remap-benches statistic-benches dnstap-benches benches"
    Finished test [unoptimized + debuginfo] target(s) in 1.31s
error: Found argument '--format' which wasn't expected, or isn't valid in this context

USAGE:
    limit-2c27c6bee8522ca1 --list

For more information try --help
Error:
   0: error building test list
   1: running ''/Users/jesse.szwedko/workspace/vector/target/debug/deps/limit-2c27c6bee8522ca1 --list --format terse'' failed
   2: command ["/Users/jesse.szwedko/workspace/vector/target/debug/deps/limit-2c27c6bee8522ca1", "--list", "--format", "terse"] exited with code 1

Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
make: *** [test] Error 1

Running --help on the test binary:

Criterion Benchmark 

USAGE:
    limit-2c27c6bee8522ca1 [FLAGS] [OPTIONS] [FILTER]

FLAGS:
    -h, --help       Prints help information
        --list       List all benchmarks
    -n, --noplot     Disable plot and HTML generation.
    -v, --verbose    Print additional statistical information.

OPTIONS:
    -b, --baseline <baseline>                        Compare to a named baseline.
    -c, --color <color>
            Configure coloring of output. always = always colorize output, never = never colorize output, auto =
            colorize output if output is a tty and compiled for unix. [default: auto]  [possible values: auto, always,
            never]
        --confidence-level <confidence-level>        Changes the default confidence level for this run. [default: 0.95]
        --load-baseline <load-baseline>              Load a previous baseline instead of sampling new data.
        --measurement-time <measurement-time>        Changes the default measurement time for this run. [default: 5]
        --noise-threshold <noise-threshold>          Changes the default noise threshold for this run. [default: 0.01]
        --nresamples <nresamples>
            Changes the default number of resamples for this run. [default: 100000]

        --output-format <output-format>
            Change the CLI output format. By default, Criterion.rs will use its own format. If output format is set to
            'bencher', Criterion.rs will print output in a format that resembles the 'bencher' crate. [default:
            criterion]  [possible values: criterion, bencher]
        --plotting-backend <plotting-backend>
            Set the plotting backend. By default, Criterion.rs will use the gnuplot backend if gnuplot is available, or
            the plotters backend if it isn't. [possible values: gnuplot, plotters]
        --profile-time <profile-time>
            Iterate each benchmark for approximately the given number of seconds, doing no analysis and without storing
            the results. Useful for running the benchmarks in a profiler.
        --sample-size <sample-size>                  Changes the default size of the sample for this run. [default: 100]
    -s, --save-baseline <save-baseline>              Save results under a named baseline. [default: base]
        --significance-level <significance-level>
            Changes the default significance level for this run. [default: 0.05]

        --warm-up-time <warm-up-time>                Changes the default warm up time for this run. [default: 3]

ARGS:
    <FILTER>    Skip benchmarks whose names do not contain FILTER.


This executable is a Criterion.rs benchmark.
See https://github.com/bheisler/criterion.rs for more details.

To enable debug output, define the environment variable CRITERION_DEBUG.
Criterion.rs will output more debug information and will save the gnuplot
scripts alongside the generated plots.

To test that the benchmarks work, run `cargo test --benches`

NOTE: If you see an 'unrecognized option' error using any of the options above, see:
https://bheisler.github.io/criterion.rs/book/faq.html

I was just curious to get thoughts on handling this. Should I stick with normal cargo test --benches for that target for now and use nextest for the other targets?

@sunshowers
Copy link
Member

sunshowers commented Mar 4, 2022

Thanks for reporting this! I think the easiest thing to do would be to make criterion support the --format terse option and output lines in the format nextest accepts: https://nexte.st/book/custom-test-harnesses.html

Note that following the standard Rust benchmark library, lines ending in ": bench" are ignored. We may want to add an option to run benchmarks as tests if that's a desired use case (is that what you want?)

@jszwedko
Copy link
Author

jszwedko commented Mar 4, 2022

Thanks for the response! That makes sense, I missed that section in the book. I'll open an issue over on criterion to see if they'd be open to adding those flags.

Note that following the standard Rust benchmark library, lines ending in ": bench" are ignored. We may want to add an option to run benchmarks as tests if that's a desired use case (is that what you want?)

Yeah, we have had cases where benches were broken accidentally so we started running them as tests in CI which seems to just run one benchmark iteration and lets us validate that no assertions failed (for criterion at least). That would be great if nextest supported them, but I can just run cargo test --benches -- --bench separately for now.

jszwedko added a commit to vectordotdev/vector that referenced this issue Mar 4, 2022
`nextest` doesn't work with criterion benches:

nextest-rs/nextest#96

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
jszwedko added a commit to vectordotdev/vector that referenced this issue Mar 4, 2022
`nextest` doesn't work with criterion benches:

nextest-rs/nextest#96

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@sunshowers
Copy link
Member

There's also #38 which might be useful.

jszwedko added a commit to vectordotdev/vector that referenced this issue Mar 8, 2022
* chore: Try out nextest in CI

Starting with flakey test retries.

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

* Split up benches and non-benches tests

`nextest` doesn't work with criterion benches:

nextest-rs/nextest#96

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

* Avoid testing benches with nextest

Since criterion doesn't support the flags mentioned on

https://nexte.st/book/custom-test-harnesses.html

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

* Fix cargo nextest installation

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

* Run docs tests too since nextest doesn't support them

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

* Fix docs test

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

* Don't test benches since there isn't a way to test just them

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

* Install nextest for Windows

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>

* Fix CLI test running

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@sunshowers
Copy link
Member

I think we can support this with a --run-benches option, similar to --run-ignored today.

@sunshowers
Copy link
Member

sunshowers commented Apr 19, 2022

I looked at this a bit more and it seems like criterion binaries built with cargo test don't appear to behave with --list:

/home/rain/dev/cargo-guppy/target/debug/deps/package_graph-4455225dc7640272 --list
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
Testing make_package_graph
Success

Testing depends_on
Success

Testing depends_on_cache
Success

Testing into_ids
Success

Testing resolve_package_name
Success

Testing make_package_name_hashmap
Success

Testing make_cycles
Success

I think this will need to be fixed in criterion first -- marking this "help wanted" in case someone wants to pick up the work in criterion. See https://nexte.st/book/custom-test-harnesses for more documentation (note that we also accept lines ending with : bench and currently skip them, but we can choose to not do that in the future.)

@sunshowers sunshowers added the help wanted Extra attention is needed label Apr 19, 2022
@jszwedko
Copy link
Author

Thanks for following up @sunshowers ! I agree bheisler/criterion.rs#562 will need to be resolved first.

@sunshowers
Copy link
Member

bheisler/criterion.rs#602 addresses this.

@sunshowers
Copy link
Member

Criterion 0.5, which came out a few days ago, supports nextest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants