Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
First iteration of streaming PromQL engine (#7693)
* Use forked Prometheus module * Use promql.QueryEngine rather than *promql.Engine, add config option to select engine * Add parallel query path with streaming engine. * Add IntelliJ debugging configuration for streaming querier. * Add Grafana datasource for streaming querier * Use correct query-scheduler port and enable debugging * Set up load test. * Fix p99 latency panel * Commit settings used in previous test. * Enable chunks streaming. * Disable load generator. * Create separate load testing query paths to ensure meta-monitoring queries don't interfere with load test results. * Add initial k6-based load test script * Add script to run load tests against both engines. * Add todo * Extract friendly name to hostname logic to separate script * Rename k6 output file * Check both engines produce the same result. * Summarise results after each test run. * Enable streaming chunks from store-gateways to queriers. * Refactor test script, print max and average CPU and memory after test * Reorder variables to match output order * Don't include query name in k6 output file name. * Add initial script to generate data. * Rename script * Add script to generate data generation config. * Make it easier to test different queries. * Rename PromQL engine CLI flag. * Add Thanos' engine. * Add Thanos engine to benchmarking setup. * Set default test duration to 5 minutes. * Rework data generation setup to work around limitation in prometheus-toolbox. * Load test data faster. * Record test config for first test * Record test config for second test * Record test config for third test * Disable gRPC compression for ingester client * Record test config for re-run of second test * Record test config for re-run of first test * Record test config for re-run of third test * Log URL used for test. * Record test config for fourth test * Record test config for fifth test * Record test config for 10k series test * Record test config for 1k series test * Record test config for 50k series test * Fix configuration issue that does not affect outcome of testing. This does not affect the outcome of testing because all tests use queriers directly (ie. do not use query-frontends or query-schedulers) * Record test config for seventh test * Move production code to main Mimir code tree * Copy test files into repo * Add license and provenance comments * Add a readme. * Update package path in scripts and remove unnecessary testing script * Add more notes to code * Explain benefit of streaming engine. * Mention benchmarking tools in readme. * Remove references to Thanos' engine. * Undo unnecessary go.mod changes * Remove benchmarking tool for now We can add something more robust in the future if we need it. * Remove docker-compose changes * Reset whitespace changes * Remove readme * Vendor in branch of mimir-prometheus with changes from prometheus/prometheus#13713 * Fix issues introduced with recent Prometheus changes and fix linting issues * Detect and reject configuration not supported by streaming engine * Remove TODOs tracked elsewhere * Bring in latest changes to `QueryEngine` interface * Remove more TODOs tracked elsewhere * Change signature of `Next()` to return an error when the stream is exhausted * Ignore warnings when comparing results from Prometheus and streaming engines. * Correct naming * Rename `Operator` to clarify that it produces instant vectors. * Rename test file * Reorganise test file * Bring in latest benchmark changes from upstream. * Run tests and benchmarks with isolation disabled, to mirror what Mimir itself does * Fix script permissions * Simplify test setup and don't bother testing 2000 series case to speed up tests. * Use assertion helpers in benchmark * Improve engine benchmark comparison script: fix issue with package name containing "streaming", clearly label results * Remove unnecessary files * Add note to engine benchmark comparison script * Make test names clearer * Ensure both engines produce the same result before benchmarking * Add tests for unsupported expressions and improve error messages returned when unsupported expression used. * Fix linting warnings * Add notes for invalid cases that should be caught by the PromQL parser. * Implement `Query.Statement()` * Give variable a better name * Remove Pool interface * Remove unnecessary types * Update comments * Remove unused lookback delta for range vector selector * Extract selector logic to a shared type * Add more TODOs * Use Selector for range vector selectors. * Compute values used on every call to Next() once in instant and range vector selectors. This gives a 0-7% latency improvement for queries with many series at small absolute cost to queries with few series. Memory consumption is slightly higher (tens of bytes), but not concerning. * Remove unnecessary script and update comment * Validate query time range * Enforce that range query expression produces a vector or scalar result. * Add CLI flag to experimental features list in docs * Add changelog entry * Fix linting warnings * Move common PromQL engine configuration to its own package * Don't bother with 10 series benchmarks. They're not very interesting, and the benchmarks take too long to run. * Sort imports * Add missing license header. * Use bucketed pool based on zeropool. This greatly improves latency and allocations, with latency improvements between 0 and 15% on our benchmarks. * Fix import sorting * Set upper limits for sizes of pooled slices. * Set size for SeriesBatch pool * Add integration test. * Move RingBuffer out of util package and pool slices used. This improves latency of rate-related benchmarks by between 0 and 11%. * Add tests for ring buffer type. * Return RingBuffer point slices to pool when no longer required. * Fix linting * Remove TODOs about pooling group slices and maps. Pooling these slices did not have a substantial performance impact. * Pool groups in aggregation operator. This reduces latency of aggregation operations by 0-2% in our benchmarks. * Remove TODOs about pooling label builders. Pooling these had a negative performance impact. * Remove more TODOs, clarify interface expectations * Release memory sooner after query execution * Fix typo in comment * Remove unecessary context cancellation checks from operators. Context cancellation is handled by the underlying queryable, at least for the PromQL expressions we currently support. * Fix linting warnings * Address remaining TODOs in Selector * Fix integration test flakiness * Don't panic when encountering a native histogram * Fix issue where bucketed pool could return a slice with capacity less than the requested size. * Remove redundant checks * Add support for @ in instant and range vector selectors. This is used extensively by Prometheus' test scripts, so it's easier to just implement support for this rather than try to workaround it or disable it. * Run Prometheus' test cases against streaming engine. There are still some failing test cases, but these appear to be genuine issues that need to be investigated. * Fix failing staleness test: don't emit empty series for instant query * Upgrade mimir-prometheus * Add test case for stale markers in a range query. * Add test cases for @ modifier with range queries * Check for errors while reading test script * Fix whitespace * Remove duplicate timestamp computation in Selector and operators * Update `rate` calculation to match behaviour in prometheus/prometheus#13725 * Add an overview of the engine. * Fix linting issues and clarify docs. * Add test case for scenario where input to aggregation returns no results or no points, and remove stale TODOs * Implement `Query.String()`, and remove stale TODOs. * Clarify BucketedPool contract, and add test case for case where min size is greater than 1. * Expand test coverage to include scenarios not covered by upstream tests or those uniquely likely to cause issues in the streaming engine. * Elaborate on comment * Expand example in docs and fix typo * Remove unnecessary sorting and use non-experimental slices package in comparison tests. * Add missing comments. * Clean up temporary files in benchmark comparison script, and improve formatting of output * Add note explaining purpose of script. * Address PR feedback * Replace use of `DurationMilliseconds` with `d.Milliseconds()` * Rename `Series()` to `SeriesMetadata()` * Include names of acceptable values in CLI flag description * Remove TODOs * Add explanation for RangeVectorSelectorWithTransformation * Move to top-level `streamingpromql` package * Move common query engine config back to original location (revert e3933a2)
- Loading branch information