Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export key rocksdb metrics via nodectrl /metrics endpoint #1073

Closed
wants to merge 1 commit into from

Conversation

AhmedSoliman
Copy link
Contributor

@AhmedSoliman AhmedSoliman commented Jan 15, 2024

Export key rocksdb metrics via nodectrl /metrics endpoint

This exports key tickers, histograms, and properties from rocksdb through the /metrics endpoint. The exported metrics are reported in prometheus exposition format.
Additionally, another http endpoint /rocksdb-stats that returns the raw rocksdb statistics output (can be useful in performance investigations).

Note that rocksdb metric reporting is only triggered if the scraping endpoint is queried.

The change requires changes in rust-rocksdb to support pulling individual tickers and histograms out of rocksdb. The change is proposed in rust-rocksdb/rust-rocksdb#853
Additionally, the rust-rocksdb interface doesn't make it easy to get memory usage information when we use optimistic transactional database (which we do), for that, another change in rust-rocksdb/rust-rocksdb#854 is needed to get access to memory usage information.

Both changes are merged into a restate branch at https://github.com/restatedev/rust-rocksdb/tree/next. I updated Cargo to use that until changes are merged upstream.

Test Plan:

> http localhost:5122/metrics | head -n 10



# TYPE invoker_invocation_task_started_total counter
invoker_invocation_task_started_total{rpc_service="CheckoutProcess"} 12

# TYPE invoker_invocation_task_failed_total counter
invoker_invocation_task_failed_total{rpc_service="CheckoutProcess",transient="true"} 12

# TYPE invoker_inflight_invocations_total gauge
invoker_inflight_invocations_total{rpc_service="CheckoutProcess"} 0

# TYPE rocksdb_memtable_miss_total counter
rocksdb_memtable_miss_total 2066

# TYPE rocksdb_bytes_read_total counter
rocksdb_bytes_read_total 1736

# TYPE rocksdb_bytes_written_total counter
rocksdb_bytes_written_total 0

# TYPE rocksdb_db_get_seconds summary
rocksdb_db_get_seconds{quantile="0.5"} 0.000006297482837528604
rocksdb_db_get_seconds{quantile="0.95"} 0.000019823636363636356
rocksdb_db_get_seconds{quantile="0.99"} 0.00012004999999999938
rocksdb_db_get_seconds{quantile="1.0"} 0.000667
rocksdb_db_get_seconds_sum 0.022774
rocksdb_db_get_seconds_count 2066

# TYPE rocksdb_db_write_seconds summary
rocksdb_db_write_seconds{quantile="0.5"} 0
rocksdb_db_write_seconds{quantile="0.95"} 0
rocksdb_db_write_seconds{quantile="0.99"} 0
rocksdb_db_write_seconds{quantile="1.0"} 0
rocksdb_db_write_seconds_sum 0
rocksdb_db_write_seconds_count 0

# TYPE rocksdb_db_seek_seconds summary
rocksdb_db_seek_seconds{quantile="0.5"} 0.000007813397129186603
rocksdb_db_seek_seconds{quantile="0.95"} 0.00002298974358974355
rocksdb_db_seek_seconds{quantile="0.99"} 0.000049989444444444286
rocksdb_db_seek_seconds{quantile="1.0"} 0.000573
rocksdb_db_seek_seconds_sum 0.032024
rocksdb_db_seek_seconds_count 3107

# TYPE rocksdb_db_multiget_seconds summary
rocksdb_db_multiget_seconds{quantile="0.5"} 0
rocksdb_db_multiget_seconds{quantile="0.95"} 0
rocksdb_db_multiget_seconds{quantile="0.99"} 0
rocksdb_db_multiget_seconds{quantile="1.0"} 0
rocksdb_db_multiget_seconds_sum 0
rocksdb_db_multiget_seconds_count 0

# TYPE rocksdb_bytes_per_write_bytes summary
rocksdb_bytes_per_write_bytes{quantile="0.5"} 0
rocksdb_bytes_per_write_bytes{quantile="0.95"} 0
rocksdb_bytes_per_write_bytes{quantile="0.99"} 0
rocksdb_bytes_per_write_bytes{quantile="1.0"} 0
rocksdb_bytes_per_write_bytes_sum 0
rocksdb_bytes_per_write_bytes_count 0

# TYPE rocksdb_bytes_per_read_bytes summary
rocksdb_bytes_per_read_bytes{quantile="0.5"} 0.5048899755501223
rocksdb_bytes_per_read_bytes{quantile="0.95"} 0.9592909535452323
rocksdb_bytes_per_read_bytes{quantile="0.99"} 0.999682151589242
rocksdb_bytes_per_read_bytes{quantile="1.0"} 106
rocksdb_bytes_per_read_bytes_sum 1736
rocksdb_bytes_per_read_bytes_count 2065

# TYPE rocksdb_num_immutable_mem_table_count gauge
rocksdb_num_immutable_mem_table_count 0

# TYPE rocksdb_mem_table_flush_pending_count gauge
rocksdb_mem_table_flush_pending_count 0

# TYPE rocksdb_compaction_pending_count gauge
rocksdb_compaction_pending_count 0

# TYPE rocksdb_background_errors_count gauge
rocksdb_background_errors_count 0

# TYPE rocksdb_cur_size_active_mem_table_bytes gauge
rocksdb_cur_size_active_mem_table_bytes 2048

# TYPE rocksdb_cur_size_all_mem_tables_bytes gauge
rocksdb_cur_size_all_mem_tables_bytes 2048

# TYPE rocksdb_size_all_mem_tables_bytes gauge
rocksdb_size_all_mem_tables_bytes 2048

# TYPE rocksdb_num_entries_active_mem_table_count gauge
rocksdb_num_entries_active_mem_table_count 0

# TYPE rocksdb_num_entries_imm_mem_tables_count gauge
rocksdb_num_entries_imm_mem_tables_count 0

# TYPE rocksdb_num_deletes_active_mem_table_count gauge
rocksdb_num_deletes_active_mem_table_count 0

# TYPE rocksdb_num_deletes_imm_mem_tables_count gauge
rocksdb_num_deletes_imm_mem_tables_count 0

# TYPE rocksdb_estimate_num_keys_count gauge
rocksdb_estimate_num_keys_count 0

# TYPE rocksdb_estimate_table_readers_mem_bytes gauge
rocksdb_estimate_table_readers_mem_bytes 0

# TYPE rocksdb_num_live_versions_count gauge
rocksdb_num_live_versions_count 1

# TYPE rocksdb_estimate_live_data_size_bytes gauge
rocksdb_estimate_live_data_size_bytes 0

# TYPE rocksdb_min_log_number_to_keep_count gauge
rocksdb_min_log_number_to_keep_count 240

# TYPE rocksdb_live_sst_files_size_bytes gauge
rocksdb_live_sst_files_size_bytes 0

# TYPE rocksdb_estimate_pending_compaction_bytes_bytes gauge
rocksdb_estimate_pending_compaction_bytes_bytes 0

# TYPE rocksdb_num_running_compactions_count gauge
rocksdb_num_running_compactions_count 0

# TYPE rocksdb_actual_delayed_write_rate_count gauge
rocksdb_actual_delayed_write_rate_count 0

# TYPE rocksdb_block_cache_capacity_bytes gauge
rocksdb_block_cache_capacity_bytes 33554432

# TYPE rocksdb_block_cache_usage_bytes gauge
rocksdb_block_cache_usage_bytes 87

# TYPE rocksdb_block_cache_pinned_usage_bytes gauge
rocksdb_block_cache_pinned_usage_bytes 87

# TYPE rocksdb_num_files_at_level0_count gauge
rocksdb_num_files_at_level0_count 0

# TYPE rocksdb_num_files_at_level1_count gauge
rocksdb_num_files_at_level1_count 0

# TYPE rocksdb_num_files_at_level2_count gauge
rocksdb_num_files_at_level2_count 0

# TYPE rocksdb_memory_approximate_cache_bytes gauge
rocksdb_memory_approximate_cache_bytes 87

# TYPE rocksdb_memory_approx_memtable_bytes gauge
rocksdb_memory_approx_memtable_bytes 18432

# TYPE rocksdb_memory_approx_memtable_unflushed_bytes gauge
rocksdb_memory_approx_memtable_unflushed_bytes 18432

# TYPE rocksdb_memory_approx_memtable_readers_bytes gauge
rocksdb_memory_approx_memtable_readers_bytes 8863


Stack created with Sapling. Best reviewed with ReviewStack.

@AhmedSoliman AhmedSoliman marked this pull request as ready for review January 15, 2024 16:08
@AhmedSoliman
Copy link
Contributor Author

This fixes #63

Copy link

github-actions bot commented Jan 15, 2024

Test Results

102 files  ±0  102 suites  ±0   11m 24s ⏱️ + 1m 23s
 93 tests ±0   93 ✅ ±0  0 💤 ±0  0 ❌ ±0 
232 runs  ±0  232 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 67d4e63. ± Comparison against base commit 6d2469c.

♻️ This comment has been updated with latest results.

@AhmedSoliman
Copy link
Contributor Author

An example of how this can be used
image

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @AhmedSoliman. LGTM. +1 for merging.

crates/node-ctrl/src/handler.rs Outdated Show resolved Hide resolved
This exports key tickers, histograms, and properties from rocksdb through the /metrics endpoint. The exported metrics are reported in prometheus exposition format.
Additionally, another http endpoint `/rocksdb-stats` that returns the raw rocksdb statistics output (can be useful in performance investigations).

Note that rocksdb metric reporting is only triggered if the scraping endpoint is queried.


The change requires changes in rust-rocksdb to support pulling individual tickers and histograms out of rocksdb. The change is proposed in rust-rocksdb/rust-rocksdb#853
Additionally, the rust-rocksdb interface doesn't make it easy to get memory usage information when we use optimistic transactional database (which we do), for that, another change in rust-rocksdb/rust-rocksdb#854 is needed to get access to memory usage information.

Both changes are merged into a `restate` branch at https://github.com/restatedev/rust-rocksdb/tree/next. I updated Cargo to use that until changes are merged upstream.

Test Plan:

```
> http localhost:5122/metrics | head -n 10



# TYPE invoker_invocation_task_started_total counter
invoker_invocation_task_started_total{rpc_service="CheckoutProcess"} 12

# TYPE invoker_invocation_task_failed_total counter
invoker_invocation_task_failed_total{rpc_service="CheckoutProcess",transient="true"} 12

# TYPE invoker_inflight_invocations_total gauge
invoker_inflight_invocations_total{rpc_service="CheckoutProcess"} 0

# TYPE rocksdb_memtable_miss_total counter
rocksdb_memtable_miss_total 2066

# TYPE rocksdb_bytes_read_total counter
rocksdb_bytes_read_total 1736

# TYPE rocksdb_bytes_written_total counter
rocksdb_bytes_written_total 0

# TYPE rocksdb_db_get_seconds summary
rocksdb_db_get_seconds{quantile="0.5"} 0.000006297482837528604
rocksdb_db_get_seconds{quantile="0.95"} 0.000019823636363636356
rocksdb_db_get_seconds{quantile="0.99"} 0.00012004999999999938
rocksdb_db_get_seconds{quantile="1.0"} 0.000667
rocksdb_db_get_seconds_sum 0.022774
rocksdb_db_get_seconds_count 2066

# TYPE rocksdb_db_write_seconds summary
rocksdb_db_write_seconds{quantile="0.5"} 0
rocksdb_db_write_seconds{quantile="0.95"} 0
rocksdb_db_write_seconds{quantile="0.99"} 0
rocksdb_db_write_seconds{quantile="1.0"} 0
rocksdb_db_write_seconds_sum 0
rocksdb_db_write_seconds_count 0

# TYPE rocksdb_db_seek_seconds summary
rocksdb_db_seek_seconds{quantile="0.5"} 0.000007813397129186603
rocksdb_db_seek_seconds{quantile="0.95"} 0.00002298974358974355
rocksdb_db_seek_seconds{quantile="0.99"} 0.000049989444444444286
rocksdb_db_seek_seconds{quantile="1.0"} 0.000573
rocksdb_db_seek_seconds_sum 0.032024
rocksdb_db_seek_seconds_count 3107

# TYPE rocksdb_db_multiget_seconds summary
rocksdb_db_multiget_seconds{quantile="0.5"} 0
rocksdb_db_multiget_seconds{quantile="0.95"} 0
rocksdb_db_multiget_seconds{quantile="0.99"} 0
rocksdb_db_multiget_seconds{quantile="1.0"} 0
rocksdb_db_multiget_seconds_sum 0
rocksdb_db_multiget_seconds_count 0

# TYPE rocksdb_bytes_per_write_bytes summary
rocksdb_bytes_per_write_bytes{quantile="0.5"} 0
rocksdb_bytes_per_write_bytes{quantile="0.95"} 0
rocksdb_bytes_per_write_bytes{quantile="0.99"} 0
rocksdb_bytes_per_write_bytes{quantile="1.0"} 0
rocksdb_bytes_per_write_bytes_sum 0
rocksdb_bytes_per_write_bytes_count 0

# TYPE rocksdb_bytes_per_read_bytes summary
rocksdb_bytes_per_read_bytes{quantile="0.5"} 0.5048899755501223
rocksdb_bytes_per_read_bytes{quantile="0.95"} 0.9592909535452323
rocksdb_bytes_per_read_bytes{quantile="0.99"} 0.999682151589242
rocksdb_bytes_per_read_bytes{quantile="1.0"} 106
rocksdb_bytes_per_read_bytes_sum 1736
rocksdb_bytes_per_read_bytes_count 2065

# TYPE rocksdb_num_immutable_mem_table_count gauge
rocksdb_num_immutable_mem_table_count 0

# TYPE rocksdb_mem_table_flush_pending_count gauge
rocksdb_mem_table_flush_pending_count 0

# TYPE rocksdb_compaction_pending_count gauge
rocksdb_compaction_pending_count 0

# TYPE rocksdb_background_errors_count gauge
rocksdb_background_errors_count 0

# TYPE rocksdb_cur_size_active_mem_table_bytes gauge
rocksdb_cur_size_active_mem_table_bytes 2048

# TYPE rocksdb_cur_size_all_mem_tables_bytes gauge
rocksdb_cur_size_all_mem_tables_bytes 2048

# TYPE rocksdb_size_all_mem_tables_bytes gauge
rocksdb_size_all_mem_tables_bytes 2048

# TYPE rocksdb_num_entries_active_mem_table_count gauge
rocksdb_num_entries_active_mem_table_count 0

# TYPE rocksdb_num_entries_imm_mem_tables_count gauge
rocksdb_num_entries_imm_mem_tables_count 0

# TYPE rocksdb_num_deletes_active_mem_table_count gauge
rocksdb_num_deletes_active_mem_table_count 0

# TYPE rocksdb_num_deletes_imm_mem_tables_count gauge
rocksdb_num_deletes_imm_mem_tables_count 0

# TYPE rocksdb_estimate_num_keys_count gauge
rocksdb_estimate_num_keys_count 0

# TYPE rocksdb_estimate_table_readers_mem_bytes gauge
rocksdb_estimate_table_readers_mem_bytes 0

# TYPE rocksdb_num_live_versions_count gauge
rocksdb_num_live_versions_count 1

# TYPE rocksdb_estimate_live_data_size_bytes gauge
rocksdb_estimate_live_data_size_bytes 0

# TYPE rocksdb_min_log_number_to_keep_count gauge
rocksdb_min_log_number_to_keep_count 240

# TYPE rocksdb_live_sst_files_size_bytes gauge
rocksdb_live_sst_files_size_bytes 0

# TYPE rocksdb_estimate_pending_compaction_bytes_bytes gauge
rocksdb_estimate_pending_compaction_bytes_bytes 0

# TYPE rocksdb_num_running_compactions_count gauge
rocksdb_num_running_compactions_count 0

# TYPE rocksdb_actual_delayed_write_rate_count gauge
rocksdb_actual_delayed_write_rate_count 0

# TYPE rocksdb_block_cache_capacity_bytes gauge
rocksdb_block_cache_capacity_bytes 33554432

# TYPE rocksdb_block_cache_usage_bytes gauge
rocksdb_block_cache_usage_bytes 87

# TYPE rocksdb_block_cache_pinned_usage_bytes gauge
rocksdb_block_cache_pinned_usage_bytes 87

# TYPE rocksdb_num_files_at_level0_count gauge
rocksdb_num_files_at_level0_count 0

# TYPE rocksdb_num_files_at_level1_count gauge
rocksdb_num_files_at_level1_count 0

# TYPE rocksdb_num_files_at_level2_count gauge
rocksdb_num_files_at_level2_count 0

# TYPE rocksdb_memory_approximate_cache_bytes gauge
rocksdb_memory_approximate_cache_bytes 87

# TYPE rocksdb_memory_approx_memtable_bytes gauge
rocksdb_memory_approx_memtable_bytes 18432

# TYPE rocksdb_memory_approx_memtable_unflushed_bytes gauge
rocksdb_memory_approx_memtable_unflushed_bytes 18432

# TYPE rocksdb_memory_approx_memtable_readers_bytes gauge
rocksdb_memory_approx_memtable_readers_bytes 8863

```
@tillrohrmann
Copy link
Contributor

Not a Grafana expert, is it possible to export Dashboards and provide them to interested users/colleagues to get started? I believe your dashboard could be a good start @AhmedSoliman.

@AhmedSoliman
Copy link
Contributor Author

Merged manually.

@AhmedSoliman AhmedSoliman deleted the pr1073 branch February 27, 2024 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants