Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connection: expose setters for self-identity STARTUP options #1039

Merged
merged 13 commits into from
Aug 3, 2024

Conversation

wprzytula
Copy link
Collaborator

Resolves: #1028

Motivation

See #1028.

What's done

Preparatory steps

Reduce allocations by using Cow<str> for SUPPORTED/STARTUP options

  • types::write_string_map was genericised to accept HashMaps with keys and values being any form of strings, including Cow<str>.
  • as majority of keys and significant number of values of STARTUP options are known at compile time and thus statically allocated, Startup request now holds HashMap<Cow<str>, Cow<str> instead of HashMap<String, String> to reduce needless allocations.

refactors for readability

  • setup_tcp_keepalive() was extracted from Connection::new() to reduce clutter.
  • keys of SUPPORTED/STARTUP options were extracted as constants to options module to have them in one place and avoid using plain string literals in code.
  • resolved a dated TODO about whether logic in open_named_connection should be part of Connection::new - answered the question, justified the answer and added docstrings to convey that answer.
  • added many comments to open_named_connection, as the function is big and crucial for driver operation, so future readers will benefit.

Compression-related small enhancements

  • Compression name is now no longer allocated, as its name is a &'static str anyway.
  • warning is now emitted if requested Compression is not supported; before, it would silently fall back to no compression.

SelfIdentity struct and API

  • grouped self-identity-related STARTUP options in SelfIdentity struct. It includes:

    • DRIVER_NAME
    • DRIVER_VERSION
    • APPLICATION_NAME
    • APPLICATION_VERSION
    • CLIENT_ID

    DRIVER_NAME and DRIVER_VERSION are set by default to "ScyllaDB Rust Driver" (previously "scylla-rust-driver", changed to be consistent with "ScyllaDB Python Driver") and Rust driver's package version as declared by Cargo. The remaining ones are unset by default.

  • exposed SelfIdentity's API in SessionBuilder, and added a docstring with an example usage.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

Sorry, something went wrong.

@wprzytula wprzytula requested review from Lorak-mmk and muzarski and removed request for Lorak-mmk July 10, 2024 15:55
@wprzytula wprzytula added enhancement New feature or request cpp-rust-driver-p0 Functionality required by cpp-rust-driver labels Jul 10, 2024
@wprzytula wprzytula self-assigned this Jul 10, 2024
Copy link

github-actions bot commented Jul 10, 2024

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: 9aadac8

@Lorak-mmk Lorak-mmk self-assigned this Jul 10, 2024
@wprzytula wprzytula force-pushed the self-identity-customisation branch from 6ab7774 to bffad0d Compare July 10, 2024 16:10
Copy link
Contributor

@muzarski muzarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on adding comments to connection module. I always feel like I'm lost trying to understand the logic that resides there.

The commit message mentions:

Settable identity include:
- DRIVER_NAME
- DRIVER_VERSION
- APPLICATION_NAME
- APPLICATION_VERSION
- CLIENT_ID

All five are visible in Cassandra's `system_views.clients` in
`client_options` column.

DRIVER_NAME & DRIVER_VERSION are visible in ScyllaDB's `system.clients`.

Maybe we should add some test cases that verify it? Especially for the newly introduced STARTUP options.

@Lorak-mmk
Copy link
Collaborator

Where does APPLICATION_NAME and APPLICATION_VERSION come from?
PR description reads like those attributes are as supported and known as DRIVER_NAME / DRIVER_VERSION, but:

Please explain in the PR description (and maybe commit message too):

  • Why support those specific keys instead of arbitrary keys
  • The difference in support between driver variables and application variables
  • Where those application variables come from.

scylla-cql/src/frame/request/options.rs Outdated Show resolved Hide resolved
Comment on lines +1620 to +1795
tracing::warn!(
"Requested compression <{}> is not supported by the cluster. Falling back to no compression",
compression_str
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when can this happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may happen if SUPPORTED message sent by a node does not contain the particular compression string in the value for the "COMPRESSION" key.

scylla/src/transport/session_builder.rs Outdated Show resolved Hide resolved
scylla/src/transport/connection.rs Show resolved Hide resolved
scylla/src/transport/connection.rs Outdated Show resolved Hide resolved
@Lorak-mmk Lorak-mmk removed their assignment Jul 31, 2024
This enables serializing maps with non-necessarily owned kinds of
string, e.g. HashMap<&str, &str> or HashMap<Cow<str>, Cow<str>>.
Majority of, if not all, keys sent in the string map in STARTUP message
by the driver are known in the compile time. Some values may not be
known at compile time, but still it makes more sense to borrow them from
a String rather than clone each time a connection is opened. To reduce
needless allocations while preserving ability to add keys and values
dynamically, the HashMap type is changed to
`HashMap<Cow<'a, str>, Cow<'a, str>>`.
This is going to reduce number of allocations upon opening each
connection.
The tcp keepalive logic cluttered Connection::new(). I extracted it to
a separate function.
@wprzytula wprzytula force-pushed the self-identity-customisation branch from bffad0d to a8ab3f3 Compare August 1, 2024 10:03
@wprzytula
Copy link
Collaborator Author

Rebased on main.

As these strings bear strict meaning specified by the CQL protocol,
it makes sense to have them in one place as constants.
Before, `scylla-rust-driver` name was used. First, we should move to
to ScyllaDB because of the established naming. Second, Python driver
advertises itself as "ScyllaDB Python Driver", so let's be consistent.
Answering the question there: no, that logic shouldn't be in
`Connection::new`. `Connection::new`'s responsibility is to create a
connection and make it ready to send CQL frames. OTOH,
`open_named_connection` is responsible for connection setup on CQL
level, i.e. for OPTIONS/STARTUP handshake and registering for events
(if this is a control connection).

Informative docstrings are added that convey the above separation.
Compression's name is known in compile time, so let's not allocate it
needlessly.
Before, the driver would silently fall back to no compression.
The function is quite big and at the same time crucial for driver's
operation, so comments there will bring more ease to future readers.
@wprzytula
Copy link
Collaborator Author

Where does APPLICATION_NAME and APPLICATION_VERSION come from? PR description reads like those attributes are as supported and known as DRIVER_NAME / DRIVER_VERSION, but:

* there are no columns in system.clients dedicated to those attributes

* `APPLICATION_NAME` / `APPLICATION_VERSION` are not present in Scylla's source code at all

* only 2 results in Cassandra source is some example in docs: https://github.com/apache/cassandra/blob/d3cbf9c1f72057d2a5da9df8ed567d20cd272931/doc/modules/cassandra/pages/managing/operating/virtualtables.adoc?plain=1#L218 . `APPLICATION_NAME` and `APPLICATION_VERSION` appears in `client_options` which IIUC is an arbitrary dict where client can send any keys.

* driver variables are mentioned in protocol v5 (https://github.com/apache/cassandra/blob/d3cbf9c1f72057d2a5da9df8ed567d20cd272931/doc/native_protocol_v5.spec#L480 ), application variables are not.

Please explain in the PR description (and maybe commit message too):

* Why support those specific keys instead of arbitrary keys

* The difference in support between driver variables and application variables

* Where those application variables come from.

They come directly from CPP driver. See https://github.com/scylladb/cpp-driver/blob/fa0f27069a625057984d1fa58f434ea99b86c83f/include/cassandra.h#L2916.
As we want to support as big subset of its API as possible in cpp-rust-driver, I decided to expose API for setting those particular key-value pairs, similarly to what cpp-driver does, and not an API to set arbitrary key-value pairs.

Allowing users to set arbitrary options could break the driver by overwriting options that bear special meaning, e.g. the shard-aware port. Therefore, I'm against such liberal API. OTOH, we need to expose APPLICATION_NAME, APPLICATION_VERSION and CLIENT_ID for cpp-rust-driver.

Does this convince you @Lorak-mmk?

@Lorak-mmk
Copy link
Collaborator

Where does APPLICATION_NAME and APPLICATION_VERSION come from? PR description reads like those attributes are as supported and known as DRIVER_NAME / DRIVER_VERSION, but:

* there are no columns in system.clients dedicated to those attributes

* `APPLICATION_NAME` / `APPLICATION_VERSION` are not present in Scylla's source code at all

* only 2 results in Cassandra source is some example in docs: https://github.com/apache/cassandra/blob/d3cbf9c1f72057d2a5da9df8ed567d20cd272931/doc/modules/cassandra/pages/managing/operating/virtualtables.adoc?plain=1#L218 . `APPLICATION_NAME` and `APPLICATION_VERSION` appears in `client_options` which IIUC is an arbitrary dict where client can send any keys.

* driver variables are mentioned in protocol v5 (https://github.com/apache/cassandra/blob/d3cbf9c1f72057d2a5da9df8ed567d20cd272931/doc/native_protocol_v5.spec#L480 ), application variables are not.

Please explain in the PR description (and maybe commit message too):

* Why support those specific keys instead of arbitrary keys

* The difference in support between driver variables and application variables

* Where those application variables come from.

They come directly from CPP driver. See https://github.com/scylladb/cpp-driver/blob/fa0f27069a625057984d1fa58f434ea99b86c83f/include/cassandra.h#L2916. As we want to support as big subset of its API as possible in cpp-rust-driver, I decided to expose API for setting those particular key-value pairs, similarly to what cpp-driver does, and not an API to set arbitrary key-value pairs.

Allowing users to set arbitrary options could break the driver by overwriting options that bear special meaning, e.g. the shard-aware port. Therefore, I'm against such liberal API. OTOH, we need to expose APPLICATION_NAME, APPLICATION_VERSION and CLIENT_ID for cpp-rust-driver.

Does this convince you @Lorak-mmk?

It does. Please include this explanation in commit message and in a comment.

@wprzytula wprzytula force-pushed the self-identity-customisation branch from a8ab3f3 to 0ea5a4e Compare August 1, 2024 11:44
SelfIdentity is intended to group STARTUP options that can be used
to uniquely identify driver, application and particular client instance.

DRIVER_NAME and DRIVER_VERSION were already present, but they were
hardcoded to "scylla-rust-driver" and current crate version, retrieved
from Cargo. Now it is possible to set custom values for them, which
can be useful for custom driver builds, as well as for drivers running
on top of Rust driver (think cpp-rust-driver).

The remaining options are inspired by cpp-driver, and added for
cpp-rust-driver purposes as well:
- APPLICATION_NAME and APPLICATION_VERSION identify a single build of
an application using the driver.
- CLIENT_ID uniquely identifies a single instance of an application.

All the above information are visible in Cassandra in
`system_views.clients` in `client_options` column.
DRIVER_NAME and DRIVER_VERSION are visible in ScyllaDB in
`system.clients`.

FAQ
Q: Where do APPLICATION_NAME, APPLICATION_VERSION and CLIENT_ID come from?
- there are no columns in system.clients dedicated to those attributes,
- APPLICATION_NAME / APPLICATION_VERSION are not present in Scylla's
  source code at all,
- only 2 results in Cassandra source is some example in docs;
  APPLICATION_NAME and APPLICATION_VERSION appears in client_options
  which is an arbitrary dict where client can send any keys.
- driver variables are mentioned in protocol v5, application variables
  are not.

A:
The following options are not exposed anywhere in Scylla tables.
They come directly from CPP driver, and they are supported in Cassandra.
As we want to support as big subset of its API as possible in
cpp-rust-driver, I decided to expose API for setting those particular
key-value pairs, similarly to what cpp-driver does, and not an API
to set arbitrary key-value pairs. Allowing users to set arbitrary
options could break the driver by overwriting options that bear special
meaning, e.g. the shard-aware port. Therefore, I'm against such liberal
API. OTOH, we need to expose APPLICATION_NAME, APPLICATION_VERSION
and CLIENT_ID for cpp-rust-driver.
The exposed option enabled setting custom identity that is advertised
in STARTUP message by the driver, upon opening each connection.

Settable identity include:
- DRIVER_NAME
- DRIVER_VERSION
- APPLICATION_NAME
- APPLICATION_VERSION
- CLIENT_ID

All five are visible in Cassandra's `system_views.clients` in
`client_options` column.

DRIVER_NAME & DRIVER_VERSION are visible in ScyllaDB's `system.clients`.
@wprzytula wprzytula force-pushed the self-identity-customisation branch from 0ea5a4e to 9aadac8 Compare August 1, 2024 13:39
@wprzytula
Copy link
Collaborator Author

v2:

  • addressed review suggestions;
  • made SelfIdentity more convenient by accepting impl Into<Cow<...>> instead of only Cow<...>;
  • wrote a test using proxy that asserts proper self identity being set in STARTUP frame.

@wprzytula wprzytula requested review from Lorak-mmk and muzarski August 1, 2024 13:43
@wprzytula wprzytula merged commit 36d5edd into scylladb:main Aug 3, 2024
11 checks passed
@wprzytula wprzytula deleted the self-identity-customisation branch August 3, 2024 07:32
@wprzytula wprzytula changed the title connection: expose setters for STARTUP options connection: expose setters for self-identity STARTUP options Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp-rust-driver-p0 Functionality required by cpp-rust-driver enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose setters for STARTUP options
3 participants