Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: configurable number of digits in generated shard ranges #15744

Open
maxenglander opened this issue Apr 17, 2024 · 0 comments
Open

Comments

@maxenglander
Copy link
Collaborator

maxenglander commented Apr 17, 2024

Feature Description

key.GenerateShardRanges uses 2 hex digits in shard ranges when there are 256 or fewer shards.

vitess/go/vt/key/key.go

Lines 369 to 384 in f11de06

func GenerateShardRanges(shards int) ([]string, error) {
var format string
var maxShards int
switch {
case shards <= 0:
return nil, errors.New("shards must be greater than zero")
case shards <= 256:
format = "%02x"
maxShards = 256
case shards <= 65536:
format = "%04x"
maxShards = 65536
default:
return nil, errors.New("this function does not support more than 65336 shards in a single keyspace")
}

Add a new parameter to this method, or a new method, that allows users to configure the # of digits in a shard range.

If this FR is accepted, a follow-up FR will be to plumb this configurability in to planetscale/vitess-operator

Use Case(s)

When using non-power-of-2 # of shards, there is a degree of "lossiness" where the last shard is larger than it should be, and the rest are smaller than they should be.

vitess/go/vt/key/key.go

Lines 406 to 420 in f11de06

// If shards does not divide evenly into maxShards, then there is some lossiness,
// where each shard is smaller than it should technically be (if, for example, size == 25.6).
// If we choose to keep everything in ints, then we have two choices:
// - Have every shard in #numshards be a uniform size, tack on an additional shard
// at the end of the range to account for the loss. This is bad because if you ask for
// 7 shards, you'll actually get 7 uniform shards with 1 small shard, for 8 total shards.
// It's also bad because one shard will have much different data distribution than the rest.
// - Expand the final shard to include whatever is left in the keyrange. This will give the
// correct number of shards, which is good, but depending on how lossy each individual shard is,
// you could end with that final shard being significantly larger than the rest of the shards,
// so this doesn't solve the data distribution problem.
//
// By tracking the "real" end (both in the real number sense, and in the truthfulness of the value sense),
// we can re-truncate the integer end on each iteration, which spreads the lossiness more
// evenly across the shards.

By allowing users to configure the # of hex digits in their shard ranges, they can configure the degree of lossiness.

For example, generating a 5-shard cluster with 2-digit shard ranges results in: -33 33-66 66-99 99-cc cc-.

If we increase the shard digits for the same 5-shard cluster, we end up with: -3333 3333-6666 6666-9999 9999-cccc cccc-, which is less lossy.

@maxenglander maxenglander added Type: Feature Request Needs Triage This issue needs to be correctly labelled and triaged labels Apr 17, 2024
@maxenglander maxenglander changed the title Feature Request: configurable granularity of generated shard ranges Feature Request: configurable number of digits in generated shard ranges Apr 17, 2024
@rohit-nayak-ps rohit-nayak-ps added Component: Topology and removed Needs Triage This issue needs to be correctly labelled and triaged labels Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants