Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Update VSchema not send version if there are multiple vtgates operating at the same time. #15794

Open
javamyth opened this issue Apr 25, 2024 · 5 comments
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)

Comments

@javamyth
Copy link

javamyth commented Apr 25, 2024

Overview of the Issue

hi,There is no version number passed here. There may be concurrency issues when updating VSchema if there are multiple vtgates operating at the same time.

20240425183642557
20240425184440471

Reproduction Steps

N/A

Binary Version

vtgate version Version: 18.0.4

Operating System and Environment details

N/A

Log Fragments

N/A
@javamyth javamyth added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Apr 25, 2024
@mattlord mattlord removed Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Apr 25, 2024
@mattlord
Copy link
Contributor

The vschema has a timestamp in it. But more importantly it's versioned in the topology server (e.g. etcd). I'm going to close this for now as it's not clear to me what bug is being called out. If I'm missing something please let us know and we can re-open this any time. Thanks!

@javamyth
Copy link
Author

The vschema has a timestamp in it. But more importantly it's versioned in the topology server (e.g. etcd). I'm going to close this for now as it's not clear to me what bug is being called out. If I'm missing something please let us know and we can re-open this any time. Thanks!

The UpdateVSchema method does not pass the version number and will be overwritten directly. This will cause it to be unsafe when multiple vtgate processes execute UpdateVSchema at the same time. Please take a look at the version of ts.globalCell.Update that passes nil, thank you.

@javamyth
Copy link
Author

javamyth commented Apr 26, 2024

What is passed here is null, not version. If the data of a certain node is updated, there will be security issues.
20240425184440471
20240425183642557
20240426092442173

@mattlord
Copy link
Contributor

Thanks! I think you are correct here and we’re not providing serializability (although we could via etcd key versions and revisions) so you could read an old version, update it, and save the updated old structure overwriting intermediate changes. Because there’s no synchronization or concurrency controls concurrent writes from multiple vtgates will have a non-deterministic result and we could lose updates. I’ll reopen and let the query serving team comment.

@mattlord mattlord reopened this Apr 26, 2024
@mattlord mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving labels Apr 26, 2024
@mattlord
Copy link
Contributor

mattlord commented May 6, 2024

We should be able to enforce serializable changes across vtgates by leveraging the topo key version. For an example of where we're doing this today, look at the topo Tablet records/keys and the TabletInfo struct used in memory — same for the topo Shard records/keys and the ShardInfo in memory structure. For example:

vitess/go/vt/topo/shard.go

Lines 165 to 222 in f27d287

// GetShard is a high level function to read shard data.
// It generates trace spans.
func (ts *Server) GetShard(ctx context.Context, keyspace, shard string) (*ShardInfo, error) {
if err := ValidateKeyspaceName(keyspace); err != nil {
return nil, err
}
if _, _, err := ValidateShardName(shard); err != nil {
return nil, err
}
span, ctx := trace.NewSpan(ctx, "TopoServer.GetShard")
span.Annotate("keyspace", keyspace)
span.Annotate("shard", shard)
defer span.Finish()
shardPath := shardFilePath(keyspace, shard)
data, version, err := ts.globalCell.Get(ctx, shardPath)
if err != nil {
return nil, err
}
value := &topodatapb.Shard{}
if err = value.UnmarshalVT(data); err != nil {
return nil, vterrors.Wrapf(err, "GetShard(%v,%v): bad shard data", keyspace, shard)
}
return NewShardInfo(keyspace, shard, value, version), nil
}
// updateShard updates the shard data, with the right version.
// It also creates a span, and dispatches the event.
func (ts *Server) updateShard(ctx context.Context, si *ShardInfo) error {
span, ctx := trace.NewSpan(ctx, "TopoServer.UpdateShard")
span.Annotate("keyspace", si.keyspace)
span.Annotate("shard", si.shardName)
defer span.Finish()
data, err := si.Shard.MarshalVT()
if err != nil {
return err
}
shardPath := shardFilePath(si.keyspace, si.shardName)
newVersion, err := ts.globalCell.Update(ctx, shardPath, data, si.version)
if err != nil {
return err
}
si.version = newVersion
event.Dispatch(&events.ShardChange{
KeyspaceName: si.Keyspace(),
ShardName: si.ShardName(),
Shard: si.Shard,
Status: "updated",
})
return nil
}

Writing the Shard topo key will fail if the version provided in the Update() call is older than the key's current version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

No branches or pull requests

2 participants