-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libnet/i/defaultipam: Disambiguate PoolID string format #47837
libnet/i/defaultipam: Disambiguate PoolID string format #47837
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A much better format!
I guess it'll be impossible to downgrade without deleting networks, because old builds won't recognise the new format? If a pool doesn't have any new features that make it unsafe for an old build to use (the normal case, for all existing networks) - perhaps it'd be better to generate the pool id in the old format?
str = strings.TrimSuffix(strings.TrimPrefix(str, "PoolID{"), "}") | ||
|
||
for _, field := range strings.FieldsFunc(str, func(c rune) bool { return c == ';' }) { | ||
p := strings.SplitN(field, "=", 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth checking SplitN found an =
and returned two strings, before using p[1]
.
(Or, perhaps better - the docs for SplitN say "To split around the first instance of a separator, see Cut", which returns two strings and an "ok".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings.Cut
is a better option overall; https://pkg.go.dev/strings#Cut, but it would ignore multiple =
in the string etc, in case we need to care about those.
k2, err = PoolIDFromString(expected) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
assert.NilError(t, err) | ||
|
||
if k2.AddressSpace != k.AddressSpace || k2.Subnet != k.Subnet || k2.ChildSubnet != k.ChildSubnet { | ||
t.Fatalf("SubnetKey.FromString() failed. Expected %v. Got %v", k, k2) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth testing PoolIDFromString with a v2 string that has no ChildSubnet? (And some invalid strings to check the InvalidParameter code paths?)
(Could also convert more of the old test to use Equal rather than comparing individual fields, and use assert.Check(t, is.Equal(...))
.)
switch p[0] { | ||
case "AddressSpace": | ||
pID.AddressSpace = p[1] | ||
case "Subnet": | ||
if pID.Subnet, err = netip.ParsePrefix(p[1]); err != nil { | ||
return PoolID{}, types.InvalidParameterErrorf("invalid string form for subnetkey: %s", str) | ||
} | ||
case "ChildSubnet": | ||
if pID.ChildSubnet, err = netip.ParsePrefix(p[1]); err != nil { | ||
return PoolID{}, types.InvalidParameterErrorf("invalid string form for subnetkey: %s", str) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to add more fields? ... this will silently drop fields it doesn't recognise, is that likely to be safe behaviour on downgrade?
I guess it's impossible to predict, but it might be safest to bomb out? Or, could come up with a naming convention that says whether the address space is safe to use without understanding a field?
(Could log unknown fields, but that might not help much.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to add more fields? ... this will silently drop fields it doesn't recognise, is that likely to be safe behaviour on downgrade?
Yeah, we might need a new field to properly support duplicate static allocations (to uniquely identify each duplicate).
this will silently drop fields it doesn't recognise, is that likely to be safe behaviour on downgrade? I guess it's impossible to predict, but it might be safest to bomb out?
Indeed, my intention was to make it easy to add new fields while providing a safe / easy downgrade path.
Or, could come up with a naming convention that says whether the address space is safe to use without understanding a field?
Hm not sure to see what you mean 🤔 We can't predict what future fields (if any) will look like. Each major release will understand a specific set of fields, matching a specific implementation of the Allocator / addrSpace.
(Could log unknown fields, but that might not help much.)
We could log, yeah. But this would happen only on downgrade (so not really useful) or if we commit a big mistake (eg. a typo in a field name in the serialize code). Not sure if that's really valuable 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm not sure to see what you mean 🤔 We can't predict what future fields (if any) will look like. Each major release will understand a specific set of fields, matching a specific implementation of the Allocator / addrSpace.
I was thinking there may be two categories of new field in the id ... informational, some annotation for debug/logging/inspect that old builds can safely ignore. Or, something new that means an old daemon can't safely use the pool because it doesn't know how it works.
The duplicate-tracking field is probably in the second category? I guess an old daemon that didn't understand the field would see duplicate pool-ids, maybe lose data from one of the pools, and generally be a bit weird? (So, we'd want it to fail if the new id-field is used. In which case, it's an example of a field that shouldn't be silently ignored.)
For future backwards-compatibility - we could come up with a convention along the lines of "an unknown field name with prefix 'Info' can be silently ignored, other unknown fields mean the pool is unusable". We should probably make sure the daemon starts with unusable pools, so that they can be deleted and re-created. Maybe it's not necessary, I don't have examples, but this would be the time to build it in if it might be needed.
Yeah, correct. This implementation doesn't offer a downgrade path. I think it'd be better to ask users to backup their networkdb before upgrading, and restore it if they need to downgrade.
We could, but we'll have to transition to the v2 format at some point. I'd prefer to kill it right away. |
Once the new format's existed for a release-or-so, it'll be fine to store the new format even where the old format would work, because old builds will understand the new format. Most people won't read/follow instructions to back up the network-db (!). Even if they do, if they make changes between upgrading and discovering a problem that means they need to roll-back, it'll still be a problem - and they'll already be annoyed by that point. Also, after downgrading, I think there will just be a strange 'invalid parameter' error logged during startup (?) ... so it won't be obvious that the network-db has to be deleted, and networks re-created. (Bugs that make downgrades necessary will have a much bigger impact.) Even for development, I sometimes end up flipping between versions to compare behaviour. Re-creating networks in that case wouldn't be the end of the world, just more-faff. |
IIRC in the IPAM contract, the PoolID string is an opaque handle to an allocated IPAM pool. I don't think it is even exposed to the user anywhere in the Engine API. It might show up in logs so a printable string would be nice to have, but that's about it. Aside from that, all defaultipam needs is to be able to round-trip some KV pairs through a string. Why do we need to roll a fully custom microformat when there are so many codecs — in stdlib, even! — which are fit for purpose? Strawman: encode the KV pairs using some codec that we don't have to write the marshaler or unmarshaller for, prepending it with some magic string identifying it as v2 format. The only parsing necessary would be
Speaking of downgrades, while we can't support downgrades to a version that does not understand the v2 PoolID string format, we could plan ahead a little bit to make the v2 format more amenable to backwards compatibility to go along with the extensibility. The PNG format, for instance, has a neat trick for extensibility: each "chunk" of the file is encoded with a header that signifies the chunk type. The really clever bit is that the chunk type also encodes a "critical/ancillary" flag. PNG parsers encountering an unknown chunk examine the flag to determine whether to skip over the chunk or error out. (PNG chunk tags are 4-char strings. Ones starting with an uppercase char are critical, lowercase ancillary.) I think we should come up with some scheme to signify whether a particular PoolID KV pair is critical or ancillary, and have the parser fail to unmarshal unknown critical KVs and discard ancillary ones. |
Silly question (sorry didn't go through all the comments); is this parsing in a "hot-path", or would using JSON work for this? (at least with JSON we'd have a format that we know works, and we wouldn't have to come up with our own; it would also be extensible (adding more fields)). |
Oh, LOL, looks like Cory and I commented at the same time. 😂 (his post is definitely more in-depth than mine) |
We could backport this change to 26.1.x - then at least it'll be possible to roll back to (There's quite a lot of networking change going into 27.0, and our track record isn't great. I think it's worth planning for problems.) |
We talked about downgrades with @robmry during a 1:1 and his idea is pretty neat: backport the deserializer to v26.1 but keep the v1 serializer. As he mentioned, v27 is going to be quite heavy in terms of networking changes, so it carries some risks. This would help offset those risks by giving users an escape hatch if things go wrong. And I guess we'd need to backport the deserializer to v25 too for MCR.
I excluded JSON marshalling specifically because
That seems to be quite involved / over-engineered. At this point we don't plan to make any changes to |
I'm not exactly sure what scenario you had in mind here; wouldn't the current code already produce errors in many scenarios? It may (currently) ignore some case, such as a string starting with |
Serialization is done in Another, and maybe more important downside of stdlib's codecs: if we want something custom we have to implement a specific marshaler interface. One concrete example: to add proper support for duplicate static allocations, I need to add a new field 'AllocID' to 'PoolID', but I need to distinguish between that field being not set and being the zero value. If I use json marshaling, I'll end up writing my own marshaler. FWIW, I'm ruling out url encoding as the stringified PoolID might end up in logs and error messages. I'd prefer to keep it easily readable both for us and for end-users. |
Don't ignore the error, then! if err != nil {
panic(err)
}
Well, you wouldn't. The PoolID message is an unordered mapping of Designing and implementing an extensible serialization format for structured data, even flat K-V pairs, is not a trivial endeavour. The grammar needs to be unambiguous, invalid syntax needs to be rejected, and some scheme is necessary for escaping syntactically-significant productions embedded in user data. Case in point: with the marshaling to a JSON object is trivial. Just marshal a |
b3f1071
to
d3584ae
Compare
data := strings.TrimPrefix(str, poolIDV2Prefix) | ||
|
||
if err := json.Unmarshal([]byte(data), &pID); err != nil { | ||
return PoolID{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably get the types.InvalidParameterErrorf()
treatment too? (Although they might all be internal errors really, as this isn't user input?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed that in the original commit and added a new one to convert all InvalidParameter
into InternalError
s.
3e6645a
to
eab09b1
Compare
eab09b1
to
cd9ccba
Compare
I've updated this PR to use a JSON codec instead of my handwritten (un)marshaler. I'm using a map of strings as I'll need to analyze which fields are set in a future PR. -- I'm still a lot skeptical about the 'critical / anciliary' fields thingy. First, because I fail to see an example where that would be useful. I'm planning to add a new 'AllocID' field and we might consider making the allocator VNI-aware. But that's about it, there's no other fields we plan to add in the foreseeable future. Second, because we really want downgrades to be possible without any errors -- 'critical' fields are going to make networks unusable. It seems the backporting strategy suggested by @robmry is more suited for that. It'd allow to decide on a case by case how downgrade scenarios should be handled for every new fields (at least if older versions shouldn't ignore those fields). This gives the ability to handle them graciously. |
Prior to this change PoolID microformat was using slashes to separate fields. Those fields include subnet prefixes in CIDR notation, which also include a slash. This makes future evolution harder than it should be. This change introduces a 'v2' microformat based on JSON. This has two advantages: 1. Fields are clearly named to ensure each value is associated to the right field. 2. Field values and separators are clearly distinguished to remove any ambiguity. The 'v1' encoding will be kept until the next major MCR LTS is released. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
…rrof InvalidParameterErrorf was used whenever an invalid value was found during PoolID unmarshaling. This error is converted to a 400 HTTP code by the HTTP server. However, users never provide PoolIDs directly -- these are constructed from user-supplied values which are already validated when the PoolID is marshaled. Hence, if such erroneous value is found, it's an internal error and should be converted to a 500. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
cd9ccba
to
5a2fa59
Compare
It's for the unforeseen circumstances.
The two are not mutually exclusive. I would say that they go hand in hand. Flagging a field as critical affords fail fast behaviour when downgrading too far, alerting the operator that they may need to look for a newer patch version of the daemon with the backport. Otherwise, if the field was truly critical, the network may be silently (or subtly) broken on a downgraded engine that does not understand the field. If a downgraded engine does not need to understand the field to behave correctly, the field is ancillary by definition. |
if strings.HasPrefix(str, poolIDV2Prefix) { | ||
return parsePoolIDV2(str) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May as well test and trim in the same operation.
if strings.HasPrefix(str, poolIDV2Prefix) { | |
return parsePoolIDV2(str) | |
} | |
if v, ok := strings.CutPrefix(str, poolIDV2Prefix); ok { | |
return parsePoolIDV2(v) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I considered that but decided to pass the unaltered string instead to have it in error messages.
- What I did
Prior to this change PoolID microformat was using slashes to separate fields. Those fields include subnet prefixes in CIDR notation, which also includes a slash. This makes future evolution harder than it should be.
This change introduces a 'v2' microformat which uses: 1. named fields to disambiguate which field each value is associated to; 2. semicolons as a separator.
The 'v1' encoding will be kept until the next major MCR LTS is released after v27.
- How to verify it
- A picture of a cute animal (not mandatory but encouraged)