New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msgpack: truncate long string prefix refinements #167
msgpack: truncate long string prefix refinements #167
Conversation
Thanks for working on this! The string prefix refinement is primarily intended for short prefixes like I considered handling this during serialization to be a pragmatic compromise because it means that applications which don't serialize their values can still benefit from the longer prefixes in the less common cases where they are relevant -- these strings were presumably already in RAM anyway, so that storage cost has already been paid -- and then (as you said) this becomes a reasonable "approximation" for applications that do use serialized values to transmit refinements, as Terraform does with its providers. It is true that this is a situation where values will not round-trip losslessly. While that's non-ideal I think it's a reasonable compromise. It does mean, I suppose, that the |
Prior to this commit, serialising an unknown value with a string prefix refinement with a sufficiently long prefix would result in an error on deserialisation, since cty limits the size of the refinements blob to 1024 bytes. We choose an arbitrary maximum length of 256 bytes and truncate any string prefix refinement on serialisation. This behaviour is an exception to the general case where refined values will round-trip losslessly.
e702e81
to
55a402b
Compare
@@ -26,10 +26,6 @@ import ( | |||
// application can guarantee that the remainder of the string will not begin | |||
// with combining marks then it is safe to instead just normalize the prefix | |||
// string with [Normalize]. | |||
// | |||
// Note that this function only takes into account normalization boundaries | |||
// and does _not_ take into account grapheme cluster boundaries as defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is a mistake, unless "takes into account grapheme cluster boundaries" is intended to have a different meaning here than in line 14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I was intending to say here is that this function trims off zero or more normalization chunks so as to avoid the possibility that the resulting string might end with one that can be the prefix of a grapheme cluster chunk. Or in other words, this function potentially violates the usual rule that cty treats grapheme clusters as indivisible units under substring operations; it in fact intentionally divides them because it's trying to detect the possibility of partial ones.
But I agree that this comment is more confusing than helpful in retrospect, so I'm fine with removing it and if some confusion arises later about this the discussion about it will hopefully suggest a clearer way to explain the tradeoff this function is making.
Thanks @apparentlymart, I've created a new test for this case demonstrating the truncation behaviour. |
cty.UnknownVal(cty.String).Refine().StringPrefix(strings.Repeat("c", 255) + "-").NewValue(), | ||
}, | ||
{ | ||
cty.UnknownVal(cty.String).Refine().StringPrefix(strings.Repeat("d", 255) + "🤷🤷").NewValue(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to add comments for any of these that need it. For example, in this case the StringPrefix()
call trims the final "🤷", then the remaining "🤷" is trimmed during serialisation.
This looks great to me, so I'm going to merge it now. Thanks! |
Serialising an unknown value with a string prefix refinement with a sufficiently long prefix will result in an error on deserialisation, since cty limits the size of the refinements blob to 1024 bytes.
This PR is an implementation of the approach described in hashicorp/terraform#33464 (comment) for getting around this. Of course we would have to truncate the value while deserialising as well, otherwise the value does not round-trip correctly.
@apparentlymart I'm assuming that we don't want to impose a limit on prefix length when constructing the refinement in the first place (nor does the RefinementBuilder seem to allow for this, as far as I can tell). Silently truncating the prefix during serialisation seems like it suffices to preserve "approximations of refinements" under msgpack as documented, but if we really believe there is no proper use case for string prefix refinements greater than 256 bytes, can we let the consumer know more explicitly?