msgpack: truncate long string prefix refinements #167

kmoe · 2023-08-23T09:26:55Z

Serialising an unknown value with a string prefix refinement with a sufficiently long prefix will result in an error on deserialisation, since cty limits the size of the refinements blob to 1024 bytes.

This PR is an implementation of the approach described in hashicorp/terraform#33464 (comment) for getting around this. Of course we would have to truncate the value while deserialising as well, otherwise the value does not round-trip correctly.

@apparentlymart I'm assuming that we don't want to impose a limit on prefix length when constructing the refinement in the first place (nor does the RefinementBuilder seem to allow for this, as far as I can tell). Silently truncating the prefix during serialisation seems like it suffices to preserve "approximations of refinements" under msgpack as documented, but if we really believe there is no proper use case for string prefix refinements greater than 256 bytes, can we let the consumer know more explicitly?

apparentlymart · 2023-08-23T15:21:36Z

Thanks for working on this!

The string prefix refinement is primarily intended for short prefixes like https:// or ami- or similar, so I find it a reasonable compromise to truncate longer prefixes during serialization to limit the amount of overhead refinements can create -- not all consumers of values will do anything with the refinements anyway, but every user of the msgpack serialization must still pay the cost of storing/transmitting them.

I considered handling this during serialization to be a pragmatic compromise because it means that applications which don't serialize their values can still benefit from the longer prefixes in the less common cases where they are relevant -- these strings were presumably already in RAM anyway, so that storage cost has already been paid -- and then (as you said) this becomes a reasonable "approximation" for applications that do use serialized values to transmit refinements, as Terraform does with its providers.

It is true that this is a situation where values will not round-trip losslessly. While that's non-ideal I think it's a reasonable compromise. It does mean, I suppose, that the TestRoundTrip test model will not work for this particular scenario, and so this would either need a new test specialized for this situation or a special rule in the TestRoundTrip test loop which considers an unknown string to have "round-tripped" correctly if the expected prefix has been preserved. The former (a new test) is probably the easier to implement, because then we can hand-write the expected value rather than having to describe a general rule for deriving it from arbitrary input.

Prior to this commit, serialising an unknown value with a string prefix refinement with a sufficiently long prefix would result in an error on deserialisation, since cty limits the size of the refinements blob to 1024 bytes. We choose an arbitrary maximum length of 256 bytes and truncate any string prefix refinement on serialisation. This behaviour is an exception to the general case where refined values will round-trip losslessly.

kmoe · 2023-08-24T13:53:43Z

cty/ctystrings/prefix.go

@@ -26,10 +26,6 @@ import (
 // application can guarantee that the remainder of the string will not begin
 // with combining marks then it is safe to instead just normalize the prefix
 // string with [Normalize].
-//
-// Note that this function only takes into account normalization boundaries
-// and does _not_ take into account grapheme cluster boundaries as defined


I assume this is a mistake, unless "takes into account grapheme cluster boundaries" is intended to have a different meaning here than in line 14.

I think what I was intending to say here is that this function trims off zero or more normalization chunks so as to avoid the possibility that the resulting string might end with one that can be the prefix of a grapheme cluster chunk. Or in other words, this function potentially violates the usual rule that cty treats grapheme clusters as indivisible units under substring operations; it in fact intentionally divides them because it's trying to detect the possibility of partial ones.

But I agree that this comment is more confusing than helpful in retrospect, so I'm fine with removing it and if some confusion arises later about this the discussion about it will hopefully suggest a clearer way to explain the tradeoff this function is making.

kmoe · 2023-08-24T13:55:06Z

Thanks @apparentlymart, I've created a new test for this case demonstrating the truncation behaviour.

kmoe · 2023-08-24T14:55:26Z

cty/msgpack/roundtrip_test.go

+			cty.UnknownVal(cty.String).Refine().StringPrefix(strings.Repeat("c", 255) + "-").NewValue(),
+		},
+		{
+			cty.UnknownVal(cty.String).Refine().StringPrefix(strings.Repeat("d", 255) + "🤷🤷").NewValue(),


Happy to add comments for any of these that need it. For example, in this case the StringPrefix() call trims the final "🤷", then the remaining "🤷" is trimmed during serialisation.

apparentlymart · 2023-08-24T15:42:20Z

This looks great to me, so I'm going to merge it now. Thanks!

kmoe added 2 commits August 24, 2023 14:47

ctystrings: clarify comment

55a402b

kmoe force-pushed the msgpack-truncate-prefix-refinement branch from e702e81 to 55a402b Compare August 24, 2023 13:52

kmoe commented Aug 24, 2023

View reviewed changes

apparentlymart merged commit 560dd28 into zclconf:main Aug 24, 2023

kmoe mentioned this pull request Aug 24, 2023

upgrade go-cty to v1.13.3 hashicorp/terraform#33741

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

msgpack: truncate long string prefix refinements #167

msgpack: truncate long string prefix refinements #167

kmoe commented Aug 23, 2023 •

edited

apparentlymart commented Aug 23, 2023

kmoe Aug 24, 2023

apparentlymart Aug 24, 2023

kmoe commented Aug 24, 2023

kmoe Aug 24, 2023

apparentlymart commented Aug 24, 2023

msgpack: truncate long string prefix refinements #167

msgpack: truncate long string prefix refinements #167

Conversation

kmoe commented Aug 23, 2023 • edited

apparentlymart commented Aug 23, 2023

kmoe Aug 24, 2023

Choose a reason for hiding this comment

apparentlymart Aug 24, 2023

Choose a reason for hiding this comment

kmoe commented Aug 24, 2023

kmoe Aug 24, 2023

Choose a reason for hiding this comment

apparentlymart commented Aug 24, 2023

kmoe commented Aug 23, 2023 •

edited