Fix unknown handling in `impl_writeable_tlv_based_enum_upgradable` #2969

TheBlueMatt · 2024-03-26T15:15:20Z

impl_writeable_tlv_based_enum_upgradable professed to supporting upgrades by returning None from MaybeReadable when unknown variants written by newer versions of LDK were read. However, it generally didn't support this as it didn't discard bytes for unknown types, resulting in corrupt reading.

This is fixed here for enum variants written as a TLV stream, however we don't have a length prefix for tuple enum variants, so the documentation on the macro is updated to mention that downgrades are not supported for tuple variants.

lightning/src/util/ser_macros.rs

jkczyz · 2024-03-26T21:39:41Z

lightning/src/util/ser_macros.rs

@@ -1065,6 +1065,10 @@ macro_rules! impl_writeable_tlv_based_enum {
 /// when [`MaybeReadable`] is practical instead of just [`Readable`] as it provides an upgrade path for
 /// new variants to be added which are simply ignored by existing clients.
 ///
+/// Note that variants written as `$tuple_variant`s will not support downgrading no matter the
+/// `$tuple_variant_id` used, however any variants written as a normal variant will support


Maybe $variant_id rather than "variants written as a normal variant"? That sentence is a bit of a mouthful with the number of times "variant" is used.

I went the other way and went with simply "tuple variants" in the first part of the sentence, referring to the type by the name argument seems weird. Tried to reword it but not sure its that much clearer.

codecov-commenter · 2024-03-27T19:57:54Z

Codecov Report

Attention: Patch coverage is 0% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 91.05%. Comparing base (b8b1ef3) to head (f852d16).
Report is 56 commits behind head on main.

Files	Patch %	Lines
lightning/src/util/ser_macros.rs	0.00%	5 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2969      +/-   ##
==========================================
+ Coverage   89.51%   91.05%   +1.54%     
==========================================
  Files         117      117              
  Lines       95146   108426   +13280     
  Branches    95146   108426   +13280     
==========================================
+ Hits        85169    98730   +13561     
+ Misses       7767     7534     -233     
+ Partials     2210     2162      -48

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lightning/src/util/ser_macros.rs

TheBlueMatt · 2024-03-28T22:12:07Z

Grrr, yet more issues, see new commits. Not quite sure what to do about this...we could just say "look, downgrade is broken, don't downgrade", but that's kinda shitty. Alternatively, we could ship an 0.0.122 that has just this PR, then say you can't downgrade past that if you upgrade to 0.0.123?

valentinewallace · 2024-03-28T22:14:40Z

lightning/src/util/ser_macros.rs

-/// Note that variants written as tuple variants will not support downgrading no matter the
-/// `$tuple_variant_id` used. Any variants written as a normal variant will support downgrading by
-/// having [`MaybeReadable`] return `None` for odd `$variant_id` variants not understood.
+/// Note that only normal variants (not tuple variants) will support downgrading, thus any new odd


s/normal variants/struct and unit variants

Could you address this before squashing?

Oops, sorry, somehow I missed this entirely.

lightning/src/util/ser_macros.rs

valentinewallace · 2024-03-29T22:15:10Z

Grrr, yet more issues, see new commits. Not quite sure what to do about this...we could just say "look, downgrade is broken, don't downgrade", but that's kinda shitty. Alternatively, we could ship an 0.0.122 that has just this PR, then say you can't downgrade past that if you upgrade to 0.0.123?

Yeah, I don't see what choice we have besides doing one of those :(

It seems possible that certain specific downgrades aren't broken because no upgradable odd TLVs were added in between them. Maybe it would help users to figure out what versions those are?

valentinewallace · 2024-03-29T22:21:07Z

FYI, I was having trouble wrapping my mind around the fixes here so I ended up writing some small tests that repro the bugs on main: 8da8295

TheBlueMatt · 2024-04-02T09:09:54Z

It seems possible that certain specific downgrades aren't broken because no upgradable odd TLVs were added in between them. Maybe it would help users to figure out what versions those are?

Yea, there are definitely quite some upgrade/downgrades that are totally fine, but definitely trying to figure it all out would be quite a bit of work. Given we haven't yet received any reports of this issue in the wild (not that that means no one has ever hit it), I'd say its quite uncommon (then again, in normal operation we probably very rarely serialize with any pending events or monitor events, which is at least where we would break downgrade on the current git HEAD).

We definitely have users who have deployed multi-device so we generally have to support downgrade (ie users switching from an upgraded node to an un-upgraded one) but we can ask our users if they're comfortable with just having an 0.0.122 with this fix (and then waiting a while for 0.0.123) or if they want us to consider some kind of backport to earlier LDK versions.

MaxFangX · 2024-04-05T20:14:36Z

Saw the request for some comments on Discord so putting in my two sats here. For us, being able to downgrade is not as hard of a requirement as it would be for say a multi-device wallet (e.g. Mutiny). We operate on a model where each user implicitly approves the latest three Lexe node versions that they have provisioned to. We as the Lexe operators are prevented from running anything other than these three approved versions, which helps protect the user from downgrade attacks, but also gives us (the operators) some flexibility to do some limited downgrades in the more mundane case that one of our releases is broken. In this case we would mark the broken version as 'yanked' in our system and downgrade the user to the latest version they have approved which is not broken.

Just an example, feel free to skip if it's clear already:

User has approved 0.1.0 by provisioning to it
We release 0.1.1 which the user also approves
We discover that 0.1.1 is broken, and mark it as yanked.
- When we run the user's node we run 0.1.0 to avoid the broken version
- We also release 0.1.2 with the fix
User approves 0.1.2, and since the 'yanked' list is updated in 0.1.2, this prevents Lexe from ever running 0.1.1.
Lexe runs 0.1.2 for the user from here on out but still have the option to downgrade to 0.1.0 if needed.

If LDK breaks downgrades in a point release, it would prevents us from downgrading the user in the (hopefully very rare) case that that one of our releases is broken. This is not a big deal most of the time, since the necessity for downgrades should be rare. But there is a non-zero chance that it affects us.

The worst case scenario for us is if LDK breaks downgrades, and our release with the updated LDK also happens to be irrecoverably broken in some way (possibly due to the LDK release). If any of our users approve the broken version and have their node run by us (thus updating their persisted data), we would not be able to run their node, or could only run the broken version, until we release a fix and wait for the user to re-provision (which could take months if they're not active).

we could ship an 0.0.122 that has just this PR

This seems like a good idea in general, as any LDK changes which could break the node in other ways would be isolated from the change which breaks downgrades. This would allow us (Lexe) to release a version of our own which only includes the LDK version bump, and make sure things are OK before we 'commit' and release other functionality which might break. After confirming that LDK 0.0.122 works fine, we would have a safe version to downgrade to if a newer Lexe release is broken.

I would like to emphasize however that once again, LDK breaking downgrades is not that huge of a deal for Lexe specifically. I think we would actually prefer that LDK break downgrades every once in a while if a downgrade would allow you to improve the codebase and pay down technical debt. The hard work you guys do to provide these guarantees for us is warmly appreciated as always.

jkczyz · 2024-04-06T10:15:58Z

lightning/src/util/ser_macros.rs

+				// Thus, we consume everything left in the `$outer_reader` here, ensuring that if
+				// we're being read as a part of another TLV stream we don't spuriously fail to
+				// deserialize the outer object due to a TLV length mismatch.
+				$crate::io_extras::copy($outer_reader, &mut $crate::io_extras::sink()).unwrap();


Why do we use $crate::io_extras::copy but $reader.eat_remaining()? below?

We're not reading the inner reader (which is always a LengthLimitingReader for just the specific TLV's value and thus supports eat_remaining()) but rather the outer reader, which may be of any type and thus we can't assume eat_remaining is available.

Ah, I see. Makes sense.

valentinewallace · 2024-04-07T19:12:47Z

LGTM, feel free to squash.

`impl_writeable_tlv_based_enum_upgradable` professed to supporting upgrades by returning `None` from `MaybeReadable` when unknown variants written by newer versions of LDK were read. However, it generally didn't support this as it didn't discard bytes for unknown types, resulting in corrupt reading. This is fixed here for enum variants written as a TLV stream, however we don't have a length prefix for tuple enum variants, so the documentation on the macro is updated to mention that downgrades are not supported for tuple variants.

If we are reading an object that is `MaybeReadable` in a TLV stream using `upgradable_required`, it may return early with `Ok(None)`. In this case, it will not read any further TLVs from the TLV stream. This is fine, except that we generally expect `MaybeReadable` always consume the correct number of bytes for the full object, even if it doesn't understand it. This could pose a problem, for example, in cases where we're reading a TLV-stream `MaybeReadable` object inside another TLV-stream object. In that case, the `MaybeReadable` object may return `Ok(None)` and not consume all the available bytes, causing the outer TLV read to fail as the TLV length does not match.

Whils this is generally not supported, issues in our `MaybeReadable` implementations may occur, and we should try to be robust against them.

TheBlueMatt · 2024-04-07T19:56:05Z

Squashed with no further changes.

jkczyz · 2024-04-09T14:32:17Z

FYI, I was having trouble wrapping my mind around the fixes here so I ended up writing some small tests that repro the bugs on main: 8da8295

Might be worth including these in a follow-up PR.

TheBlueMatt · 2024-04-09T14:43:45Z

Oh duh, yea.

…-121 Cut 0.0.122 with just #2969, #2897, and #2937

TheBlueMatt force-pushed the 2024-03-fix-upgradable-enum branch from c9286b2 to 7cf7308 Compare March 26, 2024 15:15

TheBlueMatt added this to the 0.0.122 milestone Mar 26, 2024

TheBlueMatt mentioned this pull request Mar 26, 2024

Include a PaymentContext in PaymentPurpose #2970

Merged

jkczyz reviewed Mar 26, 2024

View reviewed changes

valentinewallace reviewed Mar 28, 2024

View reviewed changes

lightning/src/util/ser_macros.rs Show resolved Hide resolved

valentinewallace reviewed Mar 28, 2024

View reviewed changes

lightning/src/util/ser_macros.rs Outdated Show resolved Hide resolved

TheBlueMatt force-pushed the 2024-03-fix-upgradable-enum branch from cca8193 to ecda130 Compare March 28, 2024 22:11

valentinewallace reviewed Mar 29, 2024

View reviewed changes

TheBlueMatt modified the milestones: 0.0.123, 0.0.122 Apr 6, 2024

jkczyz reviewed Apr 6, 2024

View reviewed changes

TheBlueMatt force-pushed the 2024-03-fix-upgradable-enum branch from ecda130 to 5661da0 Compare April 7, 2024 13:08

jkczyz approved these changes Apr 7, 2024

View reviewed changes

TheBlueMatt added 3 commits April 7, 2024 19:55

Allow MaybeReadable to not fully read in upgradable_option

f852d16

Whils this is generally not supported, issues in our `MaybeReadable` implementations may occur, and we should try to be robust against them.

TheBlueMatt force-pushed the 2024-03-fix-upgradable-enum branch from 5661da0 to f852d16 Compare April 7, 2024 19:56

jkczyz approved these changes Apr 8, 2024

View reviewed changes

valentinewallace approved these changes Apr 8, 2024

View reviewed changes

valentinewallace merged commit eaf76f6 into lightningdevkit:main Apr 8, 2024
16 checks passed

TheBlueMatt mentioned this pull request Apr 8, 2024

Cut 0.0.122 with just #2969, #2897, and #2937 #2985

Merged

TheBlueMatt added a commit that referenced this pull request Apr 9, 2024

Merge pull request #2985 from TheBlueMatt/2024-03-fix-upgradable-enum…

27e5519

…-121 Cut 0.0.122 with just #2969, #2897, and #2937

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unknown handling in `impl_writeable_tlv_based_enum_upgradable` #2969

Fix unknown handling in `impl_writeable_tlv_based_enum_upgradable` #2969

TheBlueMatt commented Mar 26, 2024

jkczyz Mar 26, 2024

TheBlueMatt Mar 27, 2024

codecov-commenter commented Mar 27, 2024 •

edited

TheBlueMatt commented Mar 28, 2024

valentinewallace Mar 28, 2024 •

edited

jkczyz Apr 7, 2024

TheBlueMatt Apr 7, 2024

valentinewallace commented Mar 29, 2024

valentinewallace commented Mar 29, 2024

TheBlueMatt commented Apr 2, 2024

MaxFangX commented Apr 5, 2024 •

edited

jkczyz Apr 6, 2024

TheBlueMatt Apr 6, 2024

jkczyz Apr 6, 2024

valentinewallace commented Apr 7, 2024

TheBlueMatt commented Apr 7, 2024

jkczyz commented Apr 9, 2024

TheBlueMatt commented Apr 9, 2024

Fix unknown handling in impl_writeable_tlv_based_enum_upgradable #2969

Fix unknown handling in impl_writeable_tlv_based_enum_upgradable #2969

Conversation

TheBlueMatt commented Mar 26, 2024

jkczyz Mar 26, 2024

Choose a reason for hiding this comment

TheBlueMatt Mar 27, 2024

Choose a reason for hiding this comment

codecov-commenter commented Mar 27, 2024 • edited

Codecov Report

TheBlueMatt commented Mar 28, 2024

valentinewallace Mar 28, 2024 • edited

Choose a reason for hiding this comment

jkczyz Apr 7, 2024

Choose a reason for hiding this comment

TheBlueMatt Apr 7, 2024

Choose a reason for hiding this comment

valentinewallace commented Mar 29, 2024

valentinewallace commented Mar 29, 2024

TheBlueMatt commented Apr 2, 2024

MaxFangX commented Apr 5, 2024 • edited

jkczyz Apr 6, 2024

Choose a reason for hiding this comment

TheBlueMatt Apr 6, 2024

Choose a reason for hiding this comment

jkczyz Apr 6, 2024

Choose a reason for hiding this comment

valentinewallace commented Apr 7, 2024

TheBlueMatt commented Apr 7, 2024

jkczyz commented Apr 9, 2024

TheBlueMatt commented Apr 9, 2024

Fix unknown handling in `impl_writeable_tlv_based_enum_upgradable` #2969

Fix unknown handling in `impl_writeable_tlv_based_enum_upgradable` #2969

codecov-commenter commented Mar 27, 2024 •

edited

valentinewallace Mar 28, 2024 •

edited

MaxFangX commented Apr 5, 2024 •

edited