Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unknown handling in impl_writeable_tlv_based_enum_upgradable #2969

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

impl_writeable_tlv_based_enum_upgradable professed to supporting upgrades by returning None from MaybeReadable when unknown variants written by newer versions of LDK were read. However, it generally didn't support this as it didn't discard bytes for unknown types, resulting in corrupt reading.

This is fixed here for enum variants written as a TLV stream, however we don't have a length prefix for tuple enum variants, so the documentation on the macro is updated to mention that downgrades are not supported for tuple variants.

lightning/src/util/ser_macros.rs Outdated Show resolved Hide resolved
@@ -1065,6 +1065,10 @@ macro_rules! impl_writeable_tlv_based_enum {
/// when [`MaybeReadable`] is practical instead of just [`Readable`] as it provides an upgrade path for
/// new variants to be added which are simply ignored by existing clients.
///
/// Note that variants written as `$tuple_variant`s will not support downgrading no matter the
/// `$tuple_variant_id` used, however any variants written as a normal variant will support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe $variant_id rather than "variants written as a normal variant"? That sentence is a bit of a mouthful with the number of times "variant" is used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went the other way and went with simply "tuple variants" in the first part of the sentence, referring to the type by the name argument seems weird. Tried to reword it but not sure its that much clearer.

@codecov-commenter
Copy link

codecov-commenter commented Mar 27, 2024

Codecov Report

Attention: Patch coverage is 0% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 91.05%. Comparing base (b8b1ef3) to head (f852d16).
Report is 56 commits behind head on main.

Files Patch % Lines
lightning/src/util/ser_macros.rs 0.00% 5 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2969      +/-   ##
==========================================
+ Coverage   89.51%   91.05%   +1.54%     
==========================================
  Files         117      117              
  Lines       95146   108426   +13280     
  Branches    95146   108426   +13280     
==========================================
+ Hits        85169    98730   +13561     
+ Misses       7767     7534     -233     
+ Partials     2210     2162      -48     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheBlueMatt
Copy link
Collaborator Author

Grrr, yet more issues, see new commits. Not quite sure what to do about this...we could just say "look, downgrade is broken, don't downgrade", but that's kinda shitty. Alternatively, we could ship an 0.0.122 that has just this PR, then say you can't downgrade past that if you upgrade to 0.0.123?

/// Note that variants written as tuple variants will not support downgrading no matter the
/// `$tuple_variant_id` used. Any variants written as a normal variant will support downgrading by
/// having [`MaybeReadable`] return `None` for odd `$variant_id` variants not understood.
/// Note that only normal variants (not tuple variants) will support downgrading, thus any new odd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/normal variants/struct and unit variants

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you address this before squashing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry, somehow I missed this entirely.

lightning/src/util/ser_macros.rs Show resolved Hide resolved
@valentinewallace
Copy link
Contributor

Grrr, yet more issues, see new commits. Not quite sure what to do about this...we could just say "look, downgrade is broken, don't downgrade", but that's kinda shitty. Alternatively, we could ship an 0.0.122 that has just this PR, then say you can't downgrade past that if you upgrade to 0.0.123?

Yeah, I don't see what choice we have besides doing one of those :(

It seems possible that certain specific downgrades aren't broken because no upgradable odd TLVs were added in between them. Maybe it would help users to figure out what versions those are?

@valentinewallace
Copy link
Contributor

FYI, I was having trouble wrapping my mind around the fixes here so I ended up writing some small tests that repro the bugs on main: 8da8295

@TheBlueMatt
Copy link
Collaborator Author

It seems possible that certain specific downgrades aren't broken because no upgradable odd TLVs were added in between them. Maybe it would help users to figure out what versions those are?

Yea, there are definitely quite some upgrade/downgrades that are totally fine, but definitely trying to figure it all out would be quite a bit of work. Given we haven't yet received any reports of this issue in the wild (not that that means no one has ever hit it), I'd say its quite uncommon (then again, in normal operation we probably very rarely serialize with any pending events or monitor events, which is at least where we would break downgrade on the current git HEAD).

We definitely have users who have deployed multi-device so we generally have to support downgrade (ie users switching from an upgraded node to an un-upgraded one) but we can ask our users if they're comfortable with just having an 0.0.122 with this fix (and then waiting a while for 0.0.123) or if they want us to consider some kind of backport to earlier LDK versions.

@MaxFangX
Copy link
Contributor

MaxFangX commented Apr 5, 2024

Saw the request for some comments on Discord so putting in my two sats here. For us, being able to downgrade is not as hard of a requirement as it would be for say a multi-device wallet (e.g. Mutiny). We operate on a model where each user implicitly approves the latest three Lexe node versions that they have provisioned to. We as the Lexe operators are prevented from running anything other than these three approved versions, which helps protect the user from downgrade attacks, but also gives us (the operators) some flexibility to do some limited downgrades in the more mundane case that one of our releases is broken. In this case we would mark the broken version as 'yanked' in our system and downgrade the user to the latest version they have approved which is not broken.

Just an example, feel free to skip if it's clear already:

  • User has approved 0.1.0 by provisioning to it
  • We release 0.1.1 which the user also approves
  • We discover that 0.1.1 is broken, and mark it as yanked.
    • When we run the user's node we run 0.1.0 to avoid the broken version
    • We also release 0.1.2 with the fix
  • User approves 0.1.2, and since the 'yanked' list is updated in 0.1.2, this prevents Lexe from ever running 0.1.1.
  • Lexe runs 0.1.2 for the user from here on out but still have the option to downgrade to 0.1.0 if needed.

If LDK breaks downgrades in a point release, it would prevents us from downgrading the user in the (hopefully very rare) case that that one of our releases is broken. This is not a big deal most of the time, since the necessity for downgrades should be rare. But there is a non-zero chance that it affects us.

The worst case scenario for us is if LDK breaks downgrades, and our release with the updated LDK also happens to be irrecoverably broken in some way (possibly due to the LDK release). If any of our users approve the broken version and have their node run by us (thus updating their persisted data), we would not be able to run their node, or could only run the broken version, until we release a fix and wait for the user to re-provision (which could take months if they're not active).

we could ship an 0.0.122 that has just this PR

This seems like a good idea in general, as any LDK changes which could break the node in other ways would be isolated from the change which breaks downgrades. This would allow us (Lexe) to release a version of our own which only includes the LDK version bump, and make sure things are OK before we 'commit' and release other functionality which might break. After confirming that LDK 0.0.122 works fine, we would have a safe version to downgrade to if a newer Lexe release is broken.

I would like to emphasize however that once again, LDK breaking downgrades is not that huge of a deal for Lexe specifically. I think we would actually prefer that LDK break downgrades every once in a while if a downgrade would allow you to improve the codebase and pay down technical debt. The hard work you guys do to provide these guarantees for us is warmly appreciated as always.

@TheBlueMatt TheBlueMatt modified the milestones: 0.0.123, 0.0.122 Apr 6, 2024
// Thus, we consume everything left in the `$outer_reader` here, ensuring that if
// we're being read as a part of another TLV stream we don't spuriously fail to
// deserialize the outer object due to a TLV length mismatch.
$crate::io_extras::copy($outer_reader, &mut $crate::io_extras::sink()).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use $crate::io_extras::copy but $reader.eat_remaining()? below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not reading the inner reader (which is always a LengthLimitingReader for just the specific TLV's value and thus supports eat_remaining()) but rather the outer reader, which may be of any type and thus we can't assume eat_remaining is available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Makes sense.

@valentinewallace
Copy link
Contributor

LGTM, feel free to squash.

`impl_writeable_tlv_based_enum_upgradable` professed to supporting
upgrades by returning `None` from `MaybeReadable` when unknown
variants written by newer versions of LDK were read. However, it
generally didn't support this as it didn't discard bytes for
unknown types, resulting in corrupt reading.

This is fixed here for enum variants written as a TLV stream,
however we don't have a length prefix for tuple enum variants, so
the documentation on the macro is updated to mention that
downgrades are not supported for tuple variants.
If we are reading an object that is `MaybeReadable` in a TLV stream
using `upgradable_required`, it may return early with `Ok(None)`.
In this case, it will not read any further TLVs from the TLV
stream. This is fine, except that we generally expect
`MaybeReadable` always consume the correct number of bytes for the
full object, even if it doesn't understand it.

This could pose a problem, for example, in cases where we're
reading a TLV-stream `MaybeReadable` object inside another
TLV-stream object. In that case, the `MaybeReadable` object may
return `Ok(None)` and not consume all the available bytes, causing
the outer TLV read to fail as the TLV length does not match.
Whils this is generally not supported, issues in our
`MaybeReadable` implementations may occur, and we should try to be
robust against them.
@TheBlueMatt
Copy link
Collaborator Author

Squashed with no further changes.

@valentinewallace valentinewallace merged commit eaf76f6 into lightningdevkit:main Apr 8, 2024
16 checks passed
@jkczyz
Copy link
Contributor

jkczyz commented Apr 9, 2024

FYI, I was having trouble wrapping my mind around the fixes here so I ended up writing some small tests that repro the bugs on main: 8da8295

Might be worth including these in a follow-up PR.

@TheBlueMatt
Copy link
Collaborator Author

Oh duh, yea.

TheBlueMatt added a commit that referenced this pull request Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants