Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FromBytes::(mut_)?slice_from_[prefix|suffix] also returns the prefix/suffix instead of just the read slice #884

Closed
kupiakos opened this issue Feb 15, 2024 · 6 comments
Labels
compatibility-breaking Changes that are (likely to be) breaking customer-request Documents customer requests. documentation Improvements or additions to documentation experience-easy This issue is easy, and shouldn't require much experience good first issue Good for newcomers

Comments

@kupiakos
Copy link
Contributor

kupiakos commented Feb 15, 2024

  • slice_from_prefix returns Option<(&[Self], &[u8])>
  • slice_from_suffix returns Option<(&[u8], &[Self])>
  • mut_slice_from_prefix returns Option<(&mut [Self], &mut [u8])>
  • mut_slice_from_suffix returns Option<(&mut [u8], &mut [Self])>

However, all of the other methods drop the prefix/suffix:

  • FromBytes::read_from_prefix returns Option<Self>
  • FromBytes::read_from_suffix returns Option<Self>
  • FromBytes::ref_from_prefix returns Option<&Self>
  • FromBytes::ref_from_suffix returns Option<&Self>
  • FromBytes::mut_from_prefix returns Option<&mut Self>
  • FromBytes::mut_from_suffix returns Option<&mut Self>

This is desirable: the majority of the time I drop the prefix anyways and these are convenience methods for methods in Ref.
[ref|mut]_from_[prefix|suffix] in their docs both point to using Ref to preserve the prefix/suffix.

@kupiakos kupiakos added customer-request Documents customer requests. documentation Improvements or additions to documentation good first issue Good for newcomers experience-easy This issue is easy, and shouldn't require much experience compatibility-breaking Changes that are (likely to be) breaking labels Feb 15, 2024
dorryspears added a commit to dorryspears/zerocopy that referenced this issue Feb 18, 2024
dorryspears added a commit to dorryspears/zerocopy that referenced this issue Feb 18, 2024
@joshlf joshlf mentioned this issue Feb 21, 2024
51 tasks
@joshlf
Copy link
Member

joshlf commented Feb 21, 2024

This will be a breaking change, so we'll need to make sure we either address it as part of 0.8 or punt it to a future release. I've added it to #671 to make sure we make a decision one way or another before we release 0.8.

@joshlf
Copy link
Member

joshlf commented Mar 27, 2024

Heads up about #1051, which relates to this, and about #1059, which resolves #1051 in the opposite direction from this request.

Unless we want even more method duplication (which we'd really like to avoid), it seems that we'll need to just pick one of the following two approaches:

  • The FromBytes methods drop the suffix/prefix, and users who want them need to use Ref
  • The FromBytes methods return the suffix/prefix, and users who don't need them need to explicitly discard them

#1059 takes the latter approach. From a survey of uses in the ecosystem, it seems that needing the suffix/prefix is more common than not needing it. We also feel like discarding the suffix/prefix is more ergonomic than using the Ref API equivalents.

@kupiakos want to make sure you're aware that this is something we're considering and give you a chance to chime in.

@kupiakos
Copy link
Contributor Author

kupiakos commented Mar 29, 2024

From a survey of uses in the ecosystem

How was this survey performed? Via usage of the current unstable FromBytes APIs in open-source code? Did you also include usage of the exact-size APIs that are preceded by a .get(..SIZE)?

it seems that needing the suffix/prefix is more common than not needing it

I don't doubt that keeping the suffix is a common operation, especially in networking code. Firmware, unfortunately, tends to not be public. We need something ergonomic to translate something like this C code, which is extremely prevalent, into fully safe Rust:

int foo(char *data, size_t size) {
    if size < sizeof(struct Foo) {
        return ERROR_TOO_SMALL;
    }
    // Foo is a packed struct.
    struct Foo foo* = (Foo*)data;
    ...
}

Essentially, we want the safe version of a bounds-check followed by a pointer cast (also checking alignment if necessary). Requiring an exact size for the buffer rather than the buffer simply be large enough is rare and unnecessarily restrictive. When working with embedded code bases, it's pretty common to have a single buffer scratch space to do operations that may be larger than is necessary, or is reused as part of the response, such as with TPM.

Doing extra work and then throwing it away is antithetical to good embedded software design. Not only are methods that return more than two pointers a risk for code size bloat (even with inlining), but splitting currently introduces a panic path. The optimizer has an extremely hard job, and it is ideal to limit how much it needs to consider in order to produce optimal results, especially compared to the original C.

The mut_from_prefix method as-is currently uses Ref::new_from_prefix but that puts extra burden on the optimizer. I plan on rewriting it to instead start with a bytes.get(..size_of::<Self>()). Splitting the bytes is an extra operation, and we should only do that if we actually need to do it.

@joshlf
Copy link
Member

joshlf commented Apr 8, 2024

I don't think we can promise anything regarding what code we write internal to zerocopy. We currently make heavy use of the optimizer so that we can have reasonable internal abstractions, and it would be a big blow to the quality and reliability of our codebase to have to start reimplementing things by hand, especially when that involves adding rather than removing instances of the unsafe keyword.

That said, I think there's a third way. Would the following work for your use case? It uses unrelated APIs in zerocopy and provides much clearer visibility into exactly which operations are being performed. It would presumably be straightforward to wrap in a utility macro/method if desired, which would allow you to write roughly the code you currently write, but with the guarantees you're looking for.

// runtime check: do we have sufficient bytes?
let Ok(bytes) = <&[u8; size_of::<Foo>()]>::try_from(buffer) else {
    panic!("wrong number of bytes");
};

// no runtime checks
let des: &Unalign<Foo> = transmute_ref!(bytes);

// runtime check: is the deserialization well-aligned?
let Some(des) = des.try_deref() else {
    panic!("wrong alignment");
};

In particular:

  • TryFrom<&[u8]> for &[u8; N] is in the standard library
  • transmute_ref! desugars to a single transmute
  • Unalign::try_deref performs a single alignment check

@joshlf
Copy link
Member

joshlf commented Apr 19, 2024

Related: #1125

@joshlf
Copy link
Member

joshlf commented Apr 24, 2024

Closing in favor of #1051, which was implemented in #1059

@joshlf joshlf closed this as completed Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility-breaking Changes that are (likely to be) breaking customer-request Documents customer requests. documentation Improvements or additions to documentation experience-easy This issue is easy, and shouldn't require much experience good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants