Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

offset_from: always allow pointers to point to the same address #124921

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented May 9, 2024

This PR implements the last remaining part of the t-opsem consensus in rust-lang/unsafe-code-guidelines#472: always permits offset_from when both pointers have the same address, no matter how they are computed. This is required to achieve provenance monotonicity.

Tracking issue: #117945

TODO: how sure are we that this is future-proof wrt an eventual LLVM ptrsub?

What is provenance monotonicity and why does it matter?

Provenance monotonicity is the property that adding arbitrary provenance to any no-provenance pointer must never make the program UB. More specifically, in the program state, data in memory is stored as a sequence of abstract bytes, where each byte can optionally carry provenance. When a pointer is stored in memory, all of the bytes it is stored in carry that provenance. Provenance monotonicity means: if we take some byte that does not have provenance, and give it some arbitrary provenance, then that cannot change program behavior or introduce UB into a UB-free program.

We care about provenance monotonicity because we want to allow the optimizer to remove provenance-stripping operations. Removing a provenance-stripping operation effectively means the program after the optimization has provenance where the program before the optimization did not -- since the provenance removal does not happen in the optimized program. IOW, the compiler transformation added provenance to previously provenance-free bytes. This is exactly what provenance monotonicity lets us do.

We care about removing provenance-stripping operations because *ptr = *ptr is, in general, (likely) a provenance-stripping operation. Specifically, consider ptr: *mut usize (or any integer type), and imagine the data at *ptr is actually a pointer (i.e., we are type-punning between pointers and integers). Then *ptr on the right-hand side evaluates to the data in memory without any provenance (because integers do not have provenance). Storing that back to *ptr means that the abstract bytes ptr points to are the same as before, except their provenance is now gone. This makes *ptr = *ptr a provenance-stripping operation (Here we assume *ptr is fully initialized. If it is not initialized, evaluating *ptr to a value is UB, so removing *ptr = *ptr is trivially correct.)

What does offset_from have to do with provenance monotonicity?

With ptr = without_provenance(N), ptr.offset_from(ptr) is always well-defined and returns 0. By provenance monotonicity, I can now add provenance to the two arguments of offset_from and it must still be well-defined. Crucially, I can add different provenance to the two arguments, and it must still be well-defined. In other words, this must always be allowed: ptr1.with_addr(N).offset_from(ptr2.with_addr(N)) (and it returns 0). But the current spec for offset_from says that the two pointers must either both be derived from an integer or both be derived from the same allocation, which is not in general true for arbitrary ptr1, ptr2.

To obtain provenance monotonicity, this PR hence changes the spec for offset_from to say that if both pointers have the same address, the function is always well-defined.

What further consequences does this have?

It means the compiler can no longer transform end2 = begin.offset(end.offset_from(begin)) into end2 = end. However, it can still be transformed into end2 = begin.with_addr(end.addr()), which later parts of the backend (when provenance has been erased) can trivially turn into end2 = end.

The only alternative I am aware of is a fundamentally different handling of zero-sized accesses, where a "no provenance" pointer is not allowed to do zero-sized accesses and instead we have a special provenance that indicates "may be used for zero-sized accesses (and nothing else)". offset and offset_from would then always be UB on a "no provenance" pointer, and permit zero-sized offsets on a "zero-sized provenance" pointer. This achieves provenance monotonicity. That is, however, a breaking change as it contradicts what we landed in #117329. It's also a whole bunch of extra UB, which doesn't seem worth it just to achieve that transformation.

@rustbot
Copy link
Collaborator

rustbot commented May 9, 2024

r? @oli-obk

rustbot has assigned @oli-obk.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 9, 2024
@RalfJung RalfJung added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 9, 2024
@RalfJung RalfJung changed the title offset_from intrinsic: always allow pointers to point to the same addreess offset_from intrinsic: always allow pointers to point to the same address May 9, 2024
@RalfJung RalfJung force-pushed the offset-from-same-addr branch 2 times, most recently from 5862cbb to 6d7b305 Compare May 9, 2024 16:13
@RalfJung RalfJung changed the title offset_from intrinsic: always allow pointers to point to the same address offset_from: always allow pointers to point to the same address May 9, 2024
@bors
Copy link
Contributor

bors commented May 9, 2024

☔ The latest upstream changes (presumably #124934) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented May 13, 2024

☔ The latest upstream changes (presumably #124914) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung
Copy link
Member Author

#117329 landed, so we can consider this now.

@nikic the only point that's giving me pause here are compatibility concerns with a potential future ptrsub operation in LLVM. If such an operation gets added, do you think there will be a variant of it (with the right flags set) that matches the semantics given in this PR? I think all the provenance monotonicity arguments also apply to LLVM, but it's hard to say as LLVM hasn't decided yet how integer-typed loads of data with provenance behave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants