Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtraces do not work for binaries loaded through ld.so #101913

Open
jonhoo opened this issue Sep 16, 2022 · 12 comments
Open

Backtraces do not work for binaries loaded through ld.so #101913

jonhoo opened this issue Sep 16, 2022 · 12 comments
Assignees
Labels
C-bug Category: This is a bug. E-needs-test Call for participation: Writing correctness tests. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jonhoo
Copy link
Contributor

jonhoo commented Sep 16, 2022

When a Rust binary that is executed through ld.so — the Linux dynamic linker — triggers a panic, the backtraces are useless. I've included a discussion of why executing through ld.so is valuable below, but first, to reproduce, build a simple binary that panics on execution:

$ cargo new --bin ld-so-backtraces
     Created binary (application) `ld-so-backtraces` package
$ cd ld-so-backtraces
$ echo 'fn main() { panic!() }' > src/main.rs
$ cargo b
   Compiling ld-so-backtraces v0.1.0 (/local/home/jongje/ld-so-backtraces)
    Finished dev [unoptimized + debuginfo] target(s) in 0.46s

Backtraces work fine when executing the binary directly:

$ env RUST_BACKTRACE=1 $PWD/target/debug/ld-so-backtraces
thread 'main' panicked at 'explicit panic', src/main.rs:1:13
stack backtrace:
   0: rust_begin_unwind
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
   2: core::panicking::panic
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:48:5
   3: ld_so_backtraces::main
             at ./src/main.rs:1:13
   4: core::ops::function::FnOnce::call_once
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

But when executing through ld.so, the debug symbols vanish:

$ /usr/lib/"ld-linux-$(uname -m).so.1" $PWD/target/debug/ld-so-backtraces
thread 'main' panicked at 'explicit panic', src/main.rs:1:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
$ env RUST_BACKTRACE=1 /usr/lib/"ld-linux-$(uname -m).so.1" $PWD/target/debug/ld-so-backtraces
thread 'main' panicked at 'explicit panic', src/main.rs:1:13
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
$ env RUST_BACKTRACE=full /usr/lib/"ld-linux-$(uname -m).so.1" $PWD/target/debug/ld-so-backtraces
thread 'main' panicked at 'explicit panic', src/main.rs:1:13
stack backtrace:
   0:     0xffffab78c468 - stpcpy
   1:     0xffffab7a3b70 - tunable_list
   2:     0xffffab78a7b4 - __getcwd
   3:     0xffffab78da00 - <unknown>
   4:     0xffffab78d754 - <unknown>
   5:     0xffffab78df4c - <unknown>
   6:     0xffffab78ddf8 - <unknown>
   7:     0xffffab78c910 - getauxval
   8:     0xffffab78db80 - <unknown>
   9:     0xffffab779e60 - _dl_map_object_from_fd
  10:     0xffffab779d7c - _dl_map_object_from_fd
  11:     0xffffab77a0d0 - _dl_map_object_from_fd
  12:     0xffffab77a2e0 - open_verify.constprop.7
  13:     0xffffab77a170 - _dl_map_object_from_fd
  14:     0xffffab77a21c - _dl_map_object_from_fd
  15:     0xffffab788470 - __tunable_set_val
  16:     0xffffab77a1e8 - _dl_map_object_from_fd
  17:     0xffffab77a0fc - _dl_map_object_from_fd
  18:     0xffffab584da4 - __libc_start_main
  19:     0xffffab779fe4 - _dl_map_object_from_fd

I suspect this happens because backtraces attempt to load debug symbols through /proc/self/exe, but /proc/self/exe points to ld.so in this case, not the real binary. I don't know of any good solutions to this, but one solution is to prefer argv[0] if /proc/self/exe is ld.so. It won't be perfect (callers need to use exec -a among other things), but it'll be better than what happens today. I suspect this happens with other execution wrappers than just ld.so (does Valgrind do this for certain kinds of execution for example?), but I don't have any concrete examples.

Why ld.so?

Where I work, we execute binaries through the dynamic linker so that we can explicitly set the shared library search path to ensure that the same shared libraries used to build a given binary are used when it is executed. It has a similar effect as setting LD_LIBRARY_PATH, but with the added benefit that it does not also set the shared library search path of any programs executed down the line. There is a lot of underlying complexity here that is probably not worth digging too much into, but the idea is that a binary should ship "with its environment" (think Nix-style binaries), and so if the built Rust binary and some binary it in turn invokes are built separately, they should use different shared library search paths, namely the ones that map to exactly what they were built with. Executing through the dynamic linker achieves that, LD_LIBRARY_PATH does not. (This is also done by things like go-appimage).

Cargo has a similar issue, though its solution is going to be a little different since it can also be used a library: rust-lang/cargo#10119.

rust-analyzer also run into issues with ld.so, though from what I can tell there are fixes in the pipeline there already.

Meta

rustc --version --verbose:

rustc 1.63.0 (4b91a6ea7 2022-08-08)
binary: rustc
commit-hash: 4b91a6ea7258a947e59c6522cd5898e7c0a6a88f
commit-date: 2022-08-08
host: aarch64-unknown-linux-gnu
release: 1.63.0
LLVM version: 14.0.5
@jonhoo jonhoo added the C-bug Category: This is a bug. label Sep 16, 2022
@stephenrkell
Copy link

(Thanks @jryans for pointing me at this issue.)

If I've understood, the main problem is getting the file path associated with a particular region in the program's virtual address space. A secondary problem is that the link map leaves out the executable's filename (it is recorded as the empty string), and the target of /proc/self/exe is not a correct substitute in this run-via-ld.so case.

The only reliable way I know to solve the main problem is to use /proc/self/maps or the equivalent (e.g. FreeBSD has a sysctl KERN_PROC_VMMAP). This would sidestep the secondary problem.

For example, a backtrace routine could look for the /proc/self/maps line that covers the program counter in each frame. You would still need to match that against a link map entry, to get the object's base address, but at least you would be sure of the filename. (While you can get a fair way using just dl_iterate_phdr(), dladdr() or similar, I'm not aware of a public interface that will get the 'real' executable filename out of the dynamic linker in this scenario -- if it even saves it privately anywhere, which is probably not guaranteed.)

I've grappled with this sort of thing in my liballocs and librunt projects. E.g. there is a gotcha about reading the maps file: if your program might map memory while reading it, the file might change under your feet.

@the8472
Copy link
Member

the8472 commented Oct 17, 2022

While you can get a fair way using just dl_iterate_phdr(), dladdr() or similar, I'm not aware of a public interface that will get the 'real' executable filename out of the dynamic linker in this scenario

Would that be enough to get the debug info though? even if the file name wouldn't resolve, at least the symbols would.

@stephenrkell
Copy link

Would that be enough to get the debug info though? even if the file name wouldn't resolve, at least the symbols would.

If you don't want to call out to the filesystem, then only the dynamic linking symbols are available (.dynsym but not .symtab). Exactly which symbols that includes depends on how the binary was linked... in executables, you can get away without exporting much, so sticking with this limitation will not reliably give good backtraces.

But 'debugging information' usually doesn't refer to either of these sections, but rather to the more detailed information that is generated for a debugger's benefit (.debug_info and friends, if using DWARF). If you want reliably good backtraces then it's worth looking at that, as even .symtab isn't enough to give you a symbolic name for all code. You need to go to the filesystem to get either of those, anyway.

@pnkfelix
Copy link
Member

@stephenrkell wrote:

The only reliable way I know to solve the main problem is to use /proc/self/maps or the equivalent (e.g. FreeBSD has a sysctl KERN_PROC_VMMAP). This would sidestep the secondary problem.

I've started looking at this approach. It looks like it may be reasonably straight-foward to implement (something on the order of 100 lines of code in a standalone prototype), and does seem to resolve the ld.so case outlined by @jonhoo .

I'm going to see if I can make the corresponding change in https://github.com/rust-lang/backtrace-rs now

@pnkfelix pnkfelix self-assigned this Apr 14, 2023
@pnkfelix pnkfelix added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Apr 14, 2023
@pnkfelix
Copy link
Member

(This might be fixed, as I believe that the change from rust-lang/backtrace-rs#488 has been pulled into the current stable rust, and may just need a test. But I haven't checked carefully yet.)

@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-low

@rustbot rustbot added P-low Low priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Apr 19, 2023
@pnkfelix
Copy link
Member

Weird:

Things seem to work here when I use the nightly build, but not when I use the 1.70 build:

% cat Cargo.toml
[package]
name = "ld-so-backtraces"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
% cat src/main.rs
fn main() { panic!() }
% rustc +1.70 --version
rustc 1.70.0 (90c541806 2023-05-31)
% rustc +nightly --version
rustc 1.72.0-nightly (2d0aa5768 2023-06-18)
% cargo +1.70 build
    Finished dev [unoptimized + debuginfo] target(s) in 0.18s
% RUST_BACKTRACE=1 /lib64/ld-linux-x86-64.so.2 ./target/debug/ld-so-backtraces
thread 'main' panicked at 'explicit panic', src/main.rs:1:13
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
% cargo +nightly build
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
% RUST_BACKTRACE=1 /lib64/ld-linux-x86-64.so.2 ./target/debug/ld-so-backtraces
thread 'main' panicked at 'explicit panic', src/main.rs:1:13
stack backtrace:
   0: rust_begin_unwind
             at /rustc/2d0aa57684e10f7b3d3fe740ee18d431181583ad/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/2d0aa57684e10f7b3d3fe740ee18d431181583ad/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/2d0aa57684e10f7b3d3fe740ee18d431181583ad/library/core/src/panicking.rs:117:5
   3: ld_so_backtraces::main
             at ./src/main.rs:1:13
   4: core::ops::function::FnOnce::call_once
             at /rustc/2d0aa57684e10f7b3d3fe740ee18d431181583ad/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
%

So I must need to do something deeper here than what I had expected. Looking.

@pnkfelix
Copy link
Member

@rustbot label -P-low +P-medium

@rustbot rustbot added P-medium Medium priority and removed P-low Low priority labels Jun 19, 2023
@pnkfelix
Copy link
Member

Curiouser and cusiouser:

% cat src/main.rs
fn main() { panic!() }
% rustc +stable --version
rustc 1.70.0 (90c541806 2023-05-31)
% rustc +stable -g src/main.rs -o main
% RUST_BACKTRACE=1 /lib64/ld-linux-x86-64.so.2 ./main
thread 'main' panicked at 'explicit panic', src/main.rs:1:13
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
% rustc +beta --version
rustc 1.71.0-beta.3 (78a6ac0a8 2023-06-08)
% rustc +beta -g src/main.rs -o main
% RUST_BACKTRACE=1 /lib64/ld-linux-x86-64.so.2 ./main
thread 'main' panicked at 'explicit panic', src/main.rs:1:13
stack backtrace:
   0: std::panicking::begin_panic
             at /rustc/78a6ac0a86c3feda182dea496c99fc7f866a22e9/library/std/src/panicking.rs:625:12
   1: main::main
             at ./src/main.rs:1:13
   2: core::ops::function::FnOnce::call_once
             at /rustc/78a6ac0a86c3feda182dea496c99fc7f866a22e9/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
%

I.e. it works in the beta (and nightly) channels, but not in the stable channel.

Why?

  • Was I just wrong about the timing of when the fix would be incorporated into the stable release?
  • Or is there something fundamentally different about how stable is built, compared to both beta and nightly (and local development builds) that would cause the behavior to differ here...

@pnkfelix
Copy link
Member

pnkfelix commented Jun 22, 2023

Hmm. My change to backtrace-rs only just "recently" made it into a release of that crate, in backtrace-rs 0.3.67, back in December 2022.

and the rust project's Cargo.lock file only "recently" was bumped to use that version of backtrace-rs, in #106704 (see 7c8c9cf#diff-13ee4b2252c9e516a0547f2891aa2105c3ca71c6d7a1e682c69be97998dfc87eR215 )

Now, I had thought that the rust project itself was using backtrace-rs as a submodule, and that would cause us to pull in the development version of backtrace that is referenced by the git metadata when it comes to how the stdlib computes backtraces. But the discovery in my earlier comment that the beta branch "works", while the stable branch does not, is leading me to wonder if I've misunderstood how backtrace-rs is linked in (and thus misunderstood which version of backtrace-rs would be linked in...)

This is all "fine", in the sense that I'll be very happy if this has all just been a big misunderstanding on my part of how long it would take for my fix to make it into Rust itself (in part because I wasn't aggressively pushing for changes to version of backtrace-rs in the Cargo.lock file, and that was because, as mentioned above, I didn't realize they would be necessary...)

@pnkfelix
Copy link
Member

pnkfelix commented Jun 22, 2023

Ah, okay, brilliant. Thanks to the awesome https://github.com/kennytm/rustup-toolchain-install-master I was able to just compare the behavior of the compiler immediately before and after PR #106704, where before is 9c51cf7 and after is 39c6804

% rustup-toolchain-install-master 39c6804b92aa202369e402525cee329556bc1db0
detecting the channel of the `39c6804b92aa202369e402525cee329556bc1db0` toolchain...
downloading <https://ci-artifacts.rust-lang.org/rustc-builds/39c6804b92aa202369e402525cee329556bc1db0/rustc-nightly-x86_64-unknown-linux-gnu.tar.xz>...
77.32 MB / 77.32 MB [=============================================================================================================] 100.00 % 17.19 MB/s
downloading <https://ci-artifacts.rust-lang.org/rustc-builds/39c6804b92aa202369e402525cee329556bc1db0/rust-std-nightly-x86_64-unknown-linux-gnu.tar.xz>...
29.54 MB / 29.54 MB [=============================================================================================================] 100.00 % 14.49 MB/s
toolchain `39c6804b92aa202369e402525cee329556bc1db0` is successfully installed!
% rustup-toolchain-install-master 9c51cf7e7ffd4ca1b7159657a76e1d4fbab18276
detecting the channel of the `9c51cf7e7ffd4ca1b7159657a76e1d4fbab18276` toolchain...
downloading <https://ci-artifacts.rust-lang.org/rustc-builds/9c51cf7e7ffd4ca1b7159657a76e1d4fbab18276/rustc-nightly-x86_64-unknown-linux-gnu.tar.xz>...
77.45 MB / 77.45 MB [=============================================================================================================] 100.00 % 17.24 MB/s
downloading <https://ci-artifacts.rust-lang.org/rustc-builds/9c51cf7e7ffd4ca1b7159657a76e1d4fbab18276/rust-std-nightly-x86_64-unknown-linux-gnu.tar.xz>...
29.39 MB / 29.39 MB [=============================================================================================================] 100.00 % 14.47 MB/s
toolchain `9c51cf7e7ffd4ca1b7159657a76e1d4fbab18276` is successfully installed!
% rustc +9c51cf7e7ffd4ca1b7159657a76e1d4fbab18276 -g src/main.rs -o main
% RUST_BACKTRACE=1 /lib64/ld-linux-x86-64.so.2 ./main
thread 'main' panicked at 'explicit panic', src/main.rs:1:13
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
% rustc +39c6804b92aa202369e402525cee329556bc1db0 -g src/main.rs -o main
% RUST_BACKTRACE=1 /lib64/ld-linux-x86-64.so.2 ./main
thread 'main' panicked at 'explicit panic', src/main.rs:1:13
stack backtrace:
   0: std::panicking::begin_panic
             at /rustc/39c6804b92aa202369e402525cee329556bc1db0/library/std/src/panicking.rs:610:12
   1: main::main
             at ./src/main.rs:1:13
   2: core::ops::function::FnOnce::call_once
             at /rustc/39c6804b92aa202369e402525cee329556bc1db0/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
%

So that's enough to tell me that this was indeed fixed by #106704, and that we should see the effects of that fix flow into Rust 1.71.

(I had erroneously thought that 1.67 would be the first release that had the fixed behavior.)

@pnkfelix
Copy link
Member

(at this point I think the only remaining task here is to write a test. It will probably need to be a run-make test, due to the very specialized nature of how the test will need to invoke the binary and then post-process the output...)

@bjorn3 bjorn3 added the E-needs-test Call for participation: Writing correctness tests. label Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. E-needs-test Call for participation: Writing correctness tests. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants