read: check that sizes are smaller than the file when reading #630

symphorien · 2024-02-10T17:24:33Z

related to #204

added a regression test case which used to attempt to allocate 5 exabytes of memory.

test files added in gimli-rs/object-testfiles#12

tests/read/elf.rs

philipc

Thanks! I think this is a good fix.

What's the motivation for the second test? It's fine to have it, but it appears to work prior to this fix.

Can you fix the CI failures and update the submodule to the merged commit.

philipc · 2024-02-11T03:46:57Z

Note that ReadCache is a bit of a hack. It's better to use memmap if you can. (This might be worth documenting.)

symphorien · 2024-02-11T10:34:38Z

Can you fix the CI failures and update the submodule to the merged commit.

done

What's the motivation for the second test? It's fine to have it, but it appears to work prior to this fix.

my initial implementation had an off by one error and get_buildid would always return Err() even on more legitimate binaries, the test is to ensure ReadCache can succeed.

symphorien · 2024-02-11T10:48:23Z

Note that ReadCache is a bit of a hack. It's better to use memmap if you can. (This might be worth documenting.)

If that's the case yes please document it, and document preferred alternatives. My use case is to find the buildid of thousands of elf files in parallel. When I consulted the documentation of object, it seemed I needed to call File::parse so needed an implementation of ReadRef:

This leaves me with three options:

mmap: Most mmap rust wrappers come with huge unsafe warnings in the readme
reading the whole file in memory, which seemed wasteful at best
ReadCache, which is builtin to object

You can imagine that the third option seemed better at the time.

After using ReadCache in my code, I noticed memory usage was very spikey because of read cache so actually gave a try to a mmap implementation. I quickly abandoned the experiment, and if my memory serves me right here are the reasons:

it was slower
mmap would often (1 file out of ten) yield unexpected error codes like EINVAL (I don't remember exactly which)
spooky unsafe (please don't SIGBUS my process!)

I would be happy to discuss potential solutions or compare things more seriously and report the result if you are interested. (But maybe not on this PR)

philipc · 2024-02-11T11:51:19Z

I default to using mmap and have never had problems with it. I sometimes use fs::read for small things that don't care about performance.

it was slower

I guess that might be true if you are only using this to read the build ID, where the total amount that you need to read is small.

mmap would often (1 file out of ten) yield unexpected error codes like EINVAL (I don't remember exactly which)

I haven't encountered that, and I don't know what could cause it. I'm interested in learning how to reproduce this.

spooky unsafe (please don't SIGBUS my process!)

Yes there's technically ways that this can be unsound, but mmap has a long history of being a useful tool. In isolation you'll never have a problem. The problem occurs when other code or other processes modify the file at the same time. I don't think that's a reason not to use it though, as long as you are aware of this limitation and evaluate how likely it is to occur for your use case. If you do know that the file will be modified, you'll have to take steps to prevent it happening at the same time you read, such as using locks.

related to gimli-rs#204

symphorien · 2024-02-11T14:16:28Z

I tried again and ReadCache is 47% faster for my use case. I found the source of the EINVAL: among all the files there were a quite large proportion of empty files, and mmaping empty files results in EINVAL.

symphorien · 2024-02-11T14:20:15Z

Also about getting sigbus in case of concurrent modification of the file: I suppose that if you reimplement a cli tool the difference between sigbus and getting a real error is just the quality of the error message, but I'm implementing a server (debuginfod) and crashing the whole long lived process because of a single "wrong" file is quite annoying in this case. I suppose I could avoid that by forking and using the child process as a sacrificial victim just in case, but the increase in complexity in off-putting.

symphorien · 2024-03-03T12:32:51Z

would it be possible to get a new release with this change?

philipc · 2024-03-05T03:35:13Z

Published 0.33.0

This was referenced Feb 10, 2024

Don't use Vec::with_capacity with untrusted lengths #204

Closed

Tries to allocate 5.6EB of memory when parsing yara.src symphorien/nixseparatedebuginfod#13

Closed

bjorn3 reviewed Feb 10, 2024

View reviewed changes

tests/read/elf.rs Outdated Show resolved Hide resolved

symphorien force-pushed the read_size_bound_check branch from ee5318d to 6ef1345 Compare February 10, 2024 17:31

philipc requested changes Feb 11, 2024

View reviewed changes

symphorien force-pushed the read_size_bound_check branch from 6ef1345 to 04dde3d Compare February 11, 2024 10:32

philipc approved these changes Feb 11, 2024

View reviewed changes

philipc merged commit f1a0ec9 into gimli-rs:master Feb 11, 2024
11 of 12 checks passed

read: check that sizes are smaller than the file when reading

04dde3d

related to gimli-rs#204

philipc mentioned this pull request Feb 12, 2024

read: use Vec::try_reserve_exact for large allocations #632

Merged

symphorien deleted the read_size_bound_check branch March 3, 2024 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read: check that sizes are smaller than the file when reading #630

read: check that sizes are smaller than the file when reading #630

symphorien commented Feb 10, 2024

philipc left a comment

philipc commented Feb 11, 2024

symphorien commented Feb 11, 2024

symphorien commented Feb 11, 2024

philipc commented Feb 11, 2024

symphorien commented Feb 11, 2024

symphorien commented Feb 11, 2024

symphorien commented Mar 3, 2024

philipc commented Mar 5, 2024

read: check that sizes are smaller than the file when reading #630

read: check that sizes are smaller than the file when reading #630

Conversation

symphorien commented Feb 10, 2024

philipc left a comment

Choose a reason for hiding this comment

philipc commented Feb 11, 2024

symphorien commented Feb 11, 2024

symphorien commented Feb 11, 2024

philipc commented Feb 11, 2024

symphorien commented Feb 11, 2024

symphorien commented Feb 11, 2024

symphorien commented Mar 3, 2024

philipc commented Mar 5, 2024