-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wast: Parsing giant wat takes too much memory resources #1095
Comments
This commit shrinks the size of the `Instruction` enum on x86_64 from 152 bytes to 80 bytes. This is in an effort to reduce the resource consumption reported in bytecodealliance#1095 since it's definitely excessively large at this point. This reduces the peak memory usage from 6G to 5G when parsing that input `*.wat` file. Shrinking this enum has been done by running a build with `RUSTFLAGS=-Zprint-type-sizes` and then adding a `Box` to variants with excessively large payloads. The current size of 80 bytes is still a bit large, but I didn't want to go too crazy with inserting boxes just yet.
Otherwise parsing an `Index` requires heap allocations due to the error message tracking in `lookahead1` which typically gets thrown away as other branches succeed in parsing. I found this via DHAT while looking into bytecodealliance#1095 and while it wasn't much of a perf improvement it drastically reduced the number of temporary allocations performed at this location which seems like a nice little micro-optimization to have nonetheless.
Thanks for the report! I've been digging in a bit and there's quite a bit of fruit to pick in this regard, so I'll try to make some PRs in the future to reduce the overhead here. If I may be so bold as to ask, I'm curious if this caused issues for you somewhere? Or basically was curious what led to the motivation to make a comparison? |
Otherwise parsing an `Index` requires heap allocations due to the error message tracking in `lookahead1` which typically gets thrown away as other branches succeed in parsing. I found this via DHAT while looking into #1095 and while it wasn't much of a perf improvement it drastically reduced the number of temporary allocations performed at this location which seems like a nice little micro-optimization to have nonetheless.
This commit shrinks the size of the `Instruction` enum on x86_64 from 152 bytes to 80 bytes. This is in an effort to reduce the resource consumption reported in #1095 since it's definitely excessively large at this point. This reduces the peak memory usage from 6G to 5G when parsing that input `*.wat` file. Shrinking this enum has been done by running a build with `RUSTFLAGS=-Zprint-type-sizes` and then adding a `Box` to variants with excessively large payloads. The current size of 80 bytes is still a bit large, but I didn't want to go too crazy with inserting boxes just yet.
This commit is another improvement towards addressing bytecodealliance#1095 where the goal here is to shrink the size of `Token` and reduce the allocated memory that it retains. Currently the entire input string is tokenized and stored as a list of tokens for `Parser` to process. This means that the size of a token has a large affect on the size of this vector for large inputs. Even before this commit tokens had been slightly optimized for size where some variants were heap-allocated with a `Box`. In profiling with DHAT, however, it appears that a large portion of peak memory was these boxes, namely for integer/float tokens which appear quite a lot in many inputs. The changes in this commit were to: * Shrink the size of `Token` to two words. This is done by removing all pointers from `Token` and instead only storing a `TokenKind` which is packed to 32-bits or less. Span information is still stored in a `Token`, however. * With no more payload tokens which previously had a payload such as integers, strings, and floats are now re-parsed. They're sort of parsed once while lexing, then again when the token is interpreted later on. Some of this is fundamental where the parsing currently happens in a type-specific context but the context isn't known during lexing (e.g. if something is parsed as `u8` then that shouldn't accept `256` as input). The hypothesis behind this is that tokens are far more often keywords, whitespace, and comments rather than integers, strings, and floats. This means that if these tokens require some extra work then that should hopefully "come out in the wash" after and this representation would otherwise allow for other speedups. Locally the example in bytecodealliance#1095 has a peak memory usage reduced from 5G to 4G from this commit and additionally the parsing time drops from 8.9s to 7.6s.
First of all, thank you for optimizing the memory usage. I'd be happy to share the origin of this issue. |
Makes sense and thanks for the context! |
This commit is another improvement towards addressing #1095 where the goal here is to shrink the size of `Token` and reduce the allocated memory that it retains. Currently the entire input string is tokenized and stored as a list of tokens for `Parser` to process. This means that the size of a token has a large affect on the size of this vector for large inputs. Even before this commit tokens had been slightly optimized for size where some variants were heap-allocated with a `Box`. In profiling with DHAT, however, it appears that a large portion of peak memory was these boxes, namely for integer/float tokens which appear quite a lot in many inputs. The changes in this commit were to: * Shrink the size of `Token` to two words. This is done by removing all pointers from `Token` and instead only storing a `TokenKind` which is packed to 32-bits or less. Span information is still stored in a `Token`, however. * With no more payload tokens which previously had a payload such as integers, strings, and floats are now re-parsed. They're sort of parsed once while lexing, then again when the token is interpreted later on. Some of this is fundamental where the parsing currently happens in a type-specific context but the context isn't known during lexing (e.g. if something is parsed as `u8` then that shouldn't accept `256` as input). The hypothesis behind this is that tokens are far more often keywords, whitespace, and comments rather than integers, strings, and floats. This means that if these tokens require some extra work then that should hopefully "come out in the wash" after and this representation would otherwise allow for other speedups. Locally the example in #1095 has a peak memory usage reduced from 5G to 4G from this commit and additionally the parsing time drops from 8.9s to 7.6s.
This commit is a change to the internals of parsing for wasm text files. Previously the entire input would be lex'd and stored into an array for parsing to access. This is, however, the largest contributing factor to the peak memory usage reported in bytecodealliance#1095. Tokens are only used a handful of times so buffering the entire file in-memory is quite wasteful. This change was tricky to apply, however, because the original rationale for lexing in this manner was performance related. The recursive-descent style `Parser` trait encourages `peek`-ing tokens and will often involve attempting a parse only to later unwind and try something else instead. This means that each individual token, especially whitespace and comments, would otherwise naively be lexed many times. For example if the buffer of all tokens is "simply" removed some instrumented analysis showed that over half of all tokens in the input file were lexed more than once. This means that simply removing the buffer resulted in a performance regression. This performance regression is in some sense inherently not addressable with a lazy-lexing strategy. I implemented a fixed-width cache of the latest tokens lex'd but it still didn't perform as well as caching all tokens. I think this is because lexing is quite fast and adding the layer of the cache spent a lot of time in checking and managing the cache. While this performance regression may not be 100% fixable though I've settled on a strategy that's a bit more of a half-measure. The general idea for the `Parser` is now that it stores the current position in the file plus the next "significant" token at the same time. Here "significant" means not-whitespace and not-comments for example. This enables the parser to always know the next token having pre-skipped whitespace and comments. Thus any single-token "peek" operations don't actually need to do any lexing, they can instead look at the current token and decide what to do based on that. This is enabled with a few new `Cursor::peek_*` methods to avoid generating the next `Cursor` which would otherwise require a lexing operation. Overall this means that the majority of tokens in the case of bytecodealliance#1095 are lexed only once. There's still ~10% of tokens that are lexed 2+ times, but the performance numbers are before this commit parsing that file took 7.6s with 4G of memory, and after this commit it takes 7.9s with 2G of memory. This means that for a 4% regression in time we're reducing memory usage by half.
* Update `peek`-related methods with a `Result` This is in preparation for an upcoming change where lexing will be performed lazily rather than eagerly. In this situation lexing is possibly done during a `peek` operation which means that errors can happen at that time. This commit purely updates the related methods to return `Result<Option<T>>` instead of `Option<T>`, but it doesn't actually introduce any functional change other than that. The return value from these methods is always `Ok(...)` at this time, and a future commit will change that to possibly being an error. This is split out from the future change since it's a fair amount of churn with lots of new `?` operators popping up everywhere. * Fix some issues with support for annotations This commit fixes a few issues with annotation-related support in the wasm text format: * The `@producers` section is not parsed and recognized for components where previously it was ignored. * Similarly the `@name` annotation is not properly recognized for components instead of being ignored in a few places. * Registration of annotations has been moved to top-level parsers to avoid adding and re-adding entries to the known annotations map. * Don't lex the entire input immediately This commit is a change to the internals of parsing for wasm text files. Previously the entire input would be lex'd and stored into an array for parsing to access. This is, however, the largest contributing factor to the peak memory usage reported in #1095. Tokens are only used a handful of times so buffering the entire file in-memory is quite wasteful. This change was tricky to apply, however, because the original rationale for lexing in this manner was performance related. The recursive-descent style `Parser` trait encourages `peek`-ing tokens and will often involve attempting a parse only to later unwind and try something else instead. This means that each individual token, especially whitespace and comments, would otherwise naively be lexed many times. For example if the buffer of all tokens is "simply" removed some instrumented analysis showed that over half of all tokens in the input file were lexed more than once. This means that simply removing the buffer resulted in a performance regression. This performance regression is in some sense inherently not addressable with a lazy-lexing strategy. I implemented a fixed-width cache of the latest tokens lex'd but it still didn't perform as well as caching all tokens. I think this is because lexing is quite fast and adding the layer of the cache spent a lot of time in checking and managing the cache. While this performance regression may not be 100% fixable though I've settled on a strategy that's a bit more of a half-measure. The general idea for the `Parser` is now that it stores the current position in the file plus the next "significant" token at the same time. Here "significant" means not-whitespace and not-comments for example. This enables the parser to always know the next token having pre-skipped whitespace and comments. Thus any single-token "peek" operations don't actually need to do any lexing, they can instead look at the current token and decide what to do based on that. This is enabled with a few new `Cursor::peek_*` methods to avoid generating the next `Cursor` which would otherwise require a lexing operation. Overall this means that the majority of tokens in the case of #1095 are lexed only once. There's still ~10% of tokens that are lexed 2+ times, but the performance numbers are before this commit parsing that file took 7.6s with 4G of memory, and after this commit it takes 7.9s with 2G of memory. This means that for a 4% regression in time we're reducing memory usage by half. * Fix tests
Ok after #1110 I think we're in a much better state now. Runtime still isn't fantastic but 2G is much better than the previous 6G. More improvements are still possible, but is that suitable for your use case? If so I'll probably go ahead and close this, but if not there's a thing or two that may be possible to optimize this further. |
Thank you very much for your efforts. I made a comparison based on the latest commit. Now not only is the memory peak lower than wasm-tool
wat2wasm
|
I have a giant wat (
385,794,195 bytes
) with127384
functions (568 of which are imported functions) and I usedheaptrack
to analyze the memory usage when converting that wat to wasm usingwasm-tool
andwat2wasm
(from wabt).wasm-tool
heaptrack target/release/wasm-tools parse test.wat -o test1.wasm
wat2wasm
heaptrack wabt-1.0.33/bin/wat2wasm test.wat -o test2.wasm --enable-annotations
To my surprise the memory peak of wasm-tool was more than double that of wat2wasm. This is wat for testing.
The text was updated successfully, but these errors were encountered: