wast: Parsing giant wat takes too much memory resources #1095

Xyah3PBeHB · 2023-07-02T16:08:23Z

I have a giant wat (385,794,195 bytes) with 127384 functions (568 of which are imported functions) and I used heaptrack to analyze the memory usage when converting that wat to wasm using wasm-tool and wat2wasm (from wabt).

wasm-tool

heaptrack target/release/wasm-tools parse test.wat -o test1.wasm

total runtime: 37.60s.
calls to allocation functions: 20932994 (556743/s)
temporary memory allocations: 8617065 (229183/s)
peak heap memory consumption: 6.20G
peak RSS (including heaptrack overhead): 6.06G
total memory leaked: 352B

wat2wasm

heaptrack wabt-1.0.33/bin/wat2wasm test.wat -o test2.wasm --enable-annotations

total runtime: 31.10s.
calls to allocation functions: 23266595 (748122/s)
temporary memory allocations: 1561856 (50220/s)
peak heap memory consumption: 2.83G
peak RSS (including heaptrack overhead): 3.01G
total memory leaked: 0B

To my surprise the memory peak of wasm-tool was more than double that of wat2wasm. This is wat for testing.

The text was updated successfully, but these errors were encountered:

This commit shrinks the size of the `Instruction` enum on x86_64 from 152 bytes to 80 bytes. This is in an effort to reduce the resource consumption reported in bytecodealliance#1095 since it's definitely excessively large at this point. This reduces the peak memory usage from 6G to 5G when parsing that input `*.wat` file. Shrinking this enum has been done by running a build with `RUSTFLAGS=-Zprint-type-sizes` and then adding a `Box` to variants with excessively large payloads. The current size of 80 bytes is still a bit large, but I didn't want to go too crazy with inserting boxes just yet.

Otherwise parsing an `Index` requires heap allocations due to the error message tracking in `lookahead1` which typically gets thrown away as other branches succeed in parsing. I found this via DHAT while looking into bytecodealliance#1095 and while it wasn't much of a perf improvement it drastically reduced the number of temporary allocations performed at this location which seems like a nice little micro-optimization to have nonetheless.

alexcrichton · 2023-07-05T15:26:30Z

Thanks for the report! I've been digging in a bit and there's quite a bit of fruit to pick in this regard, so I'll try to make some PRs in the future to reduce the overhead here.

If I may be so bold as to ask, I'm curious if this caused issues for you somewhere? Or basically was curious what led to the motivation to make a comparison?

Otherwise parsing an `Index` requires heap allocations due to the error message tracking in `lookahead1` which typically gets thrown away as other branches succeed in parsing. I found this via DHAT while looking into #1095 and while it wasn't much of a perf improvement it drastically reduced the number of temporary allocations performed at this location which seems like a nice little micro-optimization to have nonetheless.

This commit shrinks the size of the `Instruction` enum on x86_64 from 152 bytes to 80 bytes. This is in an effort to reduce the resource consumption reported in #1095 since it's definitely excessively large at this point. This reduces the peak memory usage from 6G to 5G when parsing that input `*.wat` file. Shrinking this enum has been done by running a build with `RUSTFLAGS=-Zprint-type-sizes` and then adding a `Box` to variants with excessively large payloads. The current size of 80 bytes is still a bit large, but I didn't want to go too crazy with inserting boxes just yet.

This commit is another improvement towards addressing bytecodealliance#1095 where the goal here is to shrink the size of `Token` and reduce the allocated memory that it retains. Currently the entire input string is tokenized and stored as a list of tokens for `Parser` to process. This means that the size of a token has a large affect on the size of this vector for large inputs. Even before this commit tokens had been slightly optimized for size where some variants were heap-allocated with a `Box`. In profiling with DHAT, however, it appears that a large portion of peak memory was these boxes, namely for integer/float tokens which appear quite a lot in many inputs. The changes in this commit were to: * Shrink the size of `Token` to two words. This is done by removing all pointers from `Token` and instead only storing a `TokenKind` which is packed to 32-bits or less. Span information is still stored in a `Token`, however. * With no more payload tokens which previously had a payload such as integers, strings, and floats are now re-parsed. They're sort of parsed once while lexing, then again when the token is interpreted later on. Some of this is fundamental where the parsing currently happens in a type-specific context but the context isn't known during lexing (e.g. if something is parsed as `u8` then that shouldn't accept `256` as input). The hypothesis behind this is that tokens are far more often keywords, whitespace, and comments rather than integers, strings, and floats. This means that if these tokens require some extra work then that should hopefully "come out in the wash" after and this representation would otherwise allow for other speedups. Locally the example in bytecodealliance#1095 has a peak memory usage reduced from 5G to 4G from this commit and additionally the parsing time drops from 8.9s to 7.6s.

Xyah3PBeHB · 2023-07-07T11:37:00Z

First of all, thank you for optimizing the memory usage. I'd be happy to share the origin of this issue.
In the past I have used wat2wasm to analyze wasm. Recently, I plan to use wasmparser to automate some operations. First of all, I want to try the wasm-tools tool set. But in the behavior that I have been able to complete normally in the past (converting giant wat back to wasm), it ran out of memory (my device has not been upgraded for many years and only has 16G memory). So I want to compare it with the tools I used in the past.

alexcrichton · 2023-07-07T14:37:49Z

Makes sense and thanks for the context!

This commit is another improvement towards addressing #1095 where the goal here is to shrink the size of `Token` and reduce the allocated memory that it retains. Currently the entire input string is tokenized and stored as a list of tokens for `Parser` to process. This means that the size of a token has a large affect on the size of this vector for large inputs. Even before this commit tokens had been slightly optimized for size where some variants were heap-allocated with a `Box`. In profiling with DHAT, however, it appears that a large portion of peak memory was these boxes, namely for integer/float tokens which appear quite a lot in many inputs. The changes in this commit were to: * Shrink the size of `Token` to two words. This is done by removing all pointers from `Token` and instead only storing a `TokenKind` which is packed to 32-bits or less. Span information is still stored in a `Token`, however. * With no more payload tokens which previously had a payload such as integers, strings, and floats are now re-parsed. They're sort of parsed once while lexing, then again when the token is interpreted later on. Some of this is fundamental where the parsing currently happens in a type-specific context but the context isn't known during lexing (e.g. if something is parsed as `u8` then that shouldn't accept `256` as input). The hypothesis behind this is that tokens are far more often keywords, whitespace, and comments rather than integers, strings, and floats. This means that if these tokens require some extra work then that should hopefully "come out in the wash" after and this representation would otherwise allow for other speedups. Locally the example in #1095 has a peak memory usage reduced from 5G to 4G from this commit and additionally the parsing time drops from 8.9s to 7.6s.

This commit is a change to the internals of parsing for wasm text files. Previously the entire input would be lex'd and stored into an array for parsing to access. This is, however, the largest contributing factor to the peak memory usage reported in bytecodealliance#1095. Tokens are only used a handful of times so buffering the entire file in-memory is quite wasteful. This change was tricky to apply, however, because the original rationale for lexing in this manner was performance related. The recursive-descent style `Parser` trait encourages `peek`-ing tokens and will often involve attempting a parse only to later unwind and try something else instead. This means that each individual token, especially whitespace and comments, would otherwise naively be lexed many times. For example if the buffer of all tokens is "simply" removed some instrumented analysis showed that over half of all tokens in the input file were lexed more than once. This means that simply removing the buffer resulted in a performance regression. This performance regression is in some sense inherently not addressable with a lazy-lexing strategy. I implemented a fixed-width cache of the latest tokens lex'd but it still didn't perform as well as caching all tokens. I think this is because lexing is quite fast and adding the layer of the cache spent a lot of time in checking and managing the cache. While this performance regression may not be 100% fixable though I've settled on a strategy that's a bit more of a half-measure. The general idea for the `Parser` is now that it stores the current position in the file plus the next "significant" token at the same time. Here "significant" means not-whitespace and not-comments for example. This enables the parser to always know the next token having pre-skipped whitespace and comments. Thus any single-token "peek" operations don't actually need to do any lexing, they can instead look at the current token and decide what to do based on that. This is enabled with a few new `Cursor::peek_*` methods to avoid generating the next `Cursor` which would otherwise require a lexing operation. Overall this means that the majority of tokens in the case of bytecodealliance#1095 are lexed only once. There's still ~10% of tokens that are lexed 2+ times, but the performance numbers are before this commit parsing that file took 7.6s with 4G of memory, and after this commit it takes 7.9s with 2G of memory. This means that for a 4% regression in time we're reducing memory usage by half.

* Update `peek`-related methods with a `Result` This is in preparation for an upcoming change where lexing will be performed lazily rather than eagerly. In this situation lexing is possibly done during a `peek` operation which means that errors can happen at that time. This commit purely updates the related methods to return `Result<Option<T>>` instead of `Option<T>`, but it doesn't actually introduce any functional change other than that. The return value from these methods is always `Ok(...)` at this time, and a future commit will change that to possibly being an error. This is split out from the future change since it's a fair amount of churn with lots of new `?` operators popping up everywhere. * Fix some issues with support for annotations This commit fixes a few issues with annotation-related support in the wasm text format: * The `@producers` section is not parsed and recognized for components where previously it was ignored. * Similarly the `@name` annotation is not properly recognized for components instead of being ignored in a few places. * Registration of annotations has been moved to top-level parsers to avoid adding and re-adding entries to the known annotations map. * Don't lex the entire input immediately This commit is a change to the internals of parsing for wasm text files. Previously the entire input would be lex'd and stored into an array for parsing to access. This is, however, the largest contributing factor to the peak memory usage reported in #1095. Tokens are only used a handful of times so buffering the entire file in-memory is quite wasteful. This change was tricky to apply, however, because the original rationale for lexing in this manner was performance related. The recursive-descent style `Parser` trait encourages `peek`-ing tokens and will often involve attempting a parse only to later unwind and try something else instead. This means that each individual token, especially whitespace and comments, would otherwise naively be lexed many times. For example if the buffer of all tokens is "simply" removed some instrumented analysis showed that over half of all tokens in the input file were lexed more than once. This means that simply removing the buffer resulted in a performance regression. This performance regression is in some sense inherently not addressable with a lazy-lexing strategy. I implemented a fixed-width cache of the latest tokens lex'd but it still didn't perform as well as caching all tokens. I think this is because lexing is quite fast and adding the layer of the cache spent a lot of time in checking and managing the cache. While this performance regression may not be 100% fixable though I've settled on a strategy that's a bit more of a half-measure. The general idea for the `Parser` is now that it stores the current position in the file plus the next "significant" token at the same time. Here "significant" means not-whitespace and not-comments for example. This enables the parser to always know the next token having pre-skipped whitespace and comments. Thus any single-token "peek" operations don't actually need to do any lexing, they can instead look at the current token and decide what to do based on that. This is enabled with a few new `Cursor::peek_*` methods to avoid generating the next `Cursor` which would otherwise require a lexing operation. Overall this means that the majority of tokens in the case of #1095 are lexed only once. There's still ~10% of tokens that are lexed 2+ times, but the performance numbers are before this commit parsing that file took 7.6s with 4G of memory, and after this commit it takes 7.9s with 2G of memory. This means that for a 4% regression in time we're reducing memory usage by half. * Fix tests

alexcrichton · 2023-07-11T19:38:30Z

Ok after #1110 I think we're in a much better state now. Runtime still isn't fantastic but 2G is much better than the previous 6G. More improvements are still possible, but is that suitable for your use case? If so I'll probably go ahead and close this, but if not there's a thing or two that may be possible to optimize this further.

Xyah3PBeHB · 2023-07-12T08:48:24Z

Thank you very much for your efforts. I made a comparison based on the latest commit. Now not only is the memory peak lower than wat2wasm, but the performance has also been greatly improved and is far ahead. This has greatly exceeded the optimization potential I expected, and it is enough to use it on my old device.

wasm-tool

heaptrack target/release/wasm-tools parse sample.wat -o test1.wasm

total runtime: 11.11s.
calls to allocation functions: 4072605 (366538/s)
temporary memory allocations: 1748744 (157388/s)
peak heap memory consumption: 2.04G
peak RSS (including heaptrack overhead): 2.01G
total memory leaked: 368B

wat2wasm

heaptrack wabt-1.0.33/bin/wat2wasm sample.wat -o test2.wasm

total runtime: 25.57s.
calls to allocation functions: 23266595 (910024/s)
temporary memory allocations: 1561856 (61088/s)
peak heap memory consumption: 2.83G
peak RSS (including heaptrack overhead): 3.02G
total memory leaked: 0B

alexcrichton mentioned this issue Jul 5, 2023

wast: Shrink the size of Instruction #1097

Merged

alexcrichton mentioned this issue Jul 5, 2023

wast: Don't use lookahead1 on parsing Index #1098

Merged

alexcrichton mentioned this issue Jul 6, 2023

Shrink the size of wast text tokens #1103

Merged

alexcrichton mentioned this issue Jul 8, 2023

Don't lex the entire input immediately #1110

Merged

Xyah3PBeHB closed this as completed Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wast: Parsing giant wat takes too much memory resources #1095

wast: Parsing giant wat takes too much memory resources #1095

Xyah3PBeHB commented Jul 2, 2023

alexcrichton commented Jul 5, 2023

Xyah3PBeHB commented Jul 7, 2023

alexcrichton commented Jul 7, 2023

alexcrichton commented Jul 11, 2023

Xyah3PBeHB commented Jul 12, 2023

wast: Parsing giant wat takes too much memory resources #1095

wast: Parsing giant wat takes too much memory resources #1095

Comments

Xyah3PBeHB commented Jul 2, 2023

wasm-tool

wat2wasm

alexcrichton commented Jul 5, 2023

Xyah3PBeHB commented Jul 7, 2023

alexcrichton commented Jul 7, 2023

alexcrichton commented Jul 11, 2023

Xyah3PBeHB commented Jul 12, 2023

wasm-tool

wat2wasm