Skip to content

Commit

Permalink
Don't lex the entire input immediately (#1110)
Browse files Browse the repository at this point in the history
* Update `peek`-related methods with a `Result`

This is in preparation for an upcoming change where lexing will be
performed lazily rather than eagerly. In this situation lexing is
possibly done during a `peek` operation which means that errors can
happen at that time.

This commit purely updates the related methods to return
`Result<Option<T>>` instead of `Option<T>`, but it doesn't actually
introduce any functional change other than that. The return value from
these methods is always `Ok(...)` at this time, and a future commit will
change that to possibly being an error.

This is split out from the future change since it's a fair amount of
churn with lots of new `?` operators popping up everywhere.

* Fix some issues with support for annotations

This commit fixes a few issues with annotation-related support in the
wasm text format:

* The `@producers` section is not parsed and recognized for components
  where previously it was ignored.
* Similarly the `@name` annotation is not properly recognized for
  components instead of being ignored in a few places.
* Registration of annotations has been moved to top-level parsers to
  avoid adding and re-adding entries to the known annotations map.

* Don't lex the entire input immediately

This commit is a change to the internals of parsing for wasm text files.
Previously the entire input would be lex'd and stored into an array for
parsing to access. This is, however, the largest contributing factor to
the peak memory usage reported in #1095. Tokens are only used a handful
of times so buffering the entire file in-memory is quite wasteful.

This change was tricky to apply, however, because the original rationale
for lexing in this manner was performance related. The recursive-descent
style `Parser` trait encourages `peek`-ing tokens and will often involve
attempting a parse only to later unwind and try something else instead.
This means that each individual token, especially whitespace and
comments, would otherwise naively be lexed many times. For example if
the buffer of all tokens is "simply" removed some instrumented analysis
showed that over half of all tokens in the input file were lexed more
than once. This means that simply removing the buffer resulted in a
performance regression.

This performance regression is in some sense inherently not addressable
with a lazy-lexing strategy. I implemented a fixed-width cache of the
latest tokens lex'd but it still didn't perform as well as caching all
tokens. I think this is because lexing is quite fast and adding the
layer of the cache spent a lot of time in checking and managing the
cache. While this performance regression may not be 100% fixable though
I've settled on a strategy that's a bit more of a half-measure.

The general idea for the `Parser` is now that it stores the current
position in the file plus the next "significant" token at the same time.
Here "significant" means not-whitespace and not-comments for example.
This enables the parser to always know the next token having pre-skipped
whitespace and comments. Thus any single-token "peek" operations don't
actually need to do any lexing, they can instead look at the current
token and decide what to do based on that. This is enabled with a few
new `Cursor::peek_*` methods to avoid generating the next `Cursor` which
would otherwise require a lexing operation.

Overall this means that the majority of tokens in the case of #1095 are
lexed only once. There's still ~10% of tokens that are lexed 2+ times,
but the performance numbers are before this commit parsing that file
took 7.6s with 4G of memory, and after this commit it takes 7.9s with 2G
of memory. This means that for a 4% regression in time we're reducing
memory usage by half.

* Fix tests
  • Loading branch information
alexcrichton committed Jul 11, 2023
1 parent e4230eb commit 3186350
Show file tree
Hide file tree
Showing 35 changed files with 987 additions and 911 deletions.
12 changes: 12 additions & 0 deletions crates/wasm-compose/tests/compositions/defs/composed.wat
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@
(memory (;0;) 0)
(export "memory" (memory 0))
(export "cabi_realloc" (func 2))
(@producers
(processed-by "wit-component" "0.11.0")
)
)
(core module (;1;)
(type (;0;) (func (param i32)))
Expand All @@ -48,13 +51,19 @@
(export "0" (func $indirect-wasi:cli-base/environment-get-environment))
(export "1" (func $indirect-wasi:cli-base/environment-get-arguments))
(export "$imports" (table 0))
(@producers
(processed-by "wit-component" "0.11.0")
)
)
(core module (;2;)
(type (;0;) (func (param i32)))
(import "" "0" (func (;0;) (type 0)))
(import "" "1" (func (;1;) (type 0)))
(import "" "$imports" (table (;0;) 2 2 funcref))
(elem (;0;) (i32.const 0) func 0 1)
(@producers
(processed-by "wit-component" "0.11.0")
)
)
(core instance (;0;) (instantiate 1))
(alias core export 0 "0" (core func (;0;)))
Expand All @@ -74,6 +83,9 @@
(core func (;3;) (canon lower (func 0) (memory 0) (realloc 2) string-encoding=utf8))
(alias export 0 "get-arguments" (func (;1;)))
(core func (;4;) (canon lower (func 1) (memory 0) (realloc 2) string-encoding=utf8))
(@producers
(processed-by "wit-component" "0.11.0")
)
(core instance (;3;)
(export "$imports" (table 0))
(export "0" (func 3))
Expand Down
30 changes: 15 additions & 15 deletions crates/wast/src/component/alias.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,15 +79,15 @@ impl<'a> Parse<'a> for Alias<'a> {

let mut l = parser.lookahead1();

let (target, id, name) = if l.peek::<kw::outer>() {
let (target, id, name) = if l.peek::<kw::outer>()? {
parser.parse::<kw::outer>()?;
let outer = parser.parse()?;
let index = parser.parse()?;
let (kind, id, name) =
parser.parens(|parser| Ok((parser.parse()?, parser.parse()?, parser.parse()?)))?;

(AliasTarget::Outer { outer, index, kind }, id, name)
} else if l.peek::<kw::export>() {
} else if l.peek::<kw::export>()? {
parser.parse::<kw::export>()?;
let instance = parser.parse()?;
let export_name = parser.parse()?;
Expand All @@ -103,7 +103,7 @@ impl<'a> Parse<'a> for Alias<'a> {
id,
name,
)
} else if l.peek::<kw::core>() {
} else if l.peek::<kw::core>()? {
parser.parse::<kw::core>()?;
parser.parse::<kw::export>()?;
let instance = parser.parse()?;
Expand Down Expand Up @@ -155,28 +155,28 @@ pub enum ComponentExportAliasKind {
impl<'a> Parse<'a> for ComponentExportAliasKind {
fn parse(parser: Parser<'a>) -> Result<Self> {
let mut l = parser.lookahead1();
if l.peek::<kw::core>() {
if l.peek::<kw::core>()? {
parser.parse::<kw::core>()?;
let mut l = parser.lookahead1();
if l.peek::<kw::module>() {
if l.peek::<kw::module>()? {
parser.parse::<kw::module>()?;
Ok(Self::CoreModule)
} else {
Err(l.error())
}
} else if l.peek::<kw::func>() {
} else if l.peek::<kw::func>()? {
parser.parse::<kw::func>()?;
Ok(Self::Func)
} else if l.peek::<kw::value>() {
} else if l.peek::<kw::value>()? {
parser.parse::<kw::value>()?;
Ok(Self::Value)
} else if l.peek::<kw::r#type>() {
} else if l.peek::<kw::r#type>()? {
parser.parse::<kw::r#type>()?;
Ok(Self::Type)
} else if l.peek::<kw::component>() {
} else if l.peek::<kw::component>()? {
parser.parse::<kw::component>()?;
Ok(Self::Component)
} else if l.peek::<kw::instance>() {
} else if l.peek::<kw::instance>()? {
parser.parse::<kw::instance>()?;
Ok(Self::Instance)
} else {
Expand All @@ -201,22 +201,22 @@ pub enum ComponentOuterAliasKind {
impl<'a> Parse<'a> for ComponentOuterAliasKind {
fn parse(parser: Parser<'a>) -> Result<Self> {
let mut l = parser.lookahead1();
if l.peek::<kw::core>() {
if l.peek::<kw::core>()? {
parser.parse::<kw::core>()?;
let mut l = parser.lookahead1();
if l.peek::<kw::module>() {
if l.peek::<kw::module>()? {
parser.parse::<kw::module>()?;
Ok(Self::CoreModule)
} else if l.peek::<kw::r#type>() {
} else if l.peek::<kw::r#type>()? {
parser.parse::<kw::r#type>()?;
Ok(Self::CoreType)
} else {
Err(l.error())
}
} else if l.peek::<kw::r#type>() {
} else if l.peek::<kw::r#type>()? {
parser.parse::<kw::r#type>()?;
Ok(Self::Type)
} else if l.peek::<kw::component>() {
} else if l.peek::<kw::component>()? {
parser.parse::<kw::component>()?;
Ok(Self::Component)
} else {
Expand Down
15 changes: 14 additions & 1 deletion crates/wast/src/component/binary.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use crate::component::*;
use crate::core;
use crate::token::{Id, Index, NameAnnotation};
use crate::token::{Id, Index, NameAnnotation, Span};
use wasm_encoder::{
CanonicalFunctionSection, ComponentAliasSection, ComponentDefinedTypeEncoder,
ComponentExportSection, ComponentImportSection, ComponentInstanceSection, ComponentNameSection,
Expand Down Expand Up @@ -43,6 +43,7 @@ fn encode_fields(
ComponentField::Import(i) => e.encode_import(i),
ComponentField::Export(ex) => e.encode_export(ex),
ComponentField::Custom(c) => e.encode_custom(c),
ComponentField::Producers(c) => e.encode_producers(c),
}
}

Expand Down Expand Up @@ -179,6 +180,18 @@ impl<'a> Encoder<'a> {
self.component.section(custom);
}

fn encode_producers(&mut self, custom: &core::Producers) {
use crate::encode::Encode;

let mut data = Vec::new();
custom.encode(&mut data);
self.encode_custom(&Custom {
name: "producers",
span: Span::from_offset(0),
data: vec![&data],
})
}

fn encode_core_module(&mut self, module: &CoreModule<'a>) {
// Flush any in-progress section before encoding the module
self.flush(None);
Expand Down
41 changes: 24 additions & 17 deletions crates/wast/src/component/component.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
use crate::annotation;
use crate::component::*;
use crate::core::Producers;
use crate::kw;
use crate::parser::{Parse, Parser, Result};
use crate::token::Index;
Expand Down Expand Up @@ -108,12 +109,14 @@ impl<'a> Component<'a> {
impl<'a> Parse<'a> for Component<'a> {
fn parse(parser: Parser<'a>) -> Result<Self> {
let _r = parser.register_annotation("custom");
let _r = parser.register_annotation("producers");
let _r = parser.register_annotation("name");

let span = parser.parse::<kw::component>()?.0;
let id = parser.parse()?;
let name = parser.parse()?;

let kind = if parser.peek::<kw::binary>() {
let kind = if parser.peek::<kw::binary>()? {
parser.parse::<kw::binary>()?;
let mut data = Vec::new();
while !parser.is_empty() {
Expand Down Expand Up @@ -150,6 +153,7 @@ pub enum ComponentField<'a> {
Import(ComponentImport<'a>),
Export(ComponentExport<'a>),
Custom(Custom<'a>),
Producers(Producers<'a>),
}

impl<'a> ComponentField<'a> {
Expand All @@ -164,47 +168,50 @@ impl<'a> ComponentField<'a> {

impl<'a> Parse<'a> for ComponentField<'a> {
fn parse(parser: Parser<'a>) -> Result<Self> {
if parser.peek::<kw::core>() {
if parser.peek2::<kw::module>() {
if parser.peek::<kw::core>()? {
if parser.peek2::<kw::module>()? {
return Ok(Self::CoreModule(parser.parse()?));
}
if parser.peek2::<kw::instance>() {
if parser.peek2::<kw::instance>()? {
return Ok(Self::CoreInstance(parser.parse()?));
}
if parser.peek2::<kw::r#type>() {
if parser.peek2::<kw::r#type>()? {
return Ok(Self::CoreType(parser.parse()?));
}
if parser.peek2::<kw::func>() {
if parser.peek2::<kw::func>()? {
return Ok(Self::CoreFunc(parser.parse()?));
}
} else {
if parser.peek::<kw::component>() {
if parser.peek::<kw::component>()? {
return Ok(Self::Component(parser.parse()?));
}
if parser.peek::<kw::instance>() {
if parser.peek::<kw::instance>()? {
return Ok(Self::Instance(parser.parse()?));
}
if parser.peek::<kw::alias>() {
if parser.peek::<kw::alias>()? {
return Ok(Self::Alias(parser.parse()?));
}
if parser.peek::<kw::r#type>() {
if parser.peek::<kw::r#type>()? {
return Ok(Self::Type(Type::parse_maybe_with_inline_exports(parser)?));
}
if parser.peek::<kw::import>() {
if parser.peek::<kw::import>()? {
return Ok(Self::Import(parser.parse()?));
}
if parser.peek::<kw::func>() {
if parser.peek::<kw::func>()? {
return Ok(Self::Func(parser.parse()?));
}
if parser.peek::<kw::export>() {
if parser.peek::<kw::export>()? {
return Ok(Self::Export(parser.parse()?));
}
if parser.peek::<kw::start>() {
if parser.peek::<kw::start>()? {
return Ok(Self::Start(parser.parse()?));
}
if parser.peek::<annotation::custom>() {
if parser.peek::<annotation::custom>()? {
return Ok(Self::Custom(parser.parse()?));
}
if parser.peek::<annotation::producers>()? {
return Ok(Self::Producers(parser.parse()?));
}
}
Err(parser.error("expected valid component field"))
}
Expand All @@ -226,12 +233,12 @@ impl<'a> Parse<'a> for Start<'a> {
parser.parse::<kw::start>()?;
let func = parser.parse()?;
let mut args = Vec::new();
while !parser.is_empty() && !parser.peek2::<kw::result>() {
while !parser.is_empty() && !parser.peek2::<kw::result>()? {
args.push(parser.parens(|parser| parser.parse())?);
}

let mut results = Vec::new();
while !parser.is_empty() && parser.peek2::<kw::result>() {
while !parser.is_empty() && parser.peek2::<kw::result>()? {
results.push(parser.parens(|parser| {
parser.parse::<kw::result>()?;
parser.parens(|parser| {
Expand Down
5 changes: 4 additions & 1 deletion crates/wast/src/component/expand.rs
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,10 @@ impl<'a> Expander<'a> {
}
None
}
ComponentField::Start(_) | ComponentField::Alias(_) | ComponentField::Custom(_) => None,
ComponentField::Start(_)
| ComponentField::Alias(_)
| ComponentField::Custom(_)
| ComponentField::Producers(_) => None,
};

if let Some(expanded) = expanded {
Expand Down

0 comments on commit 3186350

Please sign in to comment.