Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Wast table initialisation syntactic sugar with typeful references #952

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
84 changes: 55 additions & 29 deletions crates/wast/src/core/table.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,12 @@ impl<'a> Parse<'a> for Table<'a> {
let elem = parser.parse()?;
let payload = parser.parens(|p| {
p.parse::<kw::elem>()?;
let ty = if parser.peek::<LParen>() {
Some(elem)
} else {
None
let ty = match elem {
rt @ RefType { heap: HeapType::Index(_), .. } => Some(rt),
elem if parser.peek::<LParen>() => Some(elem),
_ => None,
};
ElemPayload::parse_tail(parser, ty)
Ok(ElemPayload::parse_tail(parser, ty)?)
})?;
TableKind::Inline { elem, payload }
} else if l.peek::<u32>() {
Expand Down Expand Up @@ -203,39 +203,65 @@ impl<'a> Parse<'a> for ElemPayload<'a> {

impl<'a> ElemPayload<'a> {
fn parse_tail(parser: Parser<'a>, ty: Option<RefType<'a>>) -> Result<Self> {
let (must_use_indices, ty) = match ty {

// We are in the middle of parsing an element
// (elem XXXX indices/expressions)
// ^^^^
// | we are here. We use the provided context `ty` + parser lookahead to learn which
// abbreviation notation we are currently parsing.
match ty {
None => {
// case: "XXXX" is "func", we expect to parse a list
// of untyped function references indices.
parser.parse::<Option<kw::func>>()?;
(true, RefType::func())
let mut elems = Vec::new();
while !parser.is_empty() {
elems.push(parser.parse()?);
}
Ok(ElemPayload::Indices(elems))
}
Some(ty) => (false, ty),
};
if let HeapType::Func = ty.heap {
if must_use_indices || parser.peek::<Index<'_>>() {
Some(RefType { heap: HeapType::Func, .. }) if parser.peek::<Index<'_>>() => {
// case: "XXXX" is the empty string. The context tells
// us to expect a list of untyped function references.
let mut elems = Vec::new();
while !parser.is_empty() {
elems.push(parser.parse()?);
}
return Ok(ElemPayload::Indices(elems));
Ok(ElemPayload::Indices(elems))
}
}
let mut exprs = Vec::new();
while !parser.is_empty() {
let expr = parser.parens(|parser| {
if parser.peek::<kw::item>() {
parser.parse::<kw::item>()?;
parser.parse()
} else {
// Without `item` this is "sugar" for a single-instruction
// expression.
let insn = parser.parse()?;
Ok(Expression {
instrs: [insn].into(),
})
Some(ty @ RefType { heap: HeapType::Index(_), .. }) if parser.peek::<Index<'_>>() => {
// case: "XXXX" is the empty string. The context tells
// us to expect a list of typeful function references.
let mut instrs = Vec::new();
while !parser.is_empty() {
instrs.push(Instruction::RefFunc(parser.parse()?));
}
})?;
exprs.push(expr);
Ok(ElemPayload::Exprs {
ty,
exprs: vec![Expression { instrs:instrs.into() }] })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel quite right to me because this is implicitly transforming an index-based list into an expression-based list which is represented differently in the binary format. In the sense that the text format is trying to be a reflection of the binary format I think it would be best if indices remained indices and expressions remained expressions as this also assists in round-tripping a module.

Reading over the current online spec for function references I'm not sure that the elaboration depicted there is possible because the binary format for elements specifies no means by which function indices can be typed as anything other than funcref.

I suppose another way to word my concerns here are that there's precisely one test in the test suite which is implicitly switching indices in the text format to expressions in the binary format. That doesn't feel intentional to me and at least from a text-parser perspective I'd prefer to not do that since it feels like it's deviating from the original intent. In that sense I think that either the test in the test suite should be updated or the binary encoding for element segments should be updated to allow for non-funcref segments specified as lists of indices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel quite right to me because this is implicitly transforming an index-based list into an expression-based list which is represented differently in the binary format. In the sense that the text format is trying to be a reflection of the binary format I think it would be best if indices remained indices and expressions remained expressions as this also assists in round-tripping a module.

I agree, that's a slight hack. I suppose I can make it more principled by extending the definition of Indices with a type component like Exprs, and then slightly tweak the binary encoding/decoding as you suggested.

Reading over the current online spec for function references I'm not sure that the elaboration depicted there is possible because the binary format for elements specifies no means by which function indices can be typed as anything other than funcref.

My guess is that this is a typo/oversight. CC @rossberg

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seems I was too fast with the "fix" last night, which replaced func with reftype. Before, the expansion was actually reducing to the abbreviation for element lists defined here, so that was at least correct, even if surprisingly restrictive.

I can either revert that, or inline the ref.func wrapping. Probably will do the latter, which matches the reference interpreter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. That makes sense to me. My patch here ought to produce the same binary encoding as the reference interpreter.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be coming at this from a different angle which may be worth correcting if you're up for it. My understanding of the text format is that it's intended to reflect the various structures of the binary format. To that end my expectation is that each binary representation of an element segment would be printed in the text format with a different syntactic structure. For example the binary format for elements lists 8 different encodings for element segments and I'd personally expect that each form would be printed in the text format differently. Furthermore when that text format is parsed it would then reproduce the same binary encoding, avoiding translating between different representations.

In that sense I interpreted the equivalence you noted in your link between list-of-indices and list-of-expressions as more of a semantic equivalence rather than a binary equivalence. Given all this my expectation is that because there is no binary encoding for list-of-indices with a non-funcref type then that would not be supported in the text format.

Is the expectation, however, that everything is transformed to list-of-expressions? Or are my expectations of the binary/text format misaligned here?

Copy link

@rossberg rossberg Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec only specifies the meaning of binary and text format. It does so by defining how each is parsed into abstract syntax (which involves some desugaring to reduce spec redundancy). It does not specify anything about the inverse, or about tools doing binary-to-text conversion or back. The reference interpreter simply does the simplest thing for the inverse direction.

Other tools are of course welcome to implement perfect roundtripping (and both formats are almost closely enough aligned to do that, except for some minor coding details like LEBs and local declarations). But that's a tooling question, outside the scope of the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok well sounds like the spec is staying put. @dhil I don't have a ton of time right now to take this over per se, so if you're up for doing a sort of sync with the current state of the parsing here and with the spec that would be appreciated.

I'll note though that given the state of the spec I don't think the goal of this implementation will be 100% strict spec complicance because it doesn't make sense for our use case. The goal of the text implementation here is to closely resemble the binary format where it can, so for example the text implementation should not silently translate all list-of-index representations to list-of-expressions.

Copy link
Contributor Author

@dhil dhil Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to make the necessary changes. I will, however, ask you to nudge me in the right direction. So what do we need to change specifically? I understand that we should compile to something like Indices -- I suppose the question is whether I create a new enum member for this use case, or extend Indices with typing information (I guess both will be breaking changes for consumers of the API).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in general, a once over of the code and corresponding to exising tests and the spec would be good to do. For example this specific case I've highlighted here is not correct as it creates a single expression of many ref.func instructions. This only happens to work for the 1-element table showing up in the tests at the moment.

Additionally the comments in this function about what's being parsed aren't quite correct in the context of which this PR is being sent because an element segment isn't being parsed but rather an inline portion of a table definition.

I also think it would be better perhaps to refactor this in to "figure out how to parse" and "then parse" steps. Right now the index parsing is duplicated in two places and it'd be good to deduplicate that.

Finally when this function is originally called there's some nontrivial logic to calculate what type to pass in -- either None or Some of what was already parsed. I think it would be best to either unconditionally pass it in and rely on logic in this function to figure out what to do, or for comments to be up above explaining the cases for when it's passed in and what effect it has on element segment parsing.


Since this is a new previously-unused syntax that's expected to be valid I think it would be good to add more tests as well under the tests/local/* folder since the one instance of this in the upstream suite is only a minor slice of what's being added.

}
Some(ty) => {
// default case: "XXXX" is the empty string. We expect
// to parse a list of constant expressions.
let mut exprs = Vec::new();
while !parser.is_empty() {
let expr = parser.parens(|parser| {
if parser.peek::<kw::item>() {
parser.parse::<kw::item>()?;
parser.parse()
} else {
// Without `item` this is "sugar" for a single-instruction
// expression.
let insn = parser.parse()?;
Ok(Expression {
instrs: [insn].into(),
})
}
})?;
exprs.push(expr);
}
Ok(ElemPayload::Exprs { exprs, ty })
}
}
Ok(ElemPayload::Exprs { exprs, ty })
}
}
15 changes: 1 addition & 14 deletions tests/roundtrip.rs
Original file line number Diff line number Diff line change
Expand Up @@ -291,20 +291,7 @@ impl TestState {
fn test_wast_directive(&self, test: &Path, directive: WastDirective, idx: usize) -> Result<()> {
// Only test parsing and encoding of modules which wasmparser doesn't
// support test (basically just test `wast`, nothing else)
let skip_verify = test.iter().any(|t| t == "gc")
// This specific test contains a module along the lines of:
//
// (module
// (type $t (func))
// (func $tf)
// (table $t (ref null $t) (elem $tf))
// )
//
// which doesn't currently validate since the injected element
// segment has a type of `funcref` which isn't compatible with the
// table's type. The spec interpreter thinks this should validate,
// however, and I'm not entirely sure why.
|| test.ends_with("function-references/br_table.wast");
let skip_verify = test.iter().any(|t| t == "gc");

match directive {
WastDirective::Wat(mut module) => {
Expand Down

Large diffs are not rendered by default.