Support for Wast table initialisation syntactic sugar with typeful references #952

dhil · 2023-03-07T10:21:42Z

This PR fixes a bug introduced by #701, which caused the test function-references/br_table.wast to fail.

The bug occurred during elaboration of table initialisation when the initial element had a typeful reference type. Previously, the elaborator would assume that the initialising element had type funcref. Concretely, in code this meant that the following module

(module
  (type $t (func))
  (func $tf)
  (table $t (ref null $t) (elem $tf)))

would elaborate to something like

(module
  (type $t (func))
  (func $tf (type $t))
  (table $t 1 1 (ref null 0))
  (elem 0 (i32.const 0) func (ref.func $tf)))

The problem here is that the type of ref.func $tf is ref null 0 which is not a subtype of func.

Now the wast parser correctly elaborates the above module, such that the it reads (ref null 0) rather than func in the type component of elem.

(Note until recently the spec had a typo, which would make it seem like the previous elaboration of the above module was correct. C.f. WebAssembly/function-references@f919dde)

This commit fixes a bug in the elaboration of table initialisation when the element has a typeful reference type. Previsouly, the elaborator would assume that the initialising element has type `funcref`. Concretely, in code this meant that the following module ```wast (module (type $t (func)) (func $tf) (table $t (ref null $t) (elem $tf))) ``` would elaborate to something like ```wast (module (type $t (func)) (func $tf (type $t)) (table $t 1 1 (ref null 0)) (elem 0 (i32.const 0) func (ref.func $tf))) ``` The problem here is that the type of `ref.func $tf` is `ref null 0` which is not a subtype of `func`. So, now the wast parser correctly elaborates the above module, such that the it reads `(ref null 0)` rather than `func` in the type component of `elem`. The current implementation uses a slight hack to back patch a parsed `ElemPayload` as the parser do not have access to the sufficient context information to parse `ElemPayload` correctly for the above example.

…x-br-table

alexcrichton · 2023-03-07T15:39:57Z

crates/wast/src/core/table.rs

-            exprs.push(expr);
+                Ok(ElemPayload::Exprs {
+                    ty,
+                    exprs: vec![Expression { instrs:instrs.into() }] })


This doesn't feel quite right to me because this is implicitly transforming an index-based list into an expression-based list which is represented differently in the binary format. In the sense that the text format is trying to be a reflection of the binary format I think it would be best if indices remained indices and expressions remained expressions as this also assists in round-tripping a module.

Reading over the current online spec for function references I'm not sure that the elaboration depicted there is possible because the binary format for elements specifies no means by which function indices can be typed as anything other than funcref.

I suppose another way to word my concerns here are that there's precisely one test in the test suite which is implicitly switching indices in the text format to expressions in the binary format. That doesn't feel intentional to me and at least from a text-parser perspective I'd prefer to not do that since it feels like it's deviating from the original intent. In that sense I think that either the test in the test suite should be updated or the binary encoding for element segments should be updated to allow for non-funcref segments specified as lists of indices.

This doesn't feel quite right to me because this is implicitly transforming an index-based list into an expression-based list which is represented differently in the binary format. In the sense that the text format is trying to be a reflection of the binary format I think it would be best if indices remained indices and expressions remained expressions as this also assists in round-tripping a module.

I agree, that's a slight hack. I suppose I can make it more principled by extending the definition of Indices with a type component like Exprs, and then slightly tweak the binary encoding/decoding as you suggested.

Reading over the current online spec for function references I'm not sure that the elaboration depicted there is possible because the binary format for elements specifies no means by which function indices can be typed as anything other than funcref.

My guess is that this is a typo/oversight. CC @rossberg

Oh, seems I was too fast with the "fix" last night, which replaced func with reftype. Before, the expansion was actually reducing to the abbreviation for element lists defined here, so that was at least correct, even if surprisingly restrictive.

I can either revert that, or inline the ref.func wrapping. Probably will do the latter, which matches the reference interpreter.

OK. That makes sense to me. My patch here ought to produce the same binary encoding as the reference interpreter.

I may be coming at this from a different angle which may be worth correcting if you're up for it. My understanding of the text format is that it's intended to reflect the various structures of the binary format. To that end my expectation is that each binary representation of an element segment would be printed in the text format with a different syntactic structure. For example the binary format for elements lists 8 different encodings for element segments and I'd personally expect that each form would be printed in the text format differently. Furthermore when that text format is parsed it would then reproduce the same binary encoding, avoiding translating between different representations.

In that sense I interpreted the equivalence you noted in your link between list-of-indices and list-of-expressions as more of a semantic equivalence rather than a binary equivalence. Given all this my expectation is that because there is no binary encoding for list-of-indices with a non-funcref type then that would not be supported in the text format.

Is the expectation, however, that everything is transformed to list-of-expressions? Or are my expectations of the binary/text format misaligned here?

The spec only specifies the meaning of binary and text format. It does so by defining how each is parsed into abstract syntax (which involves some desugaring to reduce spec redundancy). It does not specify anything about the inverse, or about tools doing binary-to-text conversion or back. The reference interpreter simply does the simplest thing for the inverse direction.

Other tools are of course welcome to implement perfect roundtripping (and both formats are almost closely enough aligned to do that, except for some minor coding details like LEBs and local declarations). But that's a tooling question, outside the scope of the spec.

Ok well sounds like the spec is staying put. @dhil I don't have a ton of time right now to take this over per se, so if you're up for doing a sort of sync with the current state of the parsing here and with the spec that would be appreciated.

I'll note though that given the state of the spec I don't think the goal of this implementation will be 100% strict spec complicance because it doesn't make sense for our use case. The goal of the text implementation here is to closely resemble the binary format where it can, so for example the text implementation should not silently translate all list-of-index representations to list-of-expressions.

I am happy to make the necessary changes. I will, however, ask you to nudge me in the right direction. So what do we need to change specifically? I understand that we should compile to something like Indices -- I suppose the question is whether I create a new enum member for this use case, or extend Indices with typing information (I guess both will be breaking changes for consumers of the API).

Well, in general, a once over of the code and corresponding to exising tests and the spec would be good to do. For example this specific case I've highlighted here is not correct as it creates a single expression of many ref.func instructions. This only happens to work for the 1-element table showing up in the tests at the moment.

Additionally the comments in this function about what's being parsed aren't quite correct in the context of which this PR is being sent because an element segment isn't being parsed but rather an inline portion of a table definition.

I also think it would be better perhaps to refactor this in to "figure out how to parse" and "then parse" steps. Right now the index parsing is duplicated in two places and it'd be good to deduplicate that.

Finally when this function is originally called there's some nontrivial logic to calculate what type to pass in -- either None or Some of what was already parsed. I think it would be best to either unconditionally pass it in and rely on logic in this function to figure out what to do, or for comments to be up above explaining the cases for when it's passed in and what effect it has on element segment parsing.

Since this is a new previously-unused syntax that's expected to be valid I think it would be good to add more tests as well under the tests/local/* folder since the one instance of this in the upstream suite is only a minor slice of what's being added.

This commit started off by updating the spec testsuite submodule to the latest version and I ended up fixing various bits of parsing element segments. The changes here are a little subtle, but they are: * The text format for element segments was refactored and audited to attempt to match what's currently written in the spec. The spec testsuite doesn't look like it follows the spec perfectly so a few adjustments are made. The main change here functionally is that inline element segments on `table` definitions might now automatically switch a list-of-index text format into a list-of-expression binary format. This is what bytecodealliance#952 and its discussion was about. Other than this though some parsing has been tightened up as previously some invalid element segments were allowed and now require some more specifiers. For example `(elem)` was previously considered valid but that is no longer valid, instead it must be `(elem func)`. * The `wasmparser` and `wasm-encoder` crates have both removed the type field from their element segment representations, instead only having this type information on the list-of-expressions payload. This reflects how in the binary format the list-of-indexes form of element segments does not have type information. This was primarily done to adjust validation to pass a new test where a list-of-indexes element segment seems to be expected to fit in a table of `(ref func)`. This means that the type of the element segment previously inferred as `funcref` was no longer valid and instead it should be `(ref func)`. To better represent this these pieces were refactored internally. * Validation of the `table_init` instruction was updated to perform a subtyping check between the element segment and the destination table to allow initializing a nullable table with a non-nullable element segment, which is a valid operation where the types are not exactly equal. * One extra error message had to be allowed for some new spec tests about invalid modules. Overall this isn't expected to be a major update of any form. Just another in the long line of updates and adjustments to how element segments work and validate.

alexcrichton · 2023-07-15T04:44:41Z

I've updated this in spirit into #1126 which will serve the purpose to resolve the failing test.

* Update handling and parsing of element segments This commit started off by updating the spec testsuite submodule to the latest version and I ended up fixing various bits of parsing element segments. The changes here are a little subtle, but they are: * The text format for element segments was refactored and audited to attempt to match what's currently written in the spec. The spec testsuite doesn't look like it follows the spec perfectly so a few adjustments are made. The main change here functionally is that inline element segments on `table` definitions might now automatically switch a list-of-index text format into a list-of-expression binary format. This is what #952 and its discussion was about. Other than this though some parsing has been tightened up as previously some invalid element segments were allowed and now require some more specifiers. For example `(elem)` was previously considered valid but that is no longer valid, instead it must be `(elem func)`. * The `wasmparser` and `wasm-encoder` crates have both removed the type field from their element segment representations, instead only having this type information on the list-of-expressions payload. This reflects how in the binary format the list-of-indexes form of element segments does not have type information. This was primarily done to adjust validation to pass a new test where a list-of-indexes element segment seems to be expected to fit in a table of `(ref func)`. This means that the type of the element segment previously inferred as `funcref` was no longer valid and instead it should be `(ref func)`. To better represent this these pieces were refactored internally. * Validation of the `table_init` instruction was updated to perform a subtyping check between the element segment and the destination table to allow initializing a nullable table with a non-nullable element segment, which is a valid operation where the types are not exactly equal. * One extra error message had to be allowed for some new spec tests about invalid modules. Overall this isn't expected to be a major update of any form. Just another in the long line of updates and adjustments to how element segments work and validate. * Fix benchmark

alexcrichton · 2023-07-17T17:08:25Z

I believe this is resolved by #1126

* Update handling and parsing of element segments This commit started off by updating the spec testsuite submodule to the latest version and I ended up fixing various bits of parsing element segments. The changes here are a little subtle, but they are: * The text format for element segments was refactored and audited to attempt to match what's currently written in the spec. The spec testsuite doesn't look like it follows the spec perfectly so a few adjustments are made. The main change here functionally is that inline element segments on `table` definitions might now automatically switch a list-of-index text format into a list-of-expression binary format. This is what bytecodealliance#952 and its discussion was about. Other than this though some parsing has been tightened up as previously some invalid element segments were allowed and now require some more specifiers. For example `(elem)` was previously considered valid but that is no longer valid, instead it must be `(elem func)`. * The `wasmparser` and `wasm-encoder` crates have both removed the type field from their element segment representations, instead only having this type information on the list-of-expressions payload. This reflects how in the binary format the list-of-indexes form of element segments does not have type information. This was primarily done to adjust validation to pass a new test where a list-of-indexes element segment seems to be expected to fit in a table of `(ref func)`. This means that the type of the element segment previously inferred as `funcref` was no longer valid and instead it should be `(ref func)`. To better represent this these pieces were refactored internally. * Validation of the `table_init` instruction was updated to perform a subtyping check between the element segment and the destination table to allow initializing a nullable table with a non-nullable element segment, which is a valid operation where the types are not exactly equal. * One extra error message had to be allowed for some new spec tests about invalid modules. Overall this isn't expected to be a major update of any form. Just another in the long line of updates and adjustments to how element segments work and validate. * Fix benchmark

dhil added 5 commits March 6, 2023 14:37

Generalise

a35a876

Refactor and document

4aee763

Merge branch 'main' of github.com:bytecodealliance/wasm-tools into fi…

f17bf92

…x-br-table

Add br_table.wast snapshot

7c025ed

alexcrichton reviewed Mar 7, 2023

View reviewed changes

alexcrichton mentioned this pull request Apr 19, 2023

Function references bytecodealliance/wasmtime#5288

Merged

dhil mentioned this pull request May 25, 2023

Function references unresolved issues bytecodealliance/wasmtime#6455

Closed

6 tasks

alexcrichton mentioned this pull request Jul 15, 2023

Update handling and parsing of element segments #1126

Merged

alexcrichton closed this Jul 17, 2023

dhil deleted the fix-br-table branch July 17, 2023 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Wast table initialisation syntactic sugar with typeful references #952

Support for Wast table initialisation syntactic sugar with typeful references #952

dhil commented Mar 7, 2023 •

edited

alexcrichton Mar 7, 2023

dhil Mar 7, 2023

rossberg Mar 7, 2023

dhil Mar 7, 2023

rossberg Mar 7, 2023

alexcrichton Mar 8, 2023

rossberg Mar 8, 2023 •

edited

alexcrichton Mar 9, 2023

dhil Mar 9, 2023 •

edited

alexcrichton Mar 10, 2023

alexcrichton commented Jul 15, 2023

alexcrichton commented Jul 17, 2023

Support for Wast table initialisation syntactic sugar with typeful references #952

Support for Wast table initialisation syntactic sugar with typeful references #952

Conversation

dhil commented Mar 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossberg Mar 8, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhil Mar 9, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Jul 15, 2023

alexcrichton commented Jul 17, 2023

dhil commented Mar 7, 2023 •

edited

rossberg Mar 8, 2023 •

edited

dhil Mar 9, 2023 •

edited