Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Wast table initialisation syntactic sugar with typeful references #952

Closed
wants to merge 5 commits into from

Conversation

dhil
Copy link
Contributor

@dhil dhil commented Mar 7, 2023

This PR fixes a bug introduced by #701, which caused the test function-references/br_table.wast to fail.

The bug occurred during elaboration of table initialisation when the initial element had a typeful reference type. Previously, the elaborator would assume that the initialising element had type funcref. Concretely, in code this meant that the following module

(module
  (type $t (func))
  (func $tf)
  (table $t (ref null $t) (elem $tf)))

would elaborate to something like

(module
  (type $t (func))
  (func $tf (type $t))
  (table $t 1 1 (ref null 0))
  (elem 0 (i32.const 0) func (ref.func $tf)))

The problem here is that the type of ref.func $tf is ref null 0 which is not a subtype of func.

Now the wast parser correctly elaborates the above module, such that the it reads (ref null 0) rather than func in the type component of elem.

(Note until recently the spec had a typo, which would make it seem like the previous elaboration of the above module was correct. C.f. WebAssembly/function-references@f919dde)

dhil added 5 commits March 6, 2023 14:37
This commit fixes a bug in the elaboration of table initialisation
when the element has a typeful reference type. Previsouly, the
elaborator would assume that the initialising element has type
`funcref`. Concretely, in code this meant that the following module

```wast
(module
  (type $t (func))
  (func $tf)
  (table $t (ref null $t) (elem $tf)))
```

would elaborate to something like

```wast
(module
  (type $t (func))
  (func $tf (type $t))
  (table $t 1 1 (ref null 0))
  (elem 0 (i32.const 0) func (ref.func $tf)))
```

The problem here is that the type of `ref.func $tf` is `ref null 0`
which is not a subtype of `func`. So, now the wast parser correctly
elaborates the above module, such that the it reads `(ref null 0)`
rather than `func` in the type component of `elem`.

The current implementation uses a slight hack to back patch a parsed
`ElemPayload` as the parser do not have access to the sufficient
context information to parse `ElemPayload` correctly for the above
example.
exprs.push(expr);
Ok(ElemPayload::Exprs {
ty,
exprs: vec![Expression { instrs:instrs.into() }] })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel quite right to me because this is implicitly transforming an index-based list into an expression-based list which is represented differently in the binary format. In the sense that the text format is trying to be a reflection of the binary format I think it would be best if indices remained indices and expressions remained expressions as this also assists in round-tripping a module.

Reading over the current online spec for function references I'm not sure that the elaboration depicted there is possible because the binary format for elements specifies no means by which function indices can be typed as anything other than funcref.

I suppose another way to word my concerns here are that there's precisely one test in the test suite which is implicitly switching indices in the text format to expressions in the binary format. That doesn't feel intentional to me and at least from a text-parser perspective I'd prefer to not do that since it feels like it's deviating from the original intent. In that sense I think that either the test in the test suite should be updated or the binary encoding for element segments should be updated to allow for non-funcref segments specified as lists of indices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel quite right to me because this is implicitly transforming an index-based list into an expression-based list which is represented differently in the binary format. In the sense that the text format is trying to be a reflection of the binary format I think it would be best if indices remained indices and expressions remained expressions as this also assists in round-tripping a module.

I agree, that's a slight hack. I suppose I can make it more principled by extending the definition of Indices with a type component like Exprs, and then slightly tweak the binary encoding/decoding as you suggested.

Reading over the current online spec for function references I'm not sure that the elaboration depicted there is possible because the binary format for elements specifies no means by which function indices can be typed as anything other than funcref.

My guess is that this is a typo/oversight. CC @rossberg

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seems I was too fast with the "fix" last night, which replaced func with reftype. Before, the expansion was actually reducing to the abbreviation for element lists defined here, so that was at least correct, even if surprisingly restrictive.

I can either revert that, or inline the ref.func wrapping. Probably will do the latter, which matches the reference interpreter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. That makes sense to me. My patch here ought to produce the same binary encoding as the reference interpreter.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be coming at this from a different angle which may be worth correcting if you're up for it. My understanding of the text format is that it's intended to reflect the various structures of the binary format. To that end my expectation is that each binary representation of an element segment would be printed in the text format with a different syntactic structure. For example the binary format for elements lists 8 different encodings for element segments and I'd personally expect that each form would be printed in the text format differently. Furthermore when that text format is parsed it would then reproduce the same binary encoding, avoiding translating between different representations.

In that sense I interpreted the equivalence you noted in your link between list-of-indices and list-of-expressions as more of a semantic equivalence rather than a binary equivalence. Given all this my expectation is that because there is no binary encoding for list-of-indices with a non-funcref type then that would not be supported in the text format.

Is the expectation, however, that everything is transformed to list-of-expressions? Or are my expectations of the binary/text format misaligned here?

Copy link

@rossberg rossberg Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec only specifies the meaning of binary and text format. It does so by defining how each is parsed into abstract syntax (which involves some desugaring to reduce spec redundancy). It does not specify anything about the inverse, or about tools doing binary-to-text conversion or back. The reference interpreter simply does the simplest thing for the inverse direction.

Other tools are of course welcome to implement perfect roundtripping (and both formats are almost closely enough aligned to do that, except for some minor coding details like LEBs and local declarations). But that's a tooling question, outside the scope of the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok well sounds like the spec is staying put. @dhil I don't have a ton of time right now to take this over per se, so if you're up for doing a sort of sync with the current state of the parsing here and with the spec that would be appreciated.

I'll note though that given the state of the spec I don't think the goal of this implementation will be 100% strict spec complicance because it doesn't make sense for our use case. The goal of the text implementation here is to closely resemble the binary format where it can, so for example the text implementation should not silently translate all list-of-index representations to list-of-expressions.

Copy link
Contributor Author

@dhil dhil Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to make the necessary changes. I will, however, ask you to nudge me in the right direction. So what do we need to change specifically? I understand that we should compile to something like Indices -- I suppose the question is whether I create a new enum member for this use case, or extend Indices with typing information (I guess both will be breaking changes for consumers of the API).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in general, a once over of the code and corresponding to exising tests and the spec would be good to do. For example this specific case I've highlighted here is not correct as it creates a single expression of many ref.func instructions. This only happens to work for the 1-element table showing up in the tests at the moment.

Additionally the comments in this function about what's being parsed aren't quite correct in the context of which this PR is being sent because an element segment isn't being parsed but rather an inline portion of a table definition.

I also think it would be better perhaps to refactor this in to "figure out how to parse" and "then parse" steps. Right now the index parsing is duplicated in two places and it'd be good to deduplicate that.

Finally when this function is originally called there's some nontrivial logic to calculate what type to pass in -- either None or Some of what was already parsed. I think it would be best to either unconditionally pass it in and rely on logic in this function to figure out what to do, or for comments to be up above explaining the cases for when it's passed in and what effect it has on element segment parsing.


Since this is a new previously-unused syntax that's expected to be valid I think it would be good to add more tests as well under the tests/local/* folder since the one instance of this in the upstream suite is only a minor slice of what's being added.

alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jul 15, 2023
This commit started off by updating the spec testsuite submodule to the
latest version and I ended up fixing various bits of parsing element
segments. The changes here are a little subtle, but they are:

* The text format for element segments was refactored and audited to
  attempt to match what's currently written in the spec. The spec
  testsuite doesn't look like it follows the spec perfectly so a few
  adjustments are made. The main change here functionally is that inline
  element segments on `table` definitions might now automatically switch
  a list-of-index text format into a list-of-expression binary format.
  This is what bytecodealliance#952 and its discussion was about. Other than this though
  some parsing has been tightened up as previously some invalid element
  segments were allowed and now require some more specifiers. For
  example `(elem)` was previously considered valid but that is no longer
  valid, instead it must be `(elem func)`.

* The `wasmparser` and `wasm-encoder` crates have both removed the type
  field from their element segment representations, instead only having
  this type information on the list-of-expressions payload. This
  reflects how in the binary format the list-of-indexes form of element
  segments does not have type information. This was primarily done to
  adjust validation to pass a new test where a list-of-indexes element
  segment seems to be expected to fit in a table of `(ref func)`. This
  means that the type of the element segment previously inferred as
  `funcref` was no longer valid and instead it should be `(ref func)`.
  To better represent this these pieces were refactored internally.

* Validation of the `table_init` instruction was updated to perform a
  subtyping check between the element segment and the destination table
  to allow initializing a nullable table with a non-nullable element
  segment, which is a valid operation where the types are not exactly
  equal.

* One extra error message had to be allowed for some new spec tests
  about invalid modules.

Overall this isn't expected to be a major update of any form. Just
another in the long line of updates and adjustments to how element
segments work and validate.
@alexcrichton
Copy link
Member

I've updated this in spirit into #1126 which will serve the purpose to resolve the failing test.

alexcrichton added a commit that referenced this pull request Jul 17, 2023
* Update handling and parsing of element segments

This commit started off by updating the spec testsuite submodule to the
latest version and I ended up fixing various bits of parsing element
segments. The changes here are a little subtle, but they are:

* The text format for element segments was refactored and audited to
  attempt to match what's currently written in the spec. The spec
  testsuite doesn't look like it follows the spec perfectly so a few
  adjustments are made. The main change here functionally is that inline
  element segments on `table` definitions might now automatically switch
  a list-of-index text format into a list-of-expression binary format.
  This is what #952 and its discussion was about. Other than this though
  some parsing has been tightened up as previously some invalid element
  segments were allowed and now require some more specifiers. For
  example `(elem)` was previously considered valid but that is no longer
  valid, instead it must be `(elem func)`.

* The `wasmparser` and `wasm-encoder` crates have both removed the type
  field from their element segment representations, instead only having
  this type information on the list-of-expressions payload. This
  reflects how in the binary format the list-of-indexes form of element
  segments does not have type information. This was primarily done to
  adjust validation to pass a new test where a list-of-indexes element
  segment seems to be expected to fit in a table of `(ref func)`. This
  means that the type of the element segment previously inferred as
  `funcref` was no longer valid and instead it should be `(ref func)`.
  To better represent this these pieces were refactored internally.

* Validation of the `table_init` instruction was updated to perform a
  subtyping check between the element segment and the destination table
  to allow initializing a nullable table with a non-nullable element
  segment, which is a valid operation where the types are not exactly
  equal.

* One extra error message had to be allowed for some new spec tests
  about invalid modules.

Overall this isn't expected to be a major update of any form. Just
another in the long line of updates and adjustments to how element
segments work and validate.

* Fix benchmark
@alexcrichton
Copy link
Member

I believe this is resolved by #1126

@dhil dhil deleted the fix-br-table branch July 17, 2023 17:10
dhil pushed a commit to dhil/wasm-tools that referenced this pull request Jul 21, 2023
* Update handling and parsing of element segments

This commit started off by updating the spec testsuite submodule to the
latest version and I ended up fixing various bits of parsing element
segments. The changes here are a little subtle, but they are:

* The text format for element segments was refactored and audited to
  attempt to match what's currently written in the spec. The spec
  testsuite doesn't look like it follows the spec perfectly so a few
  adjustments are made. The main change here functionally is that inline
  element segments on `table` definitions might now automatically switch
  a list-of-index text format into a list-of-expression binary format.
  This is what bytecodealliance#952 and its discussion was about. Other than this though
  some parsing has been tightened up as previously some invalid element
  segments were allowed and now require some more specifiers. For
  example `(elem)` was previously considered valid but that is no longer
  valid, instead it must be `(elem func)`.

* The `wasmparser` and `wasm-encoder` crates have both removed the type
  field from their element segment representations, instead only having
  this type information on the list-of-expressions payload. This
  reflects how in the binary format the list-of-indexes form of element
  segments does not have type information. This was primarily done to
  adjust validation to pass a new test where a list-of-indexes element
  segment seems to be expected to fit in a table of `(ref func)`. This
  means that the type of the element segment previously inferred as
  `funcref` was no longer valid and instead it should be `(ref func)`.
  To better represent this these pieces were refactored internally.

* Validation of the `table_init` instruction was updated to perform a
  subtyping check between the element segment and the destination table
  to allow initializing a nullable table with a non-nullable element
  segment, which is a valid operation where the types are not exactly
  equal.

* One extra error message had to be allowed for some new spec tests
  about invalid modules.

Overall this isn't expected to be a major update of any form. Just
another in the long line of updates and adjustments to how element
segments work and validate.

* Fix benchmark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants