Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subgroup Operations #2523

Closed
wants to merge 26 commits into from
Closed

Conversation

exrook
Copy link
Contributor

@exrook exrook commented Sep 30, 2023

Moved to gfx-rs/wgpu#4190

Adds new capability SUBGROUP, required for a shader to use subgroup builtin functions or parameters.

Adds new validator settings subgroup_operations and subgroup_stages determining which sets of the below subgroup operations are valid, and which stages they are valid in.

BASIC operations:

# Performs a control and memory barrier across all invocations in the subgroup
subgroupBarrier()

VOTE operations:

subgroupAll(bool) -> bool
subgroupAny(bool) -> bool

ARITHMETIC operations:

# Operations on scalars and vectors of f32, i32, u32:
# Computes a single result value using values from all active lanes
subgroupAdd(value) -> value
subgroupMul(value) -> value
subgroupMin(value) -> value
subgroupMax(value) -> value
# Operations on scalars and vectors of i32, u32:
# Computes a single result value using values from all active lanes
subgroupAnd(value) -> value
subgroupOr(value) -> value
subgroupXor(value) -> value
# Computes a prefix scan across all active lanes
subgroupPrefixInclusiveAdd(value) -> value
subgroupPrefixInclusiveMul(value) -> value
subgroupPrefixExclusiveAdd(value) -> value
subgroupPrefixExclusiveMul(value) -> value

BALLOT operations:

# Computes a result using a single bit from every active lane
subgroupBallot() -> vec4<u32>
subgroupBallot(bool) -> vec4<u32>
# Operations on scalars and vectors of f32, i32, u32:
# Reads a value from the first active lane into every other active lane
subgroupBroadcastFirst(value) -> value

SHUFFLE operations:

# Operations on scalars and vectors of f32, i32, u32:
# Reads a value from the lane given by index into the current lane, index may vary per lane
subgroupBroadcast(value, index) -> value
# As above, but the index is computed from the current lane id XOR index_mask
subgroupShuffleXor(value, index_mask) -> value

SHUFFLE_RELATIVE operations

# Reads a value from the lane with id = current lane id +/- `offset`
subgroupShuffleUp(value, index_offset) -> value
subgroupShuffleDown(value, index_offset) -> value

New builtins:

# available in any stage, subject to device support
subgroup_invocation_id: u32
subgroup_size: u32
# available only in compute stage
num_subgroups: u32
subgroup_id: u32

related: gfx-rs/wgpu#4428

@exrook exrook mentioned this pull request Sep 30, 2023
3 tasks
@exrook exrook changed the title Implement subgroupBallot Subgroup Operations Oct 5, 2023
@Lichtso
Copy link

Lichtso commented Oct 11, 2023

subgroupPrefixExclusiveAdd/subgroupPrefixExclusiveMul and subgroupPrefixInclusiveAdd/subgroupPrefixInclusiveMul should be differentiated.

subgroupBroadcast(index, value) should be subgroupBroadcast(value, index).

Regarding the built-ins, there should not only be subgroup_invocation_id and subgroup_size but also num_subgroups and subgroup_id.

Also, as broadcast is included, the shuffles should be added as well. They are the more generalized subgroup gather operations, where as broadcast is a special case of a shuffle where all threads gather from the same source.

@Lichtso
Copy link

Lichtso commented Oct 11, 2023

@exrook I continued your work in this PR: #2557

@Lichtso Lichtso mentioned this pull request Oct 14, 2023
14 tasks
@exrook exrook force-pushed the wgsl_minimal_subgroup branch 3 times, most recently from 08dec85 to ef0bd1c Compare October 21, 2023 22:04
@exrook exrook marked this pull request as ready for review October 21, 2023 22:50
@exrook exrook requested a review from a team as a code owner October 21, 2023 22:50
supported operations and stages subgroup operations are supported on
can be passed to the validator after creating it

operations are grouped to follow vulkan:
- basic: elect, barrier
- vote: any, all
- arithmetic: reductions, scan
- ballot: ballot, broadcasts,
- shuffle: shuffles,
- shuffle relative: shuffle up, down
@jimblandy
Copy link
Member

jimblandy commented Oct 24, 2023

Okay, so, a bunch of thoughts on this:

  • What fantastic work! This is definitely something we want to have.

  • My guess is that the WebGPU committee will take up subgroup operations in 2024. Naga will need to adjust these operations to match the spec, so users need to be aware that they are not stable.

  • That said, Naga needs to be pretty open-minded about getting ahead of the spec. Supporting the features people need enlarges Naga's audience, which ideally means that we have more people tending to common infrastructure as well, which is the positive FLOSS dynamic. A similar rationale applies to ray-tracing features, for example: by incorporating them, we make Naga useful to anyone doing ray-tracing, and hopefully get core contributions from them in turn.

  • We are right now in the midst of some pretty heavy work catching up with the WGSL spec. Teo just landed Implement const-expressions (phase 2) #2309, and we're working on [wgsl-in] Implement automatic type conversions (abstract types) wgpu#4400 and Pipeline-overridable constants wgpu#4366. These are large changes that will drastically improve Naga's compliance with the spec, so they're important to users. This is a top priority for the Mozilla WebGPU team. I am concerned about adding other large pieces like this PR while we are in the midst of that work.

Would it make sense to land this work as a long-lived branch, maintained by interested parties, taking regular merges from master? That way, users who need subgroup operations can get them with a git/branch Cargo dependency, but we can continue to work on our priorities.

@teoxoy
Copy link
Member

teoxoy commented Oct 24, 2023

A way to make this easier to review would be to gather as much relevant information as possible into an investigation with a subsequent proposal in the spec repo, as right now gpuweb/gpuweb#4306 and other issues seem to have scattered and missing info on the topic.

gfx-rs/wgpu#4190 (comment) & gfx-rs/wgpu#4190 (comment) are a good start!

@exrook
Copy link
Contributor Author

exrook commented Oct 25, 2023

Thanks @teoxoy, @jimblandy for for taking a look at this!

Would it make sense to land this work as a long-lived branch, maintained by interested parties, taking regular merges from master? That way, users who need subgroup operations can get them with a git/branch Cargo dependency, but we can continue to work on our priorities.

👍
While ideally I'd like to have these changes in master to make them accessible to native users of wgpu, I think a long-lived branch would be the next best thing. I don't mind managing merges from master until there's time to review everything.

With gfx-rs/wgpu#4231 upcoming, it should also be relatively straightforward for interested users to point at a single wgpu/subgroup branch.

Additionally if there's any way I can break up these changes to make them easier to review I'm happy to consider doing that.

A way to make this easier to review would be to gather as much relevant information as possible into an investigation with a subsequent proposal in the spec repo, as right now gpuweb/gpuweb#4306 and other issues seem to have scattered and missing info on the topic.

gfx-rs/wgpu#4190 (comment) & gfx-rs/wgpu#4190 (comment) are a good start!

I'll see what I can do about that!

@exrook
Copy link
Contributor Author

exrook commented Oct 25, 2023

Moved into gfx-rs/wgpu#4190

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants