Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy reductions should be data-parallel #158

Open
inducer opened this issue Aug 27, 2021 · 5 comments
Open

Lazy reductions should be data-parallel #158

inducer opened this issue Aug 27, 2021 · 5 comments

Comments

@inducer
Copy link
Owner

inducer commented Aug 27, 2021

Not currently a showstopper, but it comes up in mirgecom when logging stats about state and dependent variables.

cc @kaushikcfd @matthiasdiener @matthiasdiener @MTCam

@kaushikcfd
Copy link
Collaborator

Shouldn't this be a part of the downstream array contexts? An example for this could be what we do in meshmode:
https://github.com/inducer/meshmode/pull/248/files#diff-d5e55ef91478d86ac35923519f1d4556fa8502c33265a63f9cbf01579c26d461R528-R541

I.e. compute the reductions eagerly via PyOpenCL. (Not the best, but could be one way for a downstream array context to handle it.)

@inducer
Copy link
Owner Author

inducer commented Aug 27, 2021

Yeah, you're right. We'll need to know something about array axes in order to effectively do the transformation. Moving to grudge (which is where the reductions are being introduced).

@inducer inducer transferred this issue from inducer/arraycontext Aug 27, 2021
@inducer
Copy link
Owner Author

inducer commented Aug 27, 2021

Oh, TIL about the eager reductions. So that means we can force our reductions to be parallel (via eager) if we pre-freeze/thaw all inputs?

@kaushikcfd Do you have a sense how often _can_be_eagerly_computed is actually true in practice?

This might do in the short term, however in the long term, I'm fairly sure we want to be able to fuse all those reductions, so that we only need to load all that vector data once, and at that point, properly transforming the reductions becomes unavoidable.

@kaushikcfd
Copy link
Collaborator

@kaushikcfd Do you have a sense how often _can_be_eagerly_computed is actually true in practice?

At least for our drivers no reduction instructions go un-parallelized.

This might do in the short term

Yep for sure, this is a placeholder until we have a better approach. A common pattern seen in our drivers is to get the max/min of a single quantity (pressure for ex.). There's no reason we should have 2 kernel launches for that.

@inducer
Copy link
Owner Author

inducer commented Aug 27, 2021

our drivers

Just to be clear: Which drivers do you mean by that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants