-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy reductions should be data-parallel #158
Comments
Shouldn't this be a part of the downstream array contexts? An example for this could be what we do in meshmode: I.e. compute the reductions eagerly via PyOpenCL. (Not the best, but could be one way for a downstream array context to handle it.) |
Yeah, you're right. We'll need to know something about array axes in order to effectively do the transformation. Moving to grudge (which is where the reductions are being introduced). |
Oh, TIL about the eager reductions. So that means we can force our reductions to be parallel (via eager) if we pre-freeze/thaw all inputs? @kaushikcfd Do you have a sense how often This might do in the short term, however in the long term, I'm fairly sure we want to be able to fuse all those reductions, so that we only need to load all that vector data once, and at that point, properly transforming the reductions becomes unavoidable. |
At least for our drivers no reduction instructions go un-parallelized.
Yep for sure, this is a placeholder until we have a better approach. A common pattern seen in our drivers is to get the max/min of a single quantity (pressure for ex.). There's no reason we should have 2 kernel launches for that. |
Just to be clear: Which drivers do you mean by that? |
Not currently a showstopper, but it comes up in mirgecom when logging stats about state and dependent variables.
cc @kaushikcfd @matthiasdiener @matthiasdiener @MTCam
The text was updated successfully, but these errors were encountered: