-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blaze slicing #1
Comments
Here is another possible example interface and test def slice_dask(outname, inname, index, shape, blockshape):
...
assert slice_dask('y', 'x', slice(0, 20), (100,), (25,)) == \
{('y', 0), (operator.getitem, ('x', 0), slice(0, 20))}
assert slice_dask('y', 'x', slice(20, 30), (100,), (25,)) == \
{('y', 0), (operator.getitem, ('x', 0), slice(20, 25)),
('y', 1), (operator.getitem, ('x', 1), slice(0, 5))} |
From private conversation: I might solve this problem by creating a small helper function with the following behavior: def f(n, blocksize, index):
...
assert f(100, 20, slice( 0, 10)) == {0: slice( 0, 10)}
assert f(100, 20, slice(20, 30)) == {1: slice( 0, 10)}
assert f(100, 20, slice(15, 30)) == {0: slice(15, 20),
1: slice( 0, 10)}
assert f(100, 20, slice(15, 85)) == {0: slice(15, 20),
1: slice( 0, 20),
2: slice( 0, 20),
3: slice( 0, 20),
4: slice( 0, 5)}
assert f(100, 20, 5) == {0: 5}
assert f(100, 20, 35) == {1: 15} One can then use this function in the following way. shape = (100, 100, 100)
blockshape = (20, 20, 20)
index = (5, slice(5, 35), slice(40, 90))
inname = 'x'
outname = 'y'
# maps = [f(d, bd, i) for b, bd, i in zip(shape, blockshape, index)]
maps = map(f, shape, blockshape, index) # fancy way of saying the above
# Now have something like
[{0: 5},
{0: slice(5, 20), 1: slice(0, 15)},
{2: slice(0, 20), 3: slice(0, 20), 4: slice(0, 10)}]
# Need to do get cartesian product of these three (using itertools.product or some sort
# of recursion
{(0, 0, 0): ((0, 0, 2), (5, slice(5, 20), slice(0, 20))),
(0, 0, 1): ((0, 0, 3), (5, slice(5, 20), slice(0, 20))),
(0, 0, 2): ((0, 0, 4), (5, slice(5, 20), slice(0, 10))),
(0, 1, 0): ((0, 1, 2), (5, slice(0, 15), slice(0, 20))),
(0, 1, 1): ((0, 1, 3), (5, slice(0, 15), slice(0, 20))),
(0, 1, 2): ((0, 1, 4), (5, slice(0, 15), slice(0, 10)))} And then add in the administrative details # add in administrative stuff just at the end
{('y', 0, 0, 0): (getitem, ('x', 0, 0, 2), (5, slice(5, 20), slice(0, 20))),
('y', 0, 0, 1): (getitem, ('x', 0, 0, 3), (5, slice(5, 20), slice(0, 20))),
('y', 0, 0, 2): (getitem, ('x', 0, 0, 4), (5, slice(5, 20), slice(0, 10))),
('y', 0, 1, 0): (getitem, ('x', 0, 1, 2), (5, slice(0, 15), slice(0, 20))),
('y', 0, 1, 1): (getitem, ('x', 0, 1, 3), (5, slice(0, 15), slice(0, 20))),
('y', 0, 1, 2): (getitem, ('x', 0, 1, 4), (5, slice(0, 15), slice(0, 10)))} |
OK, so after playing with this some more we might need to make the Array class and the interace to slicing more complex. dask Arrays are currently defined by
Slicing breaks the concept of a single regular blocksize. We now have a sequence of block-lengths along each dimension. Repeated slicing requires us to handle possibly irregularly spaced block-lengths. E.g. we might have an array
And so 1d slicing functions should presumably take not a blocksize, but a sequence of blocksizes
Additionally, we now need to know the new block-lengths, in this case
This could be computed from the original inputs or from the result of |
Added negative slicing capabilities to _slice_1d().
Add index=False description to read_parquet
* Add name kwarg to from_zarr (#1) * Add name kwarg to from_zarr * added default from_zarr naming convention per jrbourbeau * flake8 fix * Added test for from_zarr name hashing * added name test * update from_zarr docstring * grammer curl Co-Authored-By: mpeaton <mpeaton@users.noreply.github.com> * python-snappy
* Add name kwarg to from_zarr (dask#1) * Add name kwarg to from_zarr * added default from_zarr naming convention per jrbourbeau * flake8 fix * Added test for from_zarr name hashing * added name test * update from_zarr docstring * grammer curl Co-Authored-By: mpeaton <mpeaton@users.noreply.github.com> * python-snappy
* Add name kwarg to from_zarr * added default from_zarr naming convention per jrbourbeau * flake8 fix
We want to implement slicing on chunked dask graphs representing arrays. E.g. if I have a 100 by 100 array, split into 10 by 10 blocks and I compute
Then I want the upper left block completely, and half of the block above and below, and a quarter of the block diagonally lower-right. Note that doing this logic correctly depends not only on the dask, but also in the shape-metadata in the
Array
object.Slicing might become complex (there is a lot of potential book-keeping here) but a partial solution is probably good enough for now.
Example
Given a dask
Array
object like the followingand a blaze expression like the following
We'd like to be able to compute a new
dask.obj.Array
object with the following daskNote on code complexity
Dask is likely to be refactored. Things like the
Array
class are experimental and likely to change shape in the future. We want to avoid work when that refactor occurs. Because of this it's nice to keep as much gritty detail work like this independent from the current conventions for as long as possible. E.g. there is likely a std-lib only function that creates a dictionary from pure-python terms (tuples, dicts, names, slices) and then adask.obj.Array
-aware function. This is the approach behindarray.top
andobj.atop
The text was updated successfully, but these errors were encountered: