Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setting compression on individual columns of astropy table #156

Open
lgarrison opened this issue Jan 6, 2023 · 2 comments
Open

setting compression on individual columns of astropy table #156

lgarrison opened this issue Jan 6, 2023 · 2 comments

Comments

@lgarrison
Copy link
Member

Is there a syntax to set compression on individual columns of an astropy table? In the following example, using all_array_compression compresses the columns, but using AsdfFile.set_array_compression() does not.

import numpy as np
from astropy.table import Table
import asdf

t = Table(data=dict(col=np.ones(1)))

with asdf.AsdfFile(tree=dict(table=t)) as af:
    af.write_to('test.asdf')

with asdf.AsdfFile(tree=dict(table=t)) as af:
    af.write_to('test_compressed.asdf',
        all_array_compression='zlib',
    )

with asdf.AsdfFile(tree=dict(table=t)) as af:
    af.set_array_compression(af['table']['col'], 'zlib')  # this has no effect
    af.write_to('test_compressed_col.asdf')

Comparing test.asdf with test_compressed_col.asdf, we see that they have identical checksums (and there's no zlib tag at the beginning of the binary block). So the set_array_compression had no effect.

I've tried variants of this like

set_array_compression(af['table']['col'].base, 'zlib')
set_array_compression(af['table']['col'].data, 'zlib')

but I couldn't get it to work.

I did dig around in the source code a bit, and it looked to me like it's trying to compare the ultimate ndarray base to check if two arrays are the same, but maybe a copy is being made somewhere that's thwarting this detection.

@WilliamJamieson
Copy link
Contributor

@braingram or @eslavich what are your opinions?

@braingram
Copy link
Contributor

Thanks for opening this issue!

Unfortunately this looks to be unsupported without using the legacy extension api (specifically the 'reserve_blocks' hook).

The call to set_array_compression uses the id of the column array to define an internal block which stores the zlib compression option. However, the call to write_to includes a call to block_manager.find_used_blocks which looks at all internal blocks (like the one created on the call to set_array_compression) and throws out any blocks that don't appear to be used.
https://github.com/asdf-format/asdf/blob/master/asdf/block.py#L553-L557
It uses the reserve_blocks hook that is currently only supported with legacy extensions (asdf-astropy uses the new style converters) and looks at each node in the tree to see if that node has blocks that should be kept. Since no node claims the block created when set_array_compression was called (in this case the table node should claim this but ASDF does not currently have a way to do this) it is thrown out and the compression settings are lost.

We (the asdf developers) are currently working on flushing out the new style extensions to support all the features of the legacy extension api/type system. This is a good example case that we should strive to support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants