Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the data block needs to be binary? #118

Open
LibrEars opened this issue Oct 4, 2022 · 6 comments
Open

Does the data block needs to be binary? #118

LibrEars opened this issue Oct 4, 2022 · 6 comments

Comments

@LibrEars
Copy link

LibrEars commented Oct 4, 2022

Hi all,

I would like to save my astropy QTables with the asdf format to also save the meta dictionary and units. The only thing that holds me back is comparability with other users who might not want to use python to evaluate the data.

Would it be possible to store the data (relatively small) after the yaml header in a human-readable format like ecsv does? Or is there a hard reason why it needs to be a binary block?

PS: I found the ASDF-option to save inline arrays inside the yaml, but I think it is not accessible via QTable.write() and it seems not very human-readable nor easy to extract with GUI-software..
PS2: using ecsv with QTable.read() it does not import the meta dictionary as a dictionary and asdf seems more future-proof.

@LibrEars LibrEars changed the title Why does the data block needs to be binary? Does the data block needs to be binary? Oct 4, 2022
@WilliamJamieson
Copy link
Contributor

@LibrEars,

Thanks for the feedback. Unfortunately, there is not an elegant way to save arrays inside yaml and I would strongly encourage you to not attempt saving "too big" as it can make reading the asdf file quite slow. However, the inline arrays have been designed so that they can be easily parsable by any yaml parser.

As for your main request could you provide me with minimal code example which produces an example of a small (just a few rows) QTable that you want to save (random data is fine)?

@perrygreenfield
Copy link
Member

Could you also clarify what you mean by human readable? Just because a block can contain binary doesn't preclude the contents being simple text in principle (though the current python interface is very numpy-oriented). If your intent is that people can edit this block with a text editor, yes, there are some binary words before the actual contents that may complicate that. On the other hand, are you asking for a way to write to and read this human readable content from the binary block. Along those lines, can you show how you would like this to be used in code (writing and reading)?

@LibrEars
Copy link
Author

LibrEars commented Oct 5, 2022

Hi all,
thank you for the quick replies :). Here is some code as an explenatin:

# Example question on human readable asdf: https://github.com/astropy/asdf-astropy/issues/118#issuecomment-1267339629

# %% Import modules
import time

import numpy as np
from astropy.table import QTable
import astropy.units as u

# %% Meta-data of the experiment
meta = {"Experimentalist":"LibrEars",
        "measurement_type": "flux_of_fluxgenerator",
        "nr":42, "pix":7, "voltage":-2,
        "time":time.asctime(time.localtime()),
        "temperature":37}


# %% Store data columns in a astropy QTable (from fluxgenerator measurements)
current = np.linspace(0,20, 20)
flux = np.ones(20)
data = QTable([current, flux], names=["Curren", "Fluxgenerated_flux"], units=[u.A, u.flx])

# attach meta-data to QTable
data.meta = meta

# %% Save 
data.write("Nr{}_fluxgenerator{}".format(42, ".asdf"))

# %% Later load data and meta-data again via python works fine
old_data = QTable.read("Nr42_fluxgenerator.asdf")

#%% Do some fancy matplotlib-plotting...

So my main purpose to use QTable and asdf at the moment is to store the measured data, units and experiment meta-data together to be able to do improved data-handling. This works as expected in python.

Now a non-python user finds the asdf file and would like to load it into any other GUI-based data-plotting program. But opening the file with a text-editor does not display the data block in a human readable way:

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
........
........
data: !<tag:astropy.org:astropy/table/table-1.0.0>
  colnames: [Curren, Fluxgenerated_flux]
  columns:
  - !unit/quantity-1.1.0
    unit: !unit/unit-1.0.0 A
    value: !core/ndarray-1.0.0
      source: 0
      datatype: float64
      byteorder: little
      shape: [20]
  - !unit/quantity-1.1.0
    unit: !unit/unit-1.0.0 flx
    value: !core/ndarray-1.0.0
      source: 1
      datatype: float64
      byteorder: little
      shape: [20]
  meta: {Experimentalist: LibrEars, measurement_type: flux_of_fluxgenerator, nr: 42,
    pix: 7, temperature: 37, time: 'Wed Oct  5 09:59:08 2022', voltage: -2}
  qtable: true
...
\D3BLK\000\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\A0\00\00\00\00\00\00\00\A0\00\00\00\00\00\00\00\A0s\E5\82<5\AC@\DD\D0\C5@s9\B3d7\00\00\00\00\00\00\00\00y
\E55\94\D7\F0?y
\E55\94\D7\00@6\94\D7P^C	@y
\E55\94\D7�@\D7P^Cy
�@6\94\D7P^C�@\94\D7P^Cy�@y
\E55\94\D7 @(\AF\A1\BC\86\F2"@\D7P^Cy
%@\86\F2�\CAk('@6\94\D7P^C)@\E55\94\D7P^+@\94\D7P^Cy-@Cy
\E55\94/@y
\E55\94\D70@Q^Cy
\E51@(\AF\A1\BC\86\F22@\00\00\00\00\00\004@\D3BLK\000\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\A0\00\00\00\00\00\00\00\A0\00\00\00\00\00\00\00\A0O\D3\F7\C0\ABؔ\91\AD*Ӳ\B2{\A7\EB\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?\00\00\00\00\00\00\F0?#ASDF BLOCK INDEX
%YAML 1.1
---
- 1512
- 1726
...

So he might be confused about how to read the data. If, on the other hand, the file would open as follows, it would be compatible with everything that can read text and still contain the schema, units and meta-data for advanced usage:

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
........
........
  colnames: [Curren, Fluxgenerated_flux]
  columns:
  - !unit/quantity-1.1.0
    unit: !unit/unit-1.0.0 A
    value: !core/ndarray-1.0.0
      source: 0
      datatype: float64
      byteorder: little
      shape: [20]
  - !unit/quantity-1.1.0
    unit: !unit/unit-1.0.0 flx
    value: !core/ndarray-1.0.0
      source: 1
      datatype: float64
      byteorder: little
      shape: [20]
  meta: {Experimentalist: LibrEars, measurement_type: flux_of_fluxgenerator, nr: 42,
    pix: 7, temperature: 37, time: 'Wed Oct  5 09:59:08 2022', voltage: -2}
  qtable: true
...
Curren Fluxgenerated_flux
0.0 1.0
1.0526315789473684 1.0
2.1052631578947367 1.0
3.1578947368421053 1.0
4.2105263157894735 1.0
5.263157894736842 1.0
6.315789473684211 1.0
7.368421052631579 1.0
8.421052631578947 1.0
9.473684210526315 1.0
10.526315789473683 1.0
11.578947368421051 1.0
12.631578947368421 1.0
13.68421052631579 1.0
14.736842105263158 1.0
15.789473684210526 1.0
16.842105263157894 1.0
17.894736842105264 1.0
18.94736842105263 1.0
20.0 1.0
#ASDF BLOCK INDEX
%YAML 1.1
---
- 1512
- 1726
...

The use-case would be small measurements. Maybe a 'compressor' saving data in text instead of binary would be a solution (and some statement in the yaml-header about how to read / 'decompress' that block by asdf)?

@perrygreenfield
Copy link
Member

You can supply a keyword argument to the write method as such:

data.write("Nr{}_fluxgenerator{}".format(42, ".asdf"), all_array_storage='inline')

Which will produce this form of the ASDF file:

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 2.11.2.dev15+g6703d8f.d20220729}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 2.11.2.dev15+g6703d8f.d20220729}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://astropy.org/astropy/extensions/astropy-1.0.0
    software: !core/software-1.0.0 {name: asdf-astropy, version: 0.2.1}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    software: !core/software-1.0.0 {name: asdf-astropy, version: 0.2.1}
data: !<tag:astropy.org:astropy/table/table-1.0.0>
  colnames: [Curren, Fluxgenerated_flux]
  columns:
  - !unit/quantity-1.1.0
    unit: !unit/unit-1.0.0 A
    value: !core/ndarray-1.0.0
      data: [0.0, 1.0526315789473684, 2.1052631578947367, 3.1578947368421053, 4.2105263157894735,
        5.263157894736842, 6.315789473684211, 7.368421052631579, 8.421052631578947,
        9.473684210526315, 10.526315789473683, 11.578947368421051, 12.631578947368421,
        13.68421052631579, 14.736842105263158, 15.789473684210526, 16.842105263157894,
        17.894736842105264, 18.94736842105263, 20.0]
      datatype: float64
      shape: [20]
  - !unit/quantity-1.1.0
    unit: !unit/unit-1.0.0 flx
    value: !core/ndarray-1.0.0
      data: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
        1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
      datatype: float64
      shape: [20]
  meta: {Experimentalist: LibrEars, measurement_type: flux_of_fluxgenerator, nr: 42,
    pix: 7, temperature: 37, time: 'Wed Oct  5 08:20:11 2022', voltage: -2}
  qtable: true
...

Would this suffice for your needs?

@LibrEars
Copy link
Author

LibrEars commented Oct 5, 2022

Hi @perrygreenfield,

thank you for your suggestion. The ´all_array_storage='inline'´ keyword goes in the right direction. I did not find it in the astropy documentation (QTable.write.help("asdf") ), so thank you for pointing it out.

For the purpose of comparability/ accessibility to the data I would still feel an array like structure outside of the yaml would be more suitable, as most data-programs can import row-like data.

@perrygreenfield
Copy link
Member

I think something we will be looking at soon is a way to support things other than arrays in binary blocks. But we are currently focussed on chunking support so it will have to wait after that (but may inform some changes we may need to make to support chunking with options for other kinds of content). Thanks for your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants