Skip to content
@ArtifactDB

ArtifactDB

Metadata and file store for arbitrary data objects

Storage for analysis-ready artifacts

About

This organization contains various repositories for the ArtifactDB project, a storage system for analysis-ready artifacts. It aims to provide easy access to datasets and analysis results across multiple programming frameworks such as R and Python. The ArtifactDB system was originally developed at @Genentech to store the outputs of various genomics analysis pipelines (plus the associated metadata); scientists can then pull these artifacts into their analysis environments for further processing.

For users

R

The alabaster suite implements functions to save and read Bioconductor objects via language-agnostic file formats. This is the workhorse of our R-based data serialization pipelines, managing the conversion of various objects into files for long-term storage.

The gypsum R package implements an interface to the gypsum REST API. This handles the upload of files and associated metadata to cloud storage for large-scale distribution.

Python

The dolomite suite implements functions to save and read Bioconductor objects via language-agnostic file formats. This is equivalent to alabaster for Python and is based heavily on the classes from the BiocPy project.

For developers

The gypsum worker implements a REST API for storing and serving artifacts via the Cloudflare stack. This uses R2 for storage and Workers to handle authenticated uploads via flexible permission schemes.

The Gobbler manages artifacts across users on a shared filesystem such as those used in HPC clusters. This is effectively an on-premise version of gypsum that is simpler and more efficient for local applications.

The takane library contains language-agnostic specifications for all Bioconductor object types. These are enforced by validator functions written in C++, which are used by both alabaster and dolomite to verify compliance.

Popular repositories

  1. uzuki uzuki Public archive

    Safely saving R lists to JSON

    C++ 4

  2. alabaster.base alabaster.base Public

    Base methods for the alabaster client framework

    C++ 2

  3. zircon-R zircon-R Public

    R interface for AritfactDB APIs

    R 2

  4. dolomite-base dolomite-base Public

    Save Bioconductor objects in Python.

    Python 2

  5. BiocObjectSchemas BiocObjectSchemas Public archive

    JSON schemas for Bioconductor objects

    1 1

  6. alabaster.se alabaster.se Public

    Save and load SummarizedExperiment objects to file

    R 1

Repositories

Showing 10 of 71 repositories

Top languages

Loading…

Most used topics

Loading…