Skip to content

Euphrasiologist/nu_plugin_bio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nushell bio

A bioinformatics plugin for nushell. This plugin parses most common bioinformatics formats into structured data so you can use them with nushell more effectively.

Quick setup

Go and get nushell, it's great. I'm assuming you have the rust toolchain installed. Then come back!

# clone this repo
git clone https://github.com/Euphrasiologist/nu_plugin_bio
# change into the repo directory
cd nu_plugin_bio
# build
# it's quite a long compile time...
cargo build --release
# register the plugin
register nu_plugin_bio/target/release/nu_plugin_bio

# see the current file formats currently supported below
# now you can just use open, and the file extension will be auto-detected.

# there are some test files in the tests/ dir.
open ./tests/test.fasta
    | get id

# if you want to add flags you have to explicitly use from <x>
# e.g. if you want descriptions in fasta files to be parsed.

open --raw ./tests/test.fasta 
    | from fasta -d
    | first

The backend is a noodles wrapper, an excellent, all-Rust bioinformatics I/O library.

Aims

Aim to support the following:

  • BAM 1.6
  • BCF 2.2
    • bcf.gz
  • VCF 4.3
    • vcf.gz
  • BED(3 only right now)
  • CRAM 3.0
  • FASTA
    • fa.gz
  • FASTQ
    • fq.gz
  • GFF3
  • GTF 2.2
  • SAM 1.6
  • GFA 1.0
    • gfa.gz

Note that performance will not be optimal with the current state of nu_plugin, as we cannot access the engine state of nushell, and therefore need to load entire data structures into memory. Testing still needs to be done on large files.

More?

If there's a bioinformatics format you want to add, let me know, or add a PR.