GitHub - LCrossman/punchline: Punchline for pangenomes

Punchline Pfam domain searches

Definition:

punchline /ˈpʌn(t)ʃlʌɪn/ noun

the final phrase or sentence of a joke or story, providing the humour or some other crucial element.

Dependencies: HMMER3 installation instructions here: http://hmmer.org/documentation.html Python2.7 or Python3 Biopython - http://biopython.org/DIST/docs/install/Installation.html
R
R corrplot library install.packages("corrplot") from within R

First step:
Gather your files:
1. Download Pfam-A database from here: https://pfam.xfam.org and FTP tab at the top of the page Make a note of the version number in case you need to quote it later.
2. Collect all your proteins to search in one file as a protein fasta file - you will need to make sure each protein is named so that you know which strain it was from and all need a unique name such as H10407|H10407_0003. The strain name should be first, followed by a | character and the name of the coding sequence or locus tag. ** more details on how to do this if requested **
Second step:
Run Pfam - the fastest way to search a large number of protein sequences with a large number of Pfam motifs is to use hmmsearch.
hmmsearch --noali --notextw --cpu [insert number of cpus here] --domtblout your_outfile_name Pfam_A_hmm_profiles your_protein_fasta

Third step:
Run pfam_presence_absence.py
pfam_presence_absence.py hmmsearch_outfile Pfam_domain_names

This will create a large number of files, each with the top line of the species in question The Pfam_domain_names file is a list of all the Pfam domain shortnames you are searching with, as a text file with one name per line. An example file is provided in the examples folder and is specific for the 32.0 version of Pfam-A, released Sept. 2018.

Fourth step:
We join these files to an R matrix with:
join_and_create_dataset_punchline.R
Includes:
Kruskal-wallis significance testing using R

***What to do if the plot needs resizing ** coming soon!!

Fifth step:
Heatmap and correlation plot using Rscript
Rscript Heatmapper_punchline.R

Sixth step:
Phylogeny clustering
*still under construction
You will be able to build the input phylogeny file by: running plot_pfam_phylog.R Then run phylip neighbour on that file and afterwards run: plot_pfam_phylog.R If you get errors (usually about the strain names) you can try: validateforphylip.py

Look at the resulting tree with your favourite tree viewer program such as Figtree
Or try color_tree_labels.R in useful_scripts folder on github.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
examples		examples
.gitignore		.gitignore
AdjustFasta.py		AdjustFasta.py
FilterFasta.py		FilterFasta.py
LICENSE		LICENSE
README.md		README.md
heatmapper_punchline.R		heatmapper_punchline.R
join_and_create_dataset_punchline.R		join_and_create_dataset_punchline.R
pfam_presence_absence.py		pfam_presence_absence.py
punchline_README.txt		punchline_README.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

.gitignore

.gitignore

AdjustFasta.py

AdjustFasta.py

FilterFasta.py

FilterFasta.py

LICENSE

LICENSE

README.md

README.md

heatmapper_punchline.R

heatmapper_punchline.R

join_and_create_dataset_punchline.R

join_and_create_dataset_punchline.R

pfam_presence_absence.py

pfam_presence_absence.py

punchline_README.txt

punchline_README.txt

Repository files navigation

About

Releases

Packages

Languages

License

LCrossman/punchline

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages