polish: update README and part arguments
This commit is contained in:
parent
bbc3d7fe82
commit
811a1a6c0e
2 changed files with 68 additions and 36 deletions
90
README.md
90
README.md
|
@ -10,39 +10,49 @@ Author: Guoyi Zhang
|
||||||
|
|
||||||
### External software
|
### External software
|
||||||
|
|
||||||
- GNU Bash (provide cd)
|
|
||||||
- GNU coreutils (provide cp mv mkdir mv)
|
|
||||||
- GNU findutils (provide find)
|
|
||||||
- fastp
|
- fastp
|
||||||
- spades.py (provided by spades)
|
- spades.py (provided by spades)
|
||||||
- diamond
|
- diamond
|
||||||
|
- bowtie2
|
||||||
|
- samtools
|
||||||
|
- bcftools
|
||||||
|
- exonerate (optional, only for --codon)
|
||||||
- java
|
- java
|
||||||
- macse (default recognized path: /usr/share/java/macse.jar)
|
- macse (default recognized path: /usr/share/java/macse.jar)
|
||||||
- GNU parallel
|
- trimal
|
||||||
|
|
||||||
### Internal software
|
### Internal software
|
||||||
|
|
||||||
- splitfasta (default recognized path: /usr/bin/splitfasta)
|
|
||||||
- sortdiamond (default recognized path: /usr/bin/sortdiamond)
|
- sortdiamond (default recognized path: /usr/bin/sortdiamond)
|
||||||
|
- delstop (default recognized path: /usr/bin/delstop)
|
||||||
|
|
||||||
## Arguments
|
## Arguments
|
||||||
|
|
||||||
### Details
|
### Details
|
||||||
|
|
||||||
```
|
```
|
||||||
-c --contigs contings type: scaffolds or contigs
|
-c --config config file for software path (optional)
|
||||||
-g --genes gene file path
|
-g --genes gene file path (optional, if -r is specified)
|
||||||
-f --functions functions type (optional): all clean
|
-f --functions functions type (optional): all clean assembly
|
||||||
assembly fasta map pre split merge align
|
map postmap varcall consen codon align trim
|
||||||
-h --help show this information
|
-h --help show this information
|
||||||
-l --list list file path
|
-l --list list file path
|
||||||
-m --memory memory settings (optional, default 16 GB)
|
-m --memory memory settings (optional, default 16 GB)
|
||||||
-r --reference reference genome path
|
-r --reference reference genome path
|
||||||
-t --threads threads setting (optional, default 8 threads)
|
-t --threads threads setting (optional, default 8 threads)
|
||||||
--macse Macse jarfile path
|
--codon Only use the codon region (optional)
|
||||||
--sortdiamond sortdiamond file path
|
--fastp Fastp path (optional)
|
||||||
--splitfasta splitfasta file path
|
--spades Spades python path (optional)
|
||||||
for example: bash RGBEPP.sh -c scaffolds -f all -l list -g genes -r reference.aa.fasta
|
--diamond Diamond python path (optional)
|
||||||
|
--sortdiamond SortDiamond python path (optional)
|
||||||
|
--bowtie2 Bowtie2 path (optional)
|
||||||
|
--samtools Samtools path (optional)
|
||||||
|
--bcftools Bcftools path (optional)
|
||||||
|
--exonerate Exonerate path (optional)
|
||||||
|
--macse Macse jarfile path (optional)
|
||||||
|
--delstop Delstop path (optional)
|
||||||
|
--trimal Trimal path (optional)
|
||||||
|
for example: ./RGBEPP -f all -l list -t 8 -r reference.fasta
|
||||||
```
|
```
|
||||||
|
|
||||||
### Directories Design
|
### Directories Design
|
||||||
|
@ -52,16 +62,17 @@ for example: bash RGBEPP.sh -c scaffolds -f all -l list -g genes -r reference.aa
|
||||||
├── 00_raw
|
├── 00_raw
|
||||||
├── 01_fastp
|
├── 01_fastp
|
||||||
├── 02_spades
|
├── 02_spades
|
||||||
├── 03_assemblied
|
├── 03_bowtie2
|
||||||
├── 04_diamond
|
├── 04_bam
|
||||||
├── 05_pre
|
├── 05_vcf
|
||||||
├── 06_split
|
├── 06_consen
|
||||||
├── 07_merge
|
├── 07_macse
|
||||||
├── 08_macse
|
├── 08_macse
|
||||||
├── genes
|
├── 08_trimal
|
||||||
├── list
|
├── list
|
||||||
|
├── gene
|
||||||
├── reference.aa.fasta
|
├── reference.aa.fasta
|
||||||
└── RGBEPP.sh
|
└── RGBEPP
|
||||||
```
|
```
|
||||||
|
|
||||||
Each directory corresponds to each function.
|
Each directory corresponds to each function.
|
||||||
|
@ -88,23 +99,44 @@ grep '>' Reference.fasta | sed "s@>@@g" > genes
|
||||||
|
|
||||||
## Process
|
## Process
|
||||||
|
|
||||||
### RGBEPP.sh functions
|
### RGBEPP functions
|
||||||
|
|
||||||
|
map postmap varcall consen codon align trim
|
||||||
|
|
||||||
|
|
||||||
- Function clean: Quality control + trimming (fastp)
|
- Function clean: Quality control + trimming (fastp)
|
||||||
- Function assembly: de novo assembly (spades)
|
- Function assembly: de novo assembly (spades)
|
||||||
- Function fasta: gather all fasta files from assembly directories (RGBEPP.sh)
|
- Function map: local nucleic acids alignment search against amino acids subject sequence (diamond, sortdiamond), mapping raw reads to its scaffolds sequences (bowtie2)
|
||||||
- Function map: local nucleic acids alignment search against amino acids subject sequence (diamond)
|
- Function postmap: Sorting and marking the read read alignment (samtools)
|
||||||
- Function pre: generate corresponding sequences based on blast-styled output (sortdiamond)
|
- Function varcall: variant calling and filtering (bcftools)
|
||||||
- Function split: splitting fasta sequence to directories based on the reference genome (splitfasta)
|
- Function consen: get consensus fasta file from vcf files (bcftools), then sort sequences based on gene name and taxa name (RGBEPP)
|
||||||
- Function merge: merge different taxa in the same reference exon gene to one fasta (RGBEPP.sh)
|
- Function codon (optional): only extract the exon sequence (exonerate)
|
||||||
- Function align: multiple sequence align based on Condon (macse)
|
- Function align: multiple sequence align based on condon (macse)
|
||||||
|
- Function trim: trimming based on codon (trimal, delstop)
|
||||||
|
|
||||||
|
### Arguments reuqirements for functions
|
||||||
|
|
||||||
|
| Functions | -g/--gene | -l/--list | -r/--reference |
|
||||||
|
| --------- | --------- | --------- | -------------- |
|
||||||
|
| clean | | ✔ | |
|
||||||
|
| assembly | | ✔ | |
|
||||||
|
| map | | ✔ | ✔ |
|
||||||
|
| postmap | | ✔ | |
|
||||||
|
| varcall | | ✔ | |
|
||||||
|
| consen | ✔ | ✔ | |
|
||||||
|
| codon | ✔ | | ✔ |
|
||||||
|
| align | ✔ | | |
|
||||||
|
| trim | ✔ | | |
|
||||||
|
|
||||||
|
|
||||||
### Downstream process
|
### Downstream process
|
||||||
|
|
||||||
- concatenate sequences via SeqCombGo or catsequences or sequencematrix
|
- concatenate sequences via SeqCombGo or catsequences or sequencematrix
|
||||||
- coalescent / concatenated phylogeny
|
- coalescent / concatenated phylogeny
|
||||||
|
|
||||||
# sortdiamond
|
## Inner software
|
||||||
|
|
||||||
|
### sortdiamond
|
||||||
|
|
||||||
Usage: sortdiamond diamond_output.m8 generated.fasta sseq,qstart,qend,bitscore/evalue,qseq(optional, default 1,6,7,11,17, start from 0) bitscore/evalue(optional, default bitscore)
|
Usage: sortdiamond diamond_output.m8 generated.fasta sseq,qstart,qend,bitscore/evalue,qseq(optional, default 1,6,7,11,17, start from 0) bitscore/evalue(optional, default bitscore)
|
||||||
|
|
||||||
|
@ -112,7 +144,7 @@ Default sseq is column 2, qstart is column 8, etc.
|
||||||
|
|
||||||
Diamond default output format (--outfmt 6) does not contain qseq, you must custom the output format under output format 6.
|
Diamond default output format (--outfmt 6) does not contain qseq, you must custom the output format under output format 6.
|
||||||
|
|
||||||
# splitfasta
|
### splitfasta
|
||||||
|
|
||||||
Usage: splitfasta sample.fasta
|
Usage: splitfasta sample.fasta
|
||||||
|
|
||||||
|
|
4
RGBEPP.d
4
RGBEPP.d
|
@ -18,8 +18,8 @@ void show_help(string pkgver) {
|
||||||
Author: Guoyi Zhang
|
Author: Guoyi Zhang
|
||||||
-c\t--config\tconfig file for software path (optional)
|
-c\t--config\tconfig file for software path (optional)
|
||||||
-g\t--genes\t\tgene file path (optional, if -r is specified)
|
-g\t--genes\t\tgene file path (optional, if -r is specified)
|
||||||
-f\t--functions\tfunctions type (optional): all clean map
|
-f\t--functions\tfunctions type (optional): all clean assembly
|
||||||
\t \tpostmap varcall consen codon align trim
|
\t \t map postmap varcall consen codon align trim
|
||||||
-h\t--help\t\tshow this information
|
-h\t--help\t\tshow this information
|
||||||
-l\t--list\t\tlist file path
|
-l\t--list\t\tlist file path
|
||||||
-m\t--memory\tmemory settings (optional, default 16 GB)
|
-m\t--memory\tmemory settings (optional, default 16 GB)
|
||||||
|
|
Loading…
Reference in a new issue