add: more details on cpp binary

2024-07-05 17:17:31 +10:00 · 2024-07-05 17:17:31 +10:00 · 69bf191a83
commit 69bf191a83
parent 50089a62d0
1 changed files with 24 additions and 10 deletions
--- a/README.md
+++ b/README.md
@ -6,9 +6,9 @@ License: GPL-2.0-only

 Author: Guoyi Zhang

-# Requirements
+## Requirements

-## External software 
+### External software 

 - GNU Bash (provide cd)
 - GNU coreutils (provide cp mv mkdir mv)
@ -20,14 +20,14 @@ Author: Guoyi Zhang
 - macse (default recognized path: /usr/share/java/macse.jar)
 - GNU parallel

-## Internal software
+### Internal software

 - splitfasta (default recognized path: /usr/bin/splitfasta)
 - sortdiamond (default recognized path: /usr/bin/sortdiamond)

-# Arguments
+## Arguments

-## Details
+### Details

 ```
 -c	--contigs	contings type: scaffolds or contigs
@ -45,7 +45,7 @@ Author: Guoyi Zhang
 for example: bash RGBEPP.sh -c scaffolds -f all -l list -g genes -r reference.aa.fasta 
 ```

-## Directories Design
+### Directories Design

 ```
 .
@ -68,7 +68,7 @@ Each directory corresponds to each function.

 `00_raw` should conatin all raw fastq.gz data.

-## Text Files
+### Text Files

 `list` is the text file containing all samples, if your raw data is following the style ${list_name}\_R1.fastq.gz and  ${list_name}\_R2.fastq.gz, ${list_name} is what you should list in `list` file. The easy way to get it in Linux/Unix system is the following command

@ -86,9 +86,9 @@ grep '>' Reference.fasta | sed "s@>@@g" > genes

 `reference.aa.fasta` can be replaced by another other name, but it must contain reference amino acids genome in fasta format

-# Progress
+## Process

-## RGBEPP.sh functions
+### RGBEPP.sh functions

 - Function clean: Quality control + trimming (fastp)
 - Function assembly: de novo assembly (spades)
@ -99,9 +99,23 @@ grep '>' Reference.fasta | sed "s@>@@g" > genes
 - Function merge: merge different taxa in the same reference exon gene to one fasta (RGBEPP.sh)
 - Function align: multiple sequence align based on Condon (macse)

-## Downstream process
+### Downstream process

 - concatenate sequences via SeqCombGo or catsequences or sequencematrix
 - coalescent / concatenated phylogeny

+# sortdiamond
+
+Usage: sortdiamond diamond_output.m8 generated.fasta sseq,qstart,qend,bitscore/evalue,qseq(optional, default 1,6,7,11,17, start from 0) bitscore/evalue(optional, default bitscore)
+
+Default sseq is column 2, qstart is column 8, etc.
+
+Diamond default output format (--outfmt 6) does not contain qseq, you must custom the output format under output format 6. 
+
+# splitfasta
+
+Usage: splitfasta sample.fasta
+
+It always creates directories in the path that you run the splitfasta, and puts split fasta into the directory.
+