`list` is the text file containing all samples, if your raw data is following the style ${list_name}\_R1.fastq.gz and ${list_name}\_R2.fastq.gz, ${list_name} is what you should list in `list` file. The easy way to get it in Linux/Unix system is the following command
```
cd 00_raw
ls | sed "s@_R[12].fastq.gz@@g" > ../list
cd ..
```
`genes` is the text file containing all gene names from the reference fasta file. The easy way to get it in Linux/Unix system is the following command
```
grep '>' Reference.fasta | sed "s@>@@g" > genes
```
`reference.aa.fasta` can be replaced by another other name, but it must contain reference amino acids genome in fasta format
# Progress
## RGBEPP.sh functions
- Function clean: Quality control + trimming (fastp)
- Function assembly: de novo assembly (spades)
- Function fasta: gather all fasta files from assembly directories (RGBEPP.sh)
- Function map: local nucleic acids alignment search against amino acids subject sequence (diamond)
- Function pre: generate corresponding sequences based on blast-styled output (sortdiamond)
- Function split: splitting fasta sequence to directories based on the reference genome (splitfasta)
- Function merge: merge different taxa in the same reference exon gene to one fasta (RGBEPP.sh)
- Function align: multiple sequence align based on Condon (macse)
## Downstream process
- concatenate sequences via SeqCombGo or catsequences or sequencematrix