124 lines
5.7 KiB
Text
124 lines
5.7 KiB
Text
|
prot2nuc update 10 Aug 94
|
||
|
|
||
|
NAME
|
||
|
prot2nuc - reverse translates protein into nucleic acid
|
||
|
|
||
|
SYNOPSIS
|
||
|
prot2nuc [-ln -gn] < input > output
|
||
|
|
||
|
DESCRIPTION
|
||
|
prot2nuc reads a file containing an amino acid sequence
|
||
|
and writes the corresponding reverse translated nucleic acid
|
||
|
sequence, using the standard IUPAC-IUB ambiguity codes to output.
|
||
|
The amino acid sequence may contain internal stop '*' characters.
|
||
|
That is, all legal amino acid characters will be processed.
|
||
|
|
||
|
-ln print n amino acids/codons per line. (default = 25)
|
||
|
|
||
|
-gn number the amino acid sequence every n amino acids/codons.
|
||
|
(defalut = 5)
|
||
|
|
||
|
If l is not evenly divisible by g, the defaults are used.
|
||
|
|
||
|
input - If the first line of the file begins with '>' or ';',
|
||
|
input will be read as the standard .wrp (Pearson) format,
|
||
|
such as that produced by getob:
|
||
|
|
||
|
>name
|
||
|
sequence lines
|
||
|
|
||
|
|
||
|
Otherwise, it will be assumed that the file ONLY contains
|
||
|
sequence, and all legal IUPAC/IUB DNA characters will be
|
||
|
read as sequence.
|
||
|
|
||
|
output - The output begins with a header, listing the both
|
||
|
1 and 3 letter amino acid codes [J. Biol. Chem. 243, 3557-3559
|
||
|
(1968)], as well as the nucleic acid ambiguity codes [Cornish-
|
||
|
Bowden (1985) Nucl. Acids Res. 13:3021-3030.]. The amino acid
|
||
|
sequence, along with its reverse translation, are then printed on
|
||
|
lines of l amino acids/codons, numbering every g amino acids/codons.
|
||
|
Non-ambiguous nucleotides appear capitalized, while ambiguous
|
||
|
nucleotides are in lowercase. A sample output file appears below:
|
||
|
|
||
|
PROT2NUC Version 8/10/94
|
||
|
|
||
|
IUPAC-IUP AMINO ACID SYMBOLS
|
||
|
[J. Biol. Chem. 243, 3557-3559 (1968)]
|
||
|
|
||
|
Phe F Leu L Ile I
|
||
|
Met M Val V Ser S
|
||
|
Pro P Thr T Ala A
|
||
|
Tyr Y His H Gln Q
|
||
|
Asn N Lys K Asp D
|
||
|
Glu E Cys C Trp W
|
||
|
Arg R Gly G STOP *
|
||
|
Asx B Glx Z UNKNOWN X
|
||
|
|
||
|
|
||
|
IUPAC-IUB SYMBOLS FOR NUCLEOTIDE NOMENCLATURE
|
||
|
[Cornish-Bowden (1985) Nucl. Acids Res. 13: 3021-3030.]
|
||
|
|
||
|
Symbol Meaning | Symbol Meaning
|
||
|
------------------------------------+---------------------------------
|
||
|
G Guanine | k G or T
|
||
|
A Adenine | s G or C
|
||
|
C Cytosine | w A or T
|
||
|
T Thymine | h A or C or T
|
||
|
U Uracil | b G or T or C
|
||
|
r Purine (A or G) | v G or C or A
|
||
|
y Pyrimidine (C or T) | d G or T or A
|
||
|
m A or C | n G or A or T or C
|
||
|
|
||
|
pI39
|
||
|
5 10 15 20
|
||
|
M E K K S L A A L S F L L L L V L F V A
|
||
|
ATGGArAArAArTCnCTnGCnGCnCTnTCnTTyCTnCTnCTnCTnGTnCTnTTyGTnGCn
|
||
|
AGyTTr TTrAGy TTrTTrTTrTTr TTr
|
||
|
|
||
|
25 30 35 40
|
||
|
Q E I V V T E A N T C E H L A D T Y R G
|
||
|
CArGArAThGTnGTnACnGArGCnAAyACnTGyGArCAyCTnGCnGAyACnTAyCGnGGn
|
||
|
TTr AGr
|
||
|
|
||
|
45 50 55 60
|
||
|
V C F T N A S C D D H C K N K A H L I S
|
||
|
GTnTGyTTyACnAAyGCnTCnTGyGAyGAyCAyTGyAArAAyAArGCnCAyCTnAThTCn
|
||
|
AGy TTr AGy
|
||
|
|
||
|
65 70
|
||
|
G T C H D W K C F C T Q N C
|
||
|
GGnACnTGyCAyGAyTGGAArTGyTTyTGyACnCArAAyTGy
|
||
|
|
||
|
|
||
|
With the Universal Genetic code, ambiguity symbols make it possible
|
||
|
to represent all possible codons for an amino acid using two output
|
||
|
lines. It is important to realize that the ambiguities on each line
|
||
|
can not be combined. For example, CTn and TTr represent all codons for
|
||
|
Leucine. However, attempting to combine them into a single triplet,
|
||
|
yTn, would be incorrect. For example, TTT and TTC are codons for
|
||
|
Phenylalanine, not Leucine.
|
||
|
|
||
|
FUTURE PLANS
|
||
|
1. It wouldn't be hard to have the output printed as nucleic acid
|
||
|
sequences in Perason format, so that the output could be read back
|
||
|
into GDE. I don't know why you would want to do this, but it could
|
||
|
be done.
|
||
|
2. Right now, only the Universal Genetic Code is used, but it should
|
||
|
be possible to read in alternative genetic codes, have prot2nuc
|
||
|
figure out the ambiguity rules (as is already done in ribosome) and
|
||
|
print out the appropriate ambiguous codons.
|
||
|
3. It might be useful to have each possible codon printed out, rather
|
||
|
than ambiguous codons. This would take up a lot more space and
|
||
|
wouldn't be as pretty. If there's a lot of demand I could do this.
|
||
|
|
||
|
AUTHOR
|
||
|
Dr. Brian Fristensky
|
||
|
Dept. of Plant Science
|
||
|
University of Manitoba
|
||
|
Winnipeg, MB Canada R3T 2N2
|
||
|
Phone: 204-474-6085
|
||
|
FAX: 204-261-5732
|
||
|
frist@cc.umanitoba.ca
|
||
|
|