107 lines
4.4 KiB
Text
107 lines
4.4 KiB
Text
reform update 3 Feb 94
|
|
|
|
NAME
|
|
reform - reformats multiply-aligned sequences for printing.
|
|
|
|
SYNOPSIS
|
|
reform [-gpcnm] [-fx] [-sn] [-ln] [file {ralign only}]
|
|
or
|
|
ralign file parameters | reform [-gpcn] [-sn] [-ln] file
|
|
|
|
DESCRIPTION
|
|
|
|
g Gaps are to be represented by dashes (-).
|
|
p Bases which agree with the consensus are
|
|
represented by periods (.).
|
|
c Positions at which all sequences agree are
|
|
capitalized in the consensus.
|
|
n Sequence data is nucleic acid. Protein default
|
|
fx Specify input file format, where x is
|
|
r:RALIGN (default) p:PEARSON i:MBCRR-MASE (Intelligenetics)
|
|
m Input file contains multiline format sequences already aligned,
|
|
as opposed to ralign output. This option is obsolete, and is
|
|
equivalent to -fp.
|
|
ln The output linelength is set to n.
|
|
Default is 70.
|
|
sn numbering starts with n (default=0)
|
|
|
|
file Sequence file as described in ralign docu-
|
|
mentation. reform needs to re-read the
|
|
sequence file read by ralign to get the
|
|
names of the sequences, which ralign ignores.
|
|
This filename is only included for ralign output.
|
|
If -m is set, file is ignored, and sequence names
|
|
must be read from the input.
|
|
|
|
Note that positions in the consensus at which no nucleotide is in the
|
|
majority are represented by n's (for nucleic acids) or x's (for proteins),
|
|
rather than periods, as in ralign.
|
|
|
|
Gaps in the input sequences may be represented by either blanks or dashes.
|
|
|
|
INPUT FILE FORMATS
|
|
|
|
(a) ralign (default, -fr)
|
|
As described in ralign documentation, the input file (which is assumed to
|
|
be ralign output) must have each sequence on a single long line. All
|
|
characters on a given line will be included in the alignment. All lines
|
|
must be exactly the same length. For example, if ralign had been read
|
|
sequence from a file called 'allcab.seq' and written output to 'allcab.ral',
|
|
the following command might be used:
|
|
|
|
reform allcab.seq <allcab.ralign >allcab.ref
|
|
|
|
(b) Pearson (-fp, -m)
|
|
Compatible with sequence files used by Pearson's fasta programs as shown:
|
|
>name1
|
|
sequence1
|
|
>name2
|
|
sequence2
|
|
...
|
|
>namen
|
|
sequencen
|
|
|
|
Sequences may run over many lines and line length does not have to be
|
|
uniform. However, both dashes ('-') and blanks (' ') will be read in
|
|
as gaps in the alignment. A right arrow (>) at the beginning of a line
|
|
indicates the name line at the beginning of a new sequence.
|
|
|
|
Any line beginning with a semicolon (';') will be considered a comment,
|
|
and will be ignored.
|
|
|
|
(c) MBCRR-MASE (Intelligenetics) (-fi)
|
|
Compatible with .mase files produced by MBCRR's mase and pima programs,
|
|
which use the Intelligenetics format as shown:
|
|
|
|
;one or more comment lines
|
|
name1
|
|
sequence1
|
|
;one or more comment lines
|
|
name2
|
|
sequence2
|
|
...
|
|
;one or more comment lines
|
|
namen
|
|
sequencen
|
|
|
|
Sequences may run over many lines and line length does not have to be
|
|
uniform. However, both dashes ('-') and blanks (' ') will be read in
|
|
as gaps in the alignment. Each sequence MUST begin with at least one
|
|
comment line. When a comment line is encountered, that signals the
|
|
beginning of a new sequence. The first line after the comment is read
|
|
as the name, and the sequence begins on the next line after that.
|
|
|
|
SEE ALSO ralign, mase
|
|
|
|
AUTHOR
|
|
Dr. Brian Fristensky
|
|
Dept. of Plant Science
|
|
University of Manitoba
|
|
Winnipeg, MB Canada R3T 2N2
|
|
Phone: 204-474-6085
|
|
FAX: 204-261-5732
|
|
frist@cc.umanitoba.ca
|
|
|
|
REFERENCE
|
|
Fristensky, B. (1993) Feature expressions: creating and manipulating
|
|
sequence datasets. Nucleic Acids Research 21:5997-6003.
|