gde_linux/CORE/xylem/reform.doc

108 lines
4.4 KiB
Plaintext

reform update 3 Feb 94
NAME
reform - reformats multiply-aligned sequences for printing.
SYNOPSIS
reform [-gpcnm] [-fx] [-sn] [-ln] [file {ralign only}]
or
ralign file parameters | reform [-gpcn] [-sn] [-ln] file
DESCRIPTION
g Gaps are to be represented by dashes (-).
p Bases which agree with the consensus are
represented by periods (.).
c Positions at which all sequences agree are
capitalized in the consensus.
n Sequence data is nucleic acid. Protein default
fx Specify input file format, where x is
r:RALIGN (default) p:PEARSON i:MBCRR-MASE (Intelligenetics)
m Input file contains multiline format sequences already aligned,
as opposed to ralign output. This option is obsolete, and is
equivalent to -fp.
ln The output linelength is set to n.
Default is 70.
sn numbering starts with n (default=0)
file Sequence file as described in ralign docu-
mentation. reform needs to re-read the
sequence file read by ralign to get the
names of the sequences, which ralign ignores.
This filename is only included for ralign output.
If -m is set, file is ignored, and sequence names
must be read from the input.
Note that positions in the consensus at which no nucleotide is in the
majority are represented by n's (for nucleic acids) or x's (for proteins),
rather than periods, as in ralign.
Gaps in the input sequences may be represented by either blanks or dashes.
INPUT FILE FORMATS
(a) ralign (default, -fr)
As described in ralign documentation, the input file (which is assumed to
be ralign output) must have each sequence on a single long line. All
characters on a given line will be included in the alignment. All lines
must be exactly the same length. For example, if ralign had been read
sequence from a file called 'allcab.seq' and written output to 'allcab.ral',
the following command might be used:
reform allcab.seq <allcab.ralign >allcab.ref
(b) Pearson (-fp, -m)
Compatible with sequence files used by Pearson's fasta programs as shown:
>name1
sequence1
>name2
sequence2
...
>namen
sequencen
Sequences may run over many lines and line length does not have to be
uniform. However, both dashes ('-') and blanks (' ') will be read in
as gaps in the alignment. A right arrow (>) at the beginning of a line
indicates the name line at the beginning of a new sequence.
Any line beginning with a semicolon (';') will be considered a comment,
and will be ignored.
(c) MBCRR-MASE (Intelligenetics) (-fi)
Compatible with .mase files produced by MBCRR's mase and pima programs,
which use the Intelligenetics format as shown:
;one or more comment lines
name1
sequence1
;one or more comment lines
name2
sequence2
...
;one or more comment lines
namen
sequencen
Sequences may run over many lines and line length does not have to be
uniform. However, both dashes ('-') and blanks (' ') will be read in
as gaps in the alignment. Each sequence MUST begin with at least one
comment line. When a comment line is encountered, that signals the
beginning of a new sequence. The first line after the comment is read
as the name, and the sequence begins on the next line after that.
SEE ALSO ralign, mase
AUTHOR
Dr. Brian Fristensky
Dept. of Plant Science
University of Manitoba
Winnipeg, MB Canada R3T 2N2
Phone: 204-474-6085
FAX: 204-261-5732
frist@cc.umanitoba.ca
REFERENCE
Fristensky, B. (1993) Feature expressions: creating and manipulating
sequence datasets. Nucleic Acids Research 21:5997-6003.