gde_linux/CORE/xylem/reform.doc

 reform                                               update  3 Feb 94

 NAME
   reform - reformats multiply-aligned sequences for printing.

 SYNOPSIS
   reform [-gpcnm] [-fx] [-sn] [-ln]  [file {ralign only}]
                            or
   ralign file parameters | reform [-gpcn] [-sn] [-ln] file

 DESCRIPTION

       g    Gaps are to be represented by dashes (-).
       p    Bases which agree with the consensus are
            represented by periods (.).
       c    Positions at which all sequences agree are
            capitalized in the consensus.
       n    Sequence data is nucleic acid. Protein default
       fx   Specify input file format, where x is
            r:RALIGN (default) p:PEARSON i:MBCRR-MASE (Intelligenetics)
       m    Input file contains multiline format sequences already aligned,
	    as opposed to ralign output. This option is obsolete, and is
            equivalent to -fp.
       ln   The output linelength is set to n.
            Default is 70.
       sn   numbering starts with n (default=0)

     file   Sequence file as described in ralign docu-
            mentation.  reform needs to re-read the
            sequence file read by ralign to get the
            names of the sequences, which ralign ignores.
	    This filename is only included for ralign output.
	    If -m is set, file is ignored, and sequence names
	    must be read from the input.

     Note that positions in the consensus at which no nucleotide is in the
     majority are represented by n's (for nucleic acids) or x's (for proteins),
     rather than periods, as in ralign.

     Gaps in the input sequences may be represented by either blanks or dashes.

  INPUT FILE FORMATS

     (a) ralign (default, -fr)
     As described in ralign documentation, the input file (which is assumed to
     be ralign output) must have each sequence on a single long line.  All
     characters on a given line will be included in the alignment.  All lines
     must be exactly the same length. For example, if ralign had been read
     sequence from a file called 'allcab.seq' and written output to 'allcab.ral',
     the following command might be used:

     reform allcab.seq <allcab.ralign >allcab.ref

     (b) Pearson (-fp, -m)
     Compatible with sequence files used by Pearson's fasta programs as shown:
     >name1
     sequence1
     >name2
     sequence2
     ...
     >namen
     sequencen

     Sequences may run over many lines and line length does not have to be
     uniform. However, both dashes ('-') and blanks (' ') will be read in
     as gaps in the alignment. A right arrow (>) at the beginning of a line
     indicates the name line at the beginning of a new sequence.

     Any line beginning with a semicolon (';') will be considered a comment,
     and will be ignored.

     (c) MBCRR-MASE (Intelligenetics) (-fi)
     Compatible with .mase files produced by MBCRR's mase and pima programs,
     which use the Intelligenetics format as shown:

     ;one or more comment lines
     name1
     sequence1
     ;one or more comment lines
     name2
     sequence2
     ...
     ;one or more comment lines
     namen
     sequencen

     Sequences may run over many lines and line length does not have to be
     uniform. However, both dashes ('-') and blanks (' ') will be read in
     as gaps in the alignment. Each sequence MUST begin with at least one
     comment line. When a comment line is encountered, that signals the
     beginning of a new sequence. The first line after the comment is read
     as the name, and the sequence begins on the next line after that.

  SEE ALSO  ralign, mase

     AUTHOR
       Dr. Brian Fristensky
       Dept. of Plant Science
       University of Manitoba
       Winnipeg, MB  Canada  R3T 2N2
       Phone: 204-474-6085
       FAX: 204-261-5732
       frist@cc.umanitoba.ca

     REFERENCE
       Fristensky, B. (1993) Feature expressions: creating and manipulating
       sequence datasets. Nucleic Acids Research 21:5997-6003.