gde_linux/CORE/xylem/shuffle.doc

     shuffle.doc                                           update 3 Feb 94

     SYNOPSIS
           shuffle -sn [-wn -on]

     DESCRIPTION
          Shuffles sequences locally. See Lipman DJ, Wilbur WJ, Smith TF
          and Waterman MS (1984) On the statistical significance of nucleic
          acid similarities. Nucl. Acids Res. 12:215-226.
          -sn    n is a random integer between 0 and 32767. This number
                 must be provided for each run.

          -wn    n is an integer, indicating the width of the window for
                 random localization. If w exceeds the length of a sequence,
                 or is negative, the entire sequence is scrambled as a single
                 window. This is also the case if w is not specified.

          -on    n is an integer, indicating the number of nucleotides
                 overlap between adjacent windows. It should never exceed
                 the window size.  o defaults to 0 if not specified.

          If w and o are specified, overlapping windows of w nucleotides
          are shuffled, thus preserving the local characteristic base
          composition. Windows overlap by o nucleotides.
          If w and o are not specified, each sequence is shuffled globally,
          thus preserving the overall base composition, but not the local
          variations in comp.

          Any number of sequences may be processed from a single input
          file.  In Pearson-format files, each new sequence begins with a
          '>' comment line, indicating the name and a short description of
          the sequence.

          No distinction is made between protein or nucleic acid sequences.
          That is, shuffle will read any of the following characters as
          sequence:

          T,U,C,A,G,N,R,Y,M,W,S,K,D,H,V,B,L,Z,F,P,E,I,Q,X,*,-

          where '*' is the result of translating a stop codon, and '-'
          is a gap generated during sequence alignment. Lowercase is
          also accepted.

     EXAMPLE
          A sample output file is shown below. Note that the first two
          lines of output are comment lines, listing the version of the
          program and the parameters used in the run.

          >SHUFFLE                   VERSION 11/ 8/93
          >RANDOM SEED:     9873          WINDOW:   12          OVERLAP:   3
          >BAZFAZ - Borborigmus azerbi F-actin-zeta gene
          ctgagtagctagtcctaaatagttagtccatagtactagtacgggtcgtt
          cacccttgggcagtg.....(etc.)

     AUTHOR
       Dr. Brian Fristensky
       Dept. of Plant Science
       University of Manitoba
       Winnipeg, MB  Canada  R3T 2N2
       Phone: 204-474-6085
       FAX: 204-261-5732
       frist@cc.umanitoba.ca

     REFERENCE
       Fristensky, B. (1993) Feature expressions: creating and manipulating
       sequence datasets. Nucleic Acids Research 21:5997-6003.