gde_linux/CORE/xylem/shuffle.doc

67 lines
3.3 KiB
Plaintext

shuffle.doc update 3 Feb 94
SYNOPSIS
shuffle -sn [-wn -on]
DESCRIPTION
Shuffles sequences locally. See Lipman DJ, Wilbur WJ, Smith TF
and Waterman MS (1984) On the statistical significance of nucleic
acid similarities. Nucl. Acids Res. 12:215-226.
-sn n is a random integer between 0 and 32767. This number
must be provided for each run.
-wn n is an integer, indicating the width of the window for
random localization. If w exceeds the length of a sequence,
or is negative, the entire sequence is scrambled as a single
window. This is also the case if w is not specified.
-on n is an integer, indicating the number of nucleotides
overlap between adjacent windows. It should never exceed
the window size. o defaults to 0 if not specified.
If w and o are specified, overlapping windows of w nucleotides
are shuffled, thus preserving the local characteristic base
composition. Windows overlap by o nucleotides.
If w and o are not specified, each sequence is shuffled globally,
thus preserving the overall base composition, but not the local
variations in comp.
Any number of sequences may be processed from a single input
file. In Pearson-format files, each new sequence begins with a
'>' comment line, indicating the name and a short description of
the sequence.
No distinction is made between protein or nucleic acid sequences.
That is, shuffle will read any of the following characters as
sequence:
T,U,C,A,G,N,R,Y,M,W,S,K,D,H,V,B,L,Z,F,P,E,I,Q,X,*,-
where '*' is the result of translating a stop codon, and '-'
is a gap generated during sequence alignment. Lowercase is
also accepted.
EXAMPLE
A sample output file is shown below. Note that the first two
lines of output are comment lines, listing the version of the
program and the parameters used in the run.
>SHUFFLE VERSION 11/ 8/93
>RANDOM SEED: 9873 WINDOW: 12 OVERLAP: 3
>BAZFAZ - Borborigmus azerbi F-actin-zeta gene
ctgagtagctagtcctaaatagttagtccatagtactagtacgggtcgtt
cacccttgggcagtg.....(etc.)
AUTHOR
Dr. Brian Fristensky
Dept. of Plant Science
University of Manitoba
Winnipeg, MB Canada R3T 2N2
Phone: 204-474-6085
FAX: 204-261-5732
frist@cc.umanitoba.ca
REFERENCE
Fristensky, B. (1993) Feature expressions: creating and manipulating
sequence datasets. Nucleic Acids Research 21:5997-6003.