66 lines
3.3 KiB
Text
66 lines
3.3 KiB
Text
shuffle.doc update 3 Feb 94
|
|
|
|
SYNOPSIS
|
|
shuffle -sn [-wn -on]
|
|
|
|
DESCRIPTION
|
|
Shuffles sequences locally. See Lipman DJ, Wilbur WJ, Smith TF
|
|
and Waterman MS (1984) On the statistical significance of nucleic
|
|
acid similarities. Nucl. Acids Res. 12:215-226.
|
|
-sn n is a random integer between 0 and 32767. This number
|
|
must be provided for each run.
|
|
|
|
-wn n is an integer, indicating the width of the window for
|
|
random localization. If w exceeds the length of a sequence,
|
|
or is negative, the entire sequence is scrambled as a single
|
|
window. This is also the case if w is not specified.
|
|
|
|
-on n is an integer, indicating the number of nucleotides
|
|
overlap between adjacent windows. It should never exceed
|
|
the window size. o defaults to 0 if not specified.
|
|
|
|
If w and o are specified, overlapping windows of w nucleotides
|
|
are shuffled, thus preserving the local characteristic base
|
|
composition. Windows overlap by o nucleotides.
|
|
If w and o are not specified, each sequence is shuffled globally,
|
|
thus preserving the overall base composition, but not the local
|
|
variations in comp.
|
|
|
|
Any number of sequences may be processed from a single input
|
|
file. In Pearson-format files, each new sequence begins with a
|
|
'>' comment line, indicating the name and a short description of
|
|
the sequence.
|
|
|
|
No distinction is made between protein or nucleic acid sequences.
|
|
That is, shuffle will read any of the following characters as
|
|
sequence:
|
|
|
|
T,U,C,A,G,N,R,Y,M,W,S,K,D,H,V,B,L,Z,F,P,E,I,Q,X,*,-
|
|
|
|
where '*' is the result of translating a stop codon, and '-'
|
|
is a gap generated during sequence alignment. Lowercase is
|
|
also accepted.
|
|
|
|
EXAMPLE
|
|
A sample output file is shown below. Note that the first two
|
|
lines of output are comment lines, listing the version of the
|
|
program and the parameters used in the run.
|
|
|
|
>SHUFFLE VERSION 11/ 8/93
|
|
>RANDOM SEED: 9873 WINDOW: 12 OVERLAP: 3
|
|
>BAZFAZ - Borborigmus azerbi F-actin-zeta gene
|
|
ctgagtagctagtcctaaatagttagtccatagtactagtacgggtcgtt
|
|
cacccttgggcagtg.....(etc.)
|
|
|
|
AUTHOR
|
|
Dr. Brian Fristensky
|
|
Dept. of Plant Science
|
|
University of Manitoba
|
|
Winnipeg, MB Canada R3T 2N2
|
|
Phone: 204-474-6085
|
|
FAX: 204-261-5732
|
|
frist@cc.umanitoba.ca
|
|
|
|
REFERENCE
|
|
Fristensky, B. (1993) Feature expressions: creating and manipulating
|
|
sequence datasets. Nucleic Acids Research 21:5997-6003.
|