860 lines
30 KiB
Text
860 lines
30 KiB
Text
|
.NPA
|
||
|
.SP 1
|
||
|
.left margin1
|
||
|
@-1. TX 0 @General
|
||
|
.sp
|
||
|
@-2. T 0 @Screen control
|
||
|
.sp
|
||
|
@-2. X 0 @Screen
|
||
|
.sp
|
||
|
@-3. TX 0 @Dictionary analysis
|
||
|
.sp
|
||
|
@0. TX -1 @MEP
|
||
|
.left margin2
|
||
|
.para
|
||
|
This is a program for analysing families of nucleotide sequences in order
|
||
|
to find common motifs and potential binding sites.
|
||
|
The ideas in this program were described in Staden, R. "Methods
|
||
|
for discovering novel motifs in nucleic acid sequences".
|
||
|
Computer Applications in the Biosciences, 5, 293-298, (1989).
|
||
|
.PARA
|
||
|
The program can read
|
||
|
sequences stored in either of two formats: 1) all sequences aligned in a
|
||
|
single file; 2) all sequences in separate files and accessed through a file
|
||
|
of file names.
|
||
|
.PARA
|
||
|
The program contains functions that can answer several questions
|
||
|
about a set of sequences:
|
||
|
.SK1
|
||
|
.left margin2
|
||
|
Which words are most common?
|
||
|
.left margin2
|
||
|
Which words occur in the most sequences?
|
||
|
.left margin2
|
||
|
Which words contain the most information?
|
||
|
.left margin2
|
||
|
Which words occur in equivalent positions in the sequences?
|
||
|
.left margin2
|
||
|
Which words are inverted repeats?
|
||
|
.left margin2
|
||
|
Which words occur on both strands of the sequences?
|
||
|
.left margin2
|
||
|
Where are the inverted repeats?
|
||
|
.left margin2
|
||
|
Where are the fuzzy words?
|
||
|
.para
|
||
|
Most of the program is
|
||
|
concerned with analysing
|
||
|
what it terms "fuzzy
|
||
|
words" within the set of sequences. The analysis is explained
|
||
|
below. Note that the standard version of the programs is limited
|
||
|
to words of maximum length 8 letters, and a maximum fuzziness
|
||
|
of 2.
|
||
|
.para
|
||
|
The following analyses (preceded by their option numbers) are included:
|
||
|
.lit
|
||
|
? = Help
|
||
|
! = Quit
|
||
|
3 = Read new sequences
|
||
|
4 = Redefine active region
|
||
|
5 = List the sequences
|
||
|
6 = List text file
|
||
|
7 = Direct output to disk
|
||
|
10 = Clear graphics
|
||
|
11 = Clear text
|
||
|
12 = Draw ruler
|
||
|
13 = Use cross hair
|
||
|
14 = Reset margins
|
||
|
15 = Label diagram
|
||
|
16 = Draw map
|
||
|
17 = Search for strings
|
||
|
18 = Set strand
|
||
|
19 = Set composition
|
||
|
20 = Set word length
|
||
|
21 = Set number of mismatches
|
||
|
22 = Show settings
|
||
|
23 = Make dictionary Dw
|
||
|
24 = Make dictionary Ds
|
||
|
25 = Make fuzzy dictionary Dm from Dw
|
||
|
26 = Make fuzzy dictionary Dm from Ds
|
||
|
27 = Make fuzzy dictionary Dh from Dm
|
||
|
28 = Examine fuzzy dictionary Dm
|
||
|
29 = Examine fuzzy dictionary Dh
|
||
|
30 = Examine words in Dm
|
||
|
31 = Examine words in Dh
|
||
|
32 = Save or restore a dictionary
|
||
|
33 = Find inverted repeats
|
||
|
.end lit
|
||
|
.para
|
||
|
Some of these methods produce graphical
|
||
|
results
|
||
|
and so the
|
||
|
program is generally used from a graphics terminal (a vdu on which lines
|
||
|
and points can be drawn as well as characters).
|
||
|
.para
|
||
|
.LEFT MARGIN2
|
||
|
The positions of each of the plots is defined relative to a users drawing
|
||
|
board which has size 1-10,000 in x and 1-10,000 in y.
|
||
|
Plots for
|
||
|
each option are drawn in a window defined by x0,y0 and xlength,ylength.
|
||
|
Where x0,y0 is the position of the bottom left hand corner of the window,
|
||
|
and xlength is the width of the window and ylength the
|
||
|
height of the window.
|
||
|
.lit
|
||
|
--------------------------------------------------------- 10,000
|
||
|
1 1
|
||
|
1 -------------------------------------- ^ 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 ylength 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 1 1
|
||
|
1 -------------------------------------- v 1
|
||
|
1 x0,y0^ 1
|
||
|
1 <---------------xlength--------------> 1
|
||
|
--------------------------------------------------------- 1
|
||
|
1 10,000
|
||
|
|
||
|
.end lit
|
||
|
All values are in drawing board units (i.e. 1-10,000, 1-10,000).
|
||
|
The default window positions are read from a file "MEPMARG" when the
|
||
|
program is started. Users can have their own file if required.
|
||
|
.para
|
||
|
The options for the program are accessed from 3 main menus: general, screen
|
||
|
control and dictionary analylsis.
|
||
|
Both menus and options are selected by number.
|
||
|
.para
|
||
|
The most important and novel part of the program is its use of "fuzzy
|
||
|
dictionaries" and an information theory measure, to help show the most
|
||
|
interesting motifs.
|
||
|
|
||
|
Central to the method is the idea of a fuzzy dictionary of word
|
||
|
frequencies. A dictionary of word frequencies is an ordered list of
|
||
|
all the words in the sequences and a count of the number of times
|
||
|
that they occur. A fuzzy dictionary is an equivalent list but which
|
||
|
contains instead, for each word, a count of the number of times
|
||
|
similar words occur in the sequences. We term words that are
|
||
|
similar "relations". The fuzziness is defined by the number of
|
||
|
letters in a word that are allowed to be different. So if we had a
|
||
|
fuzziness of 1 we allow 1 letter to be different. For example, with
|
||
|
a fuzziness of 1, the entry in the fuzzy dictionary for the word
|
||
|
TTTTTT would contain a count of the numbers of times TTTTTT
|
||
|
occured plus the number of times all words differing by exactly
|
||
|
one letter from TTTTTT occured.
|
||
|
.para
|
||
|
Once the fuzzy dictionary has been created we can examine it in
|
||
|
several ways to find candidate control sequences. The simplest
|
||
|
question we can ask is which word in the dictionary is the most
|
||
|
common. Sometimes this simple criterion of "most common" may
|
||
|
be adequate to discover a new motif but in general we would not
|
||
|
expect it to be sufficient. For example some words will be common
|
||
|
simply because of a base composition bias in the sequences being
|
||
|
analysed. In addition a word can be the most frequent and yet not
|
||
|
be "well defined". This last point is best explained by an example.
|
||
|
.para
|
||
|
Suppose we were looking at two letter words and allowing one
|
||
|
mismatch, and that there were 10 occurences of TT and 5 of AC.
|
||
|
We could align the 10 words that were one letter different from TT
|
||
|
and the 5 that were related to AC. Then we could count the
|
||
|
number of times each base occured in each position for each of
|
||
|
these two sets of words. Suppose we got the two base frequency
|
||
|
tables shown below.
|
||
|
.lit
|
||
|
TT AC
|
||
|
T 6 4 T 1 0
|
||
|
C 1 3 C 0 4
|
||
|
A 1 2 A 4 1
|
||
|
G 2 1 G 0 0
|
||
|
|
||
|
.end lit
|
||
|
These tables show that although TT occurs (with one letter
|
||
|
mismatch) more often than AC, the ratio of base frequencies for
|
||
|
AC at 4/5, 4/5 is higher than those for TT at 6/10, 4/10. Hence we
|
||
|
would say that AC was better defined than TT.
|
||
|
Expressing this another way we would say that the definition of AC
|
||
|
contained more information than that for TT. The program
|
||
|
calculates the information content in a way that takes into account
|
||
|
both the sequence composition and the level of definition of the
|
||
|
motif.
|
||
|
.para
|
||
|
Definitions
|
||
|
|
||
|
.para
|
||
|
Here we deal only with the dictionary analysis.
|
||
|
Suppose we are dealing with a set of
|
||
|
sequences and are examining them for words that are six
|
||
|
characters in length.
|
||
|
|
||
|
.para
|
||
|
Dictionary Dw contains a count of the number of times each word
|
||
|
occurs in the set of sequences. For example the entry for TTTTTT
|
||
|
contains a value equal to the number of times the word TTTTTT
|
||
|
occurs in the set of sequences.
|
||
|
|
||
|
.para
|
||
|
Dictionary Ds contains a count of the number of different sequences in
|
||
|
which each word occurs. For example if the entry for word TTTTTT
|
||
|
contains the value 10, it denotes that the word TTTTTT occurs in ten
|
||
|
different sequences. Unlike Dw it only counts words once for each
|
||
|
sequence. For example if we had a set of 100 sequences, the maximum
|
||
|
possible value that Ds could take is 100, and this would only happen if
|
||
|
a word occurred in every sequence. However for the same set of
|
||
|
sequences, Dw could contain values greater than 100, and this would
|
||
|
show that a word had occurred more than once in at least one
|
||
|
sequence.
|
||
|
|
||
|
.para
|
||
|
From either of the two dictionaries Dw or Ds we can calculate a fuzzy
|
||
|
dictionary Dm. For each word, the entry in the fuzzy dictionary Dm
|
||
|
contains the sum of the dictionary values (taken from either Dw or Ds)
|
||
|
for all words that differ from it by up to m letters. For example if m=2
|
||
|
the entry for TTTTTT contains the number of times that TTTTTT
|
||
|
occurs in the dictionary, plus the counts for all words that differ from
|
||
|
TTTTTT by 1 or 2 letters.
|
||
|
Obviously the interpretation of the values in Dm depends on which of
|
||
|
the two dictionaries Dw or Ds they were derived from. When derived
|
||
|
from Dw the entry for any word in Dm gives the total number of
|
||
|
times it, and its relations, occur in the set of sequences. When derived
|
||
|
from Ds the entry for any word in Dm gives the total number of
|
||
|
different sequences that contain a word and each of its relations.
|
||
|
|
||
|
.para
|
||
|
Finally, from fuzzy dictionary Dm we can derive fuzzy dictionary Dh.
|
||
|
All entries in Dh are zero except for the word(s), within each set of
|
||
|
relations, that are most frequent. For example if TTTTTT occurred 20
|
||
|
times but had a relation that occurred more often, then the entry for
|
||
|
TTTTTT would be zero. However if TTTTTT did not have a more
|
||
|
frequently occurring relation, then the entry for TTTTTT would
|
||
|
contain the value 20.
|
||
|
|
||
|
.LEFT MARGIN1
|
||
|
@1. T 0 @Help
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
This option gives online help. The user should select option numbers and
|
||
|
the current documentation will be given. Note that option 0 gives an
|
||
|
introduction to the program, and that ? will get help from anywhere in
|
||
|
the
|
||
|
program.
|
||
|
The following analyses (preceded by their option numbers) are included:
|
||
|
.lit
|
||
|
? = Help
|
||
|
! = Quit
|
||
|
3 = Read new sequences
|
||
|
4 = Redefine active region
|
||
|
5 = List the sequences
|
||
|
6 = List text file
|
||
|
7 = Direct output to disk
|
||
|
10 = Clear graphics
|
||
|
11 = Clear text
|
||
|
12 = Draw ruler
|
||
|
13 = Use cross hair
|
||
|
14 = Reset margins
|
||
|
15 = Label diagram
|
||
|
16 = Draw map
|
||
|
17 = Search for strings
|
||
|
18 = Set strand
|
||
|
19 = Set composition
|
||
|
20 = Set word length
|
||
|
21 = Set number of mismatches
|
||
|
22 = Show settings
|
||
|
23 = Make dictionary Dw
|
||
|
24 = Make dictionary Ds
|
||
|
25 = Make fuzzy dictionary Dm from Dw
|
||
|
26 = Make fuzzy dictionary Dm from Ds
|
||
|
27 = Make fuzzy dictionary Dh from Dm
|
||
|
28 = Examine fuzzy dictionary Dm
|
||
|
29 = Examine fuzzy dictionary Dh
|
||
|
30 = Examine words in Dm
|
||
|
31 = Examine words in Dh
|
||
|
32 = Save or restore a dictionary
|
||
|
33 = Find inverted repeats
|
||
|
.end lit
|
||
|
.left margin1
|
||
|
@2. T 0 @Quit
|
||
|
.left margin2
|
||
|
.para
|
||
|
This function stops the program.
|
||
|
.left margin1
|
||
|
@3. TX 1 @Read a new sequence
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
It can read
|
||
|
sequences stored in either of two formats: 1) all sequences aligned in a
|
||
|
single file; 2) all sequences in separate files and accessed through a file
|
||
|
of file names. Typical dialogue follows:
|
||
|
.lit
|
||
|
|
||
|
X 1 Read file of aligned sequences
|
||
|
2 Use file of file names
|
||
|
? 0,1,2 =
|
||
|
|
||
|
? File of aligned sequences=F1
|
||
|
Number of files 88
|
||
|
|
||
|
.end lit
|
||
|
.left margin1
|
||
|
@4. TX 1 @Define active region
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
For its analytic functions
|
||
|
the program always works on a region of the sequence called the active
|
||
|
region. When new sequences are read into the program the active region is
|
||
|
automatically set to start at the beginning of the sequences and go
|
||
|
up to the end of the longest one.
|
||
|
.left margin1
|
||
|
@5. TX 1 @List a sequence
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
The sequence can be listed with line lengths of 50 bases with each sequence
|
||
|
numbered in the order in which they were read.
|
||
|
Output can be directed to a disk file by
|
||
|
first selecting disk output. Typical dialogue follows.
|
||
|
.lit
|
||
|
|
||
|
? Menu or option number=5
|
||
|
|
||
|
10 20 30 40 50
|
||
|
1 TAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCA
|
||
|
2 CAAATAATCAATGTGGACTTTTCTGCCGTGATTATAGACACTTTTGTTAC
|
||
|
3 TAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATGCTATGGTTATT
|
||
|
4 ACTAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATGCTATGGTTA
|
||
|
5 AGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGA
|
||
|
6 TAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGC
|
||
|
7 ACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCG
|
||
|
8 GGGGCAAGGAGGATGGAAAGAGGTTGCCGTATAAAGAAACTAGAGTCCGT
|
||
|
9 AGGGGGTGGAGGATTTAAGCCATCTCCTGATGACGCATAGTCAGCCCATC
|
||
|
10 AAAACGTCATCGCTTGCATTAGAAAGGTTTCTGGCCGACCTTATAACCAT
|
||
|
|
||
|
60
|
||
|
1 TACCCGTTTTT
|
||
|
2 GCGTTTTTGT
|
||
|
3 TCATACCATAAG
|
||
|
4 TTTCATACC
|
||
|
5 ATTGTGAGC
|
||
|
6 TTCCGGCTCG
|
||
|
7 GAAGAGAGT
|
||
|
8 TCAGGTGT
|
||
|
9 ATGAATG
|
||
|
10 TAATTACG
|
||
|
.end lit
|
||
|
.left margin1
|
||
|
@6. TX 1 @List a text file
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
Allows the user to have a text file displayed on the screen. It will appear
|
||
|
one page at a time.
|
||
|
.left margin1
|
||
|
@7. TX 1 @Direct output to disk
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
Used to direct output that would normally appear on the screen to a file.
|
||
|
.para
|
||
|
Select redirection of either text or graphics, and
|
||
|
supply the name of the file that the output should be written to.
|
||
|
.para
|
||
|
The results from the next options selected will not appear on the screen
|
||
|
but will be written to the file. When option 7 is selected again
|
||
|
the file will be
|
||
|
closed and output will again appear on the screen.
|
||
|
.left margin1
|
||
|
@10. TX 2 @Clear graphics
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
Clears the screen of both text and graphics.
|
||
|
.left margin1
|
||
|
@11. TX 2 @Clear text
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
Clears only text from the screen.
|
||
|
.left margin1
|
||
|
@12. TX 2 @Draw a ruler
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
This option
|
||
|
allows the user to draw a ruler or scale along the x axis of the screen to
|
||
|
help identify the coordinates of points of interest. The user can define
|
||
|
the position of the first amino acid to be marked (for example if the
|
||
|
active
|
||
|
region is 1501 to 8000, the user might wish to mark every 1000th amino
|
||
|
acid
|
||
|
starting at either 1501 or 2000 - it depends if the user wishes to treat
|
||
|
the active region as an independent unit with its own numbering starting
|
||
|
at
|
||
|
its left edge, or as part of the whole sequence). The user can also define
|
||
|
the separation of the ticks on the scale and their height. If required the
|
||
|
labelling routine can be used to add numbers to the ticks.
|
||
|
.left margin1
|
||
|
@13. TX 2 @Use crosshair
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
This function puts
|
||
|
a steerable cross on the screen that can be used to find the
|
||
|
coordinates of points in the sequence. The user can move the cross
|
||
|
around using the directional keys; when he hits the space bar the
|
||
|
program will print out the coordinates of the cross in sequence units and
|
||
|
the option will be exited.
|
||
|
.para
|
||
|
If instead,
|
||
|
you hit a , the position will be displayed but the cross will remain on
|
||
|
the screen.
|
||
|
.para
|
||
|
If a letter s is hit the sequence around the cross hair is displayed and
|
||
|
the cross remains on the screen.
|
||
|
.left margin1
|
||
|
@14. TX 2 @Reposition plots
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
The positions of each of the plots is defined relative to a users drawing
|
||
|
board which has size 1-10,000 in x and 1-10,000 in y.
|
||
|
Plots for
|
||
|
each option are drawn in a window defined by x0,y0 and xlength,ylength.
|
||
|
Where x0,y0 is the position of the bottom left hand corner of the window,
|
||
|
and xlength is the width of the window and ylength the
|
||
|
height of the window.
|
||
|
.lit
|
||
|
--------------------------------------------------------- 10,000
|
||
|
1 1
|
||
|
1 -------------------------------------- ^ 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 ylength 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 1 1
|
||
|
1 -------------------------------------- v 1
|
||
|
1 x0,y0^ 1
|
||
|
1 <---------------xlength--------------> 1
|
||
|
--------------------------------------------------------- 1
|
||
|
1 10,000
|
||
|
|
||
|
.end lit
|
||
|
All values are in drawing board units (i.e. 1-10,000, 1-10,000).
|
||
|
The default window positions are read from a file "MEPMARG" when the
|
||
|
program is started. Users can have their own file if required.
|
||
|
As all the plots start
|
||
|
at the same position in x and have the same width, x0 and xlength are the
|
||
|
same for all options. Generally users will only want to change the start
|
||
|
level of the window y0 and its height ylength.
|
||
|
This option
|
||
|
allows users to change window positions whilst running the program.
|
||
|
The routine prompts first for the number of the option that the users
|
||
|
wishes
|
||
|
to reposition; then for the y start and height; then for the x start and
|
||
|
length. Note that changes to the x values affect all options. If the user
|
||
|
types only carriage return for any value it will remain unchanged.
|
||
|
The cross-hair can be used to choose suitable heights.
|
||
|
.LEFT MARGIN1
|
||
|
@15. TX 2 @Label a diagram
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
This routine allows users to label any diagrams they have produced. They
|
||
|
are asked to type in a label. When the user types carriage return to finish
|
||
|
typing the label the cross-hair appears on the screen. The user can
|
||
|
position it anywhere on the screen. If the user types R (for right justify)
|
||
|
the label will be
|
||
|
written on the diagram with its right end at the cross-hair position.
|
||
|
If the user types L (for left justify) the label will be written on the
|
||
|
diagram with its left end at the cross hair position.
|
||
|
The
|
||
|
cross-hair will then immediately reappear. The user may put the same
|
||
|
label
|
||
|
on another part of the diagram as before or if he hits the space bar he
|
||
|
will be asked if he wishes to type in another label.
|
||
|
.left margin1
|
||
|
@16. TX 2 @Display a map
|
||
|
.LEFT MARGIN2
|
||
|
.para
|
||
|
It is often convenient to plot a map alongside graphed analysis in order
|
||
|
to
|
||
|
indicate features within the sequence. This function allows users to
|
||
|
draw
|
||
|
maps using files arranged in the form of EMBL feature tables. Of course
|
||
|
the
|
||
|
EMBL table are usually only used for nucleic acid sequence annotation
|
||
|
but,
|
||
|
as long as the features are written in the correct format, they can be
|
||
|
employed by this routine. The map is composed of a line representing the
|
||
|
sequence and then further lines denoting the endpoints of each feature
|
||
|
the
|
||
|
user identifies. The user is asked to define height at which the line
|
||
|
representing the sequence should be drawn; then for the feature height;
|
||
|
then for the features to plot.
|
||
|
.left margin1
|
||
|
@17. TX 1 @Search for strings
|
||
|
.left margin2
|
||
|
.para
|
||
|
Search for strings
|
||
|
perfoms searches of all the sequences for selected words and
|
||
|
shows which sequences they are found in. The user types in a word and
|
||
|
defines the allowed number of mismatches. The results are listed or
|
||
|
plotted. If listed the display includes the sequence number, the position
|
||
|
in the sequence and the matching string.
|
||
|
The results are plotted in the
|
||
|
following way. The x axis of the plot represents the length of the aligned
|
||
|
sequences and the y direction is divided into sufficient strips to accommodate
|
||
|
each sequence. So if a match is found in the 3rd sequence at a position
|
||
|
equivalent to halfway along the longest of the sequences then a short
|
||
|
vertical line will be drawn at the midpoint of the 3rd strip. If the sequences
|
||
|
are aligned it can be useful if the motifs happen to appear in
|
||
|
related positions. For example see the original publication. Typical
|
||
|
dialogue follows.
|
||
|
.lit
|
||
|
|
||
|
? Menu or option number=17
|
||
|
X 1 Plot match positions
|
||
|
2 Plot histogram of matches
|
||
|
? 0,1,2 =
|
||
|
? Word to search for=TTGACA
|
||
|
? Minimum match (0-6) (6) =5
|
||
|
? (y/n) (y) Plot results N
|
||
|
2 35 TAGACA
|
||
|
5 14 TTTACA
|
||
|
6 37 TTTACA
|
||
|
11 14 TAGACA
|
||
|
14 14 TTGACA
|
||
|
17 14 GTGACA
|
||
|
17 22 TTAACA
|
||
|
20 1 TTGACA
|
||
|
.end lit
|
||
|
.left margin1
|
||
|
@18. TX 3 @Set strand
|
||
|
.left margin2
|
||
|
.para
|
||
|
Set strand allows the user to define which strand(s) of the sequences to
|
||
|
analyse: input stand, complement of input, or both.
|
||
|
.left margin1
|
||
|
@19. TX 3 @Set composition
|
||
|
.left margin2
|
||
|
.para
|
||
|
Set composition gives the user three choices for setting the composition
|
||
|
of the sequences for use in the calculation of the information content of
|
||
|
words. The user can select the overall composition of the sequences as read,
|
||
|
an even composition, or can type in any other 4 values.
|
||
|
.left margin1
|
||
|
@20. TX 3 @Set word length
|
||
|
.left margin2
|
||
|
.para
|
||
|
Set word length sets the length of word for which dictionaries will be made.
|
||
|
.left margin1
|
||
|
@21. TX 3 @Set number of mismatches
|
||
|
.left margin2
|
||
|
.para
|
||
|
Set number of mismatches sets the level of fuzziness for the creation of
|
||
|
dictionary Dm.
|
||
|
.left margin1
|
||
|
@22. TX 3 @Show settings
|
||
|
.left margin2
|
||
|
.para
|
||
|
Show settings show the current settings for all parameters associated with
|
||
|
dictionary analysis. A typical diaplsy follows:
|
||
|
.lit
|
||
|
? Menu or option number=22
|
||
|
Current word length = 6
|
||
|
Number of mismatches = 1
|
||
|
Start position = 1
|
||
|
End position = 63
|
||
|
Input strand only
|
||
|
Observed composition
|
||
|
Dictionary Dw unmade
|
||
|
Dictionary Ds unmade
|
||
|
Dictionary Dm unmade
|
||
|
Dictionary Dh unmade
|
||
|
.end lit
|
||
|
.left margin1
|
||
|
@23. TX 3 @Make dictionary Dw
|
||
|
.left margin2
|
||
|
.para
|
||
|
Make dictionary Dw creates a dictionary that contains a count of the
|
||
|
frequency of occurrence of each word in the collected sequences.
|
||
|
.left margin1
|
||
|
@24. TX 3 @Make dictionary Ds
|
||
|
.left margin2
|
||
|
.para
|
||
|
Make dictionary Ds creates a dictionary that contains a count of the
|
||
|
number of different sequences that contain each word.
|
||
|
.left margin1
|
||
|
@25. TX 3 @Make dictionary Dm from Dw
|
||
|
.left margin2
|
||
|
.para
|
||
|
Make dictionary Dm from Dw creates a dictionary from dictionary Dw that
|
||
|
contains the frequency of occurrence of each word (say X) in Dw plus the
|
||
|
frequency of occurrence of each word in Dw that differs from X by up to m
|
||
|
letters. Dm is called a fuzzy dictionary as it contains the frequencies of
|
||
|
occurrence of all words plus the frequencies of all the words that are
|
||
|
similar to them.
|
||
|
.left margin1
|
||
|
@26. TX 3 @Make dictionary Dm from Ds
|
||
|
.left margin2
|
||
|
.para
|
||
|
Make dictionary Dm from Ds creates a dictionary from dictionary Ds that
|
||
|
contains the frequency of occurrence of each word (say X) in Ds plus the
|
||
|
frequency of occurrence of each word in Ds that differs from X by up to m
|
||
|
letters. Dm is called a fuzzy dictionary as it contains the frequencies of
|
||
|
occurrence of all words plus the frequencies of all the words that are
|
||
|
similar to them.
|
||
|
.left margin1
|
||
|
@27. TX 3 @Make dictionary Dh from Dm
|
||
|
.left margin2
|
||
|
.para
|
||
|
Make dictionary Dh creates a dictionary from dictionary Dm and whose
|
||
|
entries are zero except for those words in any set of related words that
|
||
|
are most frequent. It finds the dominant words in each set of relations
|
||
|
and stores their counts.
|
||
|
.left margin1
|
||
|
@28. TX 3 @Examine fuzzy dictionary Dm
|
||
|
.left margin2
|
||
|
.para
|
||
|
Examine dictionary Dm allows users to analyse the contents of dictionary
|
||
|
Dm to find the most common words or those words that contain the most
|
||
|
information. The user supplies a frequency or information cutoff and chooses
|
||
|
to have the results sorted on either value. The program will find the top 100
|
||
|
words that achieve the cutoff values and present them to the user sorted
|
||
|
as selected. The information content will be calcutated from either Dw or Ds
|
||
|
depending which was used to create Dm, and using the current composition
|
||
|
setting. Typical dialogue follows:
|
||
|
.lit
|
||
|
|
||
|
? Menu or option number=28
|
||
|
Looking for highest scoring words
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =60
|
||
|
? Minimum information (0.00-1.00) (0.00) =.62
|
||
|
X 1 Sort on information
|
||
|
2 Sort on word score
|
||
|
? 0,1,2 =
|
||
|
|
||
|
? Maximum number to list (0-100) (100) =
|
||
|
|
||
|
The words are
|
||
|
Total words= 9 Maximum information= 0.7385326
|
||
|
TTGACA 60 0.73850
|
||
|
AAAAAC 64 0.66460
|
||
|
AAAAAA 90 0.64880
|
||
|
GTTTTT 66 0.64300
|
||
|
TTTTTG 73 0.64070
|
||
|
TTTTGT 63 0.63820
|
||
|
TTTTTC 65 0.63810
|
||
|
AAAATA 63 0.62670
|
||
|
TATAAT 65 0.62510
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =60
|
||
|
? Minimum information (0.00-1.00) (0.00) =.62
|
||
|
X 1 Sort on information
|
||
|
2 Sort on word score
|
||
|
? 0,1,2 =2
|
||
|
? Maximum number to list (0-100) (100) =
|
||
|
|
||
|
The words are
|
||
|
Total words= 9 Maximum information= 0.7385326
|
||
|
AAAAAA 90 0.64880
|
||
|
TTTTTG 73 0.64070
|
||
|
GTTTTT 66 0.64300
|
||
|
TTTTTC 65 0.63810
|
||
|
TATAAT 65 0.62510
|
||
|
AAAAAC 64 0.66460
|
||
|
TTTTGT 63 0.63820
|
||
|
AAAATA 63 0.62670
|
||
|
TTGACA 60 0.73850
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =!
|
||
|
|
||
|
.end lit
|
||
|
.left margin1
|
||
|
@29. TX 3 @Examine fuzzy dictionary Dh
|
||
|
.left margin2
|
||
|
.para
|
||
|
Examine dictionary Dh allows users to analyse the contents of dictionary Dh
|
||
|
to find the most common words or those words that contain the most
|
||
|
information. The user supplies a frequency or information cutoff and chooses
|
||
|
to have the results sorted on either value. The program will find the top 100
|
||
|
words that achieve the cutoff values and present them to the user sorted as
|
||
|
selected. The information content will be calcutated from either Dw or Ds
|
||
|
depending which was used to create Dh and using the current composition
|
||
|
setting. Typical dialogue follows:
|
||
|
.lit
|
||
|
|
||
|
? Menu or option number=29
|
||
|
Looking for highest scoring words
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =60
|
||
|
? Minimum information (0.00-1.00) (0.00) =.6
|
||
|
X 1 Sort on information
|
||
|
2 Sort on word score
|
||
|
? 0,1,2 =
|
||
|
|
||
|
? Maximum number to list (0-100) (100) =
|
||
|
|
||
|
The words are
|
||
|
Total words= 4 Maximum information= 0.7385326
|
||
|
TTGACA 60 0.73850
|
||
|
AAAAAA 90 0.64880
|
||
|
TATAAT 65 0.62510
|
||
|
TTTTTT 115 0.60630
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =50
|
||
|
? Minimum information (0.00-1.00) (0.00) =.5
|
||
|
X 1 Sort on information
|
||
|
2 Sort on word score
|
||
|
? 0,1,2 =
|
||
|
|
||
|
? Maximum number to list (0-100) (100) =
|
||
|
|
||
|
The words are
|
||
|
Total words= 8 Maximum information= 0.7385326
|
||
|
TTGACA 60 0.73850
|
||
|
TCTTGA 54 0.66080
|
||
|
AAAAAA 90 0.64880
|
||
|
TATAAT 65 0.62510
|
||
|
ACTTTA 57 0.61960
|
||
|
TTTTTT 115 0.60630
|
||
|
AGTATA 51 0.60540
|
||
|
TTATAA 55 0.59300
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =50
|
||
|
? Minimum information (0.00-1.00) (0.00) =
|
||
|
|
||
|
X 1 Sort on information
|
||
|
2 Sort on word score
|
||
|
? 0,1,2 =
|
||
|
|
||
|
? Maximum number to list (0-100) (100) =
|
||
|
|
||
|
The words are
|
||
|
Total words= 8 Maximum information= 0.7385326
|
||
|
TTGACA 60 0.73850
|
||
|
TCTTGA 54 0.66080
|
||
|
AAAAAA 90 0.64880
|
||
|
TATAAT 65 0.62510
|
||
|
ACTTTA 57 0.61960
|
||
|
TTTTTT 115 0.60630
|
||
|
AGTATA 51 0.60540
|
||
|
TTATAA 55 0.59300
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =!
|
||
|
|
||
|
.end lit
|
||
|
.left margin1
|
||
|
@30. TX 3 @Examine words in Dm
|
||
|
.left margin2
|
||
|
.para
|
||
|
Examine words in Dm allows users to analyse the contents of dictonary Dm at the
|
||
|
level of individual words to find their frequency, information content, and to
|
||
|
see their base frequency table. The user types in a word to examine and the
|
||
|
program displays the values and table. The information content will be
|
||
|
calcutated from either Dw or Ds depending which was used to create Dm,
|
||
|
and using the current composition setting. Typical dialogue follows:
|
||
|
.lit
|
||
|
? Menu or option number=30
|
||
|
? Word to examine=TTGACA
|
||
|
TtgacA 60 0.7385326
|
||
|
56 56 6 7 5 11
|
||
|
4 3 2 1 52 1
|
||
|
1 4 2 53 3 48
|
||
|
3 1 54 3 4 4
|
||
|
TTGACA
|
||
|
? Word to examine=TATAAT
|
||
|
taTAat 65 0.6251902
|
||
|
56 3 53 4 4 60
|
||
|
6 1 5 5 5 3
|
||
|
3 60 5 57 57 4
|
||
|
4 5 6 3 3 2
|
||
|
TATAAT
|
||
|
? Word to examine=
|
||
|
|
||
|
.end lit
|
||
|
.left margin1
|
||
|
@31. TX 3 @Examine words in Dh
|
||
|
.left margin2
|
||
|
.para
|
||
|
Examine words in Dh allows users to analyse the contents of dictonary Dh at the
|
||
|
level of individual words to find their frequency, information content, and to
|
||
|
see their base frequency table. The user types in a word to examine and the
|
||
|
program displays the values and table. The information content will be
|
||
|
calcutated from either Dw or Ds depending which was used to create Dm,
|
||
|
and using the current composition setting. Typical dialogue follows:
|
||
|
.lit
|
||
|
|
||
|
? Menu or option number=31
|
||
|
? Word to examine=TTGACA
|
||
|
TtgacA 60 0.7385326
|
||
|
56 56 6 7 5 11
|
||
|
4 3 2 1 52 1
|
||
|
1 4 2 53 3 48
|
||
|
3 1 54 3 4 4
|
||
|
TTGACA
|
||
|
? Word to examine=TATAAT
|
||
|
taTAat 65 0.6251902
|
||
|
56 3 53 4 4 60
|
||
|
6 1 5 5 5 3
|
||
|
3 60 5 57 57 4
|
||
|
4 5 6 3 3 2
|
||
|
TATAAT
|
||
|
? Word to examine=GGGGGG
|
||
|
gggggg 0 0.6199890
|
||
|
3 1 1 2 3 4
|
||
|
1 3 1 2 2 1
|
||
|
2 1 1 1 1 1
|
||
|
11 12 14 12 11 11
|
||
|
GGGGGG
|
||
|
? Word to examine=
|
||
|
|
||
|
.end lit
|
||
|
.left margin1
|
||
|
@32. TX 3 @Save or restore a dictionary
|
||
|
.left margin2
|
||
|
.para
|
||
|
Save or restore dictionary allows users to write or read any dictionary to
|
||
|
and from disk files. The user is asked te define the dictionary and file. The
|
||
|
function is useful if the machine being used is very slow at calculating
|
||
|
because the files can be handled quickly. However note that the files
|
||
|
cannot be processed by any other program.
|
||
|
.left margin1
|
||
|
@33. TX 1 @Find inverted repeats
|
||
|
.left margin2
|
||
|
.para
|
||
|
Find inverted repeats performs searches for simple inverted repeat sequences
|
||
|
in each sequence. They are defined by a range of loop sizes and a minimum
|
||
|
number of potential basepairs. The results can be plotted or listed. The x
|
||
|
axis of the plot represents the length of the aligned sequences and the y
|
||
|
direction is divided into sufficient strips to accommodate each sequence.
|
||
|
So if an inverted repeat is found in the 3rd sequence at a position equivalent
|
||
|
to halfway along the longest of the sequences then a short vertical line will
|
||
|
be drawn at the midpoint of the 3rd strip. Alternatively, if the results are
|
||
|
listed, the potential hairpin loops are drawn out, with the sequence number
|
||
|
and the position of the loop. Typical dialogue follows.
|
||
|
.lit
|
||
|
|
||
|
? Menu or option number=33
|
||
|
Define the range of loop sizes
|
||
|
? Minimum loop size (0-10) (3) =0
|
||
|
? Maximum loop size (1-20) (3) =
|
||
|
? Minimum number of basepairs (1-20) (6) =
|
||
|
? (y/n) (y) Plot results N
|
||
|
Searching
|
||
|
|
||
|
Sequence 3 34
|
||
|
C
|
||
|
G.T
|
||
|
T-A
|
||
|
A-T
|
||
|
T.G
|
||
|
T.G
|
||
|
G.T
|
||
|
ATCTTT TATTTCA
|
||
|
33
|
||
|
|
||
|
Sequence 5 35
|
||
|
T
|
||
|
G.T
|
||
|
T.G
|
||
|
A-T
|
||
|
T.G
|
||
|
G.T
|
||
|
C-G
|
||
|
T.G
|
||
|
TCCGGC AATTGTG
|
||
|
34
|
||
|
.end lit
|
||
|
.left margin1
|
||
|
@ End of help
|