793 lines
30 KiB
Text
793 lines
30 KiB
Text
|
|
||
|
@-1. TX 0 @General
|
||
|
|
||
|
@-2. T 0 @Screen control
|
||
|
|
||
|
@-2. X 0 @Screen
|
||
|
|
||
|
@-3. TX 0 @Dictionary analysis
|
||
|
|
||
|
@0. TX -1 @MEP
|
||
|
|
||
|
This is a program for analysing families of nucleotide
|
||
|
sequences in order to find common motifs and potential binding
|
||
|
sites. The ideas in this program were described in Staden, R.
|
||
|
"Methods for discovering novel motifs in nucleic acid sequences".
|
||
|
Computer Applications in the Biosciences, 5, 293-298, (1989).
|
||
|
|
||
|
The program can read sequences stored in either of two
|
||
|
formats: 1) all sequences aligned in a single file; 2) all sequences
|
||
|
in separate files and accessed through a file of file names.
|
||
|
|
||
|
The program contains functions that can answer several
|
||
|
questions about a set of sequences:
|
||
|
|
||
|
Which words are most common?
|
||
|
Which words occur in the most sequences?
|
||
|
Which words contain the most information?
|
||
|
Which words occur in equivalent positions in the sequences?
|
||
|
Which words are inverted repeats?
|
||
|
Which words occur on both strands of the sequences?
|
||
|
Where are the inverted repeats?
|
||
|
Where are the fuzzy words?
|
||
|
|
||
|
Most of the program is concerned with analysing what it terms
|
||
|
"fuzzy words" within the set of sequences. The analysis is explained
|
||
|
below. Note that the standard version of the programs is limited to
|
||
|
words of maximum length 8 letters, and a maximum fuzziness of 2.
|
||
|
|
||
|
The following analyses (preceded by their option numbers) are
|
||
|
included:
|
||
|
? = Help
|
||
|
! = Quit
|
||
|
3 = Read new sequences
|
||
|
4 = Redefine active region
|
||
|
5 = List the sequences
|
||
|
6 = List text file
|
||
|
7 = Direct output to disk
|
||
|
10 = Clear graphics
|
||
|
11 = Clear text
|
||
|
12 = Draw ruler
|
||
|
13 = Use cross hair
|
||
|
14 = Reset margins
|
||
|
15 = Label diagram
|
||
|
16 = Draw map
|
||
|
17 = Search for strings
|
||
|
18 = Set strand
|
||
|
19 = Set composition
|
||
|
20 = Set word length
|
||
|
21 = Set number of mismatches
|
||
|
22 = Show settings
|
||
|
23 = Make dictionary Dw
|
||
|
24 = Make dictionary Ds
|
||
|
25 = Make fuzzy dictionary Dm from Dw
|
||
|
26 = Make fuzzy dictionary Dm from Ds
|
||
|
27 = Make fuzzy dictionary Dh from Dm
|
||
|
28 = Examine fuzzy dictionary Dm
|
||
|
29 = Examine fuzzy dictionary Dh
|
||
|
30 = Examine words in Dm
|
||
|
31 = Examine words in Dh
|
||
|
32 = Save or restore a dictionary
|
||
|
33 = Find inverted repeats
|
||
|
|
||
|
Some of these methods produce graphical results and so the
|
||
|
program is generally used from a graphics terminal (a vdu on which
|
||
|
lines and points can be drawn as well as characters).
|
||
|
|
||
|
The positions of each of the plots is defined relative to a users
|
||
|
drawing board which has size 1-10,000 in x and 1-10,000 in y. Plots
|
||
|
for each option are drawn in a window defined by x0,y0 and
|
||
|
xlength,ylength. Where x0,y0 is the position of the bottom left hand
|
||
|
corner of the window, and xlength is the width of the window and
|
||
|
ylength the height of the window.
|
||
|
--------------------------------------------------------- 10,000
|
||
|
1 1
|
||
|
1 -------------------------------------- ^ 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 ylength 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 1 1
|
||
|
1 -------------------------------------- v 1
|
||
|
1 x0,y0^ 1
|
||
|
1 <---------------xlength--------------> 1
|
||
|
--------------------------------------------------------- 1
|
||
|
1 10,000
|
||
|
|
||
|
All values are in drawing board units (i.e. 1-10,000, 1-10,000).
|
||
|
The default window positions are read from a file "MEPMARG" when the
|
||
|
program is started. Users can have their own file if required.
|
||
|
|
||
|
The options for the program are accessed from 3 main menus:
|
||
|
general, screen control and dictionary analylsis. Both menus and
|
||
|
options are selected by number.
|
||
|
|
||
|
The most important and novel part of the program is its use of
|
||
|
"fuzzy dictionaries" and an information theory measure, to help show
|
||
|
the most interesting motifs. Central to the method is the idea of a
|
||
|
fuzzy dictionary of word frequencies. A dictionary of word
|
||
|
frequencies is an ordered list of all the words in the sequences and
|
||
|
a count of the number of times that they occur. A fuzzy dictionary
|
||
|
is an equivalent list but which contains instead, for each word, a
|
||
|
count of the number of times similar words occur in the sequences.
|
||
|
We term words that are similar "relations". The fuzziness is defined
|
||
|
by the number of letters in a word that are allowed to be different.
|
||
|
So if we had a fuzziness of 1 we allow 1 letter to be different. For
|
||
|
example, with a fuzziness of 1, the entry in the fuzzy dictionary
|
||
|
for the word TTTTTT would contain a count of the numbers of times
|
||
|
TTTTTT occured plus the number of times all words differing by
|
||
|
exactly one letter from TTTTTT occured.
|
||
|
|
||
|
Once the fuzzy dictionary has been created we can examine it
|
||
|
in several ways to find candidate control sequences. The simplest
|
||
|
question we can ask is which word in the dictionary is the most
|
||
|
common. Sometimes this simple criterion of "most common" may be
|
||
|
adequate to discover a new motif but in general we would not expect
|
||
|
it to be sufficient. For example some words will be common simply
|
||
|
because of a base composition bias in the sequences being analysed.
|
||
|
In addition a word can be the most frequent and yet not be "well
|
||
|
defined". This last point is best explained by an example.
|
||
|
|
||
|
Suppose we were looking at two letter words and allowing one
|
||
|
mismatch, and that there were 10 occurences of TT and 5 of AC. We
|
||
|
could align the 10 words that were one letter different from TT and
|
||
|
the 5 that were related to AC. Then we could count the number of
|
||
|
times each base occured in each position for each of these two sets
|
||
|
of words. Suppose we got the two base frequency tables shown below.
|
||
|
TT AC
|
||
|
T 6 4 T 1 0
|
||
|
C 1 3 C 0 4
|
||
|
A 1 2 A 4 1
|
||
|
G 2 1 G 0 0
|
||
|
|
||
|
These tables show that although TT occurs (with one letter mismatch)
|
||
|
more often than AC, the ratio of base frequencies for AC at 4/5, 4/5
|
||
|
is higher than those for TT at 6/10, 4/10. Hence we would say that
|
||
|
AC was better defined than TT. Expressing this another way we would
|
||
|
say that the definition of AC contained more information than that
|
||
|
for TT. The program calculates the information content in a way that
|
||
|
takes into account both the sequence composition and the level of
|
||
|
definition of the motif.
|
||
|
|
||
|
Definitions
|
||
|
|
||
|
Here we deal only with the dictionary analysis. Suppose we
|
||
|
are dealing with a set of sequences and are examining them for words
|
||
|
that are six characters in length.
|
||
|
|
||
|
Dictionary Dw contains a count of the number of times each
|
||
|
word occurs in the set of sequences. For example the entry for
|
||
|
TTTTTT contains a value equal to the number of times the word TTTTTT
|
||
|
occurs in the set of sequences.
|
||
|
|
||
|
Dictionary Ds contains a count of the number of different
|
||
|
sequences in which each word occurs. For example if the entry for
|
||
|
word TTTTTT contains the value 10, it denotes that the word TTTTTT
|
||
|
occurs in ten different sequences. Unlike Dw it only counts words
|
||
|
once for each sequence. For example if we had a set of 100
|
||
|
sequences, the maximum possible value that Ds could take is 100, and
|
||
|
this would only happen if a word occurred in every sequence. However
|
||
|
for the same set of sequences, Dw could contain values greater than
|
||
|
100, and this would show that a word had occurred more than once in
|
||
|
at least one sequence.
|
||
|
|
||
|
From either of the two dictionaries Dw or Ds we can calculate
|
||
|
a fuzzy dictionary Dm. For each word, the entry in the fuzzy
|
||
|
dictionary Dm contains the sum of the dictionary values (taken from
|
||
|
either Dw or Ds) for all words that differ from it by up to m
|
||
|
letters. For example if m=2 the entry for TTTTTT contains the number
|
||
|
of times that TTTTTT occurs in the dictionary, plus the counts for
|
||
|
all words that differ from TTTTTT by 1 or 2 letters. Obviously the
|
||
|
interpretation of the values in Dm depends on which of the two
|
||
|
dictionaries Dw or Ds they were derived from. When derived from Dw
|
||
|
the entry for any word in Dm gives the total number of times it, and
|
||
|
its relations, occur in the set of sequences. When derived from Ds
|
||
|
the entry for any word in Dm gives the total number of different
|
||
|
sequences that contain a word and each of its relations.
|
||
|
|
||
|
Finally, from fuzzy dictionary Dm we can derive fuzzy
|
||
|
dictionary Dh. All entries in Dh are zero except for the word(s),
|
||
|
within each set of relations, that are most frequent. For example if
|
||
|
TTTTTT occurred 20 times but had a relation that occurred more
|
||
|
often, then the entry for TTTTTT would be zero. However if TTTTTT
|
||
|
did not have a more frequently occurring relation, then the entry
|
||
|
for TTTTTT would contain the value 20.
|
||
|
@1. T 0 @Help
|
||
|
|
||
|
This option gives online help. The user should select option
|
||
|
numbers and the current documentation will be given. Note that
|
||
|
option 0 gives an introduction to the program, and that ? will get
|
||
|
help from anywhere in the program. The following analyses (preceded
|
||
|
by their option numbers) are included:
|
||
|
? = Help
|
||
|
! = Quit
|
||
|
3 = Read new sequences
|
||
|
4 = Redefine active region
|
||
|
5 = List the sequences
|
||
|
6 = List text file
|
||
|
7 = Direct output to disk
|
||
|
10 = Clear graphics
|
||
|
11 = Clear text
|
||
|
12 = Draw ruler
|
||
|
13 = Use cross hair
|
||
|
14 = Reset margins
|
||
|
15 = Label diagram
|
||
|
16 = Draw map
|
||
|
17 = Search for strings
|
||
|
18 = Set strand
|
||
|
19 = Set composition
|
||
|
20 = Set word length
|
||
|
21 = Set number of mismatches
|
||
|
22 = Show settings
|
||
|
23 = Make dictionary Dw
|
||
|
24 = Make dictionary Ds
|
||
|
25 = Make fuzzy dictionary Dm from Dw
|
||
|
26 = Make fuzzy dictionary Dm from Ds
|
||
|
27 = Make fuzzy dictionary Dh from Dm
|
||
|
28 = Examine fuzzy dictionary Dm
|
||
|
29 = Examine fuzzy dictionary Dh
|
||
|
30 = Examine words in Dm
|
||
|
31 = Examine words in Dh
|
||
|
32 = Save or restore a dictionary
|
||
|
33 = Find inverted repeats
|
||
|
@2. T 0 @Quit
|
||
|
|
||
|
This function stops the program.
|
||
|
@3. TX 1 @Read a new sequence
|
||
|
|
||
|
It can read sequences stored in either of two formats: 1) all
|
||
|
sequences aligned in a single file; 2) all sequences in separate
|
||
|
files and accessed through a file of file names. Typical dialogue
|
||
|
follows:
|
||
|
|
||
|
X 1 Read file of aligned sequences
|
||
|
2 Use file of file names
|
||
|
? 0,1,2 =
|
||
|
|
||
|
? File of aligned sequences=F1
|
||
|
Number of files 88
|
||
|
|
||
|
@4. TX 1 @Define active region
|
||
|
|
||
|
For its analytic functions the program always works on a
|
||
|
region of the sequence called the active region. When new sequences
|
||
|
are read into the program the active region is automatically set to
|
||
|
start at the beginning of the sequences and go up to the end of the
|
||
|
longest one.
|
||
|
@5. TX 1 @List a sequence
|
||
|
|
||
|
The sequence can be listed with line lengths of 50 bases with
|
||
|
each sequence numbered in the order in which they were read. Output
|
||
|
can be directed to a disk file by first selecting disk output.
|
||
|
Typical dialogue follows.
|
||
|
|
||
|
? Menu or option number=5
|
||
|
|
||
|
10 20 30 40 50
|
||
|
1 TAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCA
|
||
|
2 CAAATAATCAATGTGGACTTTTCTGCCGTGATTATAGACACTTTTGTTAC
|
||
|
3 TAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATGCTATGGTTATT
|
||
|
4 ACTAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATGCTATGGTTA
|
||
|
5 AGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGA
|
||
|
6 TAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGC
|
||
|
7 ACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCG
|
||
|
8 GGGGCAAGGAGGATGGAAAGAGGTTGCCGTATAAAGAAACTAGAGTCCGT
|
||
|
9 AGGGGGTGGAGGATTTAAGCCATCTCCTGATGACGCATAGTCAGCCCATC
|
||
|
10 AAAACGTCATCGCTTGCATTAGAAAGGTTTCTGGCCGACCTTATAACCAT
|
||
|
|
||
|
60
|
||
|
1 TACCCGTTTTT
|
||
|
2 GCGTTTTTGT
|
||
|
3 TCATACCATAAG
|
||
|
4 TTTCATACC
|
||
|
5 ATTGTGAGC
|
||
|
6 TTCCGGCTCG
|
||
|
7 GAAGAGAGT
|
||
|
8 TCAGGTGT
|
||
|
9 ATGAATG
|
||
|
10 TAATTACG
|
||
|
@6. TX 1 @List a text file
|
||
|
|
||
|
Allows the user to have a text file displayed on the screen.
|
||
|
It will appear one page at a time.
|
||
|
@7. TX 1 @Direct output to disk
|
||
|
|
||
|
Used to direct output that would normally appear on the screen
|
||
|
to a file.
|
||
|
|
||
|
Select redirection of either text or graphics, and supply the
|
||
|
name of the file that the output should be written to.
|
||
|
|
||
|
The results from the next options selected will not appear on
|
||
|
the screen but will be written to the file. When option 7 is
|
||
|
selected again the file will be closed and output will again appear
|
||
|
on the screen.
|
||
|
@10. TX 2 @Clear graphics
|
||
|
|
||
|
Clears the screen of both text and graphics.
|
||
|
@11. TX 2 @Clear text
|
||
|
|
||
|
Clears only text from the screen.
|
||
|
@12. TX 2 @Draw a ruler
|
||
|
|
||
|
This option allows the user to draw a ruler or scale along the
|
||
|
x axis of the screen to help identify the coordinates of points of
|
||
|
interest. The user can define the position of the first amino acid
|
||
|
to be marked (for example if the active region is 1501 to 8000, the
|
||
|
user might wish to mark every 1000th amino acid starting at either
|
||
|
1501 or 2000 - it depends if the user wishes to treat the active
|
||
|
region as an independent unit with its own numbering starting at its
|
||
|
left edge, or as part of the whole sequence). The user can also
|
||
|
define the separation of the ticks on the scale and their height. If
|
||
|
required the labelling routine can be used to add numbers to the
|
||
|
ticks.
|
||
|
@13. TX 2 @Use crosshair
|
||
|
|
||
|
This function puts a steerable cross on the screen that can be
|
||
|
used to find the coordinates of points in the sequence. The user can
|
||
|
move the cross around using the directional keys; when he hits the
|
||
|
space bar the program will print out the coordinates of the cross in
|
||
|
sequence units and the option will be exited.
|
||
|
|
||
|
If instead, you hit a , the position will be displayed but the
|
||
|
cross will remain on the screen.
|
||
|
|
||
|
If a letter s is hit the sequence around the cross hair is
|
||
|
displayed and the cross remains on the screen.
|
||
|
@14. TX 2 @Reposition plots
|
||
|
|
||
|
The positions of each of the plots is defined relative to a
|
||
|
users drawing board which has size 1-10,000 in x and 1-10,000 in y.
|
||
|
Plots for each option are drawn in a window defined by x0,y0 and
|
||
|
xlength,ylength. Where x0,y0 is the position of the bottom left hand
|
||
|
corner of the window, and xlength is the width of the window and
|
||
|
ylength the height of the window.
|
||
|
--------------------------------------------------------- 10,000
|
||
|
1 1
|
||
|
1 -------------------------------------- ^ 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 ylength 1
|
||
|
1 1 1 1 1
|
||
|
1 1 1 1 1
|
||
|
1 -------------------------------------- v 1
|
||
|
1 x0,y0^ 1
|
||
|
1 <---------------xlength--------------> 1
|
||
|
--------------------------------------------------------- 1
|
||
|
1 10,000
|
||
|
|
||
|
All values are in drawing board units (i.e. 1-10,000, 1-10,000).
|
||
|
The default window positions are read from a file "MEPMARG" when the
|
||
|
program is started. Users can have their own file if required. As
|
||
|
all the plots start at the same position in x and have the same
|
||
|
width, x0 and xlength are the same for all options. Generally users
|
||
|
will only want to change the start level of the window y0 and its
|
||
|
height ylength. This option allows users to change window positions
|
||
|
whilst running the program. The routine prompts first for the
|
||
|
number of the option that the users wishes to reposition; then for
|
||
|
the y start and height; then for the x start and length. Note that
|
||
|
changes to the x values affect all options. If the user types only
|
||
|
carriage return for any value it will remain unchanged. The cross-
|
||
|
hair can be used to choose suitable heights.
|
||
|
@15. TX 2 @Label a diagram
|
||
|
|
||
|
This routine allows users to label any diagrams they have
|
||
|
produced. They are asked to type in a label. When the user types
|
||
|
carriage return to finish typing the label the cross-hair appears on
|
||
|
the screen. The user can position it anywhere on the screen. If the
|
||
|
user types R (for right justify) the label will be written on the
|
||
|
diagram with its right end at the cross-hair position. If the user
|
||
|
types L (for left justify) the label will be written on the diagram
|
||
|
with its left end at the cross hair position. The cross-hair will
|
||
|
then immediately reappear. The user may put the same label on
|
||
|
another part of the diagram as before or if he hits the space bar he
|
||
|
will be asked if he wishes to type in another label.
|
||
|
@16. TX 2 @Display a map
|
||
|
|
||
|
It is often convenient to plot a map alongside graphed
|
||
|
analysis in order to indicate features within the sequence. This
|
||
|
function allows users to draw maps using files arranged in the form
|
||
|
of EMBL feature tables. Of course the EMBL table are usually only
|
||
|
used for nucleic acid sequence annotation but, as long as the
|
||
|
features are written in the correct format, they can be employed by
|
||
|
this routine. The map is composed of a line representing the
|
||
|
sequence and then further lines denoting the endpoints of each
|
||
|
feature the user identifies. The user is asked to define height at
|
||
|
which the line representing the sequence should be drawn; then for
|
||
|
the feature height; then for the features to plot.
|
||
|
@17. TX 1 @Search for strings
|
||
|
|
||
|
Search for strings perfoms searches of all the sequences for
|
||
|
selected words and shows which sequences they are found in. The user
|
||
|
types in a word and defines the allowed number of mismatches. The
|
||
|
results are listed or plotted. If listed the display includes the
|
||
|
sequence number, the position in the sequence and the matching
|
||
|
string. The results are plotted in the following way. The x axis of
|
||
|
the plot represents the length of the aligned sequences and the y
|
||
|
direction is divided into sufficient strips to accommodate each
|
||
|
sequence. So if a match is found in the 3rd sequence at a position
|
||
|
equivalent to halfway along the longest of the sequences then a
|
||
|
short vertical line will be drawn at the midpoint of the 3rd strip.
|
||
|
If the sequences are aligned it can be useful if the motifs happen
|
||
|
to appear in related positions. For example see the original
|
||
|
publication. Typical dialogue follows.
|
||
|
|
||
|
? Menu or option number=17
|
||
|
X 1 Plot match positions
|
||
|
2 Plot histogram of matches
|
||
|
? 0,1,2 =
|
||
|
? Word to search for=TTGACA
|
||
|
? Minimum match (0-6) (6) =5
|
||
|
? (y/n) (y) Plot results N
|
||
|
2 35 TAGACA
|
||
|
5 14 TTTACA
|
||
|
6 37 TTTACA
|
||
|
11 14 TAGACA
|
||
|
14 14 TTGACA
|
||
|
17 14 GTGACA
|
||
|
17 22 TTAACA
|
||
|
20 1 TTGACA
|
||
|
@18. TX 3 @Set strand
|
||
|
|
||
|
Set strand allows the user to define which strand(s) of the
|
||
|
sequences to analyse: input stand, complement of input, or both.
|
||
|
@19. TX 3 @Set composition
|
||
|
|
||
|
Set composition gives the user three choices for setting the
|
||
|
composition of the sequences for use in the calculation of the
|
||
|
information content of words. The user can select the overall
|
||
|
composition of the sequences as read, an even composition, or can
|
||
|
type in any other 4 values.
|
||
|
@20. TX 3 @Set word length
|
||
|
|
||
|
Set word length sets the length of word for which dictionaries
|
||
|
will be made.
|
||
|
@21. TX 3 @Set number of mismatches
|
||
|
|
||
|
Set number of mismatches sets the level of fuzziness for the
|
||
|
creation of dictionary Dm.
|
||
|
@22. TX 3 @Show settings
|
||
|
|
||
|
Show settings show the current settings for all parameters
|
||
|
associated with dictionary analysis. A typical diaplsy follows:
|
||
|
? Menu or option number=22
|
||
|
Current word length = 6
|
||
|
Number of mismatches = 1
|
||
|
Start position = 1
|
||
|
End position = 63
|
||
|
Input strand only
|
||
|
Observed composition
|
||
|
Dictionary Dw unmade
|
||
|
Dictionary Ds unmade
|
||
|
Dictionary Dm unmade
|
||
|
Dictionary Dh unmade
|
||
|
@23. TX 3 @Make dictionary Dw
|
||
|
|
||
|
Make dictionary Dw creates a dictionary that contains a count
|
||
|
of the frequency of occurrence of each word in the collected
|
||
|
sequences.
|
||
|
@24. TX 3 @Make dictionary Ds
|
||
|
|
||
|
Make dictionary Ds creates a dictionary that contains a count
|
||
|
of the number of different sequences that contain each word.
|
||
|
@25. TX 3 @Make dictionary Dm from Dw
|
||
|
|
||
|
Make dictionary Dm from Dw creates a dictionary from
|
||
|
dictionary Dw that contains the frequency of occurrence of each word
|
||
|
(say X) in Dw plus the frequency of occurrence of each word in Dw
|
||
|
that differs from X by up to m letters. Dm is called a fuzzy
|
||
|
dictionary as it contains the frequencies of occurrence of all words
|
||
|
plus the frequencies of all the words that are similar to them.
|
||
|
@26. TX 3 @Make dictionary Dm from Ds
|
||
|
|
||
|
Make dictionary Dm from Ds creates a dictionary from
|
||
|
dictionary Ds that contains the frequency of occurrence of each word
|
||
|
(say X) in Ds plus the frequency of occurrence of each word in Ds
|
||
|
that differs from X by up to m letters. Dm is called a fuzzy
|
||
|
dictionary as it contains the frequencies of occurrence of all words
|
||
|
plus the frequencies of all the words that are similar to them.
|
||
|
@27. TX 3 @Make dictionary Dh from Dm
|
||
|
|
||
|
Make dictionary Dh creates a dictionary from dictionary Dm
|
||
|
and whose entries are zero except for those words in any set of
|
||
|
related words that are most frequent. It finds the dominant words in
|
||
|
each set of relations and stores their counts.
|
||
|
@28. TX 3 @Examine fuzzy dictionary Dm
|
||
|
|
||
|
Examine dictionary Dm allows users to analyse the contents of
|
||
|
dictionary Dm to find the most common words or those words that
|
||
|
contain the most information. The user supplies a frequency or
|
||
|
information cutoff and chooses to have the results sorted on either
|
||
|
value. The program will find the top 100 words that achieve the
|
||
|
cutoff values and present them to the user sorted as selected. The
|
||
|
information content will be calcutated from either Dw or Ds
|
||
|
depending which was used to create Dm, and using the current
|
||
|
composition setting. Typical dialogue follows:
|
||
|
|
||
|
? Menu or option number=28
|
||
|
Looking for highest scoring words
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =60
|
||
|
? Minimum information (0.00-1.00) (0.00) =.62
|
||
|
X 1 Sort on information
|
||
|
2 Sort on word score
|
||
|
? 0,1,2 =
|
||
|
|
||
|
? Maximum number to list (0-100) (100) =
|
||
|
|
||
|
The words are
|
||
|
Total words= 9 Maximum information= 0.7385326
|
||
|
TTGACA 60 0.73850
|
||
|
AAAAAC 64 0.66460
|
||
|
AAAAAA 90 0.64880
|
||
|
GTTTTT 66 0.64300
|
||
|
TTTTTG 73 0.64070
|
||
|
TTTTGT 63 0.63820
|
||
|
TTTTTC 65 0.63810
|
||
|
AAAATA 63 0.62670
|
||
|
TATAAT 65 0.62510
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =60
|
||
|
? Minimum information (0.00-1.00) (0.00) =.62
|
||
|
X 1 Sort on information
|
||
|
2 Sort on word score
|
||
|
? 0,1,2 =2
|
||
|
? Maximum number to list (0-100) (100) =
|
||
|
|
||
|
The words are
|
||
|
Total words= 9 Maximum information= 0.7385326
|
||
|
AAAAAA 90 0.64880
|
||
|
TTTTTG 73 0.64070
|
||
|
GTTTTT 66 0.64300
|
||
|
TTTTTC 65 0.63810
|
||
|
TATAAT 65 0.62510
|
||
|
AAAAAC 64 0.66460
|
||
|
TTTTGT 63 0.63820
|
||
|
AAAATA 63 0.62670
|
||
|
TTGACA 60 0.73850
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =!
|
||
|
|
||
|
@29. TX 3 @Examine fuzzy dictionary Dh
|
||
|
|
||
|
Examine dictionary Dh allows users to analyse the contents of
|
||
|
dictionary Dh to find the most common words or those words that
|
||
|
contain the most information. The user supplies a frequency or
|
||
|
information cutoff and chooses to have the results sorted on either
|
||
|
value. The program will find the top 100 words that achieve the
|
||
|
cutoff values and present them to the user sorted as selected. The
|
||
|
information content will be calcutated from either Dw or Ds
|
||
|
depending which was used to create Dh and using the current
|
||
|
composition setting. Typical dialogue follows:
|
||
|
|
||
|
? Menu or option number=29
|
||
|
Looking for highest scoring words
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =60
|
||
|
? Minimum information (0.00-1.00) (0.00) =.6
|
||
|
X 1 Sort on information
|
||
|
2 Sort on word score
|
||
|
? 0,1,2 =
|
||
|
|
||
|
? Maximum number to list (0-100) (100) =
|
||
|
|
||
|
The words are
|
||
|
Total words= 4 Maximum information= 0.7385326
|
||
|
TTGACA 60 0.73850
|
||
|
AAAAAA 90 0.64880
|
||
|
TATAAT 65 0.62510
|
||
|
TTTTTT 115 0.60630
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =50
|
||
|
? Minimum information (0.00-1.00) (0.00) =.5
|
||
|
X 1 Sort on information
|
||
|
2 Sort on word score
|
||
|
? 0,1,2 =
|
||
|
|
||
|
? Maximum number to list (0-100) (100) =
|
||
|
|
||
|
The words are
|
||
|
Total words= 8 Maximum information= 0.7385326
|
||
|
TTGACA 60 0.73850
|
||
|
TCTTGA 54 0.66080
|
||
|
AAAAAA 90 0.64880
|
||
|
TATAAT 65 0.62510
|
||
|
ACTTTA 57 0.61960
|
||
|
TTTTTT 115 0.60630
|
||
|
AGTATA 51 0.60540
|
||
|
TTATAA 55 0.59300
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =50
|
||
|
? Minimum information (0.00-1.00) (0.00) =
|
||
|
|
||
|
X 1 Sort on information
|
||
|
2 Sort on word score
|
||
|
? 0,1,2 =
|
||
|
|
||
|
? Maximum number to list (0-100) (100) =
|
||
|
|
||
|
The words are
|
||
|
Total words= 8 Maximum information= 0.7385326
|
||
|
TTGACA 60 0.73850
|
||
|
TCTTGA 54 0.66080
|
||
|
AAAAAA 90 0.64880
|
||
|
TATAAT 65 0.62510
|
||
|
ACTTTA 57 0.61960
|
||
|
TTTTTT 115 0.60630
|
||
|
AGTATA 51 0.60540
|
||
|
TTATAA 55 0.59300
|
||
|
The highest word score = 115
|
||
|
? Minimum word score (0-115) (0) =!
|
||
|
|
||
|
@30. TX 3 @Examine words in Dm
|
||
|
|
||
|
Examine words in Dm allows users to analyse the contents of
|
||
|
dictonary Dm at the level of individual words to find their
|
||
|
frequency, information content, and to see their base frequency
|
||
|
table. The user types in a word to examine and the program displays
|
||
|
the values and table. The information content will be calcutated
|
||
|
from either Dw or Ds depending which was used to create Dm, and
|
||
|
using the current composition setting. Typical dialogue follows:
|
||
|
? Menu or option number=30
|
||
|
? Word to examine=TTGACA
|
||
|
TtgacA 60 0.7385326
|
||
|
56 56 6 7 5 11
|
||
|
4 3 2 1 52 1
|
||
|
1 4 2 53 3 48
|
||
|
3 1 54 3 4 4
|
||
|
TTGACA
|
||
|
? Word to examine=TATAAT
|
||
|
taTAat 65 0.6251902
|
||
|
56 3 53 4 4 60
|
||
|
6 1 5 5 5 3
|
||
|
3 60 5 57 57 4
|
||
|
4 5 6 3 3 2
|
||
|
TATAAT
|
||
|
? Word to examine=
|
||
|
|
||
|
@31. TX 3 @Examine words in Dh
|
||
|
|
||
|
Examine words in Dh allows users to analyse the contents of
|
||
|
dictonary Dh at the level of individual words to find their
|
||
|
frequency, information content, and to see their base frequency
|
||
|
table. The user types in a word to examine and the program displays
|
||
|
the values and table. The information content will be calcutated
|
||
|
from either Dw or Ds depending which was used to create Dm, and
|
||
|
using the current composition setting. Typical dialogue follows:
|
||
|
|
||
|
? Menu or option number=31
|
||
|
? Word to examine=TTGACA
|
||
|
TtgacA 60 0.7385326
|
||
|
56 56 6 7 5 11
|
||
|
4 3 2 1 52 1
|
||
|
1 4 2 53 3 48
|
||
|
3 1 54 3 4 4
|
||
|
TTGACA
|
||
|
? Word to examine=TATAAT
|
||
|
taTAat 65 0.6251902
|
||
|
56 3 53 4 4 60
|
||
|
6 1 5 5 5 3
|
||
|
3 60 5 57 57 4
|
||
|
4 5 6 3 3 2
|
||
|
TATAAT
|
||
|
? Word to examine=GGGGGG
|
||
|
gggggg 0 0.6199890
|
||
|
3 1 1 2 3 4
|
||
|
1 3 1 2 2 1
|
||
|
2 1 1 1 1 1
|
||
|
11 12 14 12 11 11
|
||
|
GGGGGG
|
||
|
? Word to examine=
|
||
|
|
||
|
@32. TX 3 @Save or restore a dictionary
|
||
|
|
||
|
Save or restore dictionary allows users to write or read any
|
||
|
dictionary to and from disk files. The user is asked te define the
|
||
|
dictionary and file. The function is useful if the machine being
|
||
|
used is very slow at calculating because the files can be handled
|
||
|
quickly. However note that the files cannot be processed by any
|
||
|
other program.
|
||
|
@33. TX 1 @Find inverted repeats
|
||
|
|
||
|
Find inverted repeats performs searches for simple inverted
|
||
|
repeat sequences in each sequence. They are defined by a range of
|
||
|
loop sizes and a minimum number of potential basepairs. The results
|
||
|
can be plotted or listed. The x axis of the plot represents the
|
||
|
length of the aligned sequences and the y direction is divided into
|
||
|
sufficient strips to accommodate each sequence. So if an inverted
|
||
|
repeat is found in the 3rd sequence at a position equivalent to
|
||
|
halfway along the longest of the sequences then a short vertical
|
||
|
line will be drawn at the midpoint of the 3rd strip. Alternatively,
|
||
|
if the results are listed, the potential hairpin loops are drawn
|
||
|
out, with the sequence number and the position of the loop. Typical
|
||
|
dialogue follows.
|
||
|
|
||
|
? Menu or option number=33
|
||
|
Define the range of loop sizes
|
||
|
? Minimum loop size (0-10) (3) =0
|
||
|
? Maximum loop size (1-20) (3) =
|
||
|
? Minimum number of basepairs (1-20) (6) =
|
||
|
? (y/n) (y) Plot results N
|
||
|
Searching
|
||
|
|
||
|
Sequence 3 34
|
||
|
C
|
||
|
G.T
|
||
|
T-A
|
||
|
A-T
|
||
|
T.G
|
||
|
T.G
|
||
|
G.T
|
||
|
ATCTTT TATTTCA
|
||
|
33
|
||
|
|
||
|
Sequence 5 35
|
||
|
T
|
||
|
G.T
|
||
|
T.G
|
||
|
A-T
|
||
|
T.G
|
||
|
G.T
|
||
|
C-G
|
||
|
T.G
|
||
|
TCCGGC AATTGTG
|
||
|
34
|
||
|
@ End of help
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|