staden-lg/help/MEP.RNO

.NPA
.SP 1
.left margin1
@-1. TX  0 @General
.sp
@-2. T   0 @Screen control
.sp
@-2. X   0 @Screen
.sp
@-3. TX  0 @Dictionary analysis
.sp
@0. TX  -1 @MEP
.left margin2
.para
This is a program  for analysing families of nucleotide sequences in order 
to find common motifs and potential binding sites.
The ideas in this program were described in Staden, R.  "Methods 
for discovering novel motifs in nucleic acid sequences". 
Computer Applications in the Biosciences, 5, 293-298, (1989).
.PARA
The program can read 
sequences stored in either of two formats: 1) all sequences aligned in a 
single file; 2) all sequences in separate files and accessed through a file 
of file names.
.PARA
The program contains functions that can answer several questions 
about a set of sequences:
.SK1
.left margin2
 Which words are most common?
.left margin2
 Which words occur in the most sequences?
.left margin2
 Which words contain the most information?
.left margin2
 Which words occur in equivalent positions in the sequences?
.left margin2
 Which words are inverted repeats?
.left margin2
 Which words occur on both strands of the sequences?
.left margin2
 Where are the inverted repeats?
.left margin2
 Where are the fuzzy words?
.para
 Most of the program is 
concerned with analysing 
what it terms "fuzzy 
words" within the set of sequences. The analysis is explained 
below. Note that the standard version of the programs is limited 
to words of maximum length 8 letters, and a maximum fuzziness 
of 2.
.para
The following analyses (preceded by their option numbers) are included:
.lit
  ? = Help
  ! = Quit
  3 = Read new sequences
  4 = Redefine active region
  5 = List the sequences
  6 = List text file
  7 = Direct output to disk
 10 = Clear graphics
 11 = Clear text
 12 = Draw ruler
 13 = Use cross hair
 14 = Reset margins
 15 = Label diagram
 16 = Draw map
 17 = Search for strings
 18 = Set strand
 19 = Set composition
 20 = Set word length
 21 = Set number of mismatches
 22 = Show settings
 23 = Make dictionary Dw
 24 = Make dictionary Ds
 25 = Make fuzzy dictionary Dm from Dw
 26 = Make fuzzy dictionary Dm from Ds
 27 = Make fuzzy dictionary Dh from Dm
 28 = Examine fuzzy dictionary Dm
 29 = Examine fuzzy dictionary Dh
 30 = Examine words in Dm
 31 = Examine words in Dh
 32 = Save or restore a dictionary
 33 = Find inverted repeats
.end lit
.para
Some of these methods produce graphical 
 results 
and so the 
program is generally used from a graphics terminal (a vdu on which lines 
and points can be drawn as well as characters). 
.para
.LEFT MARGIN2
The positions of each of the plots is defined relative to a users drawing 
board which has size 1-10,000 in x and 1-10,000 in y.
Plots for
each option are drawn in a window defined by x0,y0 and xlength,ylength. 
Where x0,y0 is the position of the bottom left hand corner of the window,
  and xlength is the width of the window and ylength the 
height of the window.
.lit
   --------------------------------------------------------- 10,000
   1                                                       1
   1       --------------------------------------   ^      1
   1       1                                    1   1      1
   1       1                                    1   1      1
   1       1                                    1 ylength  1
   1       1                                    1   1      1
   1       1                                    1   1      1
   1       --------------------------------------   v      1
   1  x0,y0^                                               1
   1       <---------------xlength-------------->          1
   ---------------------------------------------------------      1
   1                                                   10,000

.end lit
All values are in drawing board units (i.e. 1-10,000, 1-10,000).
The default window positions are read from a file "MEPMARG" when the 
program is started. Users can have their own file if required.
.para
The options for the program are accessed from 3 main menus: general, screen 
control and dictionary analylsis.
Both menus and options are selected by number.
.para
The most important and novel part of the program is its use of "fuzzy 
dictionaries" and an information theory measure, to help show the most 
interesting motifs.

  Central to the method is the idea of a fuzzy dictionary of word 
frequencies. A dictionary of word frequencies is an ordered list of 
all the words in the sequences and a count of the number of times 
that they occur. A fuzzy dictionary is an equivalent list but which 
contains instead, for each word, a count of the number of times 
similar words occur in the sequences. We term words that are 
similar "relations". The fuzziness is defined by the number of 
letters in a word that are allowed to be different. So if we had a 
fuzziness of 1 we allow 1 letter to be different. For example, with 
a fuzziness of 1, the entry in the fuzzy dictionary for the word 
TTTTTT would contain a count of the numbers of times TTTTTT 
occured plus the number of times all words differing by exactly 
one letter from TTTTTT occured.   
.para
   Once the fuzzy dictionary has been created we can examine it in 
several ways to find candidate control sequences. The simplest 
question we can ask is which word in the dictionary is the most 
common.  Sometimes this simple criterion of "most common" may 
be adequate to discover a new motif but in general we would not 
expect it to be sufficient. For example some words will be common 
simply because of a base composition bias in the sequences being 
analysed. In addition a word can be the most frequent and yet not 
be "well defined". This last point is best explained by an example.
.para
   Suppose we were looking at  two letter words and allowing one 
mismatch, and that there were 10 occurences of TT and 5 of AC. 
We could align the 10 words that were one letter different from TT 
and the 5 that were  related to AC. Then we could count the 
number of times each base occured in each position for each of 
these two sets of words. Suppose we got the two base frequency 
tables shown below.
.lit
   TT                  AC
       T 6 4               T 1 0
       C 1 3               C 0 4
       A 1 2               A 4 1
       G 2 1               G 0 0

.end lit
These tables show that although TT occurs (with one letter 
mismatch) more often than AC, the ratio of base frequencies for 
AC at 4/5, 4/5 is higher than those for TT at 6/10, 4/10. Hence we 
would say that AC was better defined than TT.
Expressing this another way we would say that the definition of AC 
contained more information than that for TT. The program 
calculates the information content in a way that takes into account 
both the sequence composition and the level of definition of the 
motif.
.para
Definitions

.para
Here we deal only with the dictionary analysis.
Suppose we are dealing with a set of 
sequences and are examining them for words that are six 
characters in length.

.para
Dictionary Dw contains a count of the number of times each word 
occurs in the set of sequences. For example the entry for TTTTTT 
contains a value equal to the number of times the word TTTTTT 
occurs in the set of sequences.

.para
Dictionary Ds contains a count of the number of different sequences in 
which each word occurs. For example if the entry for word TTTTTT 
contains the value 10, it denotes that the word TTTTTT occurs in ten 
different sequences. Unlike Dw it only counts words once for each 
sequence. For example if we had a set of 100 sequences, the maximum 
possible value that Ds could take is 100, and this would only happen if 
a word occurred in every sequence. However for the same set of 
sequences, Dw could contain values greater than 100, and this would 
show that a word had occurred more than once in at least one 
sequence.

.para
From either of the two dictionaries Dw or Ds we can calculate a fuzzy 
dictionary Dm. For each word, the entry in the fuzzy dictionary Dm 
contains the sum of the dictionary values (taken from either Dw or Ds) 
for all words that differ from it by up to m letters. For example if m=2 
the entry for TTTTTT contains the number of times that TTTTTT 
occurs in the dictionary, plus the counts for all words that differ from 
TTTTTT by 1 or 2 letters. 
Obviously the interpretation of the values in Dm depends on which of 
the two dictionaries Dw or Ds they were derived from. When derived 
from Dw the entry for any word in Dm gives the total number of 
times it, and its relations, occur in the set of sequences. When derived 
from Ds the entry for any word in Dm gives the total number of 
different sequences that contain a word and each of its relations.

.para
Finally, from fuzzy dictionary Dm we can derive fuzzy dictionary Dh. 
All entries in Dh are zero except for the word(s), within each set of 
relations, that are most frequent. For example if TTTTTT occurred 20 
times but had a relation that occurred more often, then the entry for 
TTTTTT would be zero. However if TTTTTT did not have a more 
frequently occurring relation, then the entry for TTTTTT would 
contain the value 20. 

.LEFT MARGIN1
@1. T 0 @Help
.LEFT MARGIN2
.para
This option gives online help. The user should select option numbers and
the current documentation will be given. Note that option 0 gives an
introduction to the program, and that ? will get help from anywhere in 
the 
program.
The following analyses (preceded by their option numbers) are included:
.lit
  ? = Help
  ! = Quit
  3 = Read new sequences
  4 = Redefine active region
  5 = List the sequences
  6 = List text file
  7 = Direct output to disk
 10 = Clear graphics
 11 = Clear text
 12 = Draw ruler
 13 = Use cross hair
 14 = Reset margins
 15 = Label diagram
 16 = Draw map
 17 = Search for strings
 18 = Set strand
 19 = Set composition
 20 = Set word length
 21 = Set number of mismatches
 22 = Show settings
 23 = Make dictionary Dw
 24 = Make dictionary Ds
 25 = Make fuzzy dictionary Dm from Dw
 26 = Make fuzzy dictionary Dm from Ds
 27 = Make fuzzy dictionary Dh from Dm
 28 = Examine fuzzy dictionary Dm
 29 = Examine fuzzy dictionary Dh
 30 = Examine words in Dm
 31 = Examine words in Dh
 32 = Save or restore a dictionary
 33 = Find inverted repeats
.end lit
.left margin1
@2. T 0 @Quit
.left margin2
.para
This function stops the program.
.left margin1
@3. TX 1 @Read a new sequence
.LEFT MARGIN2
.para
It can read 
sequences stored in either of two formats: 1) all sequences aligned in a 
single file; 2) all sequences in separate files and accessed through a file 
of file names. Typical dialogue follows:
.lit
 
X 1 Read file of aligned sequences
  2 Use file of file names
? 0,1,2 =
 
? File of aligned sequences=F1
Number of files           88

.end lit
.left margin1
@4. TX 1 @Define active region
.LEFT MARGIN2
.para
For its analytic functions 
the program always works on a region of the sequence called the active 
region. When  new sequences are read into the program the active region is 
automatically set to start at the beginning of the sequences and go
up to the end of the longest one.
.left margin1
@5. TX 1 @List a sequence
.LEFT MARGIN2
.para
The sequence can be listed with line lengths of 50 bases with each sequence 
numbered in the order in which they were read.
Output can be directed to a disk file by 
first selecting disk output. Typical dialogue follows.
.lit

? Menu or option number=5

              10        20        30        40        50
   1  TAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCA
   2  CAAATAATCAATGTGGACTTTTCTGCCGTGATTATAGACACTTTTGTTAC
   3  TAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATGCTATGGTTATT
   4  ACTAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATGCTATGGTTA
   5  AGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGA
   6  TAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGC
   7  ACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCG
   8  GGGGCAAGGAGGATGGAAAGAGGTTGCCGTATAAAGAAACTAGAGTCCGT
   9  AGGGGGTGGAGGATTTAAGCCATCTCCTGATGACGCATAGTCAGCCCATC
  10  AAAACGTCATCGCTTGCATTAGAAAGGTTTCTGGCCGACCTTATAACCAT

              60
   1  TACCCGTTTTT
   2  GCGTTTTTGT
   3  TCATACCATAAG
   4  TTTCATACC
   5  ATTGTGAGC
   6  TTCCGGCTCG
   7  GAAGAGAGT
   8  TCAGGTGT
   9  ATGAATG
  10  TAATTACG
.end lit
.left margin1
@6. TX 1 @List a text file
.LEFT MARGIN2
.para
Allows the user to have a text file displayed on the screen. It will appear 
one page at a time.
.left margin1
@7. TX 1 @Direct output to disk
.LEFT MARGIN2
.para
Used to direct output that would normally appear on the screen to a file. 
.para
Select redirection of either text or graphics, and 
supply the name of the file that the output should be written to.
.para
 The results from the next options selected will not appear on the screen 
but will be written to the file. When option 7 is selected again
the file will be 
closed and output will again appear on the screen.
.left margin1
@10. TX 2 @Clear graphics
.LEFT MARGIN2
.para
 Clears the screen of both text and graphics.
.left margin1
@11. TX 2 @Clear text
.LEFT MARGIN2
.para
 Clears only text from the screen.
.left margin1
@12. TX 2 @Draw a ruler
.LEFT MARGIN2
.para
This option
allows the user to draw a ruler or scale along the x axis of the screen to 
help identify the coordinates of points of interest. The user can define 
the position of the first amino acid to be marked (for example if the 
active 
region is 1501 to 8000, the user might wish to mark every 1000th amino 
acid
starting at either 1501 or 2000 - it depends if the user wishes to treat 
the active region as an independent unit with its own numbering starting 
at 
its left edge, or as part of the whole sequence). The user can also define 
the separation of the ticks on the scale and their height. If required the 
labelling routine can be used to add numbers to the ticks.
.left margin1
@13. TX 2 @Use crosshair
.LEFT MARGIN2
.para
This function puts
a steerable cross on the screen that can be used to find the 
coordinates of points in the sequence. The user can move the cross 
around using the directional keys; when he hits the space bar the 
program will print out the coordinates of the cross in sequence units and 
the option will be exited.
.para
If instead, 
you hit a , the position will be displayed but the cross will remain on 
the screen.
.para
If a letter s is hit the sequence around the cross hair is displayed and 
the cross remains on the screen.
.left margin1
@14. TX 2 @Reposition plots
.LEFT MARGIN2
.para
The positions of each of the plots is defined relative to a users drawing 
board which has size 1-10,000 in x and 1-10,000 in y.
Plots for
each option are drawn in a window defined by x0,y0 and xlength,ylength. 
Where x0,y0 is the position of the bottom left hand corner of the window,
  and xlength is the width of the window and ylength the 
height of the window.
.lit
   --------------------------------------------------------- 10,000
   1                                                       1
   1       --------------------------------------   ^      1
   1       1                                    1   1      1
   1       1                                    1   1      1
   1       1                                    1 ylength  1
   1       1                                    1   1      1
   1       1                                    1   1      1
   1       --------------------------------------   v      1
   1  x0,y0^                                               1
   1       <---------------xlength-------------->          1
   ---------------------------------------------------------      1
   1                                                   10,000

.end lit
All values are in drawing board units (i.e. 1-10,000, 1-10,000).
The default window positions are read from a file "MEPMARG" when the 
program is started. Users can have their own file if required.
As all the plots start 
at the same position in x and have the same width, x0 and xlength are the 
same for all options. Generally users will only want to change the start 
level of the window y0 and its height ylength. 
 This option 
allows users to change window positions whilst running the program.
The routine prompts first for the number of the option that the users 
wishes 
to reposition; then for the y start and height; then for the x start and 
length. Note that changes to the x values affect all options. If the user 
types only carriage return for any value it will remain unchanged. 
The cross-hair can be used to choose suitable heights.
.LEFT MARGIN1
@15. TX 2 @Label a diagram
.LEFT MARGIN2
.para
This routine allows users to label any diagrams they have produced. They 
are asked to type in a label. When the user types carriage return to finish 
typing the label the cross-hair appears on the screen. The user can 
position it anywhere on the screen. If the user types R (for right justify)
 the label will be 
written on the diagram with its right end at the cross-hair position. 
If the user types L (for left justify) the label will be written on the 
diagram with its left end at the cross hair position.
The 
cross-hair will then immediately reappear. The user may put the same 
label 
on another part of the diagram as before or if he hits the space bar he 
will be asked if he wishes to type in another label.
.left margin1
@16. TX 2 @Display a map
.LEFT MARGIN2
.para
It is often convenient to plot a map alongside graphed analysis in order 
to 
indicate features within the sequence. This function allows users to 
draw 
maps using files arranged in the form of EMBL feature tables. Of course 
the 
EMBL table are usually only used for nucleic acid sequence annotation 
but, 
as long as the features are written in the correct format, they can be 
employed by this routine. The map is composed of a line representing the 
sequence and then further lines denoting the endpoints of each feature 
the 
user identifies. The user is asked to define height at which the line 
representing the sequence should be drawn; then for the feature height; 
then for the features to plot.
.left margin1
@17. TX 1 @Search for strings
.left margin2
.para
Search for strings
perfoms searches of all the sequences for selected words and 
shows which sequences they are found in. The user types in a word and 
defines the allowed number of mismatches. The results are listed or 
plotted. If listed the display includes the sequence number, the position 
in the sequence and the matching string.
The results are plotted in the 
following way. The x axis of the plot represents the length of the aligned
sequences and the y direction is divided into sufficient strips to accommodate
each sequence. So if a match is found in the 3rd sequence at a position
equivalent to halfway along the longest of the sequences then a short 
vertical line will be drawn at the midpoint of the 3rd strip. If the sequences
are aligned it can be useful if the motifs happen to appear in 
related positions. For example see the original publication. Typical 
dialogue follows.
.lit

? Menu or option number=17
X 1 Plot match positions
  2 Plot histogram of matches
? 0,1,2 =
? Word to search for=TTGACA
? Minimum match (0-6) (6) =5
? (y/n) (y) Plot results N
     2    35 TAGACA
     5    14 TTTACA
     6    37 TTTACA
    11    14 TAGACA
    14    14 TTGACA
    17    14 GTGACA
    17    22 TTAACA
    20     1 TTGACA
.end lit
.left margin1
@18. TX 3 @Set strand
.left margin2
.para
Set strand allows the user to define which strand(s) of the sequences to 
analyse: input stand, complement of input, or both.
.left margin1
@19. TX 3 @Set composition
.left margin2
.para
Set composition gives the user three choices for setting the composition 
of the sequences for use in the calculation of the information content of
words. The user can select the overall composition of the sequences as read,
an even composition, or can type in any other 4 values.
.left margin1
@20. TX 3 @Set word length
.left margin2
.para
Set word length sets the length of word for which dictionaries will be made.
.left margin1
@21. TX 3 @Set number of mismatches
.left margin2
.para
Set number of mismatches sets the level of fuzziness for the creation of 
dictionary Dm. 
.left margin1
@22. TX 3 @Show settings
.left margin2
.para
Show settings show the current settings for all parameters associated with 
dictionary analysis. A typical diaplsy follows:
.lit
 ? Menu or option number=22
 Current word length  =   6
 Number of mismatches =   1
 Start position       =     1
 End position         =    63
 Input strand only
 Observed composition
 Dictionary Dw unmade
 Dictionary Ds unmade
 Dictionary Dm unmade
 Dictionary Dh unmade
.end lit
.left margin1
@23. TX 3 @Make dictionary Dw
.left margin2
.para
Make dictionary Dw creates a dictionary that contains a count  of the
frequency of occurrence of each word in the collected sequences.
.left margin1
@24. TX 3 @Make dictionary Ds
.left margin2
.para
Make dictionary Ds creates a dictionary that contains a count of the
number of different sequences that contain each word.
.left margin1
@25. TX 3 @Make dictionary Dm from Dw
.left margin2
.para
Make dictionary Dm  from Dw creates a dictionary from dictionary Dw that
contains the frequency of occurrence of each word (say X) in Dw plus the
frequency of occurrence of each word in Dw that differs from X by up to m 
letters. Dm is called a fuzzy dictionary as it contains the frequencies of
occurrence of all words plus the frequencies of all the words that are 
similar to them.
.left margin1
@26. TX 3 @Make dictionary Dm from Ds
.left margin2
.para
Make dictionary Dm  from Ds creates a dictionary from dictionary Ds that
contains the frequency of occurrence of each word (say X) in Ds plus the
frequency of occurrence of each word in Ds that differs from X by up to m 
letters. Dm is called a fuzzy dictionary as it contains the frequencies of
occurrence of all words plus the frequencies of all the words that are 
similar to them.
.left margin1
@27. TX 3 @Make dictionary Dh from Dm
.left margin2
.para
Make dictionary Dh  creates a dictionary from dictionary Dm and whose
entries are zero except for those words in any set of related words that
are most frequent. It finds the dominant words in each set of relations 
and stores their counts.
.left margin1
@28. TX 3 @Examine fuzzy dictionary Dm
.left margin2
.para
Examine dictionary Dm allows users to analyse the contents of dictionary
Dm to find the most common words or those words that contain the most 
information. The user supplies a frequency or information cutoff and chooses
to have the results sorted on either value. The program will find the top 100
words that achieve the cutoff values and present them to the user sorted
as selected. The information content will be calcutated from either Dw or Ds 
depending which was used to create Dm, and using the current composition 
setting. Typical dialogue follows:
.lit

? Menu or option number=28
Looking for highest scoring words
The highest word score =          115
? Minimum word score (0-115) (0) =60
? Minimum information (0.00-1.00) (0.00) =.62
X 1 Sort on information
  2 Sort on word score
? 0,1,2 =
 
? Maximum number to list (0-100) (100) =
 
The words are
 Total words=           9 Maximum information=  0.7385326
TTGACA      60   0.73850
AAAAAC      64   0.66460
AAAAAA      90   0.64880
GTTTTT      66   0.64300
TTTTTG      73   0.64070
TTTTGT      63   0.63820
TTTTTC      65   0.63810
AAAATA      63   0.62670
TATAAT      65   0.62510
The highest word score =          115
? Minimum word score (0-115) (0) =60
? Minimum information (0.00-1.00) (0.00) =.62
X 1 Sort on information
  2 Sort on word score
? 0,1,2 =2
? Maximum number to list (0-100) (100) =
 
The words are
 Total words=           9 Maximum information=  0.7385326
AAAAAA      90   0.64880
TTTTTG      73   0.64070
GTTTTT      66   0.64300
TTTTTC      65   0.63810
TATAAT      65   0.62510
AAAAAC      64   0.66460
TTTTGT      63   0.63820
AAAATA      63   0.62670
TTGACA      60   0.73850
The highest word score =          115
? Minimum word score (0-115) (0) =!

.end lit
.left margin1
@29. TX 3 @Examine fuzzy dictionary Dh
.left margin2
.para
Examine dictionary Dh allows users to analyse the contents of dictionary  Dh
to find the most common words or those words that contain the most 
information. The user supplies a frequency or information cutoff and chooses 
to have the results sorted on either value. The program will find the top 100
words that achieve the cutoff values and present them to the user sorted as
selected. The information content will be calcutated from either Dw or Ds 
depending which was used to create Dh and using the current composition 
setting. Typical dialogue follows:
.lit

? Menu or option number=29
Looking for highest scoring words
The highest word score =          115
? Minimum word score (0-115) (0) =60
? Minimum information (0.00-1.00) (0.00) =.6
X 1 Sort on information
  2 Sort on word score
? 0,1,2 =
 
? Maximum number to list (0-100) (100) =
 
The words are
 Total words=           4 Maximum information=  0.7385326
TTGACA      60   0.73850
AAAAAA      90   0.64880
TATAAT      65   0.62510
TTTTTT     115   0.60630
The highest word score =          115
? Minimum word score (0-115) (0) =50
? Minimum information (0.00-1.00) (0.00) =.5
X 1 Sort on information
  2 Sort on word score
? 0,1,2 =
 
? Maximum number to list (0-100) (100) =
 
The words are
 Total words=           8 Maximum information=  0.7385326
TTGACA      60   0.73850
TCTTGA      54   0.66080
AAAAAA      90   0.64880
TATAAT      65   0.62510
ACTTTA      57   0.61960
TTTTTT     115   0.60630
AGTATA      51   0.60540
TTATAA      55   0.59300
The highest word score =          115
? Minimum word score (0-115) (0) =50
? Minimum information (0.00-1.00) (0.00) =
 
X 1 Sort on information
  2 Sort on word score
? 0,1,2 =
 
? Maximum number to list (0-100) (100) =
 
The words are
 Total words=           8 Maximum information=  0.7385326
TTGACA      60   0.73850
TCTTGA      54   0.66080
AAAAAA      90   0.64880
TATAAT      65   0.62510
ACTTTA      57   0.61960
TTTTTT     115   0.60630
AGTATA      51   0.60540
TTATAA      55   0.59300
The highest word score =          115
? Minimum word score (0-115) (0) =!

.end lit
.left margin1
@30. TX 3 @Examine words in Dm
.left margin2
.para
Examine words in Dm allows users to analyse the contents of dictonary Dm at the
level of individual words to find their frequency, information content, and to
see their base frequency table. The user types in a word to examine and the
program displays the values and table. The information content will be 
calcutated from either Dw or Ds depending which was used to create Dm,
and using the current composition setting. Typical dialogue follows:
.lit
? Menu or option number=30
? Word to examine=TTGACA
TtgacA            60  0.7385326
    56    56     6     7     5    11
     4     3     2     1    52     1
     1     4     2    53     3    48
     3     1    54     3     4     4
TTGACA
? Word to examine=TATAAT
taTAat            65  0.6251902
    56     3    53     4     4    60
     6     1     5     5     5     3
     3    60     5    57    57     4
     4     5     6     3     3     2
TATAAT
? Word to examine=

.end lit
.left margin1
@31. TX 3 @Examine words in Dh
.left margin2
.para
Examine words in Dh allows users to analyse the contents of dictonary Dh at the
level of individual words to find their frequency, information content, and to
see their base frequency table. The user types in a word to examine and the
program displays the values and table. The information content will be 
calcutated from either Dw or Ds depending which was used to create Dm,
and using the current composition setting. Typical dialogue follows:
.lit

 ? Menu or option number=31
? Word to examine=TTGACA
TtgacA            60  0.7385326
    56    56     6     7     5    11
     4     3     2     1    52     1
     1     4     2    53     3    48
     3     1    54     3     4     4
TTGACA
? Word to examine=TATAAT
taTAat            65  0.6251902
    56     3    53     4     4    60
     6     1     5     5     5     3
     3    60     5    57    57     4
     4     5     6     3     3     2
TATAAT
? Word to examine=GGGGGG
gggggg             0  0.6199890
     3     1     1     2     3     4
     1     3     1     2     2     1
     2     1     1     1     1     1
    11    12    14    12    11    11
GGGGGG
? Word to examine=

.end lit
.left margin1
@32. TX 3 @Save or restore a dictionary
.left margin2
.para
Save or restore dictionary allows users to write or read any dictionary to 
and from disk files. The user is asked te define the dictionary and file. The
function is useful if the machine being used is very slow at calculating 
because the files can be handled quickly. However note that the files 
cannot be processed by any other program.
.left margin1
@33. TX 1 @Find inverted repeats
.left margin2
.para
Find inverted repeats performs searches for simple inverted repeat sequences 
in each sequence. They are defined by a range of loop sizes and a minimum 
number of potential basepairs. The results can be plotted or listed. The x 
axis of the plot represents the length of the aligned sequences and the y 
direction is divided into sufficient strips to accommodate each sequence. 
So if an inverted repeat is found in the 3rd sequence at a position equivalent
to halfway along the longest of the sequences then a short vertical line will 
be drawn at the midpoint of the 3rd strip. Alternatively, if the results are
listed, the potential hairpin loops are drawn out, with the sequence number 
and the position of the loop. Typical dialogue follows.
.lit

? Menu or option number=33
Define the range of loop sizes
? Minimum loop size (0-10) (3) =0
? Maximum loop size (1-20) (3) =
? Minimum number of basepairs (1-20) (6) =
? (y/n) (y) Plot results N
 Searching

Sequence     3    34
           C       
          G.T      
          T-A      
          A-T      
          T.G      
          T.G      
          G.T      
     ATCTTT TATTTCA
         33

Sequence     5    35
           T       
          G.T      
          T.G      
          A-T      
          T.G      
          G.T      
          C-G      
          T.G      
     TCCGGC AATTGTG
         34
.end lit
.left margin1
@ End of help