gde_linux/ZUKER/mfold.user
2022-03-07 20:43:05 +00:00

358 lines
13 KiB
Text
Executable file

Sub-optimal RNA Folding Program Users Manual
--------------------------------------------
Michael Zuker, Eric Nelson and John Jaeger
Start:
Initially, the following menu appears
Enter run type
0 Regular run (default)
1 Save run
2 Continuation run
In a regular run the program takes an RNA sequence as input, computes
the energy matrix for the molecule, and produces various foldings as
output. Since the computation of the energy matrix uses a great deal
of time and resourses, the matrix can be saved before any output is
generated (a save run) and later used to produce output (a
continuation run).
Regular or Continuation run -> step b
Step a:
At this point a prompt will appear asking for the name of the file
into which the save matrix can be stored.
-> step f
Step b:
The following menu will be displayed
Enter run mode
0 Sub-optimal plot (default)
1 N best
2 Multiple molecules
If the program is run in 'Sub-optimal plot' ("dot plot") mode, the
energy matrix will be displayed graphically after it is computed. In
'N-best' mode the program will generate the suboptimal foldings
within a certain percentage of the minimum energy. If
'Multiple molecules' ("multi") mode is chosen the program will
run the N-best mode with every complete sequence in a file. This last
option MUST be done in a regular run mode.
N best or multi mode -> step d
Step c:
A prompt for the minimum number of points 'in a row' that will appear
on the energy dot plot. Helices that are smaller than this number
will not appear on the dot plot.
-> step e
Step d:
Two prompts asking for values of 'N-best' parameters now appear: the
percentage above the optimal energy which foldings must be within, and
N.
Step e:
A prompt for the window parameter. The distance between any pair of
computed foldings must be more than window. A simpler distance
function is defined in:
1. Zuker M
On Finding All Suboptimal Foldings of an RNA Molecule.
Science, 244, 48-52, (1989)
2. Zuker M
The Use of Dynamic Programming Algorithms in RNA Secondary
Structure Prediction.
in "Mathematical Methods for DNA Sequences", M. S. Waterman ed.
CRC PRESS, INC., 159-184, (1989)
The new definition of distance requires that any two computed
foldings must contain more than 'window' base pairs that are in one
folding and not in the other.
Continuation run -> step h
Step f:
At this point a prompt for the name of a file containing one or more
sequences (in Stanford, Genbank, EMBL, PIR, or NRC format) will
appear. If the program is being run in 'multi' mode all of the
sequences in the file will be folded, otherwise the program will ask
for a selection from the file's contents (a portion of a sequence).
Sequence data must be in upper case. The program recognizes A, C, G,
and T or U. The characters B, Z, H, and V or W are recognized as A,
C, G, and T or U respectively; but they are flagged by the program as
being accessible to nuclease cleavage. A flagged base can pair only
if its 3' neighbor is single stranded.
Step g:
Six files containing energy information are needed to run the
program, and the names of these files are now requested. The
default energy files are organized as follows:
dangle.dat - single base stacking
loop.dat - hairpin, bulge and interior loops
stack.dat - base pair stacking energies
tstack.dat - stacking energies for terminal mismatched pairs in
interior and hairpin loops
tloop.dat - a list of distinguished tetra-loops and the bonus eneries
given to them. If you do not want to use this file, create
a dummy file containing a few blank lines and use it instead.
miscloop.dat - some miscellaneous energies (see files.list).
These files can be replaced by dangle.025, loop.025, stack.025 etc.
for folding at (for example) 25 deg.
-> step i
Step h:
For a continuation run, a file previously created by a save run needs
to be read in at this point. A prompt will appear asking for
identification of this file. After the file is read, the energy rules
and parameters used during the save run are output either to a file or
the screen.
Step i:
Three different types of folding output formats can be produced:
printer (which shows the secondary structure in a rough, but directly
readable format), ct file, and Region table (both ct files and region
tables can be used as input to certain other programs). Prompts will
appear asking which types of output need to be produced.
Step j:
Main menu (see apendix A)
Save run -> halt
N-best and multi mode -> produce folding output
Step k:
Enter Dotplot section (see appendix B)
Appendix A
Main Menu
The following menu will appear:
1 Energy Parameter 6 Single Prohibit
2 Single Force 7 Double Prohibit
3 Double Force 8 Begin Folding
4 Closed Excision 9 Show current
5 Open Excision 10 Clear current
Selections 2 through 7 provide a way for the user to directly alter
the possible secondary structure by forcing or prohibiting particular
base-pairs. Each time one of these parameters is chosen, it is added
to a list held in memory - selection 8 will print the list and 10 will
erase the list. If '8' is chosen from the menu the program will
continue past this section.
NB : Options 2 and 3 force base pairs to occur.
Base pairs are forced by giving them a bonus energy (EPARAM(9) in the
program code). These energies are subtracted during the traceback
algorithm so that the computed structures have the correct energies.
Unfortunately, there is no way to subtract the bonus energies from
the energy dot plots. Moreover, each forced base pair contain two
bonus energies because of the nature of the algorithm. For example,
suppose that an optimal folding of an RNA contains 3 forced base
pairs ( default bonus energy is 50.0 kcal per forced base pair ) and
that the correct folding energy is -180.0 kcal/mole. Internally, the
energy will be -180.0 - (3+1) x 50.0 = -380.0 kcal/mole. To find
foldings within 10% of the correct energy, one needs to compute
foldings to within 18.0 kcal of -180.0 - 3 x 50.0 = -330.0 kcal/mole.
This comes out to -312.0 kcal/mole. The ratio of -312.0 to -380.0 is
82%, so that one would request the 18% level of suboptimality! This
confustion only exists when base pairs are forced. Each closed
excision counts as one forced base pair.
Choosing '1' from the above menu will result in the following (when
the default 37 deg. energy files have been chosen) :
Energy Parameters (10ths kcal/mole)
1 Extra stack energy [ 0]
2 Extra bulge energy [ 0]
3 Extra loop energy (interior) [ 0]
4 Extra loop energy (hairpin) [ 0]
5 Extra loop energy (multi) [ 46]
6 Multi loop energy/single-stranded base [ 4]
7 Maximum size of interior loop [ 30]
8 Maximum lopsidedness of an interior loop [ 30]
9 Bonus Energy [ -500]
10 Multi loop energy/closing base-pair [ 1]
The energy parameters (along with the energy rules, which are read in
from files) decide what a given folding will look like. For example,
one could reduce the probability of a bulge loop by increasing
parameter 2.
Note that parameters 7 and 8 limit the maximum size and lopsidedness
of bulge and interior loops. The default values of 30 should be
sufficient for folding at 37 deg or less. If you wish to fold at high
temperatures, it would be wise to increase these parameters to 60 or
even 100. Note that this will increase folding times!
Appendix B
Dotplot on the IRIS
When the DOTPLOT is chosen in a regular or continuation run,
a non-resizable, non-movable window is created on which the
triangular energy dot plot is displayed, along with some other
useful information. All energy values are displayed in
kilocalories/mole, and the i,j basepair locations are displayed in
actual historical numbers (from the original sequence). Energy
increments are integers in 10ths of a kcal/mole.
POPUP MENUS
In this version of DOTPLOT, all interaction with the program
(except for point picking...see below) is done with popup menus. To cause
the popup menu to be displayed, press the right mouse button. To select an
item from the popup menu, drag the crosshairs over the item that you want
to select, and release the mouse button.
OPTIMAL SCORE
This number represents the lowest possible energy for an i,j
pair. This is the minimum RNA folding energy. If you are in
multicolor mode (see COLORS below), the points whose scores are
equal to the optimal will ALWAYS be displayed as black filled
rectangles.
ENERGY INCREMENT
This represents the highest possible deviation in energy (in
kcal) for which a point will be plotted. All base pairs that are in
foldings within this increment from the minimum folding energy will
be plotted. This increment can be changed by selecting "Enter new
increment" from the popup menu. A one-line window will be displayed
at the bottom of the screen prompting you for a new energy
increment, entered in 10ths of a kcal. After entering a valid
number and pressing <RETURN>, the screen will redraw with the new
energy increment. Note that points that have already been found in
previous computed structures (as well as points within WINDOW of
these base pairs) will NOT be replotted when the energy dot plot is
redrawn. This allows the user to select base pairing regions
different from those that have already been found.
POINT PICKING
One of the features of DOTPLOT is the ability to select a
base pair by picking a point using the crosshairs. To do this, just
click with the left or middle mouse button on the point that you
want. DOTPLOT will optimize this selection by looking at the eight
points surrounding the point picked, and use the point with minimum
energy, not necessarily the exact point picked. After you have
clicked on a point, the historical numbering will be displayed as an
(i,j) basepair.
COMPUTING THE STRUCTURE
After you have selected a valid i,j basepair, you can
compute the best folding containing that structure selected by
selecting "Compute structure for last i,j" from the popup menu.
After computing the structure, the program will automatically return
to DOTPLOT without you ever knowing that it had left. NOTE : If the
computed structure contains fewer than WINDOW new base pairs that
are insufficiently different from base pairs already computed, the
structre will not be outputted.
THE TEXTPORT
If you had selected the output of foldings to go to the screen, you
can use the textport to view them. Just select "Toggle textport on/off"
from the popup menu and the textport will appear. Although this is the
same window that you ran the DOTPLOT-calling program from, YOU CANNOT ENTER
SHELL COMMANDS IN THE TEXTPORT (it will not respond to text input...it is
simply a text output window).
COLORS
DOTPLOT has the ability to display the plot in up to seven colors
(including black). Select the number of colors desired by moving the
crosshairs over the "Colors ->" entry on the main popup menu and then
moving the cursor to the right. This will activate a "rollover" menu from
which you can the select the number of colors. DOTPLOT determines the
color of the point to be plotted (except for optimal points, which are
always black) by dividing the difference between the minimum energy
and the minimum energy plus the energy increment into n-1 regions,
where n is the number of colors. Black is reserved for optimal base
pairs only. Each region has an associated color, and a point falling
in that region will be plotted in that color. The order of the
colors in decreasing optimality is:
BLACK (optimal)
RED
GREEN
YELLOW
BLUE
MAGENTA
CYAN
P-NUM PLOT
DOTPLOT allows you to plot the number of base pairs that the
ith base can form ( P-num(i) ) versus i (historical numbering).
P-num(i) is the ordinate versus all i's in the segment (abscissa).
Select this from the popup menu and a red rubber-band window will
attach itself to the cross-hairs. Drag this window out to the
desired size, and the plot will be drawn for the already defined
energy increment. If the energy increment changes, the plot will be
redrawn. To get rid of the P-num plot, you can click on the
top-right "close box" on the window, select "close" from the
window-margin popup menu, or select "Toggle P-num plot" from the
main popup menu. Also, you may iconify the p-num plot window by
clicking on the top left "iconify" box.
NOTE: Although iconifying the p-num plot window will not affect the DOTPLOT
routine or its parent program, it MAY stop control of the program until the
window is un-iconified/closed (why this happens is yet unknown and may be
corrected in the future).
QUITTING
To exit the program running DOTPLOT (and the DOTPLOT routine
itself), just select "quit" from the popup menu.