358 lines
13 KiB
Text
Executable file
358 lines
13 KiB
Text
Executable file
|
|
|
|
|
|
Sub-optimal RNA Folding Program Users Manual
|
|
--------------------------------------------
|
|
|
|
Michael Zuker, Eric Nelson and John Jaeger
|
|
|
|
|
|
|
|
Start:
|
|
|
|
Initially, the following menu appears
|
|
|
|
Enter run type
|
|
0 Regular run (default)
|
|
1 Save run
|
|
2 Continuation run
|
|
|
|
In a regular run the program takes an RNA sequence as input, computes
|
|
the energy matrix for the molecule, and produces various foldings as
|
|
output. Since the computation of the energy matrix uses a great deal
|
|
of time and resourses, the matrix can be saved before any output is
|
|
generated (a save run) and later used to produce output (a
|
|
continuation run).
|
|
|
|
Regular or Continuation run -> step b
|
|
|
|
|
|
Step a:
|
|
|
|
At this point a prompt will appear asking for the name of the file
|
|
into which the save matrix can be stored.
|
|
|
|
-> step f
|
|
|
|
Step b:
|
|
|
|
The following menu will be displayed
|
|
|
|
Enter run mode
|
|
0 Sub-optimal plot (default)
|
|
1 N best
|
|
2 Multiple molecules
|
|
|
|
If the program is run in 'Sub-optimal plot' ("dot plot") mode, the
|
|
energy matrix will be displayed graphically after it is computed. In
|
|
'N-best' mode the program will generate the suboptimal foldings
|
|
within a certain percentage of the minimum energy. If
|
|
'Multiple molecules' ("multi") mode is chosen the program will
|
|
run the N-best mode with every complete sequence in a file. This last
|
|
option MUST be done in a regular run mode.
|
|
|
|
N best or multi mode -> step d
|
|
|
|
|
|
Step c:
|
|
|
|
A prompt for the minimum number of points 'in a row' that will appear
|
|
on the energy dot plot. Helices that are smaller than this number
|
|
will not appear on the dot plot.
|
|
|
|
-> step e
|
|
|
|
|
|
Step d:
|
|
|
|
Two prompts asking for values of 'N-best' parameters now appear: the
|
|
percentage above the optimal energy which foldings must be within, and
|
|
N.
|
|
|
|
|
|
Step e:
|
|
|
|
A prompt for the window parameter. The distance between any pair of
|
|
computed foldings must be more than window. A simpler distance
|
|
function is defined in:
|
|
1. Zuker M
|
|
On Finding All Suboptimal Foldings of an RNA Molecule.
|
|
Science, 244, 48-52, (1989)
|
|
2. Zuker M
|
|
The Use of Dynamic Programming Algorithms in RNA Secondary
|
|
Structure Prediction.
|
|
in "Mathematical Methods for DNA Sequences", M. S. Waterman ed.
|
|
CRC PRESS, INC., 159-184, (1989)
|
|
|
|
The new definition of distance requires that any two computed
|
|
foldings must contain more than 'window' base pairs that are in one
|
|
folding and not in the other.
|
|
|
|
Continuation run -> step h
|
|
|
|
|
|
Step f:
|
|
|
|
At this point a prompt for the name of a file containing one or more
|
|
sequences (in Stanford, Genbank, EMBL, PIR, or NRC format) will
|
|
appear. If the program is being run in 'multi' mode all of the
|
|
sequences in the file will be folded, otherwise the program will ask
|
|
for a selection from the file's contents (a portion of a sequence).
|
|
Sequence data must be in upper case. The program recognizes A, C, G,
|
|
and T or U. The characters B, Z, H, and V or W are recognized as A,
|
|
C, G, and T or U respectively; but they are flagged by the program as
|
|
being accessible to nuclease cleavage. A flagged base can pair only
|
|
if its 3' neighbor is single stranded.
|
|
|
|
|
|
Step g:
|
|
|
|
Six files containing energy information are needed to run the
|
|
program, and the names of these files are now requested. The
|
|
default energy files are organized as follows:
|
|
|
|
dangle.dat - single base stacking
|
|
loop.dat - hairpin, bulge and interior loops
|
|
stack.dat - base pair stacking energies
|
|
tstack.dat - stacking energies for terminal mismatched pairs in
|
|
interior and hairpin loops
|
|
tloop.dat - a list of distinguished tetra-loops and the bonus eneries
|
|
given to them. If you do not want to use this file, create
|
|
a dummy file containing a few blank lines and use it instead.
|
|
miscloop.dat - some miscellaneous energies (see files.list).
|
|
|
|
These files can be replaced by dangle.025, loop.025, stack.025 etc.
|
|
for folding at (for example) 25 deg.
|
|
|
|
-> step i
|
|
|
|
|
|
Step h:
|
|
|
|
For a continuation run, a file previously created by a save run needs
|
|
to be read in at this point. A prompt will appear asking for
|
|
identification of this file. After the file is read, the energy rules
|
|
and parameters used during the save run are output either to a file or
|
|
the screen.
|
|
|
|
Step i:
|
|
|
|
Three different types of folding output formats can be produced:
|
|
printer (which shows the secondary structure in a rough, but directly
|
|
readable format), ct file, and Region table (both ct files and region
|
|
tables can be used as input to certain other programs). Prompts will
|
|
appear asking which types of output need to be produced.
|
|
|
|
|
|
Step j:
|
|
|
|
Main menu (see apendix A)
|
|
|
|
Save run -> halt
|
|
N-best and multi mode -> produce folding output
|
|
|
|
|
|
Step k:
|
|
|
|
Enter Dotplot section (see appendix B)
|
|
|
|
|
|
|
|
|
|
Appendix A
|
|
Main Menu
|
|
|
|
|
|
The following menu will appear:
|
|
|
|
|
|
1 Energy Parameter 6 Single Prohibit
|
|
2 Single Force 7 Double Prohibit
|
|
3 Double Force 8 Begin Folding
|
|
4 Closed Excision 9 Show current
|
|
5 Open Excision 10 Clear current
|
|
|
|
Selections 2 through 7 provide a way for the user to directly alter
|
|
the possible secondary structure by forcing or prohibiting particular
|
|
base-pairs. Each time one of these parameters is chosen, it is added
|
|
to a list held in memory - selection 8 will print the list and 10 will
|
|
erase the list. If '8' is chosen from the menu the program will
|
|
continue past this section.
|
|
|
|
NB : Options 2 and 3 force base pairs to occur.
|
|
Base pairs are forced by giving them a bonus energy (EPARAM(9) in the
|
|
program code). These energies are subtracted during the traceback
|
|
algorithm so that the computed structures have the correct energies.
|
|
Unfortunately, there is no way to subtract the bonus energies from
|
|
the energy dot plots. Moreover, each forced base pair contain two
|
|
bonus energies because of the nature of the algorithm. For example,
|
|
suppose that an optimal folding of an RNA contains 3 forced base
|
|
pairs ( default bonus energy is 50.0 kcal per forced base pair ) and
|
|
that the correct folding energy is -180.0 kcal/mole. Internally, the
|
|
energy will be -180.0 - (3+1) x 50.0 = -380.0 kcal/mole. To find
|
|
foldings within 10% of the correct energy, one needs to compute
|
|
foldings to within 18.0 kcal of -180.0 - 3 x 50.0 = -330.0 kcal/mole.
|
|
This comes out to -312.0 kcal/mole. The ratio of -312.0 to -380.0 is
|
|
82%, so that one would request the 18% level of suboptimality! This
|
|
confustion only exists when base pairs are forced. Each closed
|
|
excision counts as one forced base pair.
|
|
|
|
Choosing '1' from the above menu will result in the following (when
|
|
the default 37 deg. energy files have been chosen) :
|
|
|
|
|
|
Energy Parameters (10ths kcal/mole)
|
|
|
|
1 Extra stack energy [ 0]
|
|
2 Extra bulge energy [ 0]
|
|
3 Extra loop energy (interior) [ 0]
|
|
4 Extra loop energy (hairpin) [ 0]
|
|
5 Extra loop energy (multi) [ 46]
|
|
6 Multi loop energy/single-stranded base [ 4]
|
|
7 Maximum size of interior loop [ 30]
|
|
8 Maximum lopsidedness of an interior loop [ 30]
|
|
9 Bonus Energy [ -500]
|
|
10 Multi loop energy/closing base-pair [ 1]
|
|
|
|
|
|
The energy parameters (along with the energy rules, which are read in
|
|
from files) decide what a given folding will look like. For example,
|
|
one could reduce the probability of a bulge loop by increasing
|
|
parameter 2.
|
|
|
|
Note that parameters 7 and 8 limit the maximum size and lopsidedness
|
|
of bulge and interior loops. The default values of 30 should be
|
|
sufficient for folding at 37 deg or less. If you wish to fold at high
|
|
temperatures, it would be wise to increase these parameters to 60 or
|
|
even 100. Note that this will increase folding times!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Appendix B
|
|
Dotplot on the IRIS
|
|
|
|
|
|
|
|
When the DOTPLOT is chosen in a regular or continuation run,
|
|
a non-resizable, non-movable window is created on which the
|
|
triangular energy dot plot is displayed, along with some other
|
|
useful information. All energy values are displayed in
|
|
kilocalories/mole, and the i,j basepair locations are displayed in
|
|
actual historical numbers (from the original sequence). Energy
|
|
increments are integers in 10ths of a kcal/mole.
|
|
|
|
POPUP MENUS
|
|
In this version of DOTPLOT, all interaction with the program
|
|
(except for point picking...see below) is done with popup menus. To cause
|
|
the popup menu to be displayed, press the right mouse button. To select an
|
|
item from the popup menu, drag the crosshairs over the item that you want
|
|
to select, and release the mouse button.
|
|
|
|
OPTIMAL SCORE
|
|
|
|
This number represents the lowest possible energy for an i,j
|
|
pair. This is the minimum RNA folding energy. If you are in
|
|
multicolor mode (see COLORS below), the points whose scores are
|
|
equal to the optimal will ALWAYS be displayed as black filled
|
|
rectangles.
|
|
|
|
ENERGY INCREMENT
|
|
|
|
This represents the highest possible deviation in energy (in
|
|
kcal) for which a point will be plotted. All base pairs that are in
|
|
foldings within this increment from the minimum folding energy will
|
|
be plotted. This increment can be changed by selecting "Enter new
|
|
increment" from the popup menu. A one-line window will be displayed
|
|
at the bottom of the screen prompting you for a new energy
|
|
increment, entered in 10ths of a kcal. After entering a valid
|
|
number and pressing <RETURN>, the screen will redraw with the new
|
|
energy increment. Note that points that have already been found in
|
|
previous computed structures (as well as points within WINDOW of
|
|
these base pairs) will NOT be replotted when the energy dot plot is
|
|
redrawn. This allows the user to select base pairing regions
|
|
different from those that have already been found.
|
|
|
|
POINT PICKING
|
|
|
|
One of the features of DOTPLOT is the ability to select a
|
|
base pair by picking a point using the crosshairs. To do this, just
|
|
click with the left or middle mouse button on the point that you
|
|
want. DOTPLOT will optimize this selection by looking at the eight
|
|
points surrounding the point picked, and use the point with minimum
|
|
energy, not necessarily the exact point picked. After you have
|
|
clicked on a point, the historical numbering will be displayed as an
|
|
(i,j) basepair.
|
|
|
|
COMPUTING THE STRUCTURE
|
|
|
|
After you have selected a valid i,j basepair, you can
|
|
compute the best folding containing that structure selected by
|
|
selecting "Compute structure for last i,j" from the popup menu.
|
|
After computing the structure, the program will automatically return
|
|
to DOTPLOT without you ever knowing that it had left. NOTE : If the
|
|
computed structure contains fewer than WINDOW new base pairs that
|
|
are insufficiently different from base pairs already computed, the
|
|
structre will not be outputted.
|
|
|
|
THE TEXTPORT
|
|
|
|
If you had selected the output of foldings to go to the screen, you
|
|
can use the textport to view them. Just select "Toggle textport on/off"
|
|
from the popup menu and the textport will appear. Although this is the
|
|
same window that you ran the DOTPLOT-calling program from, YOU CANNOT ENTER
|
|
SHELL COMMANDS IN THE TEXTPORT (it will not respond to text input...it is
|
|
simply a text output window).
|
|
|
|
COLORS
|
|
|
|
DOTPLOT has the ability to display the plot in up to seven colors
|
|
(including black). Select the number of colors desired by moving the
|
|
crosshairs over the "Colors ->" entry on the main popup menu and then
|
|
moving the cursor to the right. This will activate a "rollover" menu from
|
|
which you can the select the number of colors. DOTPLOT determines the
|
|
color of the point to be plotted (except for optimal points, which are
|
|
always black) by dividing the difference between the minimum energy
|
|
and the minimum energy plus the energy increment into n-1 regions,
|
|
where n is the number of colors. Black is reserved for optimal base
|
|
pairs only. Each region has an associated color, and a point falling
|
|
in that region will be plotted in that color. The order of the
|
|
colors in decreasing optimality is:
|
|
|
|
BLACK (optimal)
|
|
RED
|
|
GREEN
|
|
YELLOW
|
|
BLUE
|
|
MAGENTA
|
|
CYAN
|
|
|
|
P-NUM PLOT
|
|
|
|
DOTPLOT allows you to plot the number of base pairs that the
|
|
ith base can form ( P-num(i) ) versus i (historical numbering).
|
|
P-num(i) is the ordinate versus all i's in the segment (abscissa).
|
|
Select this from the popup menu and a red rubber-band window will
|
|
attach itself to the cross-hairs. Drag this window out to the
|
|
desired size, and the plot will be drawn for the already defined
|
|
energy increment. If the energy increment changes, the plot will be
|
|
redrawn. To get rid of the P-num plot, you can click on the
|
|
top-right "close box" on the window, select "close" from the
|
|
window-margin popup menu, or select "Toggle P-num plot" from the
|
|
main popup menu. Also, you may iconify the p-num plot window by
|
|
clicking on the top left "iconify" box.
|
|
|
|
NOTE: Although iconifying the p-num plot window will not affect the DOTPLOT
|
|
routine or its parent program, it MAY stop control of the program until the
|
|
window is un-iconified/closed (why this happens is yet unknown and may be
|
|
corrected in the future).
|
|
|
|
QUITTING
|
|
|
|
To exit the program running DOTPLOT (and the DOTPLOT routine
|
|
itself), just select "quit" from the popup menu.
|
|
|
|
|
|
|