Sub-optimal RNA Folding Program Users Manual -------------------------------------------- Michael Zuker, Eric Nelson and John Jaeger Start: Initially, the following menu appears Enter run type 0 Regular run (default) 1 Save run 2 Continuation run In a regular run the program takes an RNA sequence as input, computes the energy matrix for the molecule, and produces various foldings as output. Since the computation of the energy matrix uses a great deal of time and resourses, the matrix can be saved before any output is generated (a save run) and later used to produce output (a continuation run). Regular or Continuation run -> step b Step a: At this point a prompt will appear asking for the name of the file into which the save matrix can be stored. -> step f Step b: The following menu will be displayed Enter run mode 0 Sub-optimal plot (default) 1 N best 2 Multiple molecules If the program is run in 'Sub-optimal plot' ("dot plot") mode, the energy matrix will be displayed graphically after it is computed. In 'N-best' mode the program will generate the suboptimal foldings within a certain percentage of the minimum energy. If 'Multiple molecules' ("multi") mode is chosen the program will run the N-best mode with every complete sequence in a file. This last option MUST be done in a regular run mode. N best or multi mode -> step d Step c: A prompt for the minimum number of points 'in a row' that will appear on the energy dot plot. Helices that are smaller than this number will not appear on the dot plot. -> step e Step d: Two prompts asking for values of 'N-best' parameters now appear: the percentage above the optimal energy which foldings must be within, and N. Step e: A prompt for the window parameter. The distance between any pair of computed foldings must be more than window. A simpler distance function is defined in: 1. Zuker M On Finding All Suboptimal Foldings of an RNA Molecule. Science, 244, 48-52, (1989) 2. Zuker M The Use of Dynamic Programming Algorithms in RNA Secondary Structure Prediction. in "Mathematical Methods for DNA Sequences", M. S. Waterman ed. CRC PRESS, INC., 159-184, (1989) The new definition of distance requires that any two computed foldings must contain more than 'window' base pairs that are in one folding and not in the other. Continuation run -> step h Step f: At this point a prompt for the name of a file containing one or more sequences (in Stanford, Genbank, EMBL, PIR, or NRC format) will appear. If the program is being run in 'multi' mode all of the sequences in the file will be folded, otherwise the program will ask for a selection from the file's contents (a portion of a sequence). Sequence data must be in upper case. The program recognizes A, C, G, and T or U. The characters B, Z, H, and V or W are recognized as A, C, G, and T or U respectively; but they are flagged by the program as being accessible to nuclease cleavage. A flagged base can pair only if its 3' neighbor is single stranded. Step g: Six files containing energy information are needed to run the program, and the names of these files are now requested. The default energy files are organized as follows: dangle.dat - single base stacking loop.dat - hairpin, bulge and interior loops stack.dat - base pair stacking energies tstack.dat - stacking energies for terminal mismatched pairs in interior and hairpin loops tloop.dat - a list of distinguished tetra-loops and the bonus eneries given to them. If you do not want to use this file, create a dummy file containing a few blank lines and use it instead. miscloop.dat - some miscellaneous energies (see files.list). These files can be replaced by dangle.025, loop.025, stack.025 etc. for folding at (for example) 25 deg. -> step i Step h: For a continuation run, a file previously created by a save run needs to be read in at this point. A prompt will appear asking for identification of this file. After the file is read, the energy rules and parameters used during the save run are output either to a file or the screen. Step i: Three different types of folding output formats can be produced: printer (which shows the secondary structure in a rough, but directly readable format), ct file, and Region table (both ct files and region tables can be used as input to certain other programs). Prompts will appear asking which types of output need to be produced. Step j: Main menu (see apendix A) Save run -> halt N-best and multi mode -> produce folding output Step k: Enter Dotplot section (see appendix B) Appendix A Main Menu The following menu will appear: 1 Energy Parameter 6 Single Prohibit 2 Single Force 7 Double Prohibit 3 Double Force 8 Begin Folding 4 Closed Excision 9 Show current 5 Open Excision 10 Clear current Selections 2 through 7 provide a way for the user to directly alter the possible secondary structure by forcing or prohibiting particular base-pairs. Each time one of these parameters is chosen, it is added to a list held in memory - selection 8 will print the list and 10 will erase the list. If '8' is chosen from the menu the program will continue past this section. NB : Options 2 and 3 force base pairs to occur. Base pairs are forced by giving them a bonus energy (EPARAM(9) in the program code). These energies are subtracted during the traceback algorithm so that the computed structures have the correct energies. Unfortunately, there is no way to subtract the bonus energies from the energy dot plots. Moreover, each forced base pair contain two bonus energies because of the nature of the algorithm. For example, suppose that an optimal folding of an RNA contains 3 forced base pairs ( default bonus energy is 50.0 kcal per forced base pair ) and that the correct folding energy is -180.0 kcal/mole. Internally, the energy will be -180.0 - (3+1) x 50.0 = -380.0 kcal/mole. To find foldings within 10% of the correct energy, one needs to compute foldings to within 18.0 kcal of -180.0 - 3 x 50.0 = -330.0 kcal/mole. This comes out to -312.0 kcal/mole. The ratio of -312.0 to -380.0 is 82%, so that one would request the 18% level of suboptimality! This confustion only exists when base pairs are forced. Each closed excision counts as one forced base pair. Choosing '1' from the above menu will result in the following (when the default 37 deg. energy files have been chosen) : Energy Parameters (10ths kcal/mole) 1 Extra stack energy [ 0] 2 Extra bulge energy [ 0] 3 Extra loop energy (interior) [ 0] 4 Extra loop energy (hairpin) [ 0] 5 Extra loop energy (multi) [ 46] 6 Multi loop energy/single-stranded base [ 4] 7 Maximum size of interior loop [ 30] 8 Maximum lopsidedness of an interior loop [ 30] 9 Bonus Energy [ -500] 10 Multi loop energy/closing base-pair [ 1] The energy parameters (along with the energy rules, which are read in from files) decide what a given folding will look like. For example, one could reduce the probability of a bulge loop by increasing parameter 2. Note that parameters 7 and 8 limit the maximum size and lopsidedness of bulge and interior loops. The default values of 30 should be sufficient for folding at 37 deg or less. If you wish to fold at high temperatures, it would be wise to increase these parameters to 60 or even 100. Note that this will increase folding times! Appendix B Dotplot on the IRIS When the DOTPLOT is chosen in a regular or continuation run, a non-resizable, non-movable window is created on which the triangular energy dot plot is displayed, along with some other useful information. All energy values are displayed in kilocalories/mole, and the i,j basepair locations are displayed in actual historical numbers (from the original sequence). Energy increments are integers in 10ths of a kcal/mole. POPUP MENUS In this version of DOTPLOT, all interaction with the program (except for point picking...see below) is done with popup menus. To cause the popup menu to be displayed, press the right mouse button. To select an item from the popup menu, drag the crosshairs over the item that you want to select, and release the mouse button. OPTIMAL SCORE This number represents the lowest possible energy for an i,j pair. This is the minimum RNA folding energy. If you are in multicolor mode (see COLORS below), the points whose scores are equal to the optimal will ALWAYS be displayed as black filled rectangles. ENERGY INCREMENT This represents the highest possible deviation in energy (in kcal) for which a point will be plotted. All base pairs that are in foldings within this increment from the minimum folding energy will be plotted. This increment can be changed by selecting "Enter new increment" from the popup menu. A one-line window will be displayed at the bottom of the screen prompting you for a new energy increment, entered in 10ths of a kcal. After entering a valid number and pressing , the screen will redraw with the new energy increment. Note that points that have already been found in previous computed structures (as well as points within WINDOW of these base pairs) will NOT be replotted when the energy dot plot is redrawn. This allows the user to select base pairing regions different from those that have already been found. POINT PICKING One of the features of DOTPLOT is the ability to select a base pair by picking a point using the crosshairs. To do this, just click with the left or middle mouse button on the point that you want. DOTPLOT will optimize this selection by looking at the eight points surrounding the point picked, and use the point with minimum energy, not necessarily the exact point picked. After you have clicked on a point, the historical numbering will be displayed as an (i,j) basepair. COMPUTING THE STRUCTURE After you have selected a valid i,j basepair, you can compute the best folding containing that structure selected by selecting "Compute structure for last i,j" from the popup menu. After computing the structure, the program will automatically return to DOTPLOT without you ever knowing that it had left. NOTE : If the computed structure contains fewer than WINDOW new base pairs that are insufficiently different from base pairs already computed, the structre will not be outputted. THE TEXTPORT If you had selected the output of foldings to go to the screen, you can use the textport to view them. Just select "Toggle textport on/off" from the popup menu and the textport will appear. Although this is the same window that you ran the DOTPLOT-calling program from, YOU CANNOT ENTER SHELL COMMANDS IN THE TEXTPORT (it will not respond to text input...it is simply a text output window). COLORS DOTPLOT has the ability to display the plot in up to seven colors (including black). Select the number of colors desired by moving the crosshairs over the "Colors ->" entry on the main popup menu and then moving the cursor to the right. This will activate a "rollover" menu from which you can the select the number of colors. DOTPLOT determines the color of the point to be plotted (except for optimal points, which are always black) by dividing the difference between the minimum energy and the minimum energy plus the energy increment into n-1 regions, where n is the number of colors. Black is reserved for optimal base pairs only. Each region has an associated color, and a point falling in that region will be plotted in that color. The order of the colors in decreasing optimality is: BLACK (optimal) RED GREEN YELLOW BLUE MAGENTA CYAN P-NUM PLOT DOTPLOT allows you to plot the number of base pairs that the ith base can form ( P-num(i) ) versus i (historical numbering). P-num(i) is the ordinate versus all i's in the segment (abscissa). Select this from the popup menu and a red rubber-band window will attach itself to the cross-hairs. Drag this window out to the desired size, and the plot will be drawn for the already defined energy increment. If the energy increment changes, the plot will be redrawn. To get rid of the P-num plot, you can click on the top-right "close box" on the window, select "close" from the window-margin popup menu, or select "Toggle P-num plot" from the main popup menu. Also, you may iconify the p-num plot window by clicking on the top left "iconify" box. NOTE: Although iconifying the p-num plot window will not affect the DOTPLOT routine or its parent program, it MAY stop control of the program until the window is un-iconified/closed (why this happens is yet unknown and may be corrected in the future). QUITTING To exit the program running DOTPLOT (and the DOTPLOT routine itself), just select "quit" from the popup menu.