.npa .left margin2 .para Introduction to the Staden sequence analysis package and its user interface .PARA The package contains the following programs: .lit GIP Gel input program SAP Sequence assemble program NIP Nucleotide interpretation program PIP Protein interpretation program SIP Similarity investigation program MEP Motif exploration program NIPL Nucleotide interpretation program (library) PIPL Protein interpretation program (library) SIPL Similarity investigation program (library) .end lit .left margin2 GIP uses a digitiser for entry of DNA sequences from autoradiographs. .left margin2 SAP handles everything relating to assembling gel readings in order to produce a consensus sequence. It can also deal with families of protein sequences. .left margin2 NIP provides functions for analysing and interpretting individual nucleotide sequences. .left margin2 PIP provides functions for analysing and interpretting individual protein sequences. .left margin2 MEP analyses families of nucleotide sequences to help discover new motifs. .left margin2 NIPL performs pattern searches on nucleotide sequence libraries. .left margin2 PIPL performs pattern searches on protein sequence libraries. .left margin2 SIP provides functions for comparing and aligning pairs of protein or nucleotide sequences. .left margin2 SIPL searches nucleotide and protein sequence libraries for entries similar to probe sequences. .left margin2 .sk1 .para Documentation .para As is explained below, the programs SAP, NIP, PIP, SIP and MEP have online help, and the help files have the names: HELPSAP, HELPNIP, HELPPIP, HELPSIP, HELPMEP. These files can be displayed on the screen or printed using the appropriate commands. Currently the help for the other programs is also contained in these files. For example help for NIPL is in HELPNIP. This file is called HELPSTADEN. .para Sequence formats .para The shotgun sequencing program SAP deals only with simple text files for gel readings, and is a self-contained system. However as there is still no single agreed format for finished sequences or for libraries of sequences, the other programs in the package can read data that is stored in several ways. .para The analytical programs can read individual sequences stored in the following formats: Staden, EMBL, Genbank, PIR (also known as NBRF), and GCG, but for storing whole libraries we use only PIR format. In addition these programs can perform a number of simple operations using libraries stored in this format. They can extract entries by entry name, can search titles for keywords, can search the whole of the annotation files for keywords, and can extract annotations for any named entry. We reformat all sequence libraries into PIR format. Currently we have NBRF, EMBL, SWISSPROT and VECBASE libraries in PIR format. .para The library searching programs operate only on sequences stored in PIR format. .para The analytical programs will operate with uppercase or lowercase sequence characters. In addition T and U are equivalent. SAP uses uppercase letters for original gel readings and lowercase letters for characters that are corrected by the automatic editor. Programs NIP and PIP use IUB symbols for redundancy in back translations and for sequence searches. The symbols are shown below. .LIT NC-IUB SYMBOLS A,C,G,T R (A,G) 'puRine' Y (T,C) 'pYrimidine' W (A,T) 'Weak' S (C,G) 'Strong' M (A,C) 'aMino' K (G,T) 'Keto' H (A,T,C) 'not G' B (G,C,T) 'not A' V (G,A,C) 'not T' D (G,A,T) 'not C' N (G,A,C,T) 'aNy' .end lit .PARA The user interface .PARA The user interface is common to all programs. It consists of a set of menus and a uniform way of presenting choices and obtaining input from the user. This section describes: the menu system; how options are selected and other choices made; how values are supplied to the program; how help is obtained, and how to escape from any part of a program. In addition it gives information about saving results in files and the use of graphics for presenting results. .para Menus .para Each program has several menus and numerous options. Each menu or option has a unique number that is used to identify it. Menu numbers are distinguished from option numbers by being preceded by the letter m (or M, all programs make no distinction between upper and lower case letters). With the exception of some parts of program SAP, the menus are not hierachical, rather the options they each contain are simply lists of related functions and their identifying numbers. Therefore options can be selected independently of the menu that is currently being shown on the screen, and the menus are simply memory aides. All options and menus are selected by typing their option number when the programs present the prompt .para "? Menu or option number =". .para To select a menu type its number preceded by the letter M. To select an option type its number. If you type only "return" you will get menu m0 which is simply a list of menus. If you select an option you will return to the current menu after the function is completed. .para When you select an option, in many cases the program will immediately perform the operation selected without further dialogue. If you precede an option number by the letter d (e.g. D17), you will force the program to offer dialogue about the selected option before the function operates, hence allowing you to change the value of any of its parameters. If you precede an option number by the symbol ? (e.g. ?17), you will be given help on the option (here 17). .para Where possible, equivalent or identical options have been given the same numbers in all programs, and so users quickly learn the numbers for the functions they employ most often. .para Help .para As mentioned above, help about each option can be obtained by preceding the option number by the symbol ? when you are presented with the prompt "? Menu or option number", but there are two further ways of obtaining help. Whenever the program asks a question you can respond by typing the symbol ? and you will receive information about the current option. In addition, option number 1 in all the programs will give help on all of a programs functions. .para Quitting .para To exit from any point in a program you type ! for quit. If a menu is on the screen this will stop the program, otherwise you will be returned to the last menu. .Para Other interactions .para Questions are presented in a few restricted ways. In all cases typing only "return" in response to a question means yes, and typing N or n means no. .para Obvious opposites such as "clear screen" and "keep picture" are presented with only the default shown. For example in this case the default is generally "keep picture" so the program will display: .para "(y/n) (y) Keep picture" .para and the picture will be retained if the user types anything other than N or n, (in which case the screen will be cleared). .para Where there are choices that are not obvious opposites, or there are more than two choices, two further conventions are used: "radio buttons" and "check boxes". .para Radio buttons are used when only one of a number of choices can be made at any one time. The choices are presented arranged one above the other, each choice with a number for its selection, and the default choice marked with an X. For example in the restriction enzyme search routine the following choices are offered: .para .lit Select output mode 1 order results enzyme by enzyme 2 order results by positon X 3 show only infrequent cutters 4 show names above the sequence ? Selection (1-4) (3) = .end lit Any single option can be selected by typing the option number, and the default option, (here shown as 3), is also obtained by typing only "return". Again help can be obtained by typing ? and you can quit by typing !. .para Check boxes are used when any number of a set of choices can be made (i.e. the choices are not exclusive). Choices are made by typing choice numbers. Each choice can be considered as a switch whose setting is reversed when it is selected. Choices that are currently switched on are marked with an X. The user quits from making selections by typing only "return". For example in the routine that plots base composition you can plot the frequencies of any combination of bases, e.g. only A, or A+T, or A+T+G etc. The following check box is offered to the user: .lit X 1 T 2 C X 3 A 4 G ? Selection (1-4) () = .END LIT As shown this will plot the A+T composition. To switch off T you select 1, to switch on C you select 2, etc, to quit, having set the bases required you type only "return". .para Input of numerical values .para All input of integer or decimal numbers is presented in a standard way with the allowed range shown in brackets and the default value also in brackets. For example: .para ? span (5-31) (11) = .para In this example you could type any number between 5 and 31, or "return" only, or ! or ? (see above). Any other input will cause the program to ask the question again. Typing only "return" gives the default value (here 11). .para Use of the bell .para The programs use the bell to indicate that a task is completed. This allows users to read textual results before they are scrolled up off the screen, or to look at a plot before it is scrolled over by the menus. When the bell sounds, the programs will wait until return is typed. You can quit from these points by typing ! but no help is available. .para Printing and saving results in files .para A few of the functions in the programs automatically write their textual results to disk files, but for most functions you can choose whether results appear on the terminal screen or go to a file. This applies to both text and graphical results. For these functions the normal, or default, place for results to appear is on the screen, and users need to decide before the function is selected if they want to redirect the results to a file. In all programs, option number 7, "Direct output to disk" gives control over whether results appear on the screen or go to a file. When a program is started results will be sent to the screen. If option 7 is selected users will be given the choice of redirecting either text or graphics to a file. The program will then ask users to supply a file name. From that point on all results will be sent to the file until option 7 is selected again, in which case the "redirection file" will be closed, and results will start to appear on the screen. .para If these files contain textual results they can be looked at from within the programs by using option 6, "List a text file". Once you leave the program you can use an appropriate system command to print the files. There is no function within the programs to direct files to a printer. .para The converse of the above is also possible. That is, it is possible to redirect results that would normally go to file, so that they appear instead on the screen. This is often useful as a way of checking results before saving them in a file. On a VAX using VMS you do this by typing TT: for the name of the file that the program would create. TT: is what VMS calls the screen. .para Use of graphics .para The analytical programs including NIP, PIP and SIP present the results of many of their analyses graphically. The position at which the results for any function appear on the screen is defined relative to a notional users "drawing board" of dimension 10,000 by 10,000. This drawing board fills the screen and results are drawn in windows defined using symbols x0,yo and xlength,ylength, where x0,y0 is the position of the bottom left hand corner of the window, and xlength is the width of the window and ylength the height of the window. .lit --------------------------------------------------------- 10,000 1 1 1 -------------------------------------- ^ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ylength 1 1 1 1 1 1 1 1 1 1 1 1 -------------------------------------- v 1 1 x0,y0^ 1 1 <---------------xlength--------------> 1 --------------------------------------------------------- 1 1 10,000 .end lit .para The window positions for each option are read from a file when a program is started. If required individual users could have their own set of plot positions, and also the positions can be redefined from within the programs using option number 14. .para For those analyses that draw continuous lines to represent results (for example a plot of base composition) the user is asked to supply the "Plot interval". All the analyses produce a value for every point along the sequence but often it is unnecessary to actually plot the values for all the points. The plot interval is simply the distance between the points shown on the screen. If the user selects a plot interval of 1, every point will be plotted; a plot interval of 3 will show every third point. It is a way of speeding up the analyses. .para Saving graphics .para Many terminals are not capable of dumping their screen contents to a file for subsequent printing. One convenient way of obtaining hard copy of graphical results is to use a micro computer as a terminal. On the Macintosh we use the terminal emulator versa termPro. This allows graphics to be saved as Macintosh files that can be annotated and printed using Macdraw and other painting programs. .para Alternatively graphics can be redirected to a file and printed using a laser printer with tektronix capability (see "Printing and saving results in files").