Introduction to the Staden sequence analysis package and its user interface The package contains the following programs: GIP Gel input program SAP Sequence assemble program NIP Nucleotide interpretation program PIP Protein interpretation program SIP Similarity investigation program MEP Motif exploration program NIPL Nucleotide interpretation program (library) PIPL Protein interpretation program (library) SIPL Similarity investigation program (library) GIP uses a digitiser for entry of DNA sequences from autoradiographs. SAP handles everything relating to assembling gel readings in order to produce a consensus sequence. It can also deal with families of protein sequences. NIP provides functions for analysing and interpretting individual nucleotide sequences. PIP provides functions for analysing and interpretting individual protein sequences. MEP analyses families of nucleotide sequences to help discover new motifs. NIPL performs pattern searches on nucleotide sequence libraries. PIPL performs pattern searches on protein sequence libraries. SIP provides functions for comparing and aligning pairs of protein or nucleotide sequences. SIPL searches nucleotide and protein sequence libraries for entries similar to probe sequences. Documentation As is explained below, the programs SAP, NIP, PIP, SIP and MEP have online help, and the help files have the names: HELPSAP, HELPNIP, HELPPIP, HELPSIP, HELPMEP. These files can be displayed on the screen or printed using the appropriate commands. Currently the help for the other programs is also contained in these files. For example help for NIPL is in HELPNIP. This file is called HELPSTADEN. Sequence formats The shotgun sequencing program SAP deals only with simple text files for gel readings, and is a self-contained system. However as there is still no single agreed format for finished sequences or for libraries of sequences, the other programs in the package can read data that is stored in several ways. The analytical programs can read individual sequences stored in the following formats: Staden, EMBL, Genbank, PIR (also known as NBRF), and GCG, but for storing whole libraries we use only PIR format. In addition these programs can perform a number of simple operations using libraries stored in this format. They can extract entries by entry name, can search titles for keywords, can search the whole of the annotation files for keywords, and can extract annotations for any named entry. We reformat all sequence libraries into PIR format. Currently we have NBRF, EMBL, SWISSPROT and VECBASE libraries in PIR format. The library searching programs operate only on sequences stored in PIR format. The analytical programs will operate with uppercase or lowercase sequence characters. In addition T and U are equivalent. SAP uses uppercase letters for original gel readings and lowercase letters for characters that are corrected by the automatic editor. Programs NIP and PIP use IUB symbols for redundancy in back translations and for sequence searches. The symbols are shown below. NC-IUB SYMBOLS A,C,G,T R (A,G) 'puRine' Y (T,C) 'pYrimidine' W (A,T) 'Weak' S (C,G) 'Strong' M (A,C) 'aMino' K (G,T) 'Keto' H (A,T,C) 'not G' B (G,C,T) 'not A' V (G,A,C) 'not T' D (G,A,T) 'not C' N (G,A,C,T) 'aNy' The user interface The user interface is common to all programs. It consists of a set of menus and a uniform way of presenting choices and obtaining input from the user. This section describes: the menu system; how options are selected and other choices made; how values are supplied to the program; how help is obtained, and how to escape from any part of a program. In addition it gives information about saving results in files and the use of graphics for presenting results. Menus Each program has several menus and numerous options. Each menu or option has a unique number that is used to identify it. Menu numbers are distinguished from option numbers by being preceded by the letter m (or M, all programs make no distinction between upper and lower case letters). With the exception of some parts of program SAP, the menus are not hierachical, rather the options they each contain are simply lists of related functions and their identifying numbers. Therefore options can be selected independently of the menu that is currently being shown on the screen, and the menus are simply memory aides. All options and menus are selected by typing their option number when the programs present the prompt "? Menu or option number =". To select a menu type its number preceded by the letter M. To select an option type its number. If you type only "return" you will get menu m0 which is simply a list of menus. If you select an option you will return to the current menu after the function is completed. When you select an option, in many cases the program will immediately perform the operation selected without further dialogue. If you precede an option number by the letter d (e.g. D17), you will force the program to offer dialogue about the selected option before the function operates, hence allowing you to change the value of any of its parameters. If you precede an option number by the symbol ? (e.g. ?17), you will be given help on the option (here 17). Where possible, equivalent or identical options have been given the same numbers in all programs, and so users quickly learn the numbers for the functions they employ most often. Help As mentioned above, help about each option can be obtained by preceding the option number by the symbol ? when you are presented with the prompt "? Menu or option number", but there are two further ways of obtaining help. Whenever the program asks a question you can respond by typing the symbol ? and you will receive information about the current option. In addition, option number 1 in all the programs will give help on all of a programs functions. Quitting To exit from any point in a program you type ! for quit. If a menu is on the screen this will stop the program, otherwise you will be returned to the last menu. Other interactions Questions are presented in a few restricted ways. In all cases typing only "return" in response to a question means yes, and typing N or n means no. Obvious opposites such as "clear screen" and "keep picture" are presented with only the default shown. For example in this case the default is generally "keep picture" so the program will display: "(y/n) (y) Keep picture" and the picture will be retained if the user types anything other than N or n, (in which case the screen will be cleared). Where there are choices that are not obvious opposites, or there are more than two choices, two further conventions are used: "radio buttons" and "check boxes". Radio buttons are used when only one of a number of choices can be made at any one time. The choices are presented arranged one above the other, each choice with a number for its selection, and the default choice marked with an X. For example in the restriction enzyme search routine the following choices are offered: Select output mode 1 order results enzyme by enzyme 2 order results by positon X 3 show only infrequent cutters 4 show names above the sequence