354 lines
14 KiB
Text
354 lines
14 KiB
Text
.npa
|
|
.left margin2
|
|
.para
|
|
Introduction to the Staden sequence analysis package and its user interface
|
|
.PARA
|
|
The package contains the following programs:
|
|
.lit
|
|
|
|
GIP Gel input program
|
|
SAP Sequence assemble program
|
|
NIP Nucleotide interpretation program
|
|
PIP Protein interpretation program
|
|
SIP Similarity investigation program
|
|
MEP Motif exploration program
|
|
NIPL Nucleotide interpretation program (library)
|
|
PIPL Protein interpretation program (library)
|
|
SIPL Similarity investigation program (library)
|
|
|
|
.end lit
|
|
.left margin2
|
|
GIP uses a digitiser for entry of DNA sequences from
|
|
autoradiographs.
|
|
.left margin2
|
|
SAP handles everything relating to assembling gel
|
|
readings in order to produce a consensus sequence. It can also deal with
|
|
families of protein sequences.
|
|
.left margin2
|
|
NIP provides functions for analysing and interpretting
|
|
individual nucleotide sequences.
|
|
.left margin2
|
|
PIP provides functions for analysing and interpretting
|
|
individual protein sequences.
|
|
.left margin2
|
|
MEP analyses families of nucleotide sequences to help discover new motifs.
|
|
.left margin2
|
|
NIPL performs pattern searches on nucleotide sequence libraries.
|
|
.left margin2
|
|
PIPL performs pattern searches on protein sequence libraries.
|
|
.left margin2
|
|
SIP provides functions for comparing and aligning
|
|
pairs of protein or nucleotide sequences.
|
|
.left margin2
|
|
SIPL searches nucleotide and protein sequence
|
|
libraries for entries similar to probe sequences.
|
|
.left margin2
|
|
.sk1
|
|
.para
|
|
Documentation
|
|
.para
|
|
As is explained below, the
|
|
programs SAP, NIP, PIP, SIP and MEP have online help,
|
|
and the help files have the names: HELPSAP, HELPNIP, HELPPIP, HELPSIP,
|
|
HELPMEP. These
|
|
files can be displayed on the screen or printed using the appropriate
|
|
commands. Currently the help for the other programs is also contained in
|
|
these files. For example help for NIPL is in HELPNIP. This file is called
|
|
HELPSTADEN.
|
|
.para
|
|
Sequence formats
|
|
.para
|
|
The shotgun sequencing program SAP deals only with simple
|
|
text files for gel readings, and is a self-contained system.
|
|
However as there is still no single agreed format
|
|
for finished sequences or for libraries of sequences,
|
|
the other programs in the package can read data that is stored in several ways.
|
|
.para
|
|
The analytical programs can read individual sequences stored in the following
|
|
formats:
|
|
Staden, EMBL, Genbank, PIR (also known as NBRF), and GCG, but for storing whole
|
|
libraries we use only PIR format. In addition
|
|
these programs can perform a number of
|
|
simple operations using libraries stored in this format. They can extract
|
|
entries by entry name, can search titles for keywords, can search the whole
|
|
of the annotation files for keywords, and can extract annotations for any
|
|
named entry.
|
|
We reformat all sequence libraries into PIR format. Currently we
|
|
have NBRF, EMBL, SWISSPROT and VECBASE libraries in PIR format.
|
|
.para
|
|
The library searching programs operate only
|
|
on sequences stored in PIR format.
|
|
.para
|
|
The analytical programs
|
|
will operate with uppercase or lowercase sequence
|
|
characters. In addition T and U are equivalent. SAP uses uppercase letters
|
|
for original gel readings and lowercase letters for characters that are
|
|
corrected by the automatic editor.
|
|
Programs NIP and PIP use IUB symbols for redundancy in back translations
|
|
and for sequence searches.
|
|
The symbols are shown below.
|
|
.LIT
|
|
|
|
|
|
NC-IUB SYMBOLS
|
|
|
|
A,C,G,T
|
|
R (A,G) 'puRine'
|
|
Y (T,C) 'pYrimidine'
|
|
W (A,T) 'Weak'
|
|
S (C,G) 'Strong'
|
|
M (A,C) 'aMino'
|
|
K (G,T) 'Keto'
|
|
H (A,T,C) 'not G'
|
|
B (G,C,T) 'not A'
|
|
V (G,A,C) 'not T'
|
|
D (G,A,T) 'not C'
|
|
N (G,A,C,T) 'aNy'
|
|
|
|
.end lit
|
|
.PARA
|
|
The user interface
|
|
.PARA
|
|
The user interface is common to all programs.
|
|
It consists of a set of menus and a uniform way
|
|
of presenting choices and obtaining input
|
|
from the user. This section describes: the
|
|
menu system; how options are selected and other choices made; how values
|
|
are supplied to the program; how help is obtained, and
|
|
how to escape from any part of a program. In addition it gives information
|
|
about saving results in files and the use of graphics for presenting
|
|
results.
|
|
.para
|
|
Menus
|
|
.para
|
|
Each program has several menus and numerous options.
|
|
Each menu or option has a unique number that is used to
|
|
identify it. Menu numbers are distinguished from
|
|
option numbers by being preceded by the letter
|
|
m (or M, all programs make no distinction between
|
|
upper and lower case letters). With the exception of
|
|
some parts of program SAP, the menus are not hierachical,
|
|
rather the options they each contain are simply lists of
|
|
related functions and their identifying numbers.
|
|
Therefore options can be selected independently
|
|
of the menu that is currently being shown on the
|
|
screen, and the menus are simply memory aides.
|
|
All options and menus are selected by typing their
|
|
option number when the programs present the prompt
|
|
.para
|
|
"? Menu or option number =".
|
|
.para
|
|
To select a menu type its number preceded by
|
|
the letter M. To select an option type its number.
|
|
If you type only "return" you will get menu m0
|
|
which is simply a list of menus. If you select an
|
|
option you will return to the current menu after the function is completed.
|
|
.para
|
|
When you select an option, in many cases the
|
|
program will immediately perform the operation
|
|
selected without further dialogue. If you precede an option
|
|
number by the letter d (e.g. D17), you
|
|
will force the program to offer dialogue about the selected option
|
|
before the function operates,
|
|
hence allowing you to change the value of any of its parameters. If
|
|
you precede an option number by the symbol ? (e.g. ?17),
|
|
you will be given help on the option (here 17).
|
|
.para
|
|
Where possible, equivalent or identical options have been given the same
|
|
numbers in all programs, and so users quickly learn the numbers for
|
|
the functions they employ most often.
|
|
.para
|
|
Help
|
|
.para
|
|
As mentioned above, help about each option can be obtained by
|
|
preceding the option number by the symbol ? when you are presented
|
|
with the prompt "? Menu or option number", but there are two further
|
|
ways of obtaining help. Whenever the program asks a question
|
|
you can respond by typing the symbol ? and you will receive information
|
|
about the current option. In addition, option number 1
|
|
in all the programs will give help on all of a programs functions.
|
|
.para
|
|
Quitting
|
|
.para
|
|
To exit from any point in a program you type ! for quit.
|
|
If a menu is on the screen this will stop the program, otherwise
|
|
you will be returned to the last menu.
|
|
.Para
|
|
Other interactions
|
|
.para
|
|
Questions are presented in a few restricted ways.
|
|
In all cases typing only "return" in response to a question means
|
|
yes, and typing N or n means no.
|
|
.para
|
|
Obvious opposites such as "clear screen" and "keep picture"
|
|
are presented with only the default shown. For example
|
|
in this case the default is generally "keep picture" so the
|
|
program will display:
|
|
.para
|
|
"(y/n) (y) Keep picture"
|
|
.para
|
|
and the picture will be retained if the user types anything other than N or
|
|
n, (in which case the screen will be cleared).
|
|
.para
|
|
Where there are choices that are not obvious opposites, or
|
|
there are more than two choices, two further conventions are used:
|
|
"radio buttons" and "check boxes".
|
|
.para
|
|
|
|
Radio buttons are used when only one of a number of choices can be
|
|
made at any one time. The choices are presented arranged one above the
|
|
other, each choice with a number for its selection, and the default
|
|
choice marked with an X. For example in the restriction
|
|
enzyme search routine the following choices are offered:
|
|
.para
|
|
.lit
|
|
|
|
Select output mode
|
|
1 order results enzyme by enzyme
|
|
2 order results by positon
|
|
X 3 show only infrequent cutters
|
|
4 show names above the sequence
|
|
? Selection (1-4) (3) =
|
|
|
|
.end lit
|
|
Any single option can be selected by typing the option number,
|
|
and the default option, (here shown as 3), is also obtained by
|
|
typing only "return". Again help can be obtained by typing ? and
|
|
you can quit by typing !.
|
|
.para
|
|
Check boxes are used when any number of a set of choices can be
|
|
made (i.e. the choices are not exclusive). Choices are
|
|
made by typing choice numbers. Each choice can be considered
|
|
as a switch whose setting is reversed when it is selected. Choices that are
|
|
currently switched on are marked with an X.
|
|
The user quits from making selections by typing only
|
|
"return". For example in the routine that plots base composition
|
|
you can plot the frequencies of any combination of bases, e.g. only
|
|
A, or A+T, or A+T+G etc.
|
|
The following check box is offered to the user:
|
|
.lit
|
|
|
|
X 1 T
|
|
2 C
|
|
X 3 A
|
|
4 G
|
|
? Selection (1-4) () =
|
|
|
|
.END LIT
|
|
As shown this will plot the A+T composition. To switch off T
|
|
you select 1, to switch on C you select 2, etc, to quit,
|
|
having set the bases required you type only "return".
|
|
.para
|
|
Input of numerical values
|
|
.para
|
|
All input of integer or decimal numbers is presented in a
|
|
standard way with the allowed range shown in brackets and the default
|
|
value also in brackets. For example:
|
|
.para
|
|
? span (5-31) (11) =
|
|
.para
|
|
In this example you could type any number between 5 and 31,
|
|
or "return" only, or ! or ? (see above). Any other input will cause the
|
|
program to ask the question again. Typing only "return" gives the default
|
|
value (here 11).
|
|
.para
|
|
Use of the bell
|
|
.para
|
|
The programs use the bell to indicate that a task is completed.
|
|
This allows users to read textual results before they are scrolled up off
|
|
the screen, or to look at a plot before it is scrolled over by the menus.
|
|
When the bell sounds, the programs will wait
|
|
until return is typed. You can quit from these points by typing ! but
|
|
no help is available.
|
|
.para
|
|
Printing and saving results in files
|
|
.para
|
|
A few of the functions in the programs automatically write their textual
|
|
results
|
|
to disk files, but for most functions you can choose whether results
|
|
appear on the terminal screen or go to a file. This applies to both text
|
|
and graphical results.
|
|
For these functions
|
|
the normal, or default, place for results to
|
|
appear is on the screen, and users need to decide before the
|
|
function is selected if they want to redirect the results to a file.
|
|
In all programs, option number 7, "Direct output to disk" gives control
|
|
over whether results appear on the screen or go to a file. When a program
|
|
is started results will be sent to the screen. If option 7 is selected
|
|
users will be given the choice of redirecting either text or graphics to a
|
|
file. The program will then ask users to supply a file name. From that
|
|
point on all results will be sent to the file until option 7 is selected again,
|
|
in which case the "redirection file" will be closed, and results will start
|
|
to appear on the screen.
|
|
.para
|
|
If these files contain textual results they can be looked at
|
|
from within the programs
|
|
by using option 6, "List a text file". Once you leave the program
|
|
you can use an appropriate system command to print the files.
|
|
There is no function within the programs to direct files to a printer.
|
|
.para
|
|
The converse of the above is also possible. That
|
|
is, it is possible to redirect results that would normally go to file,
|
|
so that they appear instead on the screen. This is often useful as a way
|
|
of checking results before saving them in a file. On a VAX using
|
|
VMS you do this by typing TT: for the name of the file that the
|
|
program would create. TT: is what VMS calls the screen.
|
|
.para
|
|
Use of graphics
|
|
.para
|
|
The analytical programs including NIP, PIP and SIP present the results of
|
|
many of their analyses graphically. The position at which the results for
|
|
any function appear on the screen is defined relative to a notional users
|
|
"drawing board" of dimension 10,000 by 10,000. This drawing board fills the
|
|
screen and results are drawn in windows defined using symbols x0,yo and
|
|
xlength,ylength,
|
|
where x0,y0 is the position of the bottom left hand corner of the window,
|
|
and xlength is the width of the window and ylength the
|
|
height of the window.
|
|
.lit
|
|
|
|
--------------------------------------------------------- 10,000
|
|
1 1
|
|
1 -------------------------------------- ^ 1
|
|
1 1 1 1 1
|
|
1 1 1 1 1
|
|
1 1 1 ylength 1
|
|
1 1 1 1 1
|
|
1 1 1 1 1
|
|
1 -------------------------------------- v 1
|
|
1 x0,y0^ 1
|
|
1 <---------------xlength--------------> 1
|
|
--------------------------------------------------------- 1
|
|
1 10,000
|
|
|
|
.end lit
|
|
.para
|
|
The window positions for each option are read from a file
|
|
when a program is started. If required individual users could have their
|
|
own set of plot positions, and also the positions
|
|
can be redefined from within the
|
|
programs using option number 14.
|
|
.para
|
|
For those analyses that draw continuous lines to represent results
|
|
(for example a plot of base composition) the user is asked to supply the
|
|
"Plot interval". All the analyses produce a value for every point along the
|
|
sequence but often it is unnecessary to actually plot the
|
|
values for all the points.
|
|
The plot interval is simply the distance between the points
|
|
shown on the screen. If the user selects a plot interval of 1, every point
|
|
will be plotted; a plot interval of 3 will show every third point. It is a
|
|
way of speeding up the analyses.
|
|
.para
|
|
Saving graphics
|
|
.para
|
|
Many terminals are not capable of dumping their screen contents to a
|
|
file for subsequent printing. One convenient way of obtaining hard copy
|
|
of graphical results is to use a micro computer as a terminal. On
|
|
the Macintosh we use the terminal emulator versa
|
|
termPro. This allows graphics to be saved as
|
|
Macintosh files that can be annotated and printed using
|
|
Macdraw and other painting programs.
|
|
.para
|
|
Alternatively graphics can be redirected to a file and printed using a
|
|
laser printer with tektronix capability (see
|
|
"Printing and saving results in files").
|