staden-lg/README.txt

297 lines
10 KiB
Plaintext

General Information
(Not for the faint hearted)
30 September 1992
0. Introduction
---------------
This document contains information on the following subjects:
1. Installing the Staden Package on SPARCstations and DECstations
2. Installing the Staden Package on Other Machines
3. A Quick Guide to What's on the Release Tape
4. Overview of Data Flow During Sequence Assembly
5. Acknowledgements
1. Installing the Staden Package on SPARCstations and DECstations
-----------------------------------------------------------------
We are endeavouring to make the installation of the Staden Package as
quick and as easy as possible. In this current release we provide
statically linked sparc and mips executables as well as all sources.
To install the package:
1) Create a new directory for the software. You may have to log on as
superuser to do this.
% mkdir -p /home/BioSW/staden
2) Place the distribution tape in the drive and down load the package:
-sun-
% tar xvf /dev/rst0
...system messages...
-dec-
% tar xvf /dev/rmt0h
...system messages...
3) Users of the C Shell should add the following to his/her .login
file:
setenv STADENROOT /home/BioSW/staden
source $STADENROOT/staden.login
Users of the Bourne shell should add the following to their .profile
file:
STADENROOT=/home/BioSW/staden
export STADENROOT
. $STADENROOT/staden.profile
4) When the user next logs onto the work station the required
initialisation will automatically be performed, and the programs in
the Staden package can be run. Refer to the help/*.MEM files for
information on the various program. (eg help on xdap is in
help/DAP.MEM)
2. Installing the Staden Package on Other Machines
--------------------------------------------------
This is a little more difficult as you will need to remake all the
executables. Your system configuration may also mean that some changes
will need to be made, though hopefully only to makefiles. We provide
a script to aid installation (we hope!), but you may prefer to make
all the components manually.
To remake the Staden package you will require the following:
1) A Fortran77 compiler
2) An ANSI C compiler
3) X11 Release 4, including the Athena Widget libraries.
Start by following step 1 through 3 above, to unload the sources and
perform initialisations. Read the rest of this document and the other
help files. Look at the make files. Follow your nose!
If you have any problems or successes porting our software to other
platforms we would love to hear from you. We would also appreciate
receiving your general comments on the package.
Rodger Staden (principle author)
phone: +44 223 402389 email: rs@mrc-lmba.cam.ac.uk
post: MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, U.K.
Simon Dear:
phone: +44 223 402266 email: sd@mrc-lmba.cam.ac.uk
post: MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, U.K.
James Bonfield:
phome: +44 223 402499 email: jkb@mrc-lmba.cam.ac.uk
post: MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, U.K.
3. A Quick Guide to What's on the Release Tape
----------------------------------------------
The directory structure on this tape is very important. Once set up, the Staden
package expects things to be in a predefined place. The root directory
of the structure is referred to by the environment variable
STADENROOT. Below this there should be at least the following:
1) bin/
All executable files and scripts should be in this directory.
$STADENROOT/bin is added to the search path by the script staden.login
(or staden.profile if you are using the Bourne Shell). Though you are
not forced to keep programs here, we find it is the simplest place to
keep them.
2) help/
All on-line help files are in this directory. Files of the form *.MEM
or *.mem are formatted ascii files and can be printed for personal
reference. The script staden.login sets up many environment variables
that refer to files in this directory, as well as modifying
XFILESEARCHPATH, which is used by X programs.
3) manl/
Local manual pages for ted and the staden package are in this directory. The
environment variable MANPATH is modified in staden.login to search
here too.
4) staden.login and staden.profile
These two files are scripts to set up environment variables required
by the Staden package. C Shell users should source staden.login from
their .login file, and Bourne Shell users should "source" staden.profile
from their .profile directory. See "Installing the Staden Package on
SPARCstations and DECstations", Part 3.
5) tables/
Configuration files for the Staden package are in this directory.
Various environment variables are set in staden.login to refer to
files in this directory.
Also of use are the following:
doc/ - Miscellaneous documentation.
userdata/ - Sample databases
src/ - program sources
ReleaseNotes - Notes on this and future releases
Staden_install - Installation script
SequenceLibraries - Notes on the use and installation of sequence libraries
Program Sources
---------------
All the program sources are found in the directories in $STADENROOT/src:
0) Misc/
Sources for a library of useful routines used by the staden package.
** Should be made before the programs in staden/ **
1) staden/
Sources for the Staden suite: mep, xmep, nip, xnip, nipl, pip, xpip,
pipl, sap (now superseded by dap), xsap (now superceded by xdap), sip,
xsip, sipl, dap, xdap, splitp1, splitp2, splitp3, gip and convert_project.
2) ted/
Sources for the trace display and sequence editing program ted.
3) abi/
Sample scripts and programs for handling ABI 373A data files.
4) alf/
Sample scripts and programs for handling Pharmacia A.L.F. data files.
Each directory has appropriate makefiles and README files.
4. Overview of Data Flow During Sequence Assembly
-------------------------------------------------
During a sequence assembly project the data can enter the sequence
assembly program from various routes (See Figure below).
Fluorescent Based
Sequencing Machine
Chromatogram Autoradiogram
ABI 373A Pharmacia A.L.F. |
| | |
| | |
| alfsplit |
| | |
+--------+--------+ |
| |
| |
ted (gip)
| |
+----------------+----------------+
|
|
xdap
Figure 1: Data Flow Through The Staden Suite
The Pharmacia A.L.F. data files in their original format consist of
one file for the (up to 10) samples that were on the gel. The program
alfsplit divides the file up so that each sample is in a file of
its own. From then on each gel reading can be handled individually.
Whether these files can be transferred back to the Compaq for
reprocessing is unknown.
All data from fluorescent based sequencing machines must pass through
the trace editing program ted. Ted allows data vector sequence at the
5' end and unreliable data at the 3' end to be clipped. The sequence
can be edited if desired, though we should stress that this is NOT
RECOMMENDED when used in conjunction with xdap. Ted translates all
Pharmacia A.L.F. uncertainty codes to a hyphen ("-") and outputs the
clipped sequence, along with additional information on the position
and content of cutoffs, to a file.
People wanting to use xdap with ABI and Pharmacia files, but who have
written their own trace clipping software should be aware that xdap
requires information to be passed in the sequence file so that
traces can be displayed. You may want to modify your software to be
compatible with our file format. The file consists of four parts:
1) Cut off information (Optional).
Format is ";%6d%6d%6d%-4s%-16s", where
field 1 = total number of bases called
2 = number of bases in the clipped sequence at the 5' end
3 = number of bases in the sequence in this file
4 = type of trace file.
"ALF " - Pharmacia A.L.F.
"ABI " - ABI 373A
"SCF " - SCF
"PLN " - Text only
5 = name of trace file.
2) Content of the clipped sequence at the 5' end (Optional).
The sequence can extend over several lines. Each line must
begin with ";<" and should be less than 80 characters in
length.
3) Content of the clipped sequence at the 3' end (Optional).
The sequence can extend over several lines. Each line must
begin with ";>" and should be less than 80 characters in
length.
4) Initial tags for the sequence (Optional)
Format is: ";;%4s %6d %6d %s\n", where
field 1 = type of tag to be created (see $STADTABL/TAGDB)
2 = position of tag
3 = length of tag
4 = annotation for tag (optional)
This feature is only available in the program xbap, which
at the time of writing is not yet being distributed with
the package.
5) The sequence, which can extend over several lines. Each
line should be less than 80 characters in length.
Here is a sample file:
; 660 55 450ABI a21d12.s1RES
;<AGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCCGGTTCCTTCTGG
;<ATATC
;>-GATAAGCTGATTTG-TTT-CCATTATGGC-GGTTTGAGCCTC-G-GGTC
;>GACCACTCGGTGTGCCAGGAAGGGGTCTGAAATTGAATGGGTTATCACTA
;>GGCGACGTTT--TTTTCAAATTCCGGGCTAAATTTTACGGC-GGA-CGGT
;>TCCG-
;;COMM 1 10 M13mp18 subclone
CAAGACATTTTGAAATACTTGGAATACTGAATCCAAGATGTGGAACATTA
GACATATCCGTGTGCTCAACAATCGACATTTGATCCACTGATGAAAATGT
TCTTCGTTTAGAATTTCTCATAGCATCAGCCACTTTTGCATAATACTCGA
TTGAAGGTTCATGGAAAAAGCTGCGTAGAAGGCATGTCATTGTGCTTACG
AGCCATTTCGGATATCTTGTGAATTTAGCAGGAAGTTCTGTAACTGGTTG
GAATTCAAATATATCAGTTCTTCTTCCTGGATCTCGTCCTTTTTGCACTA
AAACCATTGCGATTGCATCCGGATTCTGAGTAAGAGCCACTACAGCTTTA
TGATACAGGCTCTTGTTATTCCTTTCGTGCTCGAATGGGAACTTTCCAGT
GGCACAAAAATATAGTGTACATCCCAGAGCCCATAGATCACATGTTCCGA
5. Acknowledgements
We would like to thank Applied Biosystems, Inc. and Pharmacia LKB
Biotechnology for their cooperation in agreeing to our routines
accessing the data files of their fluorescent sequencing machines.
373A sequence data file formats are the exclusive property of Applied
Biosystems, Inc.
ALF sequence data file formats are the exclusive property of Pharmacia
LKB Biotechnology, Inc.