commit f8b7bfff1b9e933830c089d9d7f24a7349326363 Author: starsareintherose Date: Sat Dec 4 05:07:58 2021 +0000 init diff --git a/README.txt b/README.txt new file mode 100644 index 0000000..f8c2986 --- /dev/null +++ b/README.txt @@ -0,0 +1,296 @@ + General Information + (Not for the faint hearted) + + 30 September 1992 + + +0. Introduction +--------------- + +This document contains information on the following subjects: + + 1. Installing the Staden Package on SPARCstations and DECstations + 2. Installing the Staden Package on Other Machines + 3. A Quick Guide to What's on the Release Tape + 4. Overview of Data Flow During Sequence Assembly + 5. Acknowledgements + + + +1. Installing the Staden Package on SPARCstations and DECstations +----------------------------------------------------------------- + +We are endeavouring to make the installation of the Staden Package as +quick and as easy as possible. In this current release we provide +statically linked sparc and mips executables as well as all sources. + +To install the package: + +1) Create a new directory for the software. You may have to log on as +superuser to do this. + + % mkdir -p /home/BioSW/staden + +2) Place the distribution tape in the drive and down load the package: + + -sun- + % tar xvf /dev/rst0 + ...system messages... + + -dec- + % tar xvf /dev/rmt0h + ...system messages... + +3) Users of the C Shell should add the following to his/her .login +file: + + setenv STADENROOT /home/BioSW/staden + source $STADENROOT/staden.login + +Users of the Bourne shell should add the following to their .profile +file: + + STADENROOT=/home/BioSW/staden + export STADENROOT + . $STADENROOT/staden.profile + + +4) When the user next logs onto the work station the required +initialisation will automatically be performed, and the programs in +the Staden package can be run. Refer to the help/*.MEM files for +information on the various program. (eg help on xdap is in +help/DAP.MEM) + + +2. Installing the Staden Package on Other Machines +-------------------------------------------------- + +This is a little more difficult as you will need to remake all the +executables. Your system configuration may also mean that some changes +will need to be made, though hopefully only to makefiles. We provide +a script to aid installation (we hope!), but you may prefer to make +all the components manually. + +To remake the Staden package you will require the following: + 1) A Fortran77 compiler + 2) An ANSI C compiler + 3) X11 Release 4, including the Athena Widget libraries. + +Start by following step 1 through 3 above, to unload the sources and +perform initialisations. Read the rest of this document and the other +help files. Look at the make files. Follow your nose! + +If you have any problems or successes porting our software to other +platforms we would love to hear from you. We would also appreciate +receiving your general comments on the package. + +Rodger Staden (principle author) + phone: +44 223 402389 email: rs@mrc-lmba.cam.ac.uk + post: MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, U.K. +Simon Dear: + phone: +44 223 402266 email: sd@mrc-lmba.cam.ac.uk + post: MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, U.K. +James Bonfield: + phome: +44 223 402499 email: jkb@mrc-lmba.cam.ac.uk + post: MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, U.K. + + + +3. A Quick Guide to What's on the Release Tape +---------------------------------------------- + +The directory structure on this tape is very important. Once set up, the Staden +package expects things to be in a predefined place. The root directory +of the structure is referred to by the environment variable +STADENROOT. Below this there should be at least the following: + +1) bin/ +All executable files and scripts should be in this directory. +$STADENROOT/bin is added to the search path by the script staden.login +(or staden.profile if you are using the Bourne Shell). Though you are +not forced to keep programs here, we find it is the simplest place to +keep them. + +2) help/ +All on-line help files are in this directory. Files of the form *.MEM +or *.mem are formatted ascii files and can be printed for personal +reference. The script staden.login sets up many environment variables +that refer to files in this directory, as well as modifying +XFILESEARCHPATH, which is used by X programs. + +3) manl/ +Local manual pages for ted and the staden package are in this directory. The +environment variable MANPATH is modified in staden.login to search +here too. + +4) staden.login and staden.profile +These two files are scripts to set up environment variables required +by the Staden package. C Shell users should source staden.login from +their .login file, and Bourne Shell users should "source" staden.profile +from their .profile directory. See "Installing the Staden Package on +SPARCstations and DECstations", Part 3. + +5) tables/ +Configuration files for the Staden package are in this directory. +Various environment variables are set in staden.login to refer to +files in this directory. + +Also of use are the following: + +doc/ - Miscellaneous documentation. +userdata/ - Sample databases +src/ - program sources +ReleaseNotes - Notes on this and future releases +Staden_install - Installation script +SequenceLibraries - Notes on the use and installation of sequence libraries + + +Program Sources +--------------- + +All the program sources are found in the directories in $STADENROOT/src: + +0) Misc/ +Sources for a library of useful routines used by the staden package. +** Should be made before the programs in staden/ ** + +1) staden/ +Sources for the Staden suite: mep, xmep, nip, xnip, nipl, pip, xpip, +pipl, sap (now superseded by dap), xsap (now superceded by xdap), sip, +xsip, sipl, dap, xdap, splitp1, splitp2, splitp3, gip and convert_project. + +2) ted/ +Sources for the trace display and sequence editing program ted. + +3) abi/ +Sample scripts and programs for handling ABI 373A data files. + +4) alf/ +Sample scripts and programs for handling Pharmacia A.L.F. data files. + +Each directory has appropriate makefiles and README files. + + + +4. Overview of Data Flow During Sequence Assembly +------------------------------------------------- + +During a sequence assembly project the data can enter the sequence +assembly program from various routes (See Figure below). + + + + Fluorescent Based + Sequencing Machine + Chromatogram Autoradiogram + + ABI 373A Pharmacia A.L.F. | + | | | + | | | + | alfsplit | + | | | + +--------+--------+ | + | | + | | + ted (gip) + | | + +----------------+----------------+ + | + | + xdap + + + Figure 1: Data Flow Through The Staden Suite + + +The Pharmacia A.L.F. data files in their original format consist of +one file for the (up to 10) samples that were on the gel. The program +alfsplit divides the file up so that each sample is in a file of +its own. From then on each gel reading can be handled individually. +Whether these files can be transferred back to the Compaq for +reprocessing is unknown. + +All data from fluorescent based sequencing machines must pass through +the trace editing program ted. Ted allows data vector sequence at the +5' end and unreliable data at the 3' end to be clipped. The sequence +can be edited if desired, though we should stress that this is NOT +RECOMMENDED when used in conjunction with xdap. Ted translates all +Pharmacia A.L.F. uncertainty codes to a hyphen ("-") and outputs the +clipped sequence, along with additional information on the position +and content of cutoffs, to a file. + +People wanting to use xdap with ABI and Pharmacia files, but who have +written their own trace clipping software should be aware that xdap +requires information to be passed in the sequence file so that +traces can be displayed. You may want to modify your software to be +compatible with our file format. The file consists of four parts: + + 1) Cut off information (Optional). + Format is ";%6d%6d%6d%-4s%-16s", where + field 1 = total number of bases called + 2 = number of bases in the clipped sequence at the 5' end + 3 = number of bases in the sequence in this file + 4 = type of trace file. + "ALF " - Pharmacia A.L.F. + "ABI " - ABI 373A + "SCF " - SCF + "PLN " - Text only + 5 = name of trace file. + + 2) Content of the clipped sequence at the 5' end (Optional). + The sequence can extend over several lines. Each line must + begin with ";<" and should be less than 80 characters in + length. + + 3) Content of the clipped sequence at the 3' end (Optional). + The sequence can extend over several lines. Each line must + begin with ";>" and should be less than 80 characters in + length. + + 4) Initial tags for the sequence (Optional) + Format is: ";;%4s %6d %6d %s\n", where + field 1 = type of tag to be created (see $STADTABL/TAGDB) + 2 = position of tag + 3 = length of tag + 4 = annotation for tag (optional) + This feature is only available in the program xbap, which + at the time of writing is not yet being distributed with + the package. + + 5) The sequence, which can extend over several lines. Each + line should be less than 80 characters in length. + +Here is a sample file: + +; 660 55 450ABI a21d12.s1RES +;-GATAAGCTGATTTG-TTT-CCATTATGGC-GGTTTGAGCCTC-G-GGTC +;>GACCACTCGGTGTGCCAGGAAGGGGTCTGAAATTGAATGGGTTATCACTA +;>GGCGACGTTT--TTTTCAAATTCCGGGCTAAATTTTACGGC-GGA-CGGT +;>TCCG- +;;COMM 1 10 M13mp18 subclone +CAAGACATTTTGAAATACTTGGAATACTGAATCCAAGATGTGGAACATTA +GACATATCCGTGTGCTCAACAATCGACATTTGATCCACTGATGAAAATGT +TCTTCGTTTAGAATTTCTCATAGCATCAGCCACTTTTGCATAATACTCGA +TTGAAGGTTCATGGAAAAAGCTGCGTAGAAGGCATGTCATTGTGCTTACG +AGCCATTTCGGATATCTTGTGAATTTAGCAGGAAGTTCTGTAACTGGTTG +GAATTCAAATATATCAGTTCTTCTTCCTGGATCTCGTCCTTTTTGCACTA +AAACCATTGCGATTGCATCCGGATTCTGAGTAAGAGCCACTACAGCTTTA +TGATACAGGCTCTTGTTATTCCTTTCGTGCTCGAATGGGAACTTTCCAGT +GGCACAAAAATATAGTGTACATCCCAGAGCCCATAGATCACATGTTCCGA + + + +5. Acknowledgements + +We would like to thank Applied Biosystems, Inc. and Pharmacia LKB +Biotechnology for their cooperation in agreeing to our routines +accessing the data files of their fluorescent sequencing machines. + +373A sequence data file formats are the exclusive property of Applied +Biosystems, Inc. + +ALF sequence data file formats are the exclusive property of Pharmacia +LKB Biotechnology, Inc. + diff --git a/ReleaseNotes b/ReleaseNotes new file mode 100644 index 0000000..896cae1 --- /dev/null +++ b/ReleaseNotes @@ -0,0 +1,190 @@ + Release Notes for Staden Package 1992.3 + --------------------------------------- + + + Installation guide + ------------------ + +The file doc/install.PS contain installation instructions. + + + Manual for the Staden Package + ----------------------------- + +There is now a 135 page manual on the Staden Package. It is currently +being distributed on a Word4 document on a Macintosh floppy disk. + + + Feedback and bug reports + ------------------------ + +We welcome comments and suggestions on all aspects of the package and are +best contacted by email: rs@uk.ac.cam.mrc-lmb and sd@uk.ac.cam.mrc-lmb. +All abnormal terminations are bugs and we would like to be told of them +so they can be fixed. We recommend that you request an update at least once +a year as the package is evolving very rapidly. + +Note due to popular demand we have decided to release new routines earlier +than in the past so please report bugs. The documentation for additions may +be sparser than before, or non-existent, but if there is something with which +you need help, email us. + + + Changes this release + -------------------- + + + The assembly programs bap and xbap heve several new functions: + 1. Find single stranded regions and try to fill them with "hidden" + data from the adjacent readings. + 2. Find single stranded regions (includes ends of contigs) and + select primers and templates for double stranding them (joining + them). + 3. Pre assembly screening for readings to find those that align + best. Optionally the hidden data can also be included in the + comparison (part of assembly function). + 4. Find pairs of readings taken from opposite ends of the same + template (ie forward and reverse read pairs). List or plot their + positions. + 5. A new function to check that readings have been assembled into + the correct positions. It aligns the hidden (previously termed "unused") + parts of readings with the consensus they overlap to see how well + they align. Poor alignments are reported. + 6. During assembly each reading is now allowed to match up to 100 + different places. + + It might be guessed from the above that we are trying to improve our + ability to deal with the assembly of human data. Hence, also the next + addition. + + A new experimental program (rep) for screening readings for Alu + sequences prior to assembly. The Alu containing segments are tagged + so they can be seen in the contig editor. A library of Alu sequences + is included in /tables/alus. The program is quite slow as it compares + each reading in both orientations with all of the Alu sequences (126 + of them) in order to find the best match. Only time and more data will + tell how sensitive it is, and whether the current default score 0f 0.6 + is "correct". BEWARE rep modifies the original reading files to include + the tag information. The only information is in /help/alu.help + + A new program for extracting sets of sequences and their annotations + from the sequence libraries (lip). The only information is in + /help/lip.help + + Changes to the xterm userinterface. These routines have been completely + rewritten. One addition is that now ?? in response to a question will + allow the user to get help on any function in a program. help is also + improved in the x version. + + + Changes last release + -------------------- + + + DAP, XDAP have been replaced by BAP and XBAP (see below) + + A new function for examining repeats has been added to NIP + + A new repeat search has been added to SIP + + Some outputs have been changed to produce FASTA format files + instead of PIR. + + MEP now allows searches for motifs in which any 8 out of a string + of 20 can be switched on. + + The manual has been updated. + + Keyword and author searches on sequence libraries + + All programs that use the libraries can now perform author +and keyword searches on all libraries (only nip did so before). + + Postscript output + + All graphics can now be saved to disk in postscript form by +use of a sub-option in "Redirect output". + + + + Sequence assembly + +BAP, XBAP replace DAP and XDAP. A program to convert DAP databases to BAP +databases (convert) is included. BAP databases can contain up to 8000 readings +and a consensus of 500,000 bases. A minor edit and recompilation will allow +up to 99,999 readings. The space is used more efficiently now as the databases +grow as the number of readings increases. Reading names can be 16 characters +in length. In addition: + +1) Assembly is 4 times as fast as in the DAP. + +2) Find internal joins is 5 times as fast and now brings up the join editor +with the two contigs in the correct orientation and aligned. + +3) The assembly routines align pads better, plus a new automatic function can +also be used to align them prior to editing. + +4) The contig editor has been greatly speeded up and its functionality +has been enhanced. + +5) A routine for selecting oligos for primer walking is included. + +6) A new routine allows batches of readings to be removed from a database. + +7) We have also included routines for making SCF files, for getting the +sequence from SCF files, and one for marking the poor quality data in +readings. See the manual. + + Sequence library formats + + The standard sequence library indexing method is now that used on the +EMBL CD-ROM. The libraries (EMBL nucleotide and SWISSPROT protein) can be +left on the CD-ROM or copied to disk. We include in the package programs +for creating this type of index for EMBL updates, PIR in codata format, +NRL3D and GenBank. If the indexes are created all programs can read all +these libraries. Programs and scripts for this task are contained in the +directory indexseqlibs. + The keyword and author searches are particularly fast and the +keyword index is based on ALL text in the files - not just the keywords. + + Feature table formats + + The programs now use the new feature table format common to EMBL +and GenBank, but retain the old format for SWISSPROT which has not yet +changed. + + For details of the above see file SequenceLibraries. + + Pattern searches + + Pipl and Nipl now have the facility to find only the best scoring +match for each sequence. The prompt is "? report all matches", so typing +only return means all matches will be shown and typing n means only the +highest scoring will be reported. It is particularly useful when employed +to create alignments. The corresponding help file has not been updated. +Also to incorporate long unix file names the pattern files no longer include +the annotation "filename". + + + Nip + + Option 38 in nip "translate and list" has been removed as the the +more flexible routines of option 39 incorporate all its functionality. Many +options that relate to feature tables have been modified but their help files +are not yet up to date. + + + Vep + + A program (vep) for automatic excising of vector (either +sequencing vector or cosmid vector) sequences from readings is now +included in the package. + + + + + Rodger Staden, Simon Dear, James Bonfield + + + + diff --git a/SequenceLibraries b/SequenceLibraries new file mode 100644 index 0000000..5f22ff9 --- /dev/null +++ b/SequenceLibraries @@ -0,0 +1,420 @@ + Notes on library handling + ------------------------- + +Contents of this document: + +I) Introduction +II) Details of file organisation and use +III) Options currently available +IV) Installation guide +V) New feature table handling routines +VI) Indexing the sequence libraries + + + Section I Introduction + ---------------------- + +Available sequence libraries + +There are a number of different sequence libraries for nucleotide and protein: +PIR, GenBank, EMBL, Swissprot, and the Japanese Databank. Even after all the +years of their existence they still use different formats for their data. This +provides tedious and unrewarding work for software developers. Recently EMBL +and GenBank agreed a new and common way of writing their feature tables, which +is great help, although the rest of their format is different. Swissprot still +uses the old embl style feature table format and PIR yet another. + +All the libraries distribute their data on magnetic tapes and EMBL and GenBank +have started to distribute on cdrom. The EMBL cdrom also contains Swissprot. +The GenBank and EMBL cdroms use different formats and have different contents. +The EMBL cdrom has useful indexes sorted alphabetically: those for entry name +and accession number, brief descriptions, keywords and freetext indexes are +already available and others are expected. These indexes point to the data for +each entry, and can be used to extract the data for any entry quickly. + +Moving to unix + +The VAX version of our package used PIR format which meant reformatting all +libraries other than PIR into that format. This required, at least +temporarily, having space for two copies of the libraries, and quite a lot of +cpu time. The software for doing this was provided by PIR, and is very VAX +specific and hence will not run under unix. For the unix version of our package +I have decided to use the EMBL cdrom format and its indexes as the primary +format. The current programs also support the use of PIR format libraries +without indexes - ie just the sequence and annotation files. + +Indexing GenBank, EMBL updates, PIR and NRL3D + +We include programs to create indexes for the above libraries. See below and +the README file in indexseqlibs. The programs can read all the above libraries +once the indexes are created. The indexing programs index the data in its +distributed form: WE DO NOT REFORMAT OR COPY THE LIBRARIES but simply create +indexes to the original files. Obviously this saves a lot of disk space, and +for those content to use only embl and swissprot from the cdrom, almost no disk +space is required. We havent tried it yet, but for genbank on cdrom, the only +extra disk space required would be for the indexes. + + --------------------------------------------------------------------------- + + Section II Details of file organisation and use + ----------------------------------------------- + +The following strategy has been used to try to deal with alternate +and changing sequence library formats. + +1) libraries are described at several levels: + + a) the top level file is a list of available libraries which contains: + the library type, the name of the file containing the name of + each libraries individual files, and the prompt to appear on + the users screen: LTYPE LOGNAM PROMPT + + b) the file containing the names of the libraries individual files + contains flags to define the file types: FTYPE LOGNAM + + c) the individual library files + + + +2) libary types handled: + + a) EMBL/SWISSPROT in distributed format with cdrom index format + LTYPE = 'A' + b) GenBank in distributed format with cdrom index format LTYPE = 'C' + c) PIR/NRL3D in CODATA format with cdrom index format LTYPE = 'B' + d) PIR/NBRF .seq files can be read sequentially as "personal files + in PIR format" and do not appear in the list of available libraries. + e) FASTA format files can be read sequentially as "personal files + in FASTA format" and do not appear in the list of available + libraries. + +3) EMBL, SWISSPROT and other libraries for which EMBL-style indexes have been +created + + current file types: + + A division.lookup + B entryname.index + C accession.target + D accession.hits + E brief description + F freetext.target + G freetext.hits + H author.target + I author.hits + + + Library list +level 1 + | + | + ----------------------------------------------------------- + | | | + lib 1 file list lib 2 file list lib 3 file list +level 2 + | | + -------- --------- +level 3 + file 1 file 1 + file 2 file 2 + . . + file n file n + + --------------------------------------------------------------------------- + + +Example +------- + +Level 1 + + File name: sequence.libs + Environment variable: SEQUENCELIBRARIES + Contents: + +A EMBLFILES EMBL nucleotide library ! in cdrom format +C GENBFILES GenBank nucleotide library! +A SWISSFILES SWISSPROT protein library! in cdrom format +B PIRFILES PIR protein library! +B NRL3DFILES NRL3D protein library! + + Notes: + +The libraries have types A,B,C. The logical names are EMBLLIBDESCRP and +SWISSLIBDESCRP, etc and the prompts are 'EMBL nucleotide library' and +'SWISSPROT protein library', etc. Anything to the right of a ! is a comment. + +Level 2: the list of library files (using embl as an example) + + File name: embl.files + Environment variable: EMBLFILES + Contents: + +A EMBLDIVPATH/embl_div.lkp +B EMBLINDPATH/entrynam.idx +C EMBLINDPATH/acnum.trg +D EMBLINDPATH/acnum.hit +E EMBLINDPATH/brief.idx +F EMBLINDPATH/freetext.trg +G EMBLINDPATH/freetext.hit +H EMBLINDPATH/author.trg +I EMBLINDPATH/author.hit + + +Level 3: the sequence and annotation files (eg 15 for embl, 1 for swissprot). + + Paths and file names: + + EMBLPATH/bb.dat + EMBLPATH/fun.dat + EMBLPATH/inv.dat + EMBLPATH/mam.dat + EMBLPATH/org.dat + EMBLPATH/patent.dat + EMBLPATH/phg.dat + EMBLPATH/pln.dat + EMBLPATH/pri.dat + EMBLPATH/pro.dat + EMBLPATH/rod.dat + EMBLPATH/syn.dat + EMBLPATH/una.dat + EMBLPATH/vrl.dat + EMBLPATH/vrt.dat + +All files from the division lookup file down are exactly as they appear on the +cdrom. The division lookup file relates numbers stored in the indexes to +actual division (or data) files stored on the disk. We rewrite it so the +directory structure and file names can be chosen locally. Its format is +I6,1x,A. An example is given below. + + Division lookup file + + File name: STADTABL/embl_div.lkp + Environment variable path EMBLDIVPATH + Contents: + + 1 EMBLPATH/bb.dat + 2 EMBLPATH/fun.dat + 3 EMBLPATH/inv.dat + 4 EMBLPATH/mam.dat + 5 EMBLPATH/org.dat + 6 EMBLPATH/patent.dat + 7 EMBLPATH/phg.dat + 8 EMBLPATH/pln.dat + 9 EMBLPATH/pri.dat + 10 EMBLPATH/pro.dat + 11 EMBLPATH/rod.dat + 12 EMBLPATH/syn.dat + 13 EMBLPATH/una.dat + 14 EMBLPATH/vrl.dat + 15 EMBLPATH/vrt.dat + --------------------------------------------------------------------------- + + + Section III Options currently available + --------------------------------------- + +Facilities currently offered in nip,pip,sip,nipl,pipl,sipl: + + Get a sequence by knowing its entry name + Get a sequences' annotation by knowing its entry name + Get an entry name by knowing its accession number + Search the freetext index + Search the author index + +Facilities currently offered in nipl,pipl,sipl: + + Search whole library + Search only a list of entry names + Search all but a list of entry names + +Outline of each type of operation + +Looking for an entry by name: the programs will open the library description +file and read the names of its files and their file types. Then they will open +the entrynam.idx file, and find the sequence offset, annotation offset and +division number. Then open the division lookup file, find the file name for the +division required, open that file, seek to the required byte and get the data. + +Looking for an entry by accession number: the programs will open the library +description file and read the names of its files and their file types. Then +they open the acnum.trg and acnum.hit files. The acnum.trg file is read to find +the accession number and a pointer to the acnum.hit file and the number of +hits. That file is read and the corresponding entry names displayed. At +present no further action is performed, although I expect to list out the +titles for the entries found. + +Searching the whole of a library: the programs will open the library +description file and read the names of its files and their file types. Then +they open the division lookup file, read the names and numbers of the sequence +files, open all of them, then open the entryname file. Then the library is +processed sequentially by reading the entry names, their sequence offsets and +division numbers from the entry names file, and then the sequence from the +appropriate data file. + +Searching the whole of a library using a list of entry names to include: the +programs will open the library description file and read the names of its files +and their file types. Then they open the division lookup file, read the names +and numbers of the sequence files, open all of them, then open the entryname +file. Then the library is processed by reading the list of entry names and +finding the names in the entry names file to get their sequence offsets and +division numbers, and then the sequence from the appropriate data file. It will +stop when it reaches the end of the list of entry names. The list of entry +names can be in any order. + +Searching the whole of a library using a list of entry names to exclude: the +programs will open the library description file and read the names of its files +and their file types. Then they open the division lookup file, read the names +and numbers of the sequence files, open all of them, then open the entryname +file. Then the library is processed sequentially by reading the list of entry +names, reading the next entry in the entry names file to make sure it does not +match, then getting the sequence offsets and division numbers, and then the +sequence from the appropriate data file. If a the next name matches the name on +the list of entry names, it will be skipped, and the next name to exclude read. +If the list of excluded names is finished the rest of the library is searched +sequentially. The list of entry names must be in the same order as those in the +library (ie sorted alphabetically). + +Searching a whole library using a PIR format file is performed by reading it +sequentially. If as list of entry names is used it must be in the same order as +the entries in the library file. + --------------------------------------------------------------------------- + + + + + Section IV Installation guide + ----------------------------- + +EMBL CDROM + + The data can be left on the cdrom or copied to hard disk. The files +staden.login and staden.profile source the file $STADTABL/libraries.config.csh +and $STADTABL/libraries.config.sh respectively. Refer to this file to see what +is required to install, add or move a sequence library that you want to be used +by the programs. + +Other libraries (PIR, Genbank, EMBL updates) + +Create the indexes then edit the files that tell the programs where the data is +stored. The files staden.login and staden.profile source the file +$STADTABL/libraries.config Refer to this file to see what is required to +install, add or move a sequence library that you want to be used by the +programs. + + +------------------------------------------------------------------------------ + + + Section V New feature table handling facilities + ----------------------------------------------- + +As mentioned above EMBL and GenBank have recently introduced new feature tables +for annotating the sequences. They are a great improvement on the previous ones +and, among other things, now permit correct translation of spliced genes. +Various options within nip have been added or modified to take advantage of +these changes. The routine to translate DNA to protein and write the protein +to disk now gives correct results for spliced genes. The routine to translate +DNA to protein and display the two together now gives correct translations +except for the amino acids spanning intron/exon junctions. The routine to plot +maps from feature tables can use the new style. The open reading frame finding +routine writes out its results in the new style. The routine that finds open +reading frames and writes their translations to disk also writes a title in the +form of a new style feature table entry. The feature table format output from +the pattern searches in nip also uses the new style. + + + +---------------------------------------------------------------------------- + + Section VI Indexing the sequence libraries + -------------------------------------------- + +We handle EMBL, SwissProt, and GenBank in their distributed format, plus +PIR and NRL3D in codata format. All programs and scripts are in directory +indexseqlibs. + +Currently we produce entryname index, accession number index freetext index, +and brief index (brief index contains the entry name the primary accession +number the sequence length and an 80 character description). + +To produce any of the indexes requires the creation of several intermediate +files and the indexing programs are written so that the intermediate files +are the same for all libraries. This means that only the programs that read +the distributed form of each library need to be unique to that library, and +all the other processing programs can be used for all libraries. + + +However even the though the indexes have the same format, programs (like nip) +that read the libraries need to treat each library separately because their +actual contents are written differently. + +Making the entry name index +--------------------------- + +Common program entryname2 + +EMBL emblentryname1 +SwissProt emblentryname1 + +GenBank genbentryname1 + +PIR pirentryname1 +NRL3D pirentryname1 + + +Making the accession number index +--------------------------------- + +Common programs access2 access3 access4 + +EMBL emblaccess1 +SwissProt emblaccess1 + +GenBank genbaccess1 + +PIR piraccess1 piraccess2 +NRL3D No accession numbers + +Making the brief index +---------------------- + +Common program title2 + +EMBL embltitle1 +SwissProt embltitle1 + +GenBank genbtitle1 + +PIR pirtitle1 pirtitle2 (pir3 has no accession numbers) +NRL3D pirtitle2 + +Scripts +------- + +emblentryname.script +emblaccession.script +embltitle.script + +swissentryname.script +swissaccession.script +swisstitle.script + +genbentrynamescript +genbaccession.script +genbtitle.script + +pirentryname.script +piraccession.script +pirtitle.script + +nrl3dentryname.script +nrl3dtitle.script + + + + + + + + diff --git a/Staden_install-alpha b/Staden_install-alpha new file mode 100644 index 0000000..34a9b69 --- /dev/null +++ b/Staden_install-alpha @@ -0,0 +1,453 @@ +#! /bin/csh -f +# +# staden_install - version 2.4 +# +# This is a prototype installation program. +# +# 9 March 1992 +# Modified for installation on Sun, Alliant, etc +# No longer install 2rs +# +# 20 November 1992 +# Now includes convert, cop, frog, getMCH and scf +# +# 25 November 1992 +# SGI supported +# +# 19 May 1993 +# DEC Alpha, Solaris supported +# +# Written by sd@uk.ac.cam.mrc-lmb +# + +# prelim +set prog = $0 ; set prog = $prog:t + +# Machines supported: al sun dec sgi alpha solaris +#set MACHINE = `echo $prog | sed 's/.*-//'` +set MACHINE = alpha + +# For local (MRC-LMB) setup only +#set LOCAL = `echo $prog | awk '/local/{print "YES";exit;}{print "NO";}'` +set LOCAL = NO + + +echo "" +echo -n "Staden Package installation procedure - " +switch (${MACHINE}) + case "al": + echo "Alliant FX/2800 Concentrix version" + set MAKE = "make -sk" + breaksw + case "sun": + echo "SunOS version" + set MAKE = "make -sk" + breaksw + case "dec": + echo "DEC Ultrix (mips) version" + set MAKE = "gmake -sk" + breaksw + case "sgi": + echo "Silicon Graphics Iris version" + set MAKE = "gmake -sk" + breaksw + case "alpha": + echo "DEC Alpha OSF/1 version" + set MAKE = "gmake -sk" + breaksw + case "solaris": + echo "Solaris version" + set MAKE = "make -sk" + breaksw + default: + echo "Panic. Unknown version" + exit 1 +endsw +echo "" +echo "* starting initialization...please wait." +echo "" + +# Binary fork of source directory +if ($LOCAL == "YES") then + set DIR_BINARIES = ${MACHINE}-binaries + set DIR_PROGS = ${MACHINE}-bin +else + set DIR_BINARIES = . + set DIR_PROGS = bin + set MAKE = "$MAKE -f makefile-${MACHINE}" +endif + +init: +# Set useful shell variables +set YES="YES"; +set NO="NO" + +# set/unset some .cshrc envs. +unset noclobber +set noglob + +# set interrupt trap +onintr end_failure + +# Make dir command +set MKDIR = "mkdir" + +# Copy command +set CP = "cp -p" + +# Install command +#set INSTALL = "install" +#set INSTALL = "mv" +set INSTALL = "cp" + +# Set up default responses +set DEF_STADEN_ROOT = `pwd` + +set DEF_REQ_NONX = "$YES" +set DEF_REQ_X = "$YES" +set DEF_REQ_TED = "$YES" +set DEF_REQ_MISC = "$YES" + +# directories +set DIR_SRC = $DEF_STADEN_ROOT/src +set DIR_BIN = $DEF_STADEN_ROOT/$DIR_PROGS +set DIR_MISC = $DIR_SRC/Misc +set DIR_STADEN = $DIR_SRC/staden +set DIR_TED = $DIR_SRC/ted +set DIR_ABI = $DIR_SRC/abi +set DIR_ALF = $DIR_SRC/alf +set DIR_BAP = $DIR_SRC/bap +set DIR_OSP = $DIR_SRC/bap/osp-bits +set DIR_CONVERT = $DIR_SRC/convert +set DIR_COP = $DIR_SRC/cop +set DIR_FROG = $DIR_SRC/frog +set DIR_GETMCH = $DIR_SRC/getMCH +set DIR_SCF = $DIR_SRC/scf + + +main: + + +preamble: + echo "" + echo "" + echo "* Please answer the following questions." + echo " Default answers to questions are given in square brackets." + echo " If you require help at any stage respond with a ? to the question." + echo "" + +ask_staden_root: + set ANS_STADEN_ROOT = $DEF_STADEN_ROOT + +ask_require_nonx_progs: + echo -n "Compile all the non-X programs in the Staden Package [$DEF_REQ_NONX]? " + set ANS_REQ_NONX = $< + if ("$ANS_REQ_NONX" == "?") then + echo "* If you do not have X windows on your system you will require" + echo " these. However, you will require Tektronics terminal emulation." + echo " If you do not require all of the non-X programs, you should abort" + echo " and manually make the ones you require." + echo "" + goto ask_require_nonx_progs + else if ("$ANS_REQ_NONX" != "") then + if ("$ANS_REQ_NONX" =~ [yY]*) then + set ANS_REQ_NONX=$YES + else if ("$ANS_REQ_NONX" =~ [nN]*) then + set ANS_REQ_NONX=$NO + else + goto ask_require_nonx_progs + endif + else + set ANS_REQ_NONX=$DEF_REQ_NONX + endif + +ask_require_x_progs: + echo -n "Compile all the X programs in the Staden Package [$DEF_REQ_X]? " + set ANS_REQ_X = $< + if ("$ANS_REQ_X" == "?") then + echo "* These are the programs that require X windows." + echo " If you do not require all of the X programs, you should abort" + echo " and manually make the ones you require." + + echo "" + goto ask_require_x_progs + else if ("$ANS_REQ_X" != "") then + if ("$ANS_REQ_X" =~ [yY]*) then + set ANS_REQ_X=$YES + else if ("$ANS_REQ_X" =~ [nN]*) then + set ANS_REQ_X=$NO + else + goto ask_require_nonx_progs + endif + else + set ANS_REQ_X=$DEF_REQ_X + endif + + +ask_require_ted: + echo -n "Compile the trace editing program ted [$DEF_REQ_TED]? " + set ANS_REQ_TED = $< + if ("$ANS_REQ_TED" == "?") then + echo "* This is the trace editor program. It allows you to look at" + echo " traces obtained from automated fluorescent sequencing machines." + echo "" + goto ask_require_ted + else if ("$ANS_REQ_TED" != "") then + if ("$ANS_REQ_TED" =~ [yY]*) then + set ANS_REQ_TED=$YES + else if ("$ANS_REQ_TED" =~ [nN]*) then + set ANS_REQ_TED=$NO + else + goto ask_require_ted + endif + else + set ANS_REQ_TED=$DEF_REQ_TED + endif + + + +ask_require_misc: + echo -n "Compile other programs [$DEF_REQ_MISC]? " + set ANS_REQ_MISC = $< + if ("$ANS_REQ_MISC" == "?") then + echo "* Other programs include:" + echo " alfsplit" + echo " getABISampleName" + echo "" + goto ask_require_misc + else if ("$ANS_REQ_MISC" != "") then + if ("$ANS_REQ_MISC" =~ [yY]*) then + set ANS_REQ_MISC=$YES + else if ("$ANS_REQ_MISC" =~ [nN]*) then + set ANS_REQ_MISC=$NO + else + goto ask_require_misc + endif + else + set ANS_REQ_MISC=$DEF_REQ_MISC + endif + + + +time_taken_warning: + echo "" + echo "The installation procedure is now ready to start." + echo "" + echo "**** Warning:" + echo " The installation will take considerable time to complete. If you" + echo " are installing the whole Staden Package from scratch it could" + echo " take as long as an hour for all exectuables to be compiled and" + echo " installed." + echo "" + +ask_goahead: + echo -n "Proceed with the installation [YES]? " + set ANSWER=$< + if ("$ANSWER" == "?") then + echo "* Final confirmation to proceed with the installation. Answer" + echo " YES to proceed; otherwise, answer NO to abort the installation." + echo "" + goto ask_goahead + else if ("$ANSWER" != "") then + if ("$ANSWER" =~ [nN]*) then + goto chickens_exit + else if ("$ANSWER" !~ [yY]*) then + goto ask_goahead + endif + endif + +installation_proper: + +# make binaries directory if it doesn't exist + + if (! -d $DIR_BIN) then + $MKDIR $DIR_BIN + endif + + if ("$ANS_REQ_MISC" == "$YES" || "$ANS_REQ_X" == "$YES" || "$ANS_REQ_NONX" == "$YES" ) then + echo "" + echo "+ Compiling miscellaneous library" + + pushd $DIR_MISC > /dev/null + + cd $DIR_BINARIES + $MAKE all + + popd > /dev/null + + endif + + if ("$ANS_REQ_NONX" == "$YES") then + echo "" + echo "+ Installing non X programs" + + pushd $DIR_STADEN > /dev/null + cd $DIR_BINARIES + $MAKE nprogs lprogs + $INSTALL mep $DIR_BIN + $INSTALL nip $DIR_BIN + $INSTALL pip $DIR_BIN + $INSTALL sap $DIR_BIN + $INSTALL sapf $DIR_BIN + $INSTALL sip $DIR_BIN + $INSTALL splitp1 $DIR_BIN + $INSTALL splitp2 $DIR_BIN + $INSTALL splitp3 $DIR_BIN + $INSTALL sethelp $DIR_BIN + $INSTALL gip $DIR_BIN + $INSTALL nipl $DIR_BIN + $INSTALL pipl $DIR_BIN + $INSTALL sipl $DIR_BIN + $INSTALL dap $DIR_BIN + $INSTALL nipf $DIR_BIN + $INSTALL vep $DIR_BIN + $INSTALL rep $DIR_BIN + $INSTALL lip $DIR_BIN + #$INSTALL convert_project $DIR_BIN + popd > /dev/null + + pushd $DIR_OSP > /dev/null + cd $DIR_BINARIES + $MAKE + popd > /dev/null + + pushd $DIR_BAP > /dev/null + cd $DIR_BINARIES + $MAKE bap + $INSTALL bap $DIR_BIN + popd > /dev/null + + endif + + if ("$ANS_REQ_TED" == "$YES") then + echo "" + echo "+ Installing Trace editor" + + pushd $DIR_TED > /dev/null + cd $DIR_BINARIES + $MAKE ted + $INSTALL ted $DIR_BIN + popd > /dev/null + endif + + if ("$ANS_REQ_X" == "$YES") then + echo "" + echo "+ Installing X programs" + + pushd $DIR_STADEN > /dev/null + cd $DIR_BINARIES + $MAKE xprogs + $INSTALL xmep $DIR_BIN + $INSTALL xnip $DIR_BIN + $INSTALL xpip $DIR_BIN + $INSTALL xsap $DIR_BIN + $INSTALL xsip $DIR_BIN + $INSTALL xdap $DIR_BIN + popd > /dev/null + + pushd $DIR_OSP > /dev/null + cd $DIR_BINARIES + $MAKE + popd > /dev/null + + pushd $DIR_BAP > /dev/null + cd $DIR_BINARIES + $MAKE xbap + $INSTALL xbap $DIR_BIN + popd > /dev/null + + + endif + + if ("$ANS_REQ_MISC" == "$YES") then + echo "" + echo "+ Installing miscellaneous programs" + + pushd $DIR_ABI > /dev/null + cd $DIR_BINARIES + $MAKE all + $INSTALL getABISampleName $DIR_BIN + popd > /dev/null + + pushd $DIR_ALF > /dev/null + cd $DIR_BINARIES + $MAKE alfsplit + $INSTALL alfsplit $DIR_BIN + popd > /dev/null + + pushd $DIR_CONVERT > /dev/null + cd $DIR_BINARIES + $MAKE convert + $INSTALL convert $DIR_BIN + popd > /dev/null + + pushd $DIR_COP > /dev/null + cd $DIR_BINARIES + $MAKE all + $INSTALL cop $DIR_BIN + $INSTALL cop-bap $DIR_BIN + popd > /dev/null + + pushd $DIR_FROG > /dev/null + cd $DIR_BINARIES + $MAKE frog + $INSTALL frog $DIR_BIN + popd > /dev/null + + pushd $DIR_GETMCH > /dev/null + cd $DIR_BINARIES + $MAKE trace2seq + $INSTALL trace2seq $DIR_BIN + popd > /dev/null + + pushd $DIR_SCF > /dev/null + cd $DIR_BINARIES + $MAKE makeSCF + $INSTALL makeSCF $DIR_BIN + popd > /dev/null + + + + endif + + +installation_done: + echo "" + echo "+ Installation completed" + echo "" + + echo " Some further initialisation is required in order to use the" + echo " package. csh users should insert the following in their .login" + echo " files:" + echo " " + echo " setenv STADENROOT $ANS_STADEN_ROOT" + echo ' source $STADENROOT/staden.login' + echo " " + echo " Users of the Bourne shell, sh, should insert the following in" + echo " their .profile:" + echo " " + echo " STADENROOT=$ANS_STADEN_ROOT" + echo " export STADENROOT" + echo ' . $STADENROOT/staden.profile' + echo " " + echo " These initialisations will alter the shell's search path so that" + echo " it can find the programs in the STADEN Package" + echo " " + +normal_exit: + exit 0 + +chickens_exit: + echo "" + echo "+ Installation cancelled" + echo "" + + exit 0 + +end_failure: + unset noglob + echo "" + echo "Aborted STADEN Package installation on `date`" + echo "" + exit 1 + diff --git a/Staden_install-dec b/Staden_install-dec new file mode 100644 index 0000000..9b06240 --- /dev/null +++ b/Staden_install-dec @@ -0,0 +1,453 @@ +#! /bin/csh -f +# +# staden_install - version 2.4 +# +# This is a prototype installation program. +# +# 9 March 1992 +# Modified for installation on Sun, Alliant, etc +# No longer install 2rs +# +# 20 November 1992 +# Now includes convert, cop, frog, getMCH and scf +# +# 25 November 1992 +# SGI supported +# +# 19 May 1993 +# DEC Alpha, Solaris supported +# +# Written by sd@uk.ac.cam.mrc-lmb +# + +# prelim +set prog = $0 ; set prog = $prog:t + +# Machines supported: al sun dec sgi alpha solaris +#set MACHINE = `echo $prog | sed 's/.*-//'` +set MACHINE = dec + +# For local (MRC-LMB) setup only +#set LOCAL = `echo $prog | awk '/local/{print "YES";exit;}{print "NO";}'` +set LOCAL = NO + + +echo "" +echo -n "Staden Package installation procedure - " +switch (${MACHINE}) + case "al": + echo "Alliant FX/2800 Concentrix version" + set MAKE = "make -sk" + breaksw + case "sun": + echo "SunOS version" + set MAKE = "make -sk" + breaksw + case "dec": + echo "DEC Ultrix (mips) version" + set MAKE = "gmake -sk" + breaksw + case "sgi": + echo "Silicon Graphics Iris version" + set MAKE = "gmake -sk" + breaksw + case "alpha": + echo "DEC Alpha OSF/1 version" + set MAKE = "gmake -sk" + breaksw + case "solaris": + echo "Solaris version" + set MAKE = "make -sk" + breaksw + default: + echo "Panic. Unknown version" + exit 1 +endsw +echo "" +echo "* starting initialization...please wait." +echo "" + +# Binary fork of source directory +if ($LOCAL == "YES") then + set DIR_BINARIES = ${MACHINE}-binaries + set DIR_PROGS = ${MACHINE}-bin +else + set DIR_BINARIES = . + set DIR_PROGS = bin + set MAKE = "$MAKE -f makefile-${MACHINE}" +endif + +init: +# Set useful shell variables +set YES="YES"; +set NO="NO" + +# set/unset some .cshrc envs. +unset noclobber +set noglob + +# set interrupt trap +onintr end_failure + +# Make dir command +set MKDIR = "mkdir" + +# Copy command +set CP = "cp -p" + +# Install command +#set INSTALL = "install" +#set INSTALL = "mv" +set INSTALL = "cp" + +# Set up default responses +set DEF_STADEN_ROOT = `pwd` + +set DEF_REQ_NONX = "$YES" +set DEF_REQ_X = "$YES" +set DEF_REQ_TED = "$YES" +set DEF_REQ_MISC = "$YES" + +# directories +set DIR_SRC = $DEF_STADEN_ROOT/src +set DIR_BIN = $DEF_STADEN_ROOT/$DIR_PROGS +set DIR_MISC = $DIR_SRC/Misc +set DIR_STADEN = $DIR_SRC/staden +set DIR_TED = $DIR_SRC/ted +set DIR_ABI = $DIR_SRC/abi +set DIR_ALF = $DIR_SRC/alf +set DIR_BAP = $DIR_SRC/bap +set DIR_OSP = $DIR_SRC/bap/osp-bits +set DIR_CONVERT = $DIR_SRC/convert +set DIR_COP = $DIR_SRC/cop +set DIR_FROG = $DIR_SRC/frog +set DIR_GETMCH = $DIR_SRC/getMCH +set DIR_SCF = $DIR_SRC/scf + + +main: + + +preamble: + echo "" + echo "" + echo "* Please answer the following questions." + echo " Default answers to questions are given in square brackets." + echo " If you require help at any stage respond with a ? to the question." + echo "" + +ask_staden_root: + set ANS_STADEN_ROOT = $DEF_STADEN_ROOT + +ask_require_nonx_progs: + echo -n "Compile all the non-X programs in the Staden Package [$DEF_REQ_NONX]? " + set ANS_REQ_NONX = $< + if ("$ANS_REQ_NONX" == "?") then + echo "* If you do not have X windows on your system you will require" + echo " these. However, you will require Tektronics terminal emulation." + echo " If you do not require all of the non-X programs, you should abort" + echo " and manually make the ones you require." + echo "" + goto ask_require_nonx_progs + else if ("$ANS_REQ_NONX" != "") then + if ("$ANS_REQ_NONX" =~ [yY]*) then + set ANS_REQ_NONX=$YES + else if ("$ANS_REQ_NONX" =~ [nN]*) then + set ANS_REQ_NONX=$NO + else + goto ask_require_nonx_progs + endif + else + set ANS_REQ_NONX=$DEF_REQ_NONX + endif + +ask_require_x_progs: + echo -n "Compile all the X programs in the Staden Package [$DEF_REQ_X]? " + set ANS_REQ_X = $< + if ("$ANS_REQ_X" == "?") then + echo "* These are the programs that require X windows." + echo " If you do not require all of the X programs, you should abort" + echo " and manually make the ones you require." + + echo "" + goto ask_require_x_progs + else if ("$ANS_REQ_X" != "") then + if ("$ANS_REQ_X" =~ [yY]*) then + set ANS_REQ_X=$YES + else if ("$ANS_REQ_X" =~ [nN]*) then + set ANS_REQ_X=$NO + else + goto ask_require_nonx_progs + endif + else + set ANS_REQ_X=$DEF_REQ_X + endif + + +ask_require_ted: + echo -n "Compile the trace editing program ted [$DEF_REQ_TED]? " + set ANS_REQ_TED = $< + if ("$ANS_REQ_TED" == "?") then + echo "* This is the trace editor program. It allows you to look at" + echo " traces obtained from automated fluorescent sequencing machines." + echo "" + goto ask_require_ted + else if ("$ANS_REQ_TED" != "") then + if ("$ANS_REQ_TED" =~ [yY]*) then + set ANS_REQ_TED=$YES + else if ("$ANS_REQ_TED" =~ [nN]*) then + set ANS_REQ_TED=$NO + else + goto ask_require_ted + endif + else + set ANS_REQ_TED=$DEF_REQ_TED + endif + + + +ask_require_misc: + echo -n "Compile other programs [$DEF_REQ_MISC]? " + set ANS_REQ_MISC = $< + if ("$ANS_REQ_MISC" == "?") then + echo "* Other programs include:" + echo " alfsplit" + echo " getABISampleName" + echo "" + goto ask_require_misc + else if ("$ANS_REQ_MISC" != "") then + if ("$ANS_REQ_MISC" =~ [yY]*) then + set ANS_REQ_MISC=$YES + else if ("$ANS_REQ_MISC" =~ [nN]*) then + set ANS_REQ_MISC=$NO + else + goto ask_require_misc + endif + else + set ANS_REQ_MISC=$DEF_REQ_MISC + endif + + + +time_taken_warning: + echo "" + echo "The installation procedure is now ready to start." + echo "" + echo "**** Warning:" + echo " The installation will take considerable time to complete. If you" + echo " are installing the whole Staden Package from scratch it could" + echo " take as long as an hour for all exectuables to be compiled and" + echo " installed." + echo "" + +ask_goahead: + echo -n "Proceed with the installation [YES]? " + set ANSWER=$< + if ("$ANSWER" == "?") then + echo "* Final confirmation to proceed with the installation. Answer" + echo " YES to proceed; otherwise, answer NO to abort the installation." + echo "" + goto ask_goahead + else if ("$ANSWER" != "") then + if ("$ANSWER" =~ [nN]*) then + goto chickens_exit + else if ("$ANSWER" !~ [yY]*) then + goto ask_goahead + endif + endif + +installation_proper: + +# make binaries directory if it doesn't exist + + if (! -d $DIR_BIN) then + $MKDIR $DIR_BIN + endif + + if ("$ANS_REQ_MISC" == "$YES" || "$ANS_REQ_X" == "$YES" || "$ANS_REQ_NONX" == "$YES" ) then + echo "" + echo "+ Compiling miscellaneous library" + + pushd $DIR_MISC > /dev/null + + cd $DIR_BINARIES + $MAKE all + + popd > /dev/null + + endif + + if ("$ANS_REQ_NONX" == "$YES") then + echo "" + echo "+ Installing non X programs" + + pushd $DIR_STADEN > /dev/null + cd $DIR_BINARIES + $MAKE nprogs lprogs + $INSTALL mep $DIR_BIN + $INSTALL nip $DIR_BIN + $INSTALL pip $DIR_BIN + $INSTALL sap $DIR_BIN + $INSTALL sapf $DIR_BIN + $INSTALL sip $DIR_BIN + $INSTALL splitp1 $DIR_BIN + $INSTALL splitp2 $DIR_BIN + $INSTALL splitp3 $DIR_BIN + $INSTALL sethelp $DIR_BIN + $INSTALL gip $DIR_BIN + $INSTALL nipl $DIR_BIN + $INSTALL pipl $DIR_BIN + $INSTALL sipl $DIR_BIN + $INSTALL dap $DIR_BIN + $INSTALL nipf $DIR_BIN + $INSTALL vep $DIR_BIN + $INSTALL rep $DIR_BIN + $INSTALL lip $DIR_BIN + #$INSTALL convert_project $DIR_BIN + popd > /dev/null + + pushd $DIR_OSP > /dev/null + cd $DIR_BINARIES + $MAKE + popd > /dev/null + + pushd $DIR_BAP > /dev/null + cd $DIR_BINARIES + $MAKE bap + $INSTALL bap $DIR_BIN + popd > /dev/null + + endif + + if ("$ANS_REQ_TED" == "$YES") then + echo "" + echo "+ Installing Trace editor" + + pushd $DIR_TED > /dev/null + cd $DIR_BINARIES + $MAKE ted + $INSTALL ted $DIR_BIN + popd > /dev/null + endif + + if ("$ANS_REQ_X" == "$YES") then + echo "" + echo "+ Installing X programs" + + pushd $DIR_STADEN > /dev/null + cd $DIR_BINARIES + $MAKE xprogs + $INSTALL xmep $DIR_BIN + $INSTALL xnip $DIR_BIN + $INSTALL xpip $DIR_BIN + $INSTALL xsap $DIR_BIN + $INSTALL xsip $DIR_BIN + $INSTALL xdap $DIR_BIN + popd > /dev/null + + pushd $DIR_OSP > /dev/null + cd $DIR_BINARIES + $MAKE + popd > /dev/null + + pushd $DIR_BAP > /dev/null + cd $DIR_BINARIES + $MAKE xbap + $INSTALL xbap $DIR_BIN + popd > /dev/null + + + endif + + if ("$ANS_REQ_MISC" == "$YES") then + echo "" + echo "+ Installing miscellaneous programs" + + pushd $DIR_ABI > /dev/null + cd $DIR_BINARIES + $MAKE all + $INSTALL getABISampleName $DIR_BIN + popd > /dev/null + + pushd $DIR_ALF > /dev/null + cd $DIR_BINARIES + $MAKE alfsplit + $INSTALL alfsplit $DIR_BIN + popd > /dev/null + + pushd $DIR_CONVERT > /dev/null + cd $DIR_BINARIES + $MAKE convert + $INSTALL convert $DIR_BIN + popd > /dev/null + + pushd $DIR_COP > /dev/null + cd $DIR_BINARIES + $MAKE all + $INSTALL cop $DIR_BIN + $INSTALL cop-bap $DIR_BIN + popd > /dev/null + + pushd $DIR_FROG > /dev/null + cd $DIR_BINARIES + $MAKE frog + $INSTALL frog $DIR_BIN + popd > /dev/null + + pushd $DIR_GETMCH > /dev/null + cd $DIR_BINARIES + $MAKE trace2seq + $INSTALL trace2seq $DIR_BIN + popd > /dev/null + + pushd $DIR_SCF > /dev/null + cd $DIR_BINARIES + $MAKE makeSCF + $INSTALL makeSCF $DIR_BIN + popd > /dev/null + + + + endif + + +installation_done: + echo "" + echo "+ Installation completed" + echo "" + + echo " Some further initialisation is required in order to use the" + echo " package. csh users should insert the following in their .login" + echo " files:" + echo " " + echo " setenv STADENROOT $ANS_STADEN_ROOT" + echo ' source $STADENROOT/staden.login' + echo " " + echo " Users of the Bourne shell, sh, should insert the following in" + echo " their .profile:" + echo " " + echo " STADENROOT=$ANS_STADEN_ROOT" + echo " export STADENROOT" + echo ' . $STADENROOT/staden.profile' + echo " " + echo " These initialisations will alter the shell's search path so that" + echo " it can find the programs in the STADEN Package" + echo " " + +normal_exit: + exit 0 + +chickens_exit: + echo "" + echo "+ Installation cancelled" + echo "" + + exit 0 + +end_failure: + unset noglob + echo "" + echo "Aborted STADEN Package installation on `date`" + echo "" + exit 1 + diff --git a/Staden_install-sgi b/Staden_install-sgi new file mode 100644 index 0000000..7ce5b1e --- /dev/null +++ b/Staden_install-sgi @@ -0,0 +1,453 @@ +#! /bin/csh -f +# +# staden_install - version 2.4 +# +# This is a prototype installation program. +# +# 9 March 1992 +# Modified for installation on Sun, Alliant, etc +# No longer install 2rs +# +# 20 November 1992 +# Now includes convert, cop, frog, getMCH and scf +# +# 25 November 1992 +# SGI supported +# +# 19 May 1993 +# DEC Alpha, Solaris supported +# +# Written by sd@uk.ac.cam.mrc-lmb +# + +# prelim +set prog = $0 ; set prog = $prog:t + +# Machines supported: al sun dec sgi alpha solaris +#set MACHINE = `echo $prog | sed 's/.*-//'` +set MACHINE = sgi + +# For local (MRC-LMB) setup only +#set LOCAL = `echo $prog | awk '/local/{print "YES";exit;}{print "NO";}'` +set LOCAL = NO + + +echo "" +echo -n "Staden Package installation procedure - " +switch (${MACHINE}) + case "al": + echo "Alliant FX/2800 Concentrix version" + set MAKE = "make -sk" + breaksw + case "sun": + echo "SunOS version" + set MAKE = "make -sk" + breaksw + case "dec": + echo "DEC Ultrix (mips) version" + set MAKE = "gmake -sk" + breaksw + case "sgi": + echo "Silicon Graphics Iris version" + set MAKE = "gmake -sk" + breaksw + case "alpha": + echo "DEC Alpha OSF/1 version" + set MAKE = "gmake -sk" + breaksw + case "solaris": + echo "Solaris version" + set MAKE = "make -sk" + breaksw + default: + echo "Panic. Unknown version" + exit 1 +endsw +echo "" +echo "* starting initialization...please wait." +echo "" + +# Binary fork of source directory +if ($LOCAL == "YES") then + set DIR_BINARIES = ${MACHINE}-binaries + set DIR_PROGS = ${MACHINE}-bin +else + set DIR_BINARIES = . + set DIR_PROGS = bin + set MAKE = "$MAKE -f makefile-${MACHINE}" +endif + +init: +# Set useful shell variables +set YES="YES"; +set NO="NO" + +# set/unset some .cshrc envs. +unset noclobber +set noglob + +# set interrupt trap +onintr end_failure + +# Make dir command +set MKDIR = "mkdir" + +# Copy command +set CP = "cp -p" + +# Install command +#set INSTALL = "install" +#set INSTALL = "mv" +set INSTALL = "cp" + +# Set up default responses +set DEF_STADEN_ROOT = `pwd` + +set DEF_REQ_NONX = "$YES" +set DEF_REQ_X = "$YES" +set DEF_REQ_TED = "$YES" +set DEF_REQ_MISC = "$YES" + +# directories +set DIR_SRC = $DEF_STADEN_ROOT/src +set DIR_BIN = $DEF_STADEN_ROOT/$DIR_PROGS +set DIR_MISC = $DIR_SRC/Misc +set DIR_STADEN = $DIR_SRC/staden +set DIR_TED = $DIR_SRC/ted +set DIR_ABI = $DIR_SRC/abi +set DIR_ALF = $DIR_SRC/alf +set DIR_BAP = $DIR_SRC/bap +set DIR_OSP = $DIR_SRC/bap/osp-bits +set DIR_CONVERT = $DIR_SRC/convert +set DIR_COP = $DIR_SRC/cop +set DIR_FROG = $DIR_SRC/frog +set DIR_GETMCH = $DIR_SRC/getMCH +set DIR_SCF = $DIR_SRC/scf + + +main: + + +preamble: + echo "" + echo "" + echo "* Please answer the following questions." + echo " Default answers to questions are given in square brackets." + echo " If you require help at any stage respond with a ? to the question." + echo "" + +ask_staden_root: + set ANS_STADEN_ROOT = $DEF_STADEN_ROOT + +ask_require_nonx_progs: + echo -n "Compile all the non-X programs in the Staden Package [$DEF_REQ_NONX]? " + set ANS_REQ_NONX = $< + if ("$ANS_REQ_NONX" == "?") then + echo "* If you do not have X windows on your system you will require" + echo " these. However, you will require Tektronics terminal emulation." + echo " If you do not require all of the non-X programs, you should abort" + echo " and manually make the ones you require." + echo "" + goto ask_require_nonx_progs + else if ("$ANS_REQ_NONX" != "") then + if ("$ANS_REQ_NONX" =~ [yY]*) then + set ANS_REQ_NONX=$YES + else if ("$ANS_REQ_NONX" =~ [nN]*) then + set ANS_REQ_NONX=$NO + else + goto ask_require_nonx_progs + endif + else + set ANS_REQ_NONX=$DEF_REQ_NONX + endif + +ask_require_x_progs: + echo -n "Compile all the X programs in the Staden Package [$DEF_REQ_X]? " + set ANS_REQ_X = $< + if ("$ANS_REQ_X" == "?") then + echo "* These are the programs that require X windows." + echo " If you do not require all of the X programs, you should abort" + echo " and manually make the ones you require." + + echo "" + goto ask_require_x_progs + else if ("$ANS_REQ_X" != "") then + if ("$ANS_REQ_X" =~ [yY]*) then + set ANS_REQ_X=$YES + else if ("$ANS_REQ_X" =~ [nN]*) then + set ANS_REQ_X=$NO + else + goto ask_require_nonx_progs + endif + else + set ANS_REQ_X=$DEF_REQ_X + endif + + +ask_require_ted: + echo -n "Compile the trace editing program ted [$DEF_REQ_TED]? " + set ANS_REQ_TED = $< + if ("$ANS_REQ_TED" == "?") then + echo "* This is the trace editor program. It allows you to look at" + echo " traces obtained from automated fluorescent sequencing machines." + echo "" + goto ask_require_ted + else if ("$ANS_REQ_TED" != "") then + if ("$ANS_REQ_TED" =~ [yY]*) then + set ANS_REQ_TED=$YES + else if ("$ANS_REQ_TED" =~ [nN]*) then + set ANS_REQ_TED=$NO + else + goto ask_require_ted + endif + else + set ANS_REQ_TED=$DEF_REQ_TED + endif + + + +ask_require_misc: + echo -n "Compile other programs [$DEF_REQ_MISC]? " + set ANS_REQ_MISC = $< + if ("$ANS_REQ_MISC" == "?") then + echo "* Other programs include:" + echo " alfsplit" + echo " getABISampleName" + echo "" + goto ask_require_misc + else if ("$ANS_REQ_MISC" != "") then + if ("$ANS_REQ_MISC" =~ [yY]*) then + set ANS_REQ_MISC=$YES + else if ("$ANS_REQ_MISC" =~ [nN]*) then + set ANS_REQ_MISC=$NO + else + goto ask_require_misc + endif + else + set ANS_REQ_MISC=$DEF_REQ_MISC + endif + + + +time_taken_warning: + echo "" + echo "The installation procedure is now ready to start." + echo "" + echo "**** Warning:" + echo " The installation will take considerable time to complete. If you" + echo " are installing the whole Staden Package from scratch it could" + echo " take as long as an hour for all exectuables to be compiled and" + echo " installed." + echo "" + +ask_goahead: + echo -n "Proceed with the installation [YES]? " + set ANSWER=$< + if ("$ANSWER" == "?") then + echo "* Final confirmation to proceed with the installation. Answer" + echo " YES to proceed; otherwise, answer NO to abort the installation." + echo "" + goto ask_goahead + else if ("$ANSWER" != "") then + if ("$ANSWER" =~ [nN]*) then + goto chickens_exit + else if ("$ANSWER" !~ [yY]*) then + goto ask_goahead + endif + endif + +installation_proper: + +# make binaries directory if it doesn't exist + + if (! -d $DIR_BIN) then + $MKDIR $DIR_BIN + endif + + if ("$ANS_REQ_MISC" == "$YES" || "$ANS_REQ_X" == "$YES" || "$ANS_REQ_NONX" == "$YES" ) then + echo "" + echo "+ Compiling miscellaneous library" + + pushd $DIR_MISC > /dev/null + + cd $DIR_BINARIES + $MAKE all + + popd > /dev/null + + endif + + if ("$ANS_REQ_NONX" == "$YES") then + echo "" + echo "+ Installing non X programs" + + pushd $DIR_STADEN > /dev/null + cd $DIR_BINARIES + $MAKE nprogs lprogs + $INSTALL mep $DIR_BIN + $INSTALL nip $DIR_BIN + $INSTALL pip $DIR_BIN + $INSTALL sap $DIR_BIN + $INSTALL sapf $DIR_BIN + $INSTALL sip $DIR_BIN + $INSTALL splitp1 $DIR_BIN + $INSTALL splitp2 $DIR_BIN + $INSTALL splitp3 $DIR_BIN + $INSTALL sethelp $DIR_BIN + $INSTALL gip $DIR_BIN + $INSTALL nipl $DIR_BIN + $INSTALL pipl $DIR_BIN + $INSTALL sipl $DIR_BIN + $INSTALL dap $DIR_BIN + $INSTALL nipf $DIR_BIN + $INSTALL vep $DIR_BIN + $INSTALL rep $DIR_BIN + $INSTALL lip $DIR_BIN + #$INSTALL convert_project $DIR_BIN + popd > /dev/null + + pushd $DIR_OSP > /dev/null + cd $DIR_BINARIES + $MAKE + popd > /dev/null + + pushd $DIR_BAP > /dev/null + cd $DIR_BINARIES + $MAKE bap + $INSTALL bap $DIR_BIN + popd > /dev/null + + endif + + if ("$ANS_REQ_TED" == "$YES") then + echo "" + echo "+ Installing Trace editor" + + pushd $DIR_TED > /dev/null + cd $DIR_BINARIES + $MAKE ted + $INSTALL ted $DIR_BIN + popd > /dev/null + endif + + if ("$ANS_REQ_X" == "$YES") then + echo "" + echo "+ Installing X programs" + + pushd $DIR_STADEN > /dev/null + cd $DIR_BINARIES + $MAKE xprogs + $INSTALL xmep $DIR_BIN + $INSTALL xnip $DIR_BIN + $INSTALL xpip $DIR_BIN + $INSTALL xsap $DIR_BIN + $INSTALL xsip $DIR_BIN + $INSTALL xdap $DIR_BIN + popd > /dev/null + + pushd $DIR_OSP > /dev/null + cd $DIR_BINARIES + $MAKE + popd > /dev/null + + pushd $DIR_BAP > /dev/null + cd $DIR_BINARIES + $MAKE xbap + $INSTALL xbap $DIR_BIN + popd > /dev/null + + + endif + + if ("$ANS_REQ_MISC" == "$YES") then + echo "" + echo "+ Installing miscellaneous programs" + + pushd $DIR_ABI > /dev/null + cd $DIR_BINARIES + $MAKE all + $INSTALL getABISampleName $DIR_BIN + popd > /dev/null + + pushd $DIR_ALF > /dev/null + cd $DIR_BINARIES + $MAKE alfsplit + $INSTALL alfsplit $DIR_BIN + popd > /dev/null + + pushd $DIR_CONVERT > /dev/null + cd $DIR_BINARIES + $MAKE convert + $INSTALL convert $DIR_BIN + popd > /dev/null + + pushd $DIR_COP > /dev/null + cd $DIR_BINARIES + $MAKE all + $INSTALL cop $DIR_BIN + $INSTALL cop-bap $DIR_BIN + popd > /dev/null + + pushd $DIR_FROG > /dev/null + cd $DIR_BINARIES + $MAKE frog + $INSTALL frog $DIR_BIN + popd > /dev/null + + pushd $DIR_GETMCH > /dev/null + cd $DIR_BINARIES + $MAKE trace2seq + $INSTALL trace2seq $DIR_BIN + popd > /dev/null + + pushd $DIR_SCF > /dev/null + cd $DIR_BINARIES + $MAKE makeSCF + $INSTALL makeSCF $DIR_BIN + popd > /dev/null + + + + endif + + +installation_done: + echo "" + echo "+ Installation completed" + echo "" + + echo " Some further initialisation is required in order to use the" + echo " package. csh users should insert the following in their .login" + echo " files:" + echo " " + echo " setenv STADENROOT $ANS_STADEN_ROOT" + echo ' source $STADENROOT/staden.login' + echo " " + echo " Users of the Bourne shell, sh, should insert the following in" + echo " their .profile:" + echo " " + echo " STADENROOT=$ANS_STADEN_ROOT" + echo " export STADENROOT" + echo ' . $STADENROOT/staden.profile' + echo " " + echo " These initialisations will alter the shell's search path so that" + echo " it can find the programs in the STADEN Package" + echo " " + +normal_exit: + exit 0 + +chickens_exit: + echo "" + echo "+ Installation cancelled" + echo "" + + exit 0 + +end_failure: + unset noglob + echo "" + echo "Aborted STADEN Package installation on `date`" + echo "" + exit 1 + diff --git a/Staden_install-solaris b/Staden_install-solaris new file mode 100644 index 0000000..93c2feb --- /dev/null +++ b/Staden_install-solaris @@ -0,0 +1,453 @@ +#! /bin/csh -f +# +# staden_install - version 2.4 +# +# This is a prototype installation program. +# +# 9 March 1992 +# Modified for installation on Sun, Alliant, etc +# No longer install 2rs +# +# 20 November 1992 +# Now includes convert, cop, frog, getMCH and scf +# +# 25 November 1992 +# SGI supported +# +# 19 May 1993 +# DEC Alpha, Solaris supported +# +# Written by sd@uk.ac.cam.mrc-lmb +# + +# prelim +set prog = $0 ; set prog = $prog:t + +# Machines supported: al sun dec sgi alpha solaris +#set MACHINE = `echo $prog | sed 's/.*-//'` +set MACHINE = solaris + +# For local (MRC-LMB) setup only +#set LOCAL = `echo $prog | awk '/local/{print "YES";exit;}{print "NO";}'` +set LOCAL = NO + + +echo "" +echo -n "Staden Package installation procedure - " +switch (${MACHINE}) + case "al": + echo "Alliant FX/2800 Concentrix version" + set MAKE = "make -sk" + breaksw + case "sun": + echo "SunOS version" + set MAKE = "make -sk" + breaksw + case "dec": + echo "DEC Ultrix (mips) version" + set MAKE = "gmake -sk" + breaksw + case "sgi": + echo "Silicon Graphics Iris version" + set MAKE = "gmake -sk" + breaksw + case "alpha": + echo "DEC Alpha OSF/1 version" + set MAKE = "gmake -sk" + breaksw + case "solaris": + echo "Solaris version" + set MAKE = "make -sk" + breaksw + default: + echo "Panic. Unknown version" + exit 1 +endsw +echo "" +echo "* starting initialization...please wait." +echo "" + +# Binary fork of source directory +if ($LOCAL == "YES") then + set DIR_BINARIES = ${MACHINE}-binaries + set DIR_PROGS = ${MACHINE}-bin +else + set DIR_BINARIES = . + set DIR_PROGS = bin + set MAKE = "$MAKE -f makefile-${MACHINE}" +endif + +init: +# Set useful shell variables +set YES="YES"; +set NO="NO" + +# set/unset some .cshrc envs. +unset noclobber +set noglob + +# set interrupt trap +onintr end_failure + +# Make dir command +set MKDIR = "mkdir" + +# Copy command +set CP = "cp -p" + +# Install command +#set INSTALL = "install" +#set INSTALL = "mv" +set INSTALL = "cp" + +# Set up default responses +set DEF_STADEN_ROOT = `pwd` + +set DEF_REQ_NONX = "$YES" +set DEF_REQ_X = "$YES" +set DEF_REQ_TED = "$YES" +set DEF_REQ_MISC = "$YES" + +# directories +set DIR_SRC = $DEF_STADEN_ROOT/src +set DIR_BIN = $DEF_STADEN_ROOT/$DIR_PROGS +set DIR_MISC = $DIR_SRC/Misc +set DIR_STADEN = $DIR_SRC/staden +set DIR_TED = $DIR_SRC/ted +set DIR_ABI = $DIR_SRC/abi +set DIR_ALF = $DIR_SRC/alf +set DIR_BAP = $DIR_SRC/bap +set DIR_OSP = $DIR_SRC/bap/osp-bits +set DIR_CONVERT = $DIR_SRC/convert +set DIR_COP = $DIR_SRC/cop +set DIR_FROG = $DIR_SRC/frog +set DIR_GETMCH = $DIR_SRC/getMCH +set DIR_SCF = $DIR_SRC/scf + + +main: + + +preamble: + echo "" + echo "" + echo "* Please answer the following questions." + echo " Default answers to questions are given in square brackets." + echo " If you require help at any stage respond with a ? to the question." + echo "" + +ask_staden_root: + set ANS_STADEN_ROOT = $DEF_STADEN_ROOT + +ask_require_nonx_progs: + echo -n "Compile all the non-X programs in the Staden Package [$DEF_REQ_NONX]? " + set ANS_REQ_NONX = $< + if ("$ANS_REQ_NONX" == "?") then + echo "* If you do not have X windows on your system you will require" + echo " these. However, you will require Tektronics terminal emulation." + echo " If you do not require all of the non-X programs, you should abort" + echo " and manually make the ones you require." + echo "" + goto ask_require_nonx_progs + else if ("$ANS_REQ_NONX" != "") then + if ("$ANS_REQ_NONX" =~ [yY]*) then + set ANS_REQ_NONX=$YES + else if ("$ANS_REQ_NONX" =~ [nN]*) then + set ANS_REQ_NONX=$NO + else + goto ask_require_nonx_progs + endif + else + set ANS_REQ_NONX=$DEF_REQ_NONX + endif + +ask_require_x_progs: + echo -n "Compile all the X programs in the Staden Package [$DEF_REQ_X]? " + set ANS_REQ_X = $< + if ("$ANS_REQ_X" == "?") then + echo "* These are the programs that require X windows." + echo " If you do not require all of the X programs, you should abort" + echo " and manually make the ones you require." + + echo "" + goto ask_require_x_progs + else if ("$ANS_REQ_X" != "") then + if ("$ANS_REQ_X" =~ [yY]*) then + set ANS_REQ_X=$YES + else if ("$ANS_REQ_X" =~ [nN]*) then + set ANS_REQ_X=$NO + else + goto ask_require_nonx_progs + endif + else + set ANS_REQ_X=$DEF_REQ_X + endif + + +ask_require_ted: + echo -n "Compile the trace editing program ted [$DEF_REQ_TED]? " + set ANS_REQ_TED = $< + if ("$ANS_REQ_TED" == "?") then + echo "* This is the trace editor program. It allows you to look at" + echo " traces obtained from automated fluorescent sequencing machines." + echo "" + goto ask_require_ted + else if ("$ANS_REQ_TED" != "") then + if ("$ANS_REQ_TED" =~ [yY]*) then + set ANS_REQ_TED=$YES + else if ("$ANS_REQ_TED" =~ [nN]*) then + set ANS_REQ_TED=$NO + else + goto ask_require_ted + endif + else + set ANS_REQ_TED=$DEF_REQ_TED + endif + + + +ask_require_misc: + echo -n "Compile other programs [$DEF_REQ_MISC]? " + set ANS_REQ_MISC = $< + if ("$ANS_REQ_MISC" == "?") then + echo "* Other programs include:" + echo " alfsplit" + echo " getABISampleName" + echo "" + goto ask_require_misc + else if ("$ANS_REQ_MISC" != "") then + if ("$ANS_REQ_MISC" =~ [yY]*) then + set ANS_REQ_MISC=$YES + else if ("$ANS_REQ_MISC" =~ [nN]*) then + set ANS_REQ_MISC=$NO + else + goto ask_require_misc + endif + else + set ANS_REQ_MISC=$DEF_REQ_MISC + endif + + + +time_taken_warning: + echo "" + echo "The installation procedure is now ready to start." + echo "" + echo "**** Warning:" + echo " The installation will take considerable time to complete. If you" + echo " are installing the whole Staden Package from scratch it could" + echo " take as long as an hour for all exectuables to be compiled and" + echo " installed." + echo "" + +ask_goahead: + echo -n "Proceed with the installation [YES]? " + set ANSWER=$< + if ("$ANSWER" == "?") then + echo "* Final confirmation to proceed with the installation. Answer" + echo " YES to proceed; otherwise, answer NO to abort the installation." + echo "" + goto ask_goahead + else if ("$ANSWER" != "") then + if ("$ANSWER" =~ [nN]*) then + goto chickens_exit + else if ("$ANSWER" !~ [yY]*) then + goto ask_goahead + endif + endif + +installation_proper: + +# make binaries directory if it doesn't exist + + if (! -d $DIR_BIN) then + $MKDIR $DIR_BIN + endif + + if ("$ANS_REQ_MISC" == "$YES" || "$ANS_REQ_X" == "$YES" || "$ANS_REQ_NONX" == "$YES" ) then + echo "" + echo "+ Compiling miscellaneous library" + + pushd $DIR_MISC > /dev/null + + cd $DIR_BINARIES + $MAKE all + + popd > /dev/null + + endif + + if ("$ANS_REQ_NONX" == "$YES") then + echo "" + echo "+ Installing non X programs" + + pushd $DIR_STADEN > /dev/null + cd $DIR_BINARIES + $MAKE nprogs lprogs + $INSTALL mep $DIR_BIN + $INSTALL nip $DIR_BIN + $INSTALL pip $DIR_BIN + $INSTALL sap $DIR_BIN + $INSTALL sapf $DIR_BIN + $INSTALL sip $DIR_BIN + $INSTALL splitp1 $DIR_BIN + $INSTALL splitp2 $DIR_BIN + $INSTALL splitp3 $DIR_BIN + $INSTALL sethelp $DIR_BIN + $INSTALL gip $DIR_BIN + $INSTALL nipl $DIR_BIN + $INSTALL pipl $DIR_BIN + $INSTALL sipl $DIR_BIN + $INSTALL dap $DIR_BIN + $INSTALL nipf $DIR_BIN + $INSTALL vep $DIR_BIN + $INSTALL rep $DIR_BIN + $INSTALL lip $DIR_BIN + #$INSTALL convert_project $DIR_BIN + popd > /dev/null + + pushd $DIR_OSP > /dev/null + cd $DIR_BINARIES + $MAKE + popd > /dev/null + + pushd $DIR_BAP > /dev/null + cd $DIR_BINARIES + $MAKE bap + $INSTALL bap $DIR_BIN + popd > /dev/null + + endif + + if ("$ANS_REQ_TED" == "$YES") then + echo "" + echo "+ Installing Trace editor" + + pushd $DIR_TED > /dev/null + cd $DIR_BINARIES + $MAKE ted + $INSTALL ted $DIR_BIN + popd > /dev/null + endif + + if ("$ANS_REQ_X" == "$YES") then + echo "" + echo "+ Installing X programs" + + pushd $DIR_STADEN > /dev/null + cd $DIR_BINARIES + $MAKE xprogs + $INSTALL xmep $DIR_BIN + $INSTALL xnip $DIR_BIN + $INSTALL xpip $DIR_BIN + $INSTALL xsap $DIR_BIN + $INSTALL xsip $DIR_BIN + $INSTALL xdap $DIR_BIN + popd > /dev/null + + pushd $DIR_OSP > /dev/null + cd $DIR_BINARIES + $MAKE + popd > /dev/null + + pushd $DIR_BAP > /dev/null + cd $DIR_BINARIES + $MAKE xbap + $INSTALL xbap $DIR_BIN + popd > /dev/null + + + endif + + if ("$ANS_REQ_MISC" == "$YES") then + echo "" + echo "+ Installing miscellaneous programs" + + pushd $DIR_ABI > /dev/null + cd $DIR_BINARIES + $MAKE all + $INSTALL getABISampleName $DIR_BIN + popd > /dev/null + + pushd $DIR_ALF > /dev/null + cd $DIR_BINARIES + $MAKE alfsplit + $INSTALL alfsplit $DIR_BIN + popd > /dev/null + + pushd $DIR_CONVERT > /dev/null + cd $DIR_BINARIES + $MAKE convert + $INSTALL convert $DIR_BIN + popd > /dev/null + + pushd $DIR_COP > /dev/null + cd $DIR_BINARIES + $MAKE all + $INSTALL cop $DIR_BIN + $INSTALL cop-bap $DIR_BIN + popd > /dev/null + + pushd $DIR_FROG > /dev/null + cd $DIR_BINARIES + $MAKE frog + $INSTALL frog $DIR_BIN + popd > /dev/null + + pushd $DIR_GETMCH > /dev/null + cd $DIR_BINARIES + $MAKE trace2seq + $INSTALL trace2seq $DIR_BIN + popd > /dev/null + + pushd $DIR_SCF > /dev/null + cd $DIR_BINARIES + $MAKE makeSCF + $INSTALL makeSCF $DIR_BIN + popd > /dev/null + + + + endif + + +installation_done: + echo "" + echo "+ Installation completed" + echo "" + + echo " Some further initialisation is required in order to use the" + echo " package. csh users should insert the following in their .login" + echo " files:" + echo " " + echo " setenv STADENROOT $ANS_STADEN_ROOT" + echo ' source $STADENROOT/staden.login' + echo " " + echo " Users of the Bourne shell, sh, should insert the following in" + echo " their .profile:" + echo " " + echo " STADENROOT=$ANS_STADEN_ROOT" + echo " export STADENROOT" + echo ' . $STADENROOT/staden.profile' + echo " " + echo " These initialisations will alter the shell's search path so that" + echo " it can find the programs in the STADEN Package" + echo " " + +normal_exit: + exit 0 + +chickens_exit: + echo "" + echo "+ Installation cancelled" + echo "" + + exit 0 + +end_failure: + unset noglob + echo "" + echo "Aborted STADEN Package installation on `date`" + echo "" + exit 1 + diff --git a/Staden_install-sun b/Staden_install-sun new file mode 100644 index 0000000..0d6711e --- /dev/null +++ b/Staden_install-sun @@ -0,0 +1,453 @@ +#! /bin/csh -f +# +# staden_install - version 2.4 +# +# This is a prototype installation program. +# +# 9 March 1992 +# Modified for installation on Sun, Alliant, etc +# No longer install 2rs +# +# 20 November 1992 +# Now includes convert, cop, frog, getMCH and scf +# +# 25 November 1992 +# SGI supported +# +# 19 May 1993 +# DEC Alpha, Solaris supported +# +# Written by sd@uk.ac.cam.mrc-lmb +# + +# prelim +set prog = $0 ; set prog = $prog:t + +# Machines supported: al sun dec sgi alpha solaris +#set MACHINE = `echo $prog | sed 's/.*-//'` +set MACHINE = sun + +# For local (MRC-LMB) setup only +#set LOCAL = `echo $prog | awk '/local/{print "YES";exit;}{print "NO";}'` +set LOCAL = NO + + +echo "" +echo -n "Staden Package installation procedure - " +switch (${MACHINE}) + case "al": + echo "Alliant FX/2800 Concentrix version" + set MAKE = "make -sk" + breaksw + case "sun": + echo "SunOS version" + set MAKE = "make -sk" + breaksw + case "dec": + echo "DEC Ultrix (mips) version" + set MAKE = "gmake -sk" + breaksw + case "sgi": + echo "Silicon Graphics Iris version" + set MAKE = "gmake -sk" + breaksw + case "alpha": + echo "DEC Alpha OSF/1 version" + set MAKE = "gmake -sk" + breaksw + case "solaris": + echo "Solaris version" + set MAKE = "make -sk" + breaksw + default: + echo "Panic. Unknown version" + exit 1 +endsw +echo "" +echo "* starting initialization...please wait." +echo "" + +# Binary fork of source directory +if ($LOCAL == "YES") then + set DIR_BINARIES = ${MACHINE}-binaries + set DIR_PROGS = ${MACHINE}-bin +else + set DIR_BINARIES = . + set DIR_PROGS = bin + set MAKE = "$MAKE -f makefile-${MACHINE}" +endif + +init: +# Set useful shell variables +set YES="YES"; +set NO="NO" + +# set/unset some .cshrc envs. +unset noclobber +set noglob + +# set interrupt trap +onintr end_failure + +# Make dir command +set MKDIR = "mkdir" + +# Copy command +set CP = "cp -p" + +# Install command +#set INSTALL = "install" +#set INSTALL = "mv" +set INSTALL = "cp" + +# Set up default responses +set DEF_STADEN_ROOT = `pwd` + +set DEF_REQ_NONX = "$YES" +set DEF_REQ_X = "$YES" +set DEF_REQ_TED = "$YES" +set DEF_REQ_MISC = "$YES" + +# directories +set DIR_SRC = $DEF_STADEN_ROOT/src +set DIR_BIN = $DEF_STADEN_ROOT/$DIR_PROGS +set DIR_MISC = $DIR_SRC/Misc +set DIR_STADEN = $DIR_SRC/staden +set DIR_TED = $DIR_SRC/ted +set DIR_ABI = $DIR_SRC/abi +set DIR_ALF = $DIR_SRC/alf +set DIR_BAP = $DIR_SRC/bap +set DIR_OSP = $DIR_SRC/bap/osp-bits +set DIR_CONVERT = $DIR_SRC/convert +set DIR_COP = $DIR_SRC/cop +set DIR_FROG = $DIR_SRC/frog +set DIR_GETMCH = $DIR_SRC/getMCH +set DIR_SCF = $DIR_SRC/scf + + +main: + + +preamble: + echo "" + echo "" + echo "* Please answer the following questions." + echo " Default answers to questions are given in square brackets." + echo " If you require help at any stage respond with a ? to the question." + echo "" + +ask_staden_root: + set ANS_STADEN_ROOT = $DEF_STADEN_ROOT + +ask_require_nonx_progs: + echo -n "Compile all the non-X programs in the Staden Package [$DEF_REQ_NONX]? " + set ANS_REQ_NONX = $< + if ("$ANS_REQ_NONX" == "?") then + echo "* If you do not have X windows on your system you will require" + echo " these. However, you will require Tektronics terminal emulation." + echo " If you do not require all of the non-X programs, you should abort" + echo " and manually make the ones you require." + echo "" + goto ask_require_nonx_progs + else if ("$ANS_REQ_NONX" != "") then + if ("$ANS_REQ_NONX" =~ [yY]*) then + set ANS_REQ_NONX=$YES + else if ("$ANS_REQ_NONX" =~ [nN]*) then + set ANS_REQ_NONX=$NO + else + goto ask_require_nonx_progs + endif + else + set ANS_REQ_NONX=$DEF_REQ_NONX + endif + +ask_require_x_progs: + echo -n "Compile all the X programs in the Staden Package [$DEF_REQ_X]? " + set ANS_REQ_X = $< + if ("$ANS_REQ_X" == "?") then + echo "* These are the programs that require X windows." + echo " If you do not require all of the X programs, you should abort" + echo " and manually make the ones you require." + + echo "" + goto ask_require_x_progs + else if ("$ANS_REQ_X" != "") then + if ("$ANS_REQ_X" =~ [yY]*) then + set ANS_REQ_X=$YES + else if ("$ANS_REQ_X" =~ [nN]*) then + set ANS_REQ_X=$NO + else + goto ask_require_nonx_progs + endif + else + set ANS_REQ_X=$DEF_REQ_X + endif + + +ask_require_ted: + echo -n "Compile the trace editing program ted [$DEF_REQ_TED]? " + set ANS_REQ_TED = $< + if ("$ANS_REQ_TED" == "?") then + echo "* This is the trace editor program. It allows you to look at" + echo " traces obtained from automated fluorescent sequencing machines." + echo "" + goto ask_require_ted + else if ("$ANS_REQ_TED" != "") then + if ("$ANS_REQ_TED" =~ [yY]*) then + set ANS_REQ_TED=$YES + else if ("$ANS_REQ_TED" =~ [nN]*) then + set ANS_REQ_TED=$NO + else + goto ask_require_ted + endif + else + set ANS_REQ_TED=$DEF_REQ_TED + endif + + + +ask_require_misc: + echo -n "Compile other programs [$DEF_REQ_MISC]? " + set ANS_REQ_MISC = $< + if ("$ANS_REQ_MISC" == "?") then + echo "* Other programs include:" + echo " alfsplit" + echo " getABISampleName" + echo "" + goto ask_require_misc + else if ("$ANS_REQ_MISC" != "") then + if ("$ANS_REQ_MISC" =~ [yY]*) then + set ANS_REQ_MISC=$YES + else if ("$ANS_REQ_MISC" =~ [nN]*) then + set ANS_REQ_MISC=$NO + else + goto ask_require_misc + endif + else + set ANS_REQ_MISC=$DEF_REQ_MISC + endif + + + +time_taken_warning: + echo "" + echo "The installation procedure is now ready to start." + echo "" + echo "**** Warning:" + echo " The installation will take considerable time to complete. If you" + echo " are installing the whole Staden Package from scratch it could" + echo " take as long as an hour for all exectuables to be compiled and" + echo " installed." + echo "" + +ask_goahead: + echo -n "Proceed with the installation [YES]? " + set ANSWER=$< + if ("$ANSWER" == "?") then + echo "* Final confirmation to proceed with the installation. Answer" + echo " YES to proceed; otherwise, answer NO to abort the installation." + echo "" + goto ask_goahead + else if ("$ANSWER" != "") then + if ("$ANSWER" =~ [nN]*) then + goto chickens_exit + else if ("$ANSWER" !~ [yY]*) then + goto ask_goahead + endif + endif + +installation_proper: + +# make binaries directory if it doesn't exist + + if (! -d $DIR_BIN) then + $MKDIR $DIR_BIN + endif + + if ("$ANS_REQ_MISC" == "$YES" || "$ANS_REQ_X" == "$YES" || "$ANS_REQ_NONX" == "$YES" ) then + echo "" + echo "+ Compiling miscellaneous library" + + pushd $DIR_MISC > /dev/null + + cd $DIR_BINARIES + $MAKE all + + popd > /dev/null + + endif + + if ("$ANS_REQ_NONX" == "$YES") then + echo "" + echo "+ Installing non X programs" + + pushd $DIR_STADEN > /dev/null + cd $DIR_BINARIES + $MAKE nprogs lprogs + $INSTALL mep $DIR_BIN + $INSTALL nip $DIR_BIN + $INSTALL pip $DIR_BIN + $INSTALL sap $DIR_BIN + $INSTALL sapf $DIR_BIN + $INSTALL sip $DIR_BIN + $INSTALL splitp1 $DIR_BIN + $INSTALL splitp2 $DIR_BIN + $INSTALL splitp3 $DIR_BIN + $INSTALL sethelp $DIR_BIN + $INSTALL gip $DIR_BIN + $INSTALL nipl $DIR_BIN + $INSTALL pipl $DIR_BIN + $INSTALL sipl $DIR_BIN + $INSTALL dap $DIR_BIN + $INSTALL nipf $DIR_BIN + $INSTALL vep $DIR_BIN + $INSTALL rep $DIR_BIN + $INSTALL lip $DIR_BIN + #$INSTALL convert_project $DIR_BIN + popd > /dev/null + + pushd $DIR_OSP > /dev/null + cd $DIR_BINARIES + $MAKE + popd > /dev/null + + pushd $DIR_BAP > /dev/null + cd $DIR_BINARIES + $MAKE bap + $INSTALL bap $DIR_BIN + popd > /dev/null + + endif + + if ("$ANS_REQ_TED" == "$YES") then + echo "" + echo "+ Installing Trace editor" + + pushd $DIR_TED > /dev/null + cd $DIR_BINARIES + $MAKE ted + $INSTALL ted $DIR_BIN + popd > /dev/null + endif + + if ("$ANS_REQ_X" == "$YES") then + echo "" + echo "+ Installing X programs" + + pushd $DIR_STADEN > /dev/null + cd $DIR_BINARIES + $MAKE xprogs + $INSTALL xmep $DIR_BIN + $INSTALL xnip $DIR_BIN + $INSTALL xpip $DIR_BIN + $INSTALL xsap $DIR_BIN + $INSTALL xsip $DIR_BIN + $INSTALL xdap $DIR_BIN + popd > /dev/null + + pushd $DIR_OSP > /dev/null + cd $DIR_BINARIES + $MAKE + popd > /dev/null + + pushd $DIR_BAP > /dev/null + cd $DIR_BINARIES + $MAKE xbap + $INSTALL xbap $DIR_BIN + popd > /dev/null + + + endif + + if ("$ANS_REQ_MISC" == "$YES") then + echo "" + echo "+ Installing miscellaneous programs" + + pushd $DIR_ABI > /dev/null + cd $DIR_BINARIES + $MAKE all + $INSTALL getABISampleName $DIR_BIN + popd > /dev/null + + pushd $DIR_ALF > /dev/null + cd $DIR_BINARIES + $MAKE alfsplit + $INSTALL alfsplit $DIR_BIN + popd > /dev/null + + pushd $DIR_CONVERT > /dev/null + cd $DIR_BINARIES + $MAKE convert + $INSTALL convert $DIR_BIN + popd > /dev/null + + pushd $DIR_COP > /dev/null + cd $DIR_BINARIES + $MAKE all + $INSTALL cop $DIR_BIN + $INSTALL cop-bap $DIR_BIN + popd > /dev/null + + pushd $DIR_FROG > /dev/null + cd $DIR_BINARIES + $MAKE frog + $INSTALL frog $DIR_BIN + popd > /dev/null + + pushd $DIR_GETMCH > /dev/null + cd $DIR_BINARIES + $MAKE trace2seq + $INSTALL trace2seq $DIR_BIN + popd > /dev/null + + pushd $DIR_SCF > /dev/null + cd $DIR_BINARIES + $MAKE makeSCF + $INSTALL makeSCF $DIR_BIN + popd > /dev/null + + + + endif + + +installation_done: + echo "" + echo "+ Installation completed" + echo "" + + echo " Some further initialisation is required in order to use the" + echo " package. csh users should insert the following in their .login" + echo " files:" + echo " " + echo " setenv STADENROOT $ANS_STADEN_ROOT" + echo ' source $STADENROOT/staden.login' + echo " " + echo " Users of the Bourne shell, sh, should insert the following in" + echo " their .profile:" + echo " " + echo " STADENROOT=$ANS_STADEN_ROOT" + echo " export STADENROOT" + echo ' . $STADENROOT/staden.profile' + echo " " + echo " These initialisations will alter the shell's search path so that" + echo " it can find the programs in the STADEN Package" + echo " " + +normal_exit: + exit 0 + +chickens_exit: + echo "" + echo "+ Installation cancelled" + echo "" + + exit 0 + +end_failure: + unset noglob + echo "" + echo "Aborted STADEN Package installation on `date`" + echo "" + exit 1 + diff --git a/Version-1993.0.7 b/Version-1993.0.7 new file mode 100644 index 0000000..1689cc5 --- /dev/null +++ b/Version-1993.0.7 @@ -0,0 +1,91 @@ +Wed Jul 7 + *Version-1993.0.7* + New xbap and ted. + Can use Ctrl as well as Meta to shift cutoffs in contig editor. + Code to read in ABI traces now robust to ABI problem files, where + called base order is not base position order. + +Thu Jul 1 + *Version-1993.0.6* + New xbap and bap, to fix bugs. + Break Contig was sometimes not recalculating consensus length correctly. + Contig Edit was trucating reading name lengths at 10 characters. + +Thu Jun 16 + *Version-1993.0.5* + New xbap and bap executables. RS changed assembly in bap so that + when entry is not permitted the program asks for the percentage + mismatch - this allows display of alignments for all levels of + mismatch. + +Mon Jun 14 14:54:43 BST 1993 + *Version-1993.0.4* + Bug in xdap. It was compiled with xbap's edUtils.h by mistake. + +Fri Jun 11 17:50:13 BST 1993 + *Version-1993.0.3* + Bugs in bap/xbap fixed. New executables included. + +Thu Jun 3 13:53:38 BST 1993 + *Version-1993.0.2* + Bugs in bap/xbap fixed. New executables included. + +Thu May 20 14:45:38 BST 1993 + *Version-1993.0.1* + Changes to makefiles and Staden_install + +Fri Mar 5 11:27:22 GMT 1993 + *Version-1993.0* + Now for DEC Alpha and Solaris + bap/xbap now includes double stranding and auto-creation of oligos + +Tue Jan 26 11:54:36 GMT 1993 + *Version-1992.3.1* + Bug fixes + 1. indexseqlibs/genbentryname1.c + 2. convert bugs + new programs + +Mon Nov 23 13:50:39 WET 1992 + *Version-1992.3* + Includes bap/xbap and utility programs + + +Wed Sep 30 11:18:09 BST 1992 + *Version-1992.2.1* + Source changes since last release + bug fixes to postscript output, sequence library programs + New sun and dec executables + + +Thu Aug 27 15:27:05 BST 1992 + + *Version-1992.2* + + +Mon Jul 27 13:01:37 WET 1992 + + *Version-1992.1.3* + Miscellaneous bug fixes and enhancements + New sun and dec executables + + +Tue Jun 16 16:07:41 BST 1992 + + *Version-1992.1.2* + Sun sparc executables now linked with cc and not gcc. + New makefile-sun files + New sources for hitNtrg.c and freetext4.c (indexseqlibs), and + tagU2.c (staden) + + +Wed May 27 17:12:36 BST 1992 + + *Version-1992.1.1* + Inclusion of vep (vector excision program), plus minor changes and bug fixes + + +Tue May 26 11:10:28 WET 1992 + + *Version-1992.1* + This version includes the port to DEC Ultrix (mips) + diff --git a/bin/alfsplit b/bin/alfsplit new file mode 100644 index 0000000..97f5008 Binary files /dev/null and b/bin/alfsplit differ diff --git a/bin/bap b/bin/bap new file mode 100644 index 0000000..2f16f89 Binary files /dev/null and b/bin/bap differ diff --git a/bin/convert b/bin/convert new file mode 100644 index 0000000..2f59c33 Binary files /dev/null and b/bin/convert differ diff --git a/bin/cop b/bin/cop new file mode 100644 index 0000000..7b2b403 Binary files /dev/null and b/bin/cop differ diff --git a/bin/cop-bap b/bin/cop-bap new file mode 100644 index 0000000..48ea21a Binary files /dev/null and b/bin/cop-bap differ diff --git a/bin/dap b/bin/dap new file mode 100644 index 0000000..476dd0a Binary files /dev/null and b/bin/dap differ diff --git a/bin/frog b/bin/frog new file mode 100644 index 0000000..53485da Binary files /dev/null and b/bin/frog differ diff --git a/bin/getABISampleName b/bin/getABISampleName new file mode 100644 index 0000000..17ae99d Binary files /dev/null and b/bin/getABISampleName differ diff --git a/bin/gip b/bin/gip new file mode 100644 index 0000000..6b69ebb Binary files /dev/null and b/bin/gip differ diff --git a/bin/lip b/bin/lip new file mode 100644 index 0000000..92f266c Binary files /dev/null and b/bin/lip differ diff --git a/bin/makeSCF b/bin/makeSCF new file mode 100644 index 0000000..b5c1610 Binary files /dev/null and b/bin/makeSCF differ diff --git a/bin/mep b/bin/mep new file mode 100644 index 0000000..0d3ce7b Binary files /dev/null and b/bin/mep differ diff --git a/bin/nip b/bin/nip new file mode 100644 index 0000000..c053e7e Binary files /dev/null and b/bin/nip differ diff --git a/bin/nipf b/bin/nipf new file mode 100644 index 0000000..6fcc502 Binary files /dev/null and b/bin/nipf differ diff --git a/bin/nipl b/bin/nipl new file mode 100644 index 0000000..d91fc3d Binary files /dev/null and b/bin/nipl differ diff --git a/bin/pip b/bin/pip new file mode 100644 index 0000000..5bb7464 Binary files /dev/null and b/bin/pip differ diff --git a/bin/pipl b/bin/pipl new file mode 100644 index 0000000..c4ab009 Binary files /dev/null and b/bin/pipl differ diff --git a/bin/rep b/bin/rep new file mode 100644 index 0000000..0c3775b Binary files /dev/null and b/bin/rep differ diff --git a/bin/sap b/bin/sap new file mode 100644 index 0000000..09515fa Binary files /dev/null and b/bin/sap differ diff --git a/bin/sapf b/bin/sapf new file mode 100644 index 0000000..cd8b574 Binary files /dev/null and b/bin/sapf differ diff --git a/bin/sethelp b/bin/sethelp new file mode 100644 index 0000000..858dcdd Binary files /dev/null and b/bin/sethelp differ diff --git a/bin/sip b/bin/sip new file mode 100644 index 0000000..4730591 Binary files /dev/null and b/bin/sip differ diff --git a/bin/sipl b/bin/sipl new file mode 100644 index 0000000..f700267 Binary files /dev/null and b/bin/sipl differ diff --git a/bin/splitp1 b/bin/splitp1 new file mode 100644 index 0000000..359ef70 Binary files /dev/null and b/bin/splitp1 differ diff --git a/bin/splitp2 b/bin/splitp2 new file mode 100644 index 0000000..f7c6df5 Binary files /dev/null and b/bin/splitp2 differ diff --git a/bin/splitp3 b/bin/splitp3 new file mode 100644 index 0000000..36b4baf Binary files /dev/null and b/bin/splitp3 differ diff --git a/bin/ted b/bin/ted new file mode 100644 index 0000000..dae6b70 Binary files /dev/null and b/bin/ted differ diff --git a/bin/trace2seq b/bin/trace2seq new file mode 100644 index 0000000..b4cb9d0 Binary files /dev/null and b/bin/trace2seq differ diff --git a/bin/vep b/bin/vep new file mode 100644 index 0000000..f2e3c81 Binary files /dev/null and b/bin/vep differ diff --git a/bin/xbap b/bin/xbap new file mode 100644 index 0000000..70dd0c1 Binary files /dev/null and b/bin/xbap differ diff --git a/bin/xbap.1 b/bin/xbap.1 new file mode 100644 index 0000000..70dd0c1 Binary files /dev/null and b/bin/xbap.1 differ diff --git a/bin/xdap b/bin/xdap new file mode 100644 index 0000000..28ba2bf Binary files /dev/null and b/bin/xdap differ diff --git a/bin/xmep b/bin/xmep new file mode 100644 index 0000000..b58f4f9 Binary files /dev/null and b/bin/xmep differ diff --git a/bin/xnip b/bin/xnip new file mode 100644 index 0000000..a5e9550 Binary files /dev/null and b/bin/xnip differ diff --git a/bin/xpip b/bin/xpip new file mode 100644 index 0000000..761f4ce Binary files /dev/null and b/bin/xpip differ diff --git a/bin/xsap b/bin/xsap new file mode 100644 index 0000000..0aa8f9b Binary files /dev/null and b/bin/xsap differ diff --git a/bin/xsip b/bin/xsip new file mode 100644 index 0000000..c230988 Binary files /dev/null and b/bin/xsip differ diff --git a/doc/Converting_Sap_Databases b/doc/Converting_Sap_Databases new file mode 100644 index 0000000..bfeecea --- /dev/null +++ b/doc/Converting_Sap_Databases @@ -0,0 +1,32 @@ +Converting Sap Databases For Be Used With XDAP SD 10 July 1991 +======================================================================= + +The sequence assembly programmes dap and xdap are based on the programs +sap and xsap, with major modifications. For a concise summary of the +new features I refer you to Rodger and my paper, "A sequence assembly +and editing program for efficient management of large projects" +(Nucleic Acids Research, in press) + +The need for storing extra information in project databases has +resulted in the creation of two files. For users who wish you use old +(sap) databases with xdap, additional files must be created to use all +the new features. The program 'convert_project' does this. It is +interactive, and asks you for names of relevant files, version numbers +etc. Here is a sample program dialogue: + + + % convert_project + Database conversion program + Converts *.RD? file to *.TG? and *.CC? files + + Project name ? test + Version ? 0 + Conversion completed. + + +Further, please ensure that the file TAGDB is in your project +directory. Copies can be found in $STADTABL. Alternatively ensure that +the environment TAGDB variable is set to $STADTABL/TAGDB + + setenv TAGDB $STADTABL/TAGDB + diff --git a/doc/README b/doc/README new file mode 100644 index 0000000..26e473d --- /dev/null +++ b/doc/README @@ -0,0 +1,30 @@ +Processing and printing LaTeX sources +------------------------------------- + +Given a source file src.tex, run LaTeX to generate the bibliographic +references: + + latex src + +Now run BibTeX to search the bibliography for them: + + bibtex src + +Now run LaTeX twice, first to pick up the references, second to bind +forward references: + + latex src + latex src + +This will have generated a src.dvi output file. Now we convert this +to PostScript: + + dvi2ps src.dvi >src.ps + +Now we can print this out: + + lpr src.ps + +Most of the above is only necessay if you are building something from +scratch, but it's best to go through it anyway until you fully +understand how LaTeX works. diff --git a/doc/gip-menu.PS b/doc/gip-menu.PS new file mode 100644 index 0000000..17d7616 --- /dev/null +++ b/doc/gip-menu.PS @@ -0,0 +1,131 @@ +%! +/cm {28.2 mul} def +/BOXSIZE 2 cm def + +/boxcen +{ +% move to centre of box +BOXSIZE mul 2 div BOXSIZE 2 div rmoveto +exch +% move back by correct amount to ensure letter is in centre of box +dup stringwidth +pop 2 div neg % halve & neg x offset +% y offset appears to be zero! - so use constant 'square' char (eg X) +(X) stringwidth pop 2 div neg +} def + +/letter +{ +dup BOXSIZE mul 0 rlineto +0 BOXSIZE rlineto +dup BOXSIZE mul neg 0 rlineto +0 BOXSIZE neg rlineto +closepath +gsave +dup boxcen rmoveto +show +stroke +grestore +BOXSIZE mul 0 rmoveto +} def + +/nextline {0 BOXSIZE neg rmoveto} def + +/line +{ +gsave +1 letter +1 letter +1 letter +1 letter +grestore +nextline +} def + +/Times-Roman findfont 50 scalefont setfont +newpath +5 setlinewidth +200 650 translate +0 0 moveto +%2 setlinecap + +gsave +(A) (G) (C) (T) line +(3) (4) (1) (2) line +(B) (H) (D) (V) line +(M) (N) (K) (L) line +(-) (X) (Y) (R) line +(8) (7) (6) (5) line +/Times-Roman findfont 25 scalefont setfont +gsave +(DELETE) 2 letter +(RESET) 2 letter +grestore +nextline +/Times-Roman findfont 35 scalefont setfont +gsave +(STOP) 4 letter +grestore +nextline +gsave +(START) 4 letter +grestore +nextline +gsave +(CONFIRM) 4 letter +grestore +nextline +% yukky from here on +gsave +0 BOXSIZE rmoveto +1 cm 0 rlineto stroke +grestore +(ORIGIN) dup 4 boxcen rmoveto show pop +(ORIGIN) stringwidth neg exch neg exch rmoveto +(X) stringwidth exch 2 div rmoveto +-5 0 rmoveto +2 setlinewidth +-45 21 rlineto +6 0 rlineto +-6 0 rmoveto +0 -6 rlineto +stroke +grestore +2 setlinewidth +0 BOXSIZE 1.4 mul rmoveto +6 6 rlineto +-6 -6 rmoveto +6 -6 rlineto +-6 6 rmoveto +80 0 rlineto +5 -6 rmoveto +/Times-Roman findfont 30 scalefont setfont +(8 cm) show +5 6 rmoveto +76 0 rlineto +-6 6 rlineto +6 -6 rmoveto +-6 -6 rlineto +stroke +0 0 moveto +BOXSIZE .4 mul neg BOXSIZE rmoveto +currentpoint translate +newpath +0 0 moveto +90 rotate +-6 6 rlineto +6 -6 rmoveto +-6 -6 rlineto +6 6 rmoveto +-244 0 rlineto +-84 0 rmoveto +0 -6 rmoveto +(20 cm) show +0 6 rmoveto +-84 0 rmoveto +-227 0 rlineto +6 6 rlineto +-6 -6 rmoveto +6 -6 rlineto +stroke +showpage diff --git a/doc/install.PS b/doc/install.PS new file mode 100644 index 0000000..0785781 --- /dev/null +++ b/doc/install.PS @@ -0,0 +1,2426 @@ +%! for use by dvi2ps Version 2.00 +% $Header: tex.ps,v 2.0 88/06/07 15:12:32 peterd Rel2 $ +% a start (Ha!) at a TeX mode for PostScript. +% The following defines procedures assumed and used by program "dvi2ps" +% and must be downloaded or sent as a header file for all TeX jobs. + +% By: Neal Holtz, Carleton University, Ottawa, Canada +% +% +% June, 1985 +% Last Modified: Aug 25/85 +% oystr 12-Feb-1986 +% Changed @dc macro to check for a badly formed bits in character +% definitions. Can get a <> bit map if a character is not actually +% in the font file. This is absolutely guaranteed to drive the +% printer nuts - it will appear that you can no longer define a +% new font, although the built-ins will still be there. +% mackay 4-Jan-1988 +% Changed size of character array to reflect gf usage (256 characters) + +% To convert this file into a downloaded file instead of a header +% file, uncomment all of the lines beginning with %-% + +%-%0000000 % Server loop exit password +%-%serverdict begin exitserver +%-% systemdict /statusdict known +%-% {statusdict begin 9 0 3 setsccinteractive /waittimeout 300 def end} +%-% if + +/TeXDict 200 dict def % define a working dictionary +TeXDict begin % start using it. + + % units are in "dots" (300/inch) +/Resolution 300 def +/Inch {Resolution mul} def % converts inches to internal units + +/Mtrx 6 array def + +%%%%%%%%%%%%%%%%%%%%% Page setup (user) options %%%%%%%%%%%%%%%%%%%%%%%% + +% dvi2ps will output coordinates in the TeX system ([0,0] 1" down and in +% from top left, with y +ive downward). The default PostScript system +% is [0,0] at bottom left, y +ive up. The Many Matrix Machinations in +% the following code are an attempt to reconcile that. The intent is to +% specify the scaling as 1 and have only translations in the matrix to +% properly position the text. Caution: the default device matrices are +% *not* the same in all PostScript devices; that should not matter in most +% of the code below (except for lanscape mode -- in that, rotations of +% -90 degrees resulted in the the rotation matrix [ e 1 ] +% [ 1 e ] +% where the "e"s were almost exactly but not quite unlike zeros. + +/@letter + { letter initmatrix + 72 Resolution div dup neg scale % set scaling to 1. + 310 -3005 translate % move origin to top (these are not exactly 1" + Mtrx currentmatrix pop % and -10" because margins aren't set exactly right) + } def + % note mode is like letter, except it uses less VM +/@note + { note initmatrix + 72 Resolution div dup neg scale % set scaling to 1. + 310 -3005 translate % move origin to top + Mtrx currentmatrix pop + } def + +/@landscape + { letter initmatrix + 72 Resolution div dup neg scale % set scaling to 1. +% -90 rotate % it would be nice to be able to do this + Mtrx currentmatrix 0 0.0 put % but instead we have to do things like this because what + Mtrx 1 -1.0 put % should be zero terms aren't (and text comes out wobbly) + Mtrx 2 1.0 put % Fie! This likely will not work on QMS printers + Mtrx 3 0.0 put % (nor on others where the device matrix is not like + Mtrx setmatrix % like it is on the LaserWriter). + 300 310 translate % move origin to top + Mtrx currentmatrix pop + } def + +/@legal + { legal initmatrix + 72 Resolution div dup neg scale % set scaling to 1. + 295 -3880 translate % move origin to top + Mtrx currentmatrix pop + } def + +/@manualfeed + { statusdict /manualfeed true put + statusdict /manualfeedtimeout 300 put % 5 minutes + } def + % n @copies - set number of copies +/@copies + { /#copies exch def + } def + +%%%%%%%%%%%%%%%%%%%% Procedure Defintions %%%%%%%%%%%%%%%%%%%%%%%%%% + +/@newfont % id @newfont - -- initialize a new font dictionary + { /newname exch def + pop + newname 7 dict def % allocate new font dictionary + newname load begin + /FontType 3 def + /FontMatrix [1 0 0 -1 0 0] def + /FontBBox [0 0 1 1] def +% mackay 4-Jan-1987 changed size of array from 128 to 256 for gf fonts + /BitMaps 256 array def + /BuildChar {CharBuilder} def + /Encoding 256 array def + 0 1 255 {Encoding exch /.undef put} for + end + newname newname load definefont pop + } def + + +% the following is the only character builder we need. it looks up the +% char data in the BitMaps array, and paints the character if possible. +% char data -- a bitmap descriptor -- is an array of length 6, of +% which the various slots are: + +/ch-image {ch-data 0 get} def % the hex string image +/ch-width {ch-data 1 get} def % the number of pixels across +/ch-height {ch-data 2 get} def % the number of pixels tall +/ch-xoff {ch-data 3 get} def % number of pixels below origin +/ch-yoff {ch-data 4 get} def % number of pixels to left of origin +/ch-tfmw {ch-data 5 get} def % spacing to next character + +/CharBuilder % fontdict ch Charbuilder - -- image one character + { /ch-code exch def % save the char code + /font-dict exch def % and the font dict. + /ch-data font-dict /BitMaps get ch-code get def % get the bitmap descriptor for char + ch-data null eq not + { ch-tfmw 0 ch-xoff neg ch-yoff neg ch-width ch-xoff sub ch-height ch-yoff sub + setcachedevice + ch-width ch-height true [1 0 0 1 ch-xoff ch-yoff] + {ch-image} imagemask + } + if + } def + + +/@sf % fontdict @sf - -- make that the current font + { setfont() pop + } def + + % in the following, the font-cacheing mechanism requires that + % a name unique in the particular font be generated + +/@dc % char-data ch @dc - -- define a new character bitmap in current font + { /ch-code exch def +% ++oystr 12-Feb-86++ + dup 0 get + length 2 lt + { pop [ <00> 1 1 0 0 8.00 ] } % replace <> with null + if +% --oystr 12-Feb-86-- + /ch-data exch def + currentfont /BitMaps get ch-code ch-data put + currentfont /Encoding get ch-code + dup ( ) cvs cvn % generate a unique name simply from the character code + put + } def + +/@bop0 % n @bop0 - -- begin the char def section of a new page + { + } def + +/@bop1 % n @bop1 - -- begin a brand new page + { pop + erasepage initgraphics + Mtrx setmatrix + /SaveImage save def() pop + } def + +%-- tjh sept. 87: if this page has a mac drawing on it, we have to +%-- use showpage in the md dictionary. +/@eop % - @eop - -- end a page + { + userdict /md known { + userdict /md get type /dicttype eq { + md /MacDrwgs known { + md begin showpage end + }{ + showpage + } ifelse + }{ + showpage + } ifelse + }{ + showpage + } ifelse + SaveImage restore() pop + } def + +/@start % - @start - -- start everything + { @letter % (there is not much to do) + } def + +/@end % - @end - -- done the whole shebang + { end + } def + +/p % x y p - -- move to position + { moveto + } def + +/r % x r - -- move right + { 0 rmoveto + } def + +/s % string s - -- show the string + { show + } def + +/c % ch c - -- show the character (code given) + { c-string exch 0 exch put + c-string show + } def + +/c-string ( ) def + +/ru % dx dy ru - -- set a rule (rectangle) + { /dy exch neg def % because dy is height up from bottom + /dx exch def + /x currentpoint /y exch def def % remember current point + newpath x y moveto + dx 0 rlineto + 0 dy rlineto + dx neg 0 rlineto + closepath fill + x y moveto + } def + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% the \special command junk +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +% The structure of the PostScript produced by dvi2ps for \special is: +% @beginspecial +% - any number of @hsize, @hoffset, @hscale, etc., commands +% @setspecial +% - the users file of PostScript commands +% @endspecial + +% The @beginspecial command recognizes whether the Macintosh Laserprep +% has been loaded or not, and redfines some Mac commands if so. +% The @setspecial handles the users shifting, scaling, clipping commands + +%-- tjh sept. 87: made changes to allow postscript and macdrawing to +%-- to be inserted with version 65 of the md dictionary. Many bugs +%-- were fixed: +%-- vo changed to vof, name conflict with md +%-- vs changed to vsz, name conflict with md +%-- substantially changed @setspecial and @MacSetUp +%-- Also, made changes to allow users to specify offsets +%-- and clip rectangles in inches. + +% The following are user settable options from the \special command. + +/@SpecialDefaults + { /hs 8.5 72 mul def + /vsz 11 72 mul def + /ho 0 def + /vof 0 def + /hsc 1 def + /vsc 1 def + /CLIP false def + } def + +% d @hsize - specify a horizontal clipping dimension +% these 2 are executed before the MacDraw initializations +/@hsize {72 mul /hs exch def /CLIP true def} def +/@vsize {72 mul /vsz exch def /CLIP true def} def + +% d @hoffset - specify a shift for the drwgs +/@hoffset {72 mul /ho exch def} def +/@voffset {72 mul /vof exch def} def + +% s @hscale - set scale factor +/@hscale {/hsc exch def} def +/@vscale {/vsc exch def} def + +/@setclipper + { hsc vsc scale + CLIP + { newpath 0 0 moveto hs 0 rlineto 0 vsz rlineto hs neg 0 rlineto closepath clip } + if + } def + +% this will be invoked as the result of a \special command (for the +% inclusion of PostScript graphics). The basic idea is to change all +% scaling and graphics back to defaults, but to shift the origin +% to the current position on the page. Due to TeXnical difficulties, +% we only set the y-origin. The x-origin is set at the left edge of +% the page. + +/@beginspecial + { gsave /SpecialSave save def + % the following magic incantation establishes the current point as + % the users origin, and reverts back to default scalings, rotations + currentpoint transform initgraphics itransform translate + @SpecialDefaults % setup default offsets, scales, sizes + @MacSetUp % fix up Mac stuff + } def + + +%-- tjh: assume this is raw postscript, but save some state in case its not. +/@setspecial + { + /specmtrx matrix currentmatrix def + ho vof translate @setclipper + } def + + +/@endspecial + { SpecialSave restore + grestore + } def + + +% - @MacSetUp - turn-off/fix-up all the MacDraw stuff that might hurt us + % we depend on 'psu' being the first procedure executed + % by a Mac document. We redefine 'psu' to adjust page + % translations, and to do all other the fixups required. + % This stuff will not harm other included PS files +/@MacSetUp + { userdict /md known % if md is defined + { userdict /md get type /dicttype eq % and if it is a dictionary + { + md begin % then redefine some stuff + /psu % redfine psu to set origins, etc. + /psu load + % this procedure contains almost all the fixup code + { +% /letter {} def % it is bad manners to execute the real +% /note {} def % versions of these (clears page image, etc.) +% /legal {} def + /MacDrwgs true def + specmtrx setmatrix % restore pre-@setspecial state. + initclip % ditto + % change smalls to prevent page clearing. + /smalls [ lnop lnop lnop lnop lnop lnop lnop lnop lnop ] def + 0 0 0 0 ppr astore pop % prevents origin translation. + % redifine cp, do the showpage later, see @eop + /cp { + pop + pop + pm restore + } def % no printing of pages + } + concatprocs + def + /od + % redefine od to translate and scale. + % redfine load to set clipping region. + /od load + { + ho vof translate + hsc vsc scale + CLIP { + /nc + /nc load + { newpath 0 0 moveto hs 0 rlineto 0 vsz rlineto + hs neg 0 rlineto closepath clip } + concatprocs + def + } if + } + concatprocs + def + end } + if } + if + } def + +% p1 p2 concatprocs p - concatenate procedures +/concatprocs + { /p2 exch cvlit def + /p1 exch cvlit def + /p p1 length p2 length add array def + p 0 p1 putinterval + p p1 length p2 putinterval + p cvx + } def + +end % revert to previous dictionary +TeXDict begin @start +%%Title: install.dvi +%%Creator: dvi2ps +%%EndProlog +1 @bop0 +[ 300 ] /cmr17.300 @newfont +cmr17.300 @sf +[ 24 49 -3 0 23.499] 73 @dc +[ 40 31 -2 0 36.644] 110 @dc +[<80FE00C301C0CC0060F00030F00038E00018E0001CC0001CC0001C80001C80003C80003C0000F80001F8003FF003FFE00FFF + C01FFF003FF0007E0000F80000F00010E00010E00010E00010E000306000303000701800F00E033001FC10> 24 31 -2 0 25.776] 115 @dc +[<001F000078C000E04001E02001C02003C01003C01003C01003C01003C01003C01003C01003C01003C00003C00003C00003C0 + 0003C00003C00003C00003C00003C00003C00003C00003C00003C00003C00003C00003C000FFFFE01FFFE00FC00007C00003 + C00001C00001C00000C00000C00000C000004000004000004000004000004000> 24 44 -1 0 25.402] 116 @dc +[<03FC03E00F0307F03E008F087C005E0478003E04F8003E04F8003E04F8001E04F8001E04F8001E047C001E003C001E003E00 + 1E001F001E000F801E0003E01E0000FC1E00000FFE0000001E0000001E0000001E0008001E003E001E003E001E003E001C00 + 3C003C0010003800100070000C00E0000303C00000FE0000> 32 31 -3 0 32.896] 97 @dc +[ 16 50 -2 0 17.907] 108 @dc +[ 16 48 -2 0 17.907] 105 @dc +[<003FE00001C01C00070007001C0001C0380000E07000007070000070E0000038E0000038E0000038E0000038E00000387000 + 0070300000F0180001E00E000FC003FFFF8007FFFF000FFFF8000E0000001C00000018000000180000001800000018000000 + 18000000087F000009C1C0000780E000070070000F0078001E003C001E003C003E003E003E003E003E003E003E003E003E00 + 3E003E003E001E003C001E003C000F007800070070080380E81C01C1C41C007F0308000000F0> 32 47 -2 15 32.896] 103 @dc +[ 40 50 -2 0 36.644] 104 @dc +[<001FC00000F0300001C00C00078002000F0002000E0001001E0000803C0000803C0000007C00000078000000F8000000F800 + 0000F8000000F8000000F8000000F8000000FFFFFF80F8000780F80007807800078078000F807C000F003C000F001C000F00 + 1E001E000E001E0007003C000380380000E0E000003F8000> 32 31 -2 0 29.149] 101 @dc +[<800FF000807FFC00C1F01E00C7000700EC000380F80001C0F00000E0E00000E0E00000F0C0000070C0000078800000788000 + 0078800000788000007880000078000000F8000000F8000000F0000001F0000003F0000007E000001FE00000FFC0000FFF80 + 00FFFF0003FFFC0007FFF8000FFF80001FF800003FC000003F0000007E0000007C000000F8000000F8000020F0000020F000 + 0020F0000020F0000060F000006070000060700000E0780000E0380001E03C0003E01E0006E00F001C6007C0786001FFE020 + 007F8020> 32 51 -4 1 36.644] 83 @dc +[<003F81FF0000E061FF00038011F000070009E0000E0005E0001E0003E0001C0001E0003C0001E0003C0001E000780001E000 + 780001E000F80001E000F80001E000F80001E000F80001E000F80001E000F80001E000F80001E000F80001E000F80001E000 + 780001E0007C0001E0003C0001E0003C0001E0001E0001E0000E0003E0000F0003E000078005E00001C019E00000F061E000 + 001F81E000000001E000000001E000000001E000000001E000000001E000000001E000000001E000000001E000000001E000 + 000001E000000001E000000001E000000001E000000001E000000001E000000003E00000003FE00000003FE000000001E000> 40 50 -3 0 36.644] 100 @dc +[ 40 49 -4 0 45.061] 80 @dc +[<003F800000E0600003801800070004000F0002001E0002001E0001003C0001007C0000007C00000078000000F8000000F800 + 0000F8000000F8000000F8000000F8000000F8000000F8000000F8000000780000007C0008007C003E003C003E001C003E00 + 1E001E000F000400070004000380180000E06000003F8000> 32 31 -3 0 29.149] 99 @dc +[ 40 50 -2 0 34.770] 107 @dc +[ 300 ] /cmr12.300 @newfont +cmr12.300 @sf +[<81FC00C60700C80180F000C0E000C0C00060C000608000708000708000708000700000700000F00000F00001E00007E0003F + C003FF800FFF001FFE003FF0007F0000780000F00000F00000E00020E00020E00020E00060E000606000607000E03001E018 + 02600C0C6003F020> 24 36 -3 1 27.097] 83 @dc +[ 16 34 -1 0 13.548] 105 @dc +[ 40 21 -1 0 40.645] 109 @dc +[<01FC000707000E03801C01C03800E07800F0700070F00078F00078F00078F00078F00078F00078F000787000707000703800 + E01800C00C018007070001FC00> 24 21 -1 0 24.387] 111 @dc +[ 32 21 -1 0 27.097] 110 @dc +[ 32 34 -2 0 37.249] 68 @dc +[<00FC000703000E00801C0040380020780020700000F00000F00000F00000F00000F00000FFFFE0F000E07000E07801E03801 + C01C01C00C038007070001FC00> 24 21 -1 0 21.677] 101 @dc +[<0FC1E03C2390781708F00F08F00708F00708F007087007007807003C07001E070007C70000FF000007000007000007001807 + 003C0E003C0C001838000FE000> 24 21 -2 0 24.387] 97 @dc +[ 24 21 -1 0 18.968] 114 @dc +[ 24 33 -2 0 24.387] 50 @dc +[ 16 33 -4 0 24.387] 49 @dc +[ 48 34 -2 0 44.692] 77 @dc +[<3C0000430000F18000F08000F0400000400000200000200000200000100000100000380000380000380000740000740000E2 + 0000E20000E20001C10001C1000380800380800380800700400700400E00200E00200E00301E0078FFC1FE> 24 31 -1 10 25.742] 121 @dc +[<0FC000103000201800700C007806007807003003000003800003800001C00001C00001C003E1E00619E00C05E01805E03803 + E07003E07001E0F001E0F001E0F001E0F001E0F001E0F001C0F001C0F001C07003807003803803801807000C0600060C0001 + F000> 24 34 -2 1 24.387] 57 @dc +[<03F0000C1C00100F002007804007804003C0F003C0F803E0F803E07003E02003E00003E00003C00003C0000780000780000F + 00001C0003F000003800000E00000F000007000007800007803807C07807C07803C07807C04007C02007801007000C1E0003 + F800> 24 34 -2 1 24.387] 51 @dc +[ 432 ] /cmbx10.432 @newfont +cmbx10.432 @sf +[<7FFFFE7FFFFE7FFFFE00FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE + 0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE0000FE00F8FE00FF + FE00FFFE0007FE00007E00001E00000E00> 24 39 -5 0 34.370] 49 @dc +[ 24 41 -2 0 26.068] 73 @dc +[ 40 27 -3 0 38.189] 110 @dc +[<001F8000FFC001F86003F87003F03807F03807F03807F03807F03807F03807F03807F00007F00007F00007F00007F00007F0 + 0007F00007F00007F00007F00007F00007F00007F000FFFFF0FFFFF01FFFF007F00003F00003F00001F00000F00000F00000 + F000007000007000007000007000> 24 38 -1 0 26.732] 116 @dc +[ 32 27 -2 0 28.310] 114 @dc +[<003FE00001FFFC0007F07F000FC01F801F800FC03F800FE03F800FE07F0007F07F0007F0FF0007F8FF0007F8FF0007F8FF00 + 07F8FF0007F8FF0007F8FF0007F8FF0007F87F0007F07F0007F07F0007F03F0007E03F800FE01F800FC00FC01F8003F07E00 + 01FFFC00003FE000> 32 27 -2 0 34.370] 111 @dc +[<003FC3FF8000FFF3FF8003F03BFF8007C00FF8000F8007F8001F8003F8003F8003F8007F0003F8007F0003F8007F0003F800 + FF0003F800FF0003F800FF0003F800FF0003F800FF0003F800FF0003F800FF0003F8007F0003F8007F0003F8007F0003F800 + 3F8003F8001F8003F8000FC007F80007E00FF80003F03FF80000FFFBF800001FE3F800000003F800000003F800000003F800 + 000003F800000003F800000003F800000003F800000003F800000003F800000003F800000003F800000003F80000003FF800 + 00003FF80000003FF800> 40 42 -2 0 38.189] 100 @dc +[<003FC3FF8001FFF3FF8003F03BFF8007E00FF80007E007F8000FE007F8000FE003F8000FE003F8000FE003F8000FE003F800 + 0FE003F8000FE003F8000FE003F8000FE003F8000FE003F8000FE003F8000FE003F8000FE003F8000FE003F8000FE003F800 + 0FE003F8000FE003F8000FE003F8000FE003F800FFE03FF800FFE03FF800FFE03FF800> 40 27 -3 0 38.189] 117 @dc +[<001FE00000FFFC0003F01E0007E007000FC003801F8001C03F8001C07F8000007F0000007F000000FF000000FF000000FF00 + 0000FF000000FF000000FF000000FF0000007F0000007F0000007F800E003F801F001F803F800FC03F8007E03F8003F01F00 + 00FFFE00001FF800> 32 27 -2 0 30.551] 99 @dc +[ 16 43 -3 0 19.094] 105 @dc +[ 329 ] /cmr10.329 @newfont +cmr10.329 @sf +[<001F800000F0F00001C0380007801E000F000F000E0007001E0007803C0003C03C0003C07C0003E07C0003E0780001E0F800 + 01F0F80001F0F80001F0F80001F0F80001F0F80001F0F80001F0F80001F0F80001F0780001E0780001E07C0003E03C0003C0 + 3C0003C01E0007800E0007000F000F0007801E0001C0380000F0F000001F8000> 32 33 -3 1 35.353] 79 @dc +[ 24 20 -1 0 25.252] 110 @dc +[<01E0031006100E080E080E080E080E080E000E000E000E000E000E000E000E000E000E000E00FFF83E000E000E0006000600 + 020002000200> 16 28 -1 0 17.676] 116 @dc +[ 24 32 -1 0 25.252] 104 @dc +[<01F8000706000C0100180080380080700000700000F00000F00000F00000FFFF80F00380F003807003807007003807003807 + 001C0E000E1C0003F000> 24 20 -1 0 20.202] 101 @dc +[<0F83C0386720781E10F01E10F00E10F00E10F00E10780E00380E001E0E00078E0000FE00000E00000E00000E00300E00781C + 007818003030001FE000> 24 20 -2 0 22.727] 97 @dc +[<03F0000E0C001C0200380100380100700000700000F00000F00000F00000F00000F00000F00000700000700000380C00381E + 001C1E000E0C0003F800> 24 20 -2 0 20.202] 99 @dc +[<01F800070E001C03803801C03801C07000E07000E0F000F0F000F0F000F0F000F0F000F0F000F07000E07000E03801C03801 + C01C0380070E0001F800> 24 20 -1 0 22.727] 111 @dc +[ 40 20 -1 0 37.878] 109 @dc +[ 24 29 -1 9 25.252] 112 @dc +[<3C0000620000F10000F08000F0800000400000400000400000200000200000700000700000700000E80000E80001EC0001C4 + 0001C4000382000382000382000701000701000E00800E00800E00801C00C01E01E0FF83F8> 24 29 -1 9 23.989] 121 @dc +[ 16 31 0 0 12.626] 105 @dc +[<03FC001C03803000C0600060C00030C00030C00030C000306000703001E00FFFC01FFF803FFE003000003000002000002000 + 0033E0001E38001C1C00380E00780F00780F00780F00780F00780F00380E001C1C300E3C3003E3300000E0> 24 31 -1 10 22.727] 103 @dc +[<01F1FC030DC00603C00E03C00E01C00E01C00E01C00E01C00E01C00E01C00E01C00E01C00E01C00E01C00E01C00E01C00E01 + C00E01C0FE1FC00E01C0> 24 20 -1 0 25.252] 117 @dc +[<004008000060180000E01C0000E01C0000F03C0001D03A0001D0320003C8730003887100038861000704E0800704C0800707 + C0800E03C0400E0380400E0380401C0380201C0300603C078070FF9FE1FC> 32 20 -1 0 32.828] 119 @dc +[ 16 32 0 0 12.626] 108 @dc +[<7FC3FE0700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700 + E00700E00700E0FFFFE00700000700000700000700000700000700000701E00701E00381E001C0C000E0C0003F00> 24 32 0 0 25.252] 12 @dc +[<03E3F80E1B801C0780380780380380700380700380F00380F00380F00380F00380F00380F003807003807003803803803803 + 801C0780061B8003E380000380000380000380000380000380000380000380000380000380000380003F80000380> 24 32 -2 0 25.252] 100 @dc +[ 24 20 0 0 23.989] 120 @dc +[<083E000CC3000D01C00F00E00E00E00E00700E00700E00780E00780E00780E00780E00780E00780E00700E00700E00E00F00 + E00F01C00EC3800E3E000E00000E00000E00000E00000E00000E00000E00000E00000E00000E0000FE00000E0000> 24 32 -1 0 25.252] 98 @dc +[<8F80D060E030C018C01880188018803800700FF03FE07F807800E000C010C010C010403030701F90> 16 20 -2 0 17.929] 115 @dc +[<7FF0000700000700000700000700000700000700000700000700000700000700000700000700000700000700000700000700 + 00070000070000FFF000070000070000070000070000070000070000070000070600038F00018F0000C600007C00> 24 32 0 0 13.889] 102 @dc +[ 16 20 -1 0 17.803] 114 @dc +[<81F800CE0C00F00600E00300C00380C001808001C08001C08001C08001C00001C00003C00003C0000780001F8003FF000FFE + 001FFC003FF0007F0000780000F00000F00000E00080E00080E00080E001806001806001803003801007800C198007E080> 24 33 -3 1 25.252] 83 @dc +[<00FFE0000E00000E00000E00000E00000E00000E00000E00FFFFF0C00E00400E00200E00200E00100E00080E00080E00040E + 00020E00020E00010E00008E00008E00004E00002E00002E00001E00000E00000E00000600000600> 24 30 -1 0 22.727] 52 @dc +[<70F8F8F870> 8 5 -4 0 12.626] 46 @dc +[<40201010080804040474FCFCF870> 8 14 -4 9 12.626] 44 @dc +[ 24 30 -2 0 22.727] 50 @dc +[ 32 31 -2 0 34.721] 68 @dc +[ 32 31 -2 0 30.934] 69 @dc +[<000FC0000070380001C0040003800200070001000E0000801E0000801C0000403C0000407C0000407C00004078000000F800 + 0000F8000000F8000000F8000000F8000000F8000000F8000000F8000000F8000000780000407C0000407C0000403C0000C0 + 1C0000C01E0000C00E0001C0070003C0038005C001C009C0007030C0000FC040> 32 33 -3 1 32.828] 67 @dc +[<000FC000003820000070180000E0080001C0040001C002000380020003800200078001000780010007800100078001000780 + 0100078001000780010007800100078001000780010007800100078001000780010007800100078001000780010007800100 + 07800100078001000780010007800100078003800FC007C0FFFC3FF8> 32 32 -2 1 34.090] 85 @dc +[ 32 31 -2 0 29.671] 70 @dc +[ 24 45 -3 11 22.727] 47 @dc +[ 16 30 -4 0 22.727] 49 @dc +[<000FE0000078182000E00460038002E0070001E00F0001E01E0001E01E0001E03C0001E03C0001E07C0001E0780001E0F800 + 03E0F8007FFCF8000000F8000000F8000000F8000000F8000000F8000000F8000000780000207C0000203C0000203C000060 + 1E0000601E0000600F0000E0070001E0038002E000E004E000781860000FE020> 32 33 -3 1 35.668] 71 @dc +[ 16 31 -1 0 16.414] 73 @dc +[ 32 32 -1 0 34.090] 65 @dc +[ 24 32 -1 0 23.989] 107 @dc +[ 32 31 -2 0 30.934] 80 @dc +[<70F8F8F8700000000000000000000070F8F8F870> 8 20 -4 0 12.626] 58 @dc +[ 329 ] /cmbx10.329 @newfont +cmbx10.329 @sf +[ 40 20 -3 0 43.559] 109 @dc +[<00FF8007FFE00F80701E00183E00187C00007C0000FC0000FC0000FC0000FFFFF8FFFFF8FC00F87C00F87C00F03E00F01E01 + E00F83C007FF8001FE00> 24 20 -1 0 23.958] 101 @dc +[ 32 29 -2 9 29.040] 112 @dc +[<0FE07E3FF8FE7E0DE0FC05E0F803E0F803E0F803E07C03E03C03E01F03E007FBE0007FE00003E00C03E03F03E03F03E03F07 + C03F0F801FFF0007FC00> 24 20 -1 0 25.410] 97 @dc +[ 32 20 -3 0 29.040] 110 @dc +[<03F8FF000FFEFF001F07F8003E01F8007E00F8007C00F8007C00F800FC00F800FC00F800FC00F800FC00F800FC00F800FC00 + F8007C00F8007C00F8007E00F8003E01F8001F83F8000FFEF80001F8F8000000F8000000F8000000F8000000F8000000F800 + 0000F8000000F8000000F8000000F8000000F8000007F8000007F800> 32 32 -2 0 29.040] 100 @dc +[ 32 20 -1 0 27.588] 120 @dc +cmr10.329 @sf +[ 40 31 -2 0 41.666] 77 @dc +cmbx10.329 @sf +[ 16 33 -2 0 14.520] 105 @dc +cmr10.329 @sf +[ 32 31 -2 0 34.090] 78 @dc +cmbx10.329 @sf +[ 16 32 -2 0 14.520] 108 @dc +cmr10.329 @sf +[<0020004000800100020006000C000C00180018003000300030007000600060006000E000E000E000E000E000E000E000E000 + E000E000E000E0006000600060007000300030003000180018000C000C00060002000100008000400020> 16 46 -3 12 17.676] 40 @dc +[<800040002000100008000C00060006000300030001800180018001C000C000C000C000E000E000E000E000E000E000E000E0 + 00E000E000E000E000C000C000C001C001800180018003000300060006000C0008001000200040008000> 16 46 -3 12 17.676] 41 @dc +cmbx10.329 @sf +[ 24 20 -2 0 20.618] 115 @dc +cmr10.329 @sf +[<00200000700000700000700000E80000E80001EC0001C40001C4000382000382000382000701000701000E00800E00800E00 + 801C00C01E01E0FF83F8> 24 20 -1 0 23.989] 118 @dc +[ 16 2 -1 -9 15.151] 45 @dc +[<003FF800038000038000038000038000038000038000038000038003E3800E13801C0B80380780380380780380700380F003 + 80F00380F00380F00380F00380F003807003807803803803803C07801C058006198003E080> 24 29 -2 9 23.989] 113 @dc +[<07FFFE00001F8000000F0000000F0000000F0000000F0000000F0000000F0000000F0000000F0000000F0000000F0000000F + 0000000F0000000F0000000F0000000F0000000F0000000F0000000F0000000F0000800F0010800F0010800F0010800F0010 + C00F0030400F0020400F0020600F0060780F01E07FFFFFE0> 32 31 -2 0 32.828] 84 @dc +cmbx10.329 @sf +[<181F80001C7FE0001EC1F8001F807C001F007C001F003E001F003E001F003F001F003F001F003F001F003F001F003F001F00 + 3F001F003E001F003E001F007E001F807C001FE0F8001F7FF0001F1FC0001F0000001F0000001F0000001F0000001F000000 + 1F0000001F0000001F0000001F0000001F000000FF000000FF000000> 32 32 -2 0 29.040] 98 @dc +[ 329 ] /cmti10.329 @newfont +cmti10.329 @sf +[<1F000031C00060E000607000E03800E03C00E01C00E01E00E01E00E01E00700F00700F00700F00700F00380F00380F003C0E + 003A0E001D0C001CF0001C00001C00000E00000E00000E00000E00000700000700000700000700003F8000078000> 24 32 -5 0 20.908] 98 @dc +[<0F0700308C80705C40703C40F01C40F01C40F00E20F00E00F00E00F00E007807007807007807003807003C03801C03800E03 + 800707800389C000F180> 24 20 -4 0 23.232] 97 @dc +[ 24 29 0 9 23.232] 112 @dc +cmr10.329 @sf +[ 24 31 -2 0 28.408] 76 @dc +cmbx10.329 @sf +[ 24 20 -2 0 21.527] 114 @dc +cmr10.329 @sf +[<000003E0FFFC0F100FC01E0807803E0407807E0407807C0407807C0007807C0007807C000780780007807800078078000780 + 70000780F0000780E0000781C00007FF80000780F0000780780007803C0007801E0007801E0007801F0007801F0007801F00 + 07801F0007801E0007801E0007803C00078078000F80F000FFFF8000> 32 32 -2 1 33.459] 82 @dc +cmbx10.329 @sf +[<01F003F807CC0F860F860F860F860F860F800F800F800F800F800F800F800F800F800F80FFFCFFFC3F800F80078003800380 + 0380018001800180> 16 29 -1 0 20.328] 116 @dc +cmr10.329 @sf +[ 32 31 -1 0 34.090] 88 @dc +[<7FE7FE0700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700 + E00700E00700E0FFFFE00700E00700E00700E00700E00700E00700E00700E00700E00381E001C1E000E0E0003FE0> 24 32 0 0 25.252] 13 @dc +1 @bop1 +cmr17.300 @sf +511 489 p (Installing) s +22 r (the) s +21 r (Staden) s +22 r 80 c +-1 r (ac) s +-2 r 107 c +-4 r (age) s +cmr12.300 @sf +810 616 p (Simon) s +17 r (Dear) s +800 718 p (21) s +16 r (Ma) s +0 r 121 c +15 r (1993) s +cmbx10.432 @sf +224 911 p 49 c +69 r (In) s +-1 r (tro) s +1 r (duction) s +cmr10.329 @sf +224 1012 p (On) s +18 r (the) s +17 r (accompan) s +0 r (ying) s +17 r (tap) s +1 r 101 c +18 r 121 c +-1 r (ou) s +17 r (will) s +18 r (\014nd) s +17 r (executables) s +18 r (for) s +17 r (one) s +18 r (of) s +18 r (SunOS) s +224 1069 p (4.x,) s +14 r (Sun) s +13 r (Solaris) s +14 r (2.x,) s +14 r (DEC) s +13 r (Ultrix,) s +14 r (DEC) s +13 r (OSF/1) s +14 r (and) s +13 r (Silicon) s +14 r (Graphics) s +13 r (SGI) s +224 1125 p (op) s +1 r (erating) s +20 r (systems.) s +33 r (Also) s +19 r (there) s +19 r (are) s +20 r (sources) s +19 r (for) s +20 r (all) s +19 r (the) s +19 r (programs) s +20 r (in) s +19 r (the) s +224 1181 p (Staden) s +15 r (pac) s +0 r 107 c +-2 r (age.) s +19 r (Programs) s +15 r (in) s +15 r (the) s +15 r (pac) s +0 r 107 c +-3 r (age) s +15 r (are:) s +cmbx10.329 @sf +224 1275 p (mep) s +18 r (and) s +17 r (xmep) s +cmr10.329 @sf +23 r (Motif) s +15 r (exploration) s +15 r (program.) s +cmbx10.329 @sf +224 1369 p (nip) s +18 r (and) s +17 r (xnip) s +cmr10.329 @sf +23 r (Nucleotide) s +15 r (in) s +0 r (terpretation) s +14 r (program.) s +cmbx10.329 @sf +224 1463 p (nipl) s +cmr10.329 @sf +23 r (Nucleotide) s +19 r (in) s +0 r (terpretation) s +19 r (program) s +19 r (\(library\).) s +34 r (Searc) s +0 r (hes) s +18 r 110 c +0 r (ucleotide) s +338 1519 p (libraries) s +15 r (for) s +15 r (patterns) s +15 r (of) s +15 r (motifs.) s +cmbx10.329 @sf +224 1613 p (pip) s +18 r (and) s +17 r (xpip) s +cmr10.329 @sf +23 r (Protein) s +15 r (in) s +0 r (terpretation) s +14 r (program.) s +cmbx10.329 @sf +224 1707 p (pipl) s +cmr10.329 @sf +23 r (Protein) s +11 r (in) s +0 r (terpretation) s +10 r (program) s +11 r (\(library\).) s +19 r (Searc) s +0 r (hes) s +10 r (protein) s +12 r (libraries) s +338 1763 p (for) s +15 r (patterns) s +15 r (of) s +15 r (motifs.) s +cmbx10.329 @sf +224 1857 p (sip) s +18 r (and) s +17 r (xsip) s +cmr10.329 @sf +23 r (Similarit) s +-1 r 121 c +15 r (in) s +-1 r 118 c +-1 r (estigation) s +14 r (program.) s +cmbx10.329 @sf +224 1951 p (sipl) s +cmr10.329 @sf +23 r (Similarit) s +0 r 121 c +14 r (in) s +0 r 118 c +-1 r (estigation) s +14 r (program) s +16 r (\(library\).) s +21 r (Compares) s +16 r 97 c +16 r (prob) s +1 r 101 c +15 r (pro-) s +338 2008 p (tein) s +15 r (or) s +15 r 110 c +0 r (ucleic) s +14 r (acid) s +15 r (sequence) s +15 r (against) s +15 r 97 c +16 r (library) s +15 r (of) s +15 r (sequences.) s +cmbx10.329 @sf +224 2101 p (sap) s +18 r (and) s +17 r (xsap) s +cmr10.329 @sf +23 r (The) s +15 r (original) s +15 r (sequence) s +15 r (assem) s +0 r (bly) s +14 r (program.) s +cmbx10.329 @sf +224 2195 p (bap) s +18 r (and) s +17 r (xbap) s +cmr10.329 @sf +23 r (Our) s +13 r (latest,) s +13 r (most) s +13 r (adv) s +-1 r (anced) s +12 r (sequence) s +13 r (assem) s +0 r (bly) s +12 r (program.) s +cmbx10.329 @sf +224 2289 p (dap) s +18 r (and) s +17 r (xdap) s +cmr10.329 @sf +23 r (An) s +15 r (obsolete) s +15 r (assem) s +0 r (bly) s +14 r (program,) s +15 r (sup) s +1 r (erceded) s +15 r 98 c +0 r 121 c +cmti10.329 @sf +14 r 98 c +-1 r (ap) s +cmr10.329 @sf +0 r 46 c +cmbx10.329 @sf +224 2383 p (lip) s +cmr10.329 @sf +23 r (Library) s +15 r (in) s +0 r (terface) s +14 r (program.) s +cmbx10.329 @sf +224 2477 p (rep) s +cmr10.329 @sf +23 r (Rep) s +1 r (eat) s +15 r (examination) s +15 r (program.) s +cmbx10.329 @sf +224 2570 p (ted) s +cmr10.329 @sf +23 r 88 c +14 r (windo) s +0 r (ws) s +14 r (utilit) s +-1 r 121 c +14 r (for) s +14 r (displa) s +0 r (ying) s +13 r (and) s +15 r (editing) s +14 r (\015uorescen) s +0 r 116 c +13 r (sequencing) s +338 2627 p (mac) s +0 r (hine) s +14 r (traces.) s +925 2776 p 49 c +@eop +2 @bop0 +cmbx10.329 @sf +[ 24 29 -4 0 26.136] 49 @dc +[<2000700018000C000E0006000600030003003B007F00FF00FF00FE007C003800> 16 16 -4 9 14.520] 44 @dc +[ 24 29 -3 0 26.136] 50 @dc +[<03FC001FFF803C0FC07807E0FC03F0FE03F0FE03F8FE03F87C03F83803F80003F80003F00003E00007C0000F8001FC0001FC + 00001F00000F80000FC01E0FC03F07E03F07E03F07E03F07E01E0FC00E0F8007FF0001FC00> 24 29 -2 0 26.136] 51 @dc +[ 32 32 -3 0 29.040] 104 @dc +cmr10.329 @sf +[ 32 31 -2 0 32.196] 66 @dc +cmbx10.329 @sf +[<01FF000FFFE03F01F878003C78003CF0001EF0001EF0001E70003E3C007C1FFFFC07FFF80FFFF01FFF801C00001800001800 + 0009FC000FFF000F07801E03C03E03E03E03E03E03E03E03E03E03E01E03DE0F079E07FFFE01FC3C> 24 30 -1 10 26.136] 103 @dc +[<01FC0007FF001F81C03F00C03E00607E00007C0000FC0000FC0000FC0000FC0000FC0000FC00007C03007C0FC03E0FC03E0F + C01F0FC007FF8001FE00> 24 20 -2 0 23.232] 99 @dc +[<01FF0007FFC01F83F03E00F83E00F87C007C7C007CFC007EFC007EFC007EFC007EFC007EFC007E7C007C7C007C3E00F83E00 + F81F83F007FFC001FF00> 24 20 -1 0 26.136] 111 @dc +[<001C0000001C0000003E0000003E0000007F0000007F000000FF800000F9800001F9C00001F0C00001F0C00003E0600003E0 + 600007C0300007C030000F8018000F8018001F001C00FFE07F80FFE07F80> 32 20 -1 0 27.588] 118 @dc +cmti10.329 @sf +[<78780084C600E58100F38100F3808063808001C04001C00001C00001C00000E00000E00000E00040E0002070C02071E01071 + E01068E00CC440038380> 24 20 -3 0 21.085] 120 @dc +[<0F0700308C80705C40703C40F01C40F01C40F00E20F00E00F00E00F00E007807007807007807003807003C03801C03800E03 + 800707800389C000F1C00001C00001C00000E00000E00000E00000E00000700000700000700000700003F8000078> 24 32 -4 0 23.232] 100 @dc +cmbx10.329 @sf +[ 16 4 -1 -8 17.424] 45 @dc +[<0007FF000007FF000000F8000000F8000000F8000000F8000000F8000000F8000000F80003F8F8000FFEF8001F87F8003F01 + F8007E00F8007E00F8007C00F800FC00F800FC00F800FC00F800FC00F800FC00F800FC00F8007C00F8007E00F8003E01F800 + 3F01F8001F87780007FE380001F81800> 32 29 -2 9 27.588] 113 @dc +[ 40 31 -2 0 39.519] 65 @dc +[ 32 31 -2 0 37.183] 66 @dc +[ 24 31 -2 0 19.823] 73 @dc +[<81FF00E7FFC0FE01E0F80070E00078E00038C0003CC0003CC0003C00003C00007C0000FC0007F800FFF807FFF00FFFF01FFF + E03FFF807FFE007FC000FC0000F80000F00018F00018F000387000387000783800F81E03F80FFF3803FC08> 24 31 -3 0 29.040] 83 @dc +[ 40 31 -2 0 40.908] 78 @dc +[ 32 32 -2 0 27.588] 107 @dc +[<0007FC00003FFF8000FE01C003F0007007E000380FC000181F80000C3F00000C3F0000067F0000067E0000067E000000FE00 + 0000FE000000FE000000FE000000FE000000FE000000FE0000007E0000067E0000067F0000063F00000E3F00000E1F80001E + 0FC0001E07E0003E03F000FE00FE03DE003FFF0E0007FC02> 32 31 -3 0 37.751] 67 @dc +[ 32 31 -2 0 32.890] 70 @dc +[<3FFC003FFC0007C00007C00007C00007C00007C00007C00007C00007C00007C00007C00007C00007C00007C00007C00007C0 + 0007C000FFFC00FFFC0007C00007C00007C00007C00007C00007C3C007C7E003C7E003E7E001F3E000FFC0001F80> 24 32 -1 0 15.972] 102 @dc +[<0000E000000000E000000000E000000000E000000000E000000000E000000000E000000000E000000000E000000000E00000 + 0000E000000000E000000000E000000000E000000000E000000000E00000FFFFFFFFC0FFFFFFFFC0FFFFFFFFC00000E00000 + 0000E000000000E000000000E000000000E000000000E000000000E000000000E000000000E000000000E000000000E00000 + 0000E000000000E000000000E000000000E000000000E00000> 40 35 -3 6 40.655] 43 @dc +[<03F8FF0007FCFF000F06F8001F01F8001F01F8001F00F8001F00F8001F00F8001F00F8001F00F8001F00F8001F00F8001F00 + F8001F00F8001F00F8001F00F8001F00F8001F00F800FF07F800FF07F800> 32 20 -3 0 29.040] 117 @dc +[<0018007000E001C00380038007000E000E001E001C003C003C007800780078007800F800F000F000F000F000F000F000F000 + F000F000F80078007800780078003C003C001C001E000E000E0007000380038001C000E000700018> 16 45 -3 11 20.328] 40 @dc +cmti10.329 @sf +[<1F8000206000401000E00800F00C00F00C00700E00000E00003E0003FC0007F8000FF0000F80000C00000C06000C07000C03 + 0006010003020000FC00> 24 20 -3 0 18.585] 115 @dc +[<0FFE0000E00000E0000070000070000070000070000038000038000F380030B800705C00703C00F01C00F01C00F00E00F00E + 00F00E00F00E007807007807007807003807003C03801C03800E03800705800388C000F040> 24 29 -4 9 20.908] 113 @dc +[<07C3800C26401C1E20180E20180E201C0E201C07101C07001C07001C07000E03800E03800E03808703804701C04301C04381 + C02301C03300E00E00C0> 24 20 -4 0 24.393] 117 @dc +[<1C003300310070803080388038401C001C001C000E000E000E008700470043004380230033000E0000000000000000000000 + 0000000001C001E001E000C0> 16 31 -4 0 13.939] 105 @dc +[<3000007000003800003800003800003800001C00001C00001C00001C00000E00000E00000E00008E00004703004707804787 + 804783802661001C1E00> 24 20 -4 0 19.166] 114 @dc +[<07C000183800380400700200700100700000F00000F00000F00000F000007C00007BF000780C003802003C01001C01000E01 + 0007010001C200007C00> 24 20 -4 0 20.908] 101 @dc +[<38006400E200E200E200E200710070007000700038003800380038001C001C001C001C000E000E000E000E00070007000700 + 070003800380038003801FC003C0> 16 32 -4 0 11.616] 108 @dc +[<080000100000100000200000600000600000400000C00000C00000C00000C00000C00000C00000C00000C00000C00000C000 + 00C00000C00000C00000E00000E000006000006000007000007000003000003800003800001800001C00000C00000E000006 + 000007000003000001800001800000C000006000002000001000000800000400000200000100> 24 46 -7 12 18.585] 40 @dc +[<03C0000E30001C08001C04001C04001C02001C02001C01001C01001C01000E00800E00800E00808700804700C04301C04383 + C02307C03307800E0380> 24 20 -4 0 20.908] 118 @dc +[ 16 30 -5 0 23.232] 49 @dc +[ 8 5 -5 0 13.939] 46 @dc +[<006000007000007000003800003800003800003800001C00001C00801FC0607C003F8E00080E00040E00060E000307000107 + 0000870000C700006300003000003000001800001800000C00000E000006000007000007000003000003800003800001C000 + 01C00001C00000E00000E00000E0000060> 24 39 -2 9 23.232] 52 @dc +[<8000006000003000001800000C000006000003000001000001800000C00000E000006000007000003000003800001800001C + 00000C00000C00000E0000060000060000070000070000030000030000030000038000038000038000018000018000018000 + 0180000180000180000180000100000300000300000300000200000600000400000800001000> 24 46 0 12 18.585] 41 @dc +cmbx10.329 @sf +[ 16 45 -3 11 20.328] 41 @dc +cmbx10.432 @sf +[ 32 39 -3 0 34.370] 50 @dc +[ 56 41 -3 0 51.555] 82 @dc +[<001FF00000FFFE0003F81F0007E003800FC001C01F8000E03F8000E07F0000007F0000007F000000FF000000FF000000FF00 + 0000FFFFFFE0FFFFFFE0FF0007E0FF0007E07F0007E07F0007C07F000FC03F800FC01F800F800F801F8007C01F0003F07E00 + 01FFF800003FE000> 32 27 -2 0 31.506] 101 @dc +[<00003FFF8000003FFF8000003FFF80000003F800000003F800000003F800000003F800000003F800000003F800000003F800 + 000003F800000003F800003FC3F80000FFF3F80003F07BF80007E01FF8000FC007F8001F8007F8003F8003F8007F8003F800 + 7F0003F8007F0003F800FF0003F800FF0003F800FF0003F800FF0003F800FF0003F800FF0003F800FF0003F8007F0003F800 + 7F8003F8007F8003F8003F8003F8001FC007F8000FC007F80007E00DF80003F838F80000FFF07800001FC03800> 40 39 -2 12 36.280] 113 @dc +[ 56 27 -3 0 57.283] 109 @dc +[ 24 27 -2 0 27.114] 115 @dc +cmr10.329 @sf +[<007FFE00000007C000000003C000000003C000000003C000000003C000000003C000000003C000000003C000000003C00000 + 0003C000000003C000000003C000000007C000000007A00000000FB00000001F100000001E080000003E080000003C040000 + 007C04000000F802000000F003000001F001000001E000800003E000800007C000400007800040000F800060001F8000F800 + FFF003FF00> 40 31 -1 0 34.090] 89 @dc +[<7FE3FF0007007000070070000700700007007000070070000700700007007000070070000700700007007000070070000700 + 7000070070000700700007007000070070000700700007007000FFFFFF800700700007007000070070000700700007007000 + 07007000070070000300F0300380F87801C0787800F06E30001F83E0> 32 32 0 0 26.515] 11 @dc +[<0000078000000FC000001FE000001FE000003FF0000038700000383000003010001FB01000F0F01001E0380007A03E000F20 + 4F000E2047001E1087803C0F03C03C0003C07C0003E0780001E0780001E0F80001F0F80001F0F80001F0F80001F0F80001F0 + F80001F0F80001F0F80001F0F80001F0780001E07C0003E07C0003E03C0003C03C0003C01E0007800E0007000F000F000780 + 1E0001C0380000F0F000001F8000> 32 41 -3 9 35.353] 81 @dc +[<03E0000C3800100E00200600400700400380E00380F003C0F003C07003C00003C00003C00003C00003800003801007801007 + 00180E00161C0011F0001000001000001000001000001000001000001FE0001FF8001FFC001FFE00180300> 24 31 -2 1 22.727] 53 @dc +[<03F0000E1C001C0E00180600380700780780700380700380700380F003C0F003C0F003C0F003C0F003C0F003C0F003C0F003 + C0F003C0F003C0F003C0F003C0F003C07003807003807003807003803807001806001C0E000E1C0003F000> 24 31 -2 1 22.727] 48 @dc +[ 32 31 -2 0 35.353] 75 @dc +[<0FC000107000201800700C00780E0078060030070000070000038000038000038003E3C00E13C0180BC03807C07007C07007 + C0F003C0F003C0F003C0F003C0F003C0F00380F003807003807007003807003806001C0C000E180003F000> 24 31 -2 1 22.727] 57 @dc +[<03F0001C3C00200E00400F00400780F00780F807C0F807C0F807C02007C00007C0000780000780000F00000E00003C0003F0 + 00003800001C00000E00000F00000F00000F80380F80780780780780780F80200F00100E000C1C0003F000> 24 31 -2 1 22.727] 51 @dc +[<01F000061C000C0E001807003807003803807003807003C07003C0F003C0F003C0F003C0F003C0F003C0F80380F80380F807 + 00F40600F21C00F1F0007000007000007800003800003803001C07800C07800E0380070100018200007C00> 24 31 -2 1 22.727] 54 @dc +[<03000007800007800007800007800007800007800007800003800003800003800003800001800001C00000C00000C0000040 + 000040000020000020000010000008000008008004008002008002004001007FFF807FFF807FFFC0400000> 24 31 -3 1 22.727] 55 @dc +[ 329 ] /cmsy10.329 @newfont +cmsy10.329 @sf +[<03C0000FF0001FF8003FFC007FFE007FFE00FFFF00FFFF00FFFF00FFFF00FFFF00FFFF007FFE007FFE003FFC001FF8000FF0 + 0003C000> 24 18 -3 -2 22.727] 15 @dc +cmbx10.432 @sf +[<00FF800007FFF0001FFFFC003F01FE007C007F007E007F80FF007FC0FF003FC0FF003FE0FF003FE07E003FE03C003FE00000 + 3FE000003FE000003FC000003FC000007F8000007F0000007E000001FC0000FFF00000FFC0000007F0000001F8000001FC00 + 0000FE000000FF000000FF000F007F801F807F803F807F803F807F803F807F803F80FF001F00FF000F81FE0007FFFC0003FF + F000007F8000> 32 39 -3 0 34.370] 51 @dc +[<01FC03FC0FFF0FFC3F839FFC7F00DF807E007F80FE003F80FE003F80FE003F80FE003F807F003F803F003F803F803F800FE0 + 3F8007FC3F8000FFFF80000FFF8000003F8000003F8000003F8007003F800F803F801FC03F001FC07E001FC07E000F81F800 + 07FFF00001FF8000> 32 27 -2 0 33.415] 97 @dc +[ 16 42 -3 0 19.094] 108 @dc +2 @bop1 +cmbx10.329 @sf +224 307 p (splitp1,) s +18 r (splitp2) s +17 r (and) s +17 r (splitp3) s +cmr10.329 @sf +23 r (Refer) s +15 r (to) s +15 r (help/SPLITP) s +-2 r (.MEM.) s +cmbx10.329 @sf +224 401 p (sethelp) s +cmr10.329 @sf +23 r (Builds) s +15 r (online) s +15 r (help) s +15 r (\014les.) s +cmbx10.329 @sf +224 494 p (gip) s +cmr10.329 @sf +23 r (Gel) s +15 r (input) s +15 r (program.) s +cmbx10.329 @sf +224 588 p (con) s +0 r 118 c +-2 r (ert) s +cmr10.329 @sf +22 r (Con) s +0 r 118 c +-2 r (erts) s +15 r 98 c +1 r (et) s +0 r 119 c +-2 r (een) s +cmti10.329 @sf +14 r (xdap) s +cmr10.329 @sf +19 r (and) s +cmti10.329 @sf +15 r (xb) s +-2 r (ap) s +cmr10.329 @sf +17 r (databases.) s +cmbx10.329 @sf +224 682 p (cop) s +18 r (and) s +17 r (cop-bap) s +cmr10.329 @sf +23 r (Chec) s +-1 r (ks) s +12 r (completed) s +cmti10.329 @sf +13 r (xdap) s +cmr10.329 @sf +16 r (and) s +cmti10.329 @sf +13 r (xb) s +-1 r (ap) s +cmr10.329 @sf +14 r (databases) s +13 r (for) s +13 r (edit-) s +338 738 p (ing) s +15 r (errors.) s +cmbx10.329 @sf +224 832 p (trace2seq) s +cmr10.329 @sf +23 r (Extracts) s +15 r (sequence) s +15 r (from) s +15 r (trace) s +15 r (\014les.) s +cmbx10.329 @sf +224 925 p (getABISampleName) s +cmr10.329 @sf +23 r (Extracts) s +15 r (sample) s +15 r (names) s +15 r (from) s +15 r (ABI) s +16 r (trace) s +15 r (\014les.) s +cmbx10.329 @sf +224 1019 p (mak) s +0 r (eSCF) s +cmr10.329 @sf +21 r (Con) s +0 r 118 c +-1 r (erts) s +14 r (existing) s +15 r (trace) s +15 r (\014les) s +16 r (to) s +15 r (the) s +15 r (compact) s +15 r (SCF) s +15 r (format.) s +cmbx10.329 @sf +224 1113 p (alfsplit) s +cmr10.329 @sf +23 r (Splits) s +16 r (the) s +17 r (Pharmacia) s +16 r (A.L.F.) s +16 r (gel) s +17 r (\014le) s +16 r (in) s +0 r (to) s +15 r 109 c +0 r (ultiple) s +15 r (\014les,) s +17 r (one) s +16 r (for) s +338 1169 p (eac) s +0 r 104 c +14 r (sample.) s +cmbx10.329 @sf +224 1263 p (frog) s +cmr10.329 @sf +23 r (Relab) s +1 r (els) s +15 r (lanes) s +15 r (in) s +16 r (ABI) s +15 r (trace) s +15 r (\014les.) s +cmbx10.329 @sf +224 1356 p 43 c +18 r 110 c +-1 r (umerous) s +17 r (scripts) s +17 r (\(including) s +cmti10.329 @sf +17 r (squirr) s +-1 r (el) s +15 r (\(v1.4\)) s +cmbx10.329 @sf +2 r 41 c +cmbx10.432 @sf +224 1499 p 50 c +69 r (Requiremen) s +-1 r (ts) s +cmr10.329 @sf +224 1601 p 89 c +-3 r (ou) s +14 r (will) s +15 r (need) s +14 r 97 c +15 r (tap) s +1 r 101 c +15 r (driv) s +0 r 101 c +13 r (to) s +15 r (read) s +15 r (the) s +14 r (soft) s +0 r 119 c +-1 r (are) s +13 r (o\013) s +15 r (the) s +15 r (distribution) s +14 r (tap) s +2 r 101 c +224 1657 p (\(QIC-150,) s +14 r (TK50,) s +13 r (or) s +13 r (Exab) s +0 r (yte\).) s +18 r 89 c +-2 r (ou) s +12 r (will) s +13 r (also) s +13 r (need) s +13 r 97 c +13 r (large) s +13 r (amoun) s +-1 r 116 c +12 r (of) s +13 r (disk) s +224 1714 p (storage) s +16 r (to) s +16 r (accommo) s +2 r (date) s +16 r (the) s +16 r (whole) s +16 r (pac) s +-1 r 107 c +-2 r (age.) s +22 r 70 c +-3 r (or) s +15 r (release) s +16 r 118 c +0 r (ersion-1993.0,) s +224 1770 p (requiremen) s +0 r (ts) s +20 r 119 c +0 r (ere) s +21 r (31Mb) s +21 r (\(SunOS) s +22 r (4.x\),) s +23 r (36Mb) s +22 r (\(Sun) s +21 r (Solaris) s +22 r (2.x\)) s +21 r (30Mb) s +224 1827 p (\(DEC) s +15 r (Ultrix\)) s +15 r (37Mb) s +16 r (\(DEC) s +15 r (OSF/1\)) s +15 r (and) s +15 r (27Mb) s +15 r (\(Silicon) s +15 r (Graphics) s +16 r (SGI.\)) s +295 1883 p 84 c +-3 r 111 c +14 r (compile) s +15 r (the) s +15 r (Staden) s +16 r (pac) s +-1 r 107 c +-2 r (age) s +14 r 121 c +0 r (ou) s +14 r (will) s +15 r (require:) s +cmsy10.329 @sf +292 1976 p 15 c +cmr10.329 @sf +23 r (An) s +15 r (ANSI) s +15 r 67 c +16 r (compiler.) s +cmsy10.329 @sf +292 2070 p 15 c +cmr10.329 @sf +23 r 65 c +15 r 70 c +0 r (OR) s +-4 r (TRAN-77) s +14 r (compiler.) s +cmsy10.329 @sf +292 2164 p 15 c +cmr10.329 @sf +23 r (X11) s +15 r (\(Release) s +15 r 52 c +16 r (or) s +15 r (5\).) s +cmsy10.329 @sf +292 2257 p 15 c +cmr10.329 @sf +23 r (GNU) s +15 r (mak) s +0 r 101 c +14 r (\(except) s +15 r (with) s +15 r (SunOS) s +16 r (and) s +15 r (Solaris) s +15 r (2.x.\)) s +cmbx10.432 @sf +224 2400 p 51 c +69 r (Installation) s +cmr10.329 @sf +224 2502 p 84 c +-3 r 111 c +15 r (install) s +15 r (the) s +15 r (pac) s +0 r 107 c +-3 r (age,) s +280 2595 p (1.) s +22 r (Create) s +22 r 97 c +22 r (directory) s +21 r (for) s +22 r (where) s +21 r 121 c +0 r (ou) s +21 r 119 c +-1 r (ould) s +21 r (lik) s +0 r 101 c +20 r (the) s +22 r (soft) s +-1 r 119 c +-1 r (are) s +21 r (to) s +21 r 98 c +2 r 101 c +338 2652 p (placed.) s +20 r 89 c +-3 r (ou) s +14 r (ma) s +0 r 121 c +14 r (ha) s +0 r 118 c +-1 r 101 c +14 r (to) s +15 r 98 c +1 r 101 c +15 r (sup) s +2 r (eruser) s +15 r (to) s +15 r (do) s +15 r (this.) s +925 2776 p 50 c +@eop +3 @bop0 +[ 329 ] /cmtt10.329 @newfont +cmtt10.329 @sf +[<7F1F1F00FFBFBF807F1F1F001C1C1C001C1C1C001C1C1C001C1C1C001C1C1C001C1C1C001C1C1C001C1C1C001C1C1C001C1C + 1C001C1C1C001E1E1C001E1E1C001F1F1C007FFFF800FFFBF8007CE0E000> 32 20 1 0 23.863] 109 @dc +[ 24 28 -1 0 23.863] 107 @dc +[<03E3F00FFBF81FFFF03C1F80380F80700780700780E00380E00380E00380E00380E00380E00380700380700780380F803C1F + 801FFF800FFB8003E380000380000380000380000380000380001F80003F80001F80> 24 28 -2 0 23.863] 100 @dc +[ 24 29 -4 0 23.863] 105 @dc +[<7FFE00FFFF007FFE0003800003800003800003800003800003800003800003800003800003C00003C00003E00003F03003F8 + 787FBFF8FF9FF07F87E0> 24 20 -1 0 23.863] 114 @dc +[<600000F00000F00000F800007800007C00003C00003C00003E00001E00001F00000F00000F00000F800007800007C00003C0 + 0003C00003E00001E00001F00000F00000F800007800007800007C00003C00003E00001E00001E00001F00000F00000F8000 + 0780000780000300> 24 36 -3 4 23.863] 47 @dc +[<7FC3FCFFE7FE7FC3FC0E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00F00E00F80E00FC1 + C00FFFC00EFF800E3E000E00000E00000E00000E00000E00007E0000FE00007E0000> 24 28 0 0 23.863] 104 @dc +[<01F0000FFE001FFF003E0F803C07807803C07001C0F001E0E000E0E000E0E000E0E000E0E000E07001C07001C03803803E0F + 801FFF000FFE0001F000> 24 20 -2 0 23.863] 111 @dc +[<01FC0007FF001FFF803E03C03801C07001C0700000E00000FFFFC0FFFFC0FFFFC0E001C0E001C07003807003803807803E0F + 001FFE0007FC0001F000> 24 20 -3 0 23.863] 101 @dc +[ 24 28 -2 0 23.863] 83 @dc +[<003E0000FF8001FFC001C1C00380E00380E00380E00380400380000380000380000380000380000380000380000380000380 + 00FFFFC0FFFFC07FFFC0038000038000038000038000018000> 24 25 -1 0 23.863] 116 @dc +[<07E1F01FFBF03FFFF0781F00F00F00E00700E00700E007007807007F07001FFF0007FF0000FF00000700000700300E00781E + 007FFC003FF8001FE000> 24 20 -3 0 23.863] 97 @dc +[<7FC3FCFFE7FE7FC3FC0E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00F00E00F80E00FC1 + C07FFFC0FEFF807E3E00> 24 20 0 0 23.863] 110 @dc +[<01FC0007FF001FFF803E03C03801C07001C0700000E00000E00000E00000E00000E00000E000007000007000003803003E07 + 801FFF8007FF0001FE00> 24 20 -3 0 23.863] 99 @dc +[<7F8FF0FF8FF87F8FF00F0780070700038E00039E0001DC0000F80000F00000700000F80001F80001DC00039E00078E000707 + 007F8FF07F9FF07F8FF0> 24 20 -1 0 23.863] 120 @dc +[<00700000F80000F80001DC0001DC0001DC00038E00038E00038E00038E000707000707000707000E03800E03800E03801E03 + C07F8FF0FF8FF87F8FF0> 24 20 -1 0 23.863] 118 @dc +[<7FFF007FFF007FFF0001C00001C00001C00001C00001C00001C00001C00001C00001C00001C00001C00001C00001C00001C0 + 00FFFFC0FFFFC07FFFC001C00001C00001C00001C0C000E1E000FFE0007FC0001F80> 24 28 -1 0 23.863] 102 @dc +[ 24 20 -3 0 23.863] 115 @dc +[<01F00007FC000FFE001F1F001C07003803807803C07001C07001C0F001E0E000E0E000E0E000E0E000E0E000E0E000E0E000 + E0E000E0E000E07001C07001C07803C03803801C07001F1F000FFE0007FC0001F000> 24 28 -2 0 23.863] 48 @dc +cmbx10.329 @sf +[<03FFFFC003FFFFC00007E0000007E0000007E0000007E0000007E0000007E0000007E0000007E0000007E0000007E0000007 + E0000007E0000007E0000007E0000007E0000007E0000007E000C007E006C007E006C007E006C007E006E007E00E6007E00C + 6007E00C7007E01C7C07E07C7FFFFFFC7FFFFFFC> 32 30 -2 0 36.362] 84 @dc +[ 40 31 -2 0 40.087] 68 @dc +[ 32 31 -2 0 34.342] 69 @dc +[ 40 31 -2 0 39.203] 82 @dc +[<001FF8000000FFFF000001F81F800007E007E0000FC003F0001F8001F8003F8001FC003F0000FC007F0000FE007F0000FE00 + 7E00007E00FE00007F00FE00007F00FE00007F00FE00007F00FE00007F00FE00007F00FE00007F00FE00007F00FE00007F00 + 7E00007E007E00007E007F0000FE003F0000FC001F0000F8001F8001F8000FC003F00007E007E00001F81F800000FFFF0000 + 001FF80000> 40 31 -3 0 39.266] 79 @dc +cmti10.329 @sf +[<1E003100708070807040704038203800380038001C001C001C001C000E000E000E000E000700FFF007000700038003800380 + 038001C00180> 16 28 -4 0 15.101] 116 @dc +[<3001C07003303803103807083803083803881C03841C01C01C01C01C01C00E00E00E00E00E00E08E00E04700704700704780 + 604740602630C01C0F80> 24 20 -4 0 25.555] 110 @dc +[<07C000187000301800700E00700F00F00700F00780F003C0F003C0F003C07801E07801E07801E03C01E01C01E01E01C00E01 + C003018001C300007C00> 24 20 -4 0 23.232] 111 @dc +[<3F800060E000F07000783800301C00001C00001C00000E00000E0003CE000C2E001C17001C0F003C07003C07003C03803C03 + 803C03803C03801E01C01E01C01E01C00E01C00F00E00700E00380E001C1E000E270003C60> 24 29 -2 9 20.908] 103 @dc +cmtt10.329 @sf +[<07FF0007FF0007FF000070000070000070000070000070000070000070000070000070000070000070000070000070000070 + 00007000007000007000007000E07038E07038E07038E07038FFFFF8FFFFF87FFFF8> 24 28 -1 0 23.863] 84 @dc +[<7F07F0FF8FF87F07F01C01C01C01C00E03800E03800FFF800FFF800FFF80070700070700070700070700030600038E00038E + 00038E00038E00018C0001DC0001DC0001DC0000D80000D80000F80000F800007000> 24 28 -1 0 23.863] 65 @dc +[<7FF800FFFE007FFF001C0F801C03C01C01C01C01E01C00E01C00E01C00F01C00701C00701C00701C00701C00701C00701C00 + 701C00701C00F01C00E01C00E01C01E01C03C01C03C01C0F807FFF00FFFE007FF800> 24 28 -1 0 23.863] 68 @dc +[ 24 28 -1 0 23.863] 69 @dc +[<7F03C0FF87C07F07C01C0DC01C0DC01C0DC01C1DC01C19C01C19C01C39C01C39C01C39C01C31C01C71C01C71C01C61C01CE1 + C01CE1C01CE1C01CC1C01CC1C01DC1C01D81C01D81C01D81C07F07F0FF0FF87E07F0> 24 28 -1 0 23.863] 78 @dc +[<7F00F0FF81F87F01F81C039C1C039C1C039C1C03801C03801C03801C03801C03801C07001C0F001FFE001FFE001FFF001C0F + 801C03801C03C01C01C01C01C01C01C01C03C01C03801C0F807FFF00FFFE007FF800> 24 28 -1 0 23.863] 82 @dc +[<0FF8003FFE007FFF00780F00700700F00780E00380E00380E00380E00380E00380E00380E00380E00380E00380E00380E003 + 80E00380E00380E00380E00380E00380F00780700700780F007FFF003FFE000FF800> 24 28 -3 0 23.863] 79 @dc +[<01FCFC03FFFE07FFFC0F03E00E01E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00E00 + E07E07E0FE0FE07E07E0> 24 20 0 0 23.863] 117 @dc +[<00C00001C00001C00001C00007E0001FF8003FFE0079DE0071C700E1C700E1C380F1C380F1C38061C38001C70001CF0001DE + 0003FC000FF8001FE0003DC00079C000F1C000E1C780E1C780E1C780E1C38071C7007DCF003FFE000FFC0003F00001C00001 + C00001C00000C000> 24 36 -3 4 23.863] 36 @dc +[<3078FCFC7830> 8 6 -9 0 23.863] 46 @dc +[<7FFFC0FFFFE07FFFC000E00000E00000E00000E00000E00000E00000E00000E00000E00000E00000E00000E00000E00000E0 + 0000E00000E00000E00000E00000E00000E00000E00000E0007FE000FFE0007FE000> 24 28 -2 0 23.863] 108 @dc +[<01FC000FFF801FFFC07E03F07800F0E00038E00038E00038E000387000707801F03FFFE01FFFC01FFE001C000038000039E0 + 001FF8001FFC001E1E001C0E003807003807003807003807003807001C0E001E1E300FFFF807FFF801E1F0> 24 31 -1 11 23.863] 103 @dc +cmti10.329 @sf +[<7FE0FFE07FF0> 16 3 -3 -8 16.262] 45 @dc +[<3C00000062000000F3000000798000003180000001C0000001C0000000C0000000E0000000E0038000E0064000E00E200070 + 0E2000700E2000700E2000700710007007000038070000380700003803800038038000380380001C0380001C01C0001C01C0 + 001C01C0001C01C0000E00E000FFFFE0000E0000000E0000000E000000070000000700000007000000070030000300780003 + 8078000180380000E01000003FE0> 32 41 2 9 25.555] 12 @dc +cmtt10.329 @sf +[<7FFFC0FFFFE0FFFFE0FFFFE0000000000000000000000000FFFFE0FFFFE0FFFFE07FFFC0> 24 12 -2 -8 23.863] 61 @dc +[<7FC000FFE0007FC0000E00000E00000E00000E00000E00000E00000E00000E3E000EFF800FFFC00FC1E00F80E00F00700F00 + 700E00380E00380E00380E00380E00380E00380E00700F00700F80E00FC1E07FFFC0FEFF807E3E00> 24 30 0 10 23.863] 112 @dc +cmr10.329 @sf +[<40201010080804040474FCFCF870> 8 14 -4 -18 12.626] 39 @dc +[<000400020000000C00030000000E00070000000E00070000001E00078000001F000F8000001F000F8000001F000F8000003C + 801E4000003C801E4000003C801E40000078C03E20000078403C20000078403C200000F0403C100000F02078100000F02078 + 100001F02078080001E010F0080001E010F0080003E010F00C0003C009E0040003C009E0040003C009E00400078007C00200 + 078007C00200078007C002000F0007C001000F00078001000F00078003801F800FC007C0FFF07FF81FF0> 48 32 -1 1 46.716] 87 @dc +cmbx10.432 @sf +[<007FFFF8007FFFF8007FFFF80000FE000000FE000000FE000000FE000000FE000000FE000000FE000000FE00FFFFFFF8FFFF + FFF8FFFFFFF8E0007E0070007E0038007E001C007E000E007E000E007E0007007E0003807E0001C07E0000E07E0000E07E00 + 00707E0000387E00001C7E00000E7E00000E7E0000077E000003FE000001FE000000FE000000FE0000007E0000003E000000 + 1E0000000E00> 32 39 -2 0 34.370] 52 @dc +[<00001FF800000001FFFF00000007FFFFC000000FF007E000003FC000F000007F00003800007E00001C0000FE00001C0001FE + 00000E0001FC00000E0003FC0000070003FC0000070003FC0000070003FC0000070003FC0000070003FC0000070003FC0000 + 070003FC0000070003FC0000070003FC0000070003FC0000070003FC0000070003FC0000070003FC0000070003FC00000700 + 03FC0000070003FC0000070003FC0000070003FC0000070003FC0000070003FC0000070003FC0000070003FC0000070003FC + 0000070003FC0000070003FC0000070003FC0000070003FC00000700FFFFF001FFFCFFFFF001FFFCFFFFF001FFFC> 48 41 -3 0 52.883] 85 @dc +[ 40 39 -2 12 38.189] 112 @dc +[ 40 41 -3 0 46.989] 80 @dc +[<7FFF80007FFF80007FFF800007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0 + 000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F00000FFFFC000 + FFFFC000FFFFC00007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F03E0007F07F0003F07F0003F8 + 7F0001F87F0000FE3E00003FFC000007F000> 32 42 -2 0 21.004] 102 @dc +cmti10.329 @sf +[<81F80000C6060000E80380007000C0006000E000600060006000700020003000200038002000380000003800000038000000 + 7800000078000001F800001FF000007FF00001FFE00001FF800003F8000003C0000003C00000038000000380010003800100 + 038001000180010001C0018000C003800060038000300580001C18C00007E040> 32 33 -3 1 25.555] 83 @dc +cmbx10.329 @sf +[ 48 31 -2 0 49.620] 77 @dc +cmti10.329 @sf +[<300300380070070066003803806200380380E100380380610038038071001C01C070801C01C038001C01C038001C01C03800 + 0E00E01C000E00E01C000E00E01C008E00E01C004700700E004700700E004780680E004740640C002630C318001C0F80F000> 40 20 -4 0 37.171] 109 @dc +3 @bop1 +cmtt10.329 @sf +475 307 p (mkdir) s +24 r (/home/Staden) s +cmr10.329 @sf +280 417 p (2.) s +22 r (Change) s +16 r (to) s +15 r (this) s +15 r (directory) s +-3 r 46 c +cmtt10.329 @sf +475 527 p (cd) s +24 r (/home/Staden) s +cmr10.329 @sf +280 638 p (3.) s +22 r (Place) s +16 r (the) s +15 r (tap) s +1 r 101 c +15 r (in) s +0 r (to) s +14 r (the) s +15 r (tap) s +1 r 101 c +16 r (unit.) s +280 731 p (4.) s +22 r (Extract) s +17 r (the) s +17 r (soft) s +0 r 119 c +-1 r (are) s +15 r (o\013) s +17 r (the) s +17 r (distribution) s +17 r (tap) s +1 r 101 c +17 r (\(NOTE:) s +17 r (the) s +17 r (device) s +338 787 p (name) s +15 r (ma) s +0 r 121 c +14 r 98 c +1 r 101 c +15 r (di\013eren) s +0 r 116 c +14 r (on) s +15 r 121 c +0 r (our) s +14 r (mac) s +0 r (hine\):) s +cmtt10.329 @sf +475 897 p (tar) s +24 r (xvf) s +24 r (/dev/rst0) s +cmr10.329 @sf +280 1007 p (5.) s +22 r 67 c +11 r (shell) s +10 r (users) s +10 r (should) s +10 r (set) s +10 r (the) s +10 r (en) s +0 r (vironmen) s +-1 r 116 c +9 r 118 c +-2 r (ariable) s +cmbx10.329 @sf +9 r (ST) s +-3 r (ADENR) s +-2 r (OOT) s +cmr10.329 @sf +338 1064 p (to) s +17 r 98 c +1 r 101 c +18 r (the) s +17 r (directory) s +17 r (where) s +18 r (the) s +17 r (pac) s +0 r 107 c +-3 r (age) s +16 r (is) s +18 r (installed) s +17 r (and) s +17 r (source) s +17 r (the) s +338 1120 p (\014le) s +cmti10.329 @sf +16 r (staden.lo) s +-1 r (gin) s +cmr10.329 @sf +15 r (found) s +16 r (there.) s +24 r (This) s +16 r (is) s +17 r 98 c +1 r (est) s +16 r (done) s +16 r 98 c +0 r 121 c +16 r (adding) s +16 r (lines) s +16 r (to) s +338 1177 p (their) s +cmti10.329 @sf +15 r (.lo) s +-1 r (gin) s +cmr10.329 @sf +14 r (\014le:) s +cmtt10.329 @sf +433 1287 p (setenv) s +24 r (STADENROOT) s +24 r (/home/Staden) s +433 1343 p (source) s +24 r ($STADENROOT/staden.login) s +cmr10.329 @sf +338 1453 p (Users) s +14 r (of) s +14 r (the) s +13 r (Bourne) s +14 r (shell,) s +14 r (sh,) s +15 r (should) s +13 r (similarly) s +14 r (add) s +14 r (lines) s +14 r (their) s +cmti10.329 @sf +14 r (.pr) s +-1 r (o-) s +338 1510 p (\014le) s +cmr10.329 @sf +15 r (\014le:) s +cmtt10.329 @sf +433 1620 p (STADENROOT=/home/Staden) s +433 1676 p (export) s +24 r (STADENROOT) s +433 1733 p 46 c +24 r ($STADENROOT/staden.profile) s +cmr10.329 @sf +338 1843 p (The) s +22 r (startup) s +22 r (routines) s +23 r (set) s +22 r (en) s +0 r (vironmen) s +-1 r 116 c +21 r 118 c +-2 r (ariables) s +22 r (and) s +22 r (mo) s +1 r (dify) s +23 r (the) s +338 1899 p (shell's) s +16 r (searc) s +0 r 104 c +15 r (path) s +16 r (so) s +16 r (that) s +16 r (it) s +16 r (can) s +16 r (\014nd) s +17 r (the) s +16 r (programs) s +16 r (in) s +16 r (the) s +16 r (Staden) s +338 1956 p 80 c +0 r (ac) s +-2 r 107 c +-2 r (age.) s +20 r (When) s +16 r (users) s +15 r (next) s +16 r (log) s +15 r (on) s +16 r (to) s +15 r (the) s +16 r (system,) s +15 r (they) s +16 r (will) s +15 r 98 c +2 r 101 c +15 r (able) s +338 2012 p (to) s +15 r (use) s +15 r (the) s +15 r (programs.) s +cmbx10.432 @sf +224 2155 p 52 c +69 r (Installation) s +23 r (on) s +23 r (Unsupp) s +2 r (orted) s +23 r (Platforms) s +cmr10.329 @sf +224 2256 p (Install) s +12 r (the) s +12 r (soft) s +0 r 119 c +-1 r (are) s +11 r (as) s +12 r 121 c +0 r (ou) s +11 r 119 c +0 r (ould) s +11 r (for) s +12 r 97 c +12 r (supp) s +1 r (orted) s +12 r (mac) s +0 r (hine.) s +18 r 89 c +-3 r (ou) s +11 r (will) s +13 r (need) s +224 2313 p (to) s +15 r (remak) s +-1 r 101 c +14 r (all) s +15 r (executables.) s +20 r (The) s +14 r (script) s +cmti10.329 @sf +15 r (Staden) s +3 r 14 2 ru +13 r (instal) s +3 r 108 c +cmr10.329 @sf +14 r (can) s +15 r 98 c +1 r 101 c +15 r (used) s +14 r (to) s +15 r (help) s +224 2369 p (recompile) s +14 r (the) s +14 r (pac) s +-1 r 107 c +-2 r (age.) s +19 r 65 c +13 r (large) s +14 r 110 c +0 r (um) s +-1 r 98 c +0 r (er) s +14 r (of) s +13 r (assumptions) s +14 r (ha) s +0 r 118 c +-1 r 101 c +12 r 98 c +2 r (een) s +13 r (made,) s +224 2426 p (and) s +15 r 121 c +0 r (ou) s +14 r (ma) s +0 r 121 c +14 r (need) s +15 r (to) s +15 r 99 c +0 r (hange) s +14 r (the) s +15 r (mak) s +0 r (e\014les) s +14 r (to) s +15 r (suit) s +16 r 121 c +-1 r (our) s +15 r (system.) s +295 2482 p (The) s +15 r (sources) s +15 r (ha) s +-1 r 118 c +-1 r 101 c +14 r 98 c +1 r (een) s +15 r (organised) s +15 r (in) s +0 r (to) s +14 r (sub) s +1 r (directories) s +15 r (of) s +15 r (the) s +15 r (directory) s +cmbx10.329 @sf +224 2539 p (src) s +cmr10.329 @sf +0 r 46 c +22 r (In) s +cmbx10.329 @sf +16 r (Misc) s +cmr10.329 @sf +15 r (are) s +16 r (routines) s +16 r (common) s +15 r (to) s +16 r (man) s +0 r 121 c +15 r (programs.) s +21 r (They) s +16 r (should) s +16 r 98 c +1 r 101 c +224 2595 p (made) s +19 r (\014rst.) s +31 r (In) s +cmbx10.329 @sf +19 r (staden) s +cmr10.329 @sf +19 r (are) s +19 r (all) s +19 r (the) s +18 r (programs) s +19 r (of) s +19 r (the) s +19 r (Staden) s +19 r (suite) s +18 r 40 c +cmti10.329 @sf +0 r (mep) s +cmr10.329 @sf +0 r 44 c +cmti10.329 @sf +224 2652 p (nip) s +cmr10.329 @sf +0 r 44 c +cmti10.329 @sf +18 r (pip) s +cmr10.329 @sf +0 r 44 c +cmti10.329 @sf +18 r (sap) s +cmr10.329 @sf +0 r 44 c +cmti10.329 @sf +17 r (sip) s +cmr10.329 @sf +0 r 44 c +cmti10.329 @sf +18 r (dap) s +cmr10.329 @sf +0 r 44 c +cmti10.329 @sf +18 r (gip) s +cmr10.329 @sf +0 r 44 c +cmti10.329 @sf +17 r (vep) s +cmr10.329 @sf +0 r 44 c +cmti10.329 @sf +18 r (lip) s +cmr10.329 @sf +17 r (and) s +cmti10.329 @sf +17 r 114 c +-1 r (ep) s +cmr10.329 @sf +0 r 41 c +16 r (with) s +17 r (the) s +18 r (exception) s +17 r (of) s +cmti10.329 @sf +17 r 98 c +-1 r (ap) s +cmr10.329 @sf +0 r 46 c +925 2776 p 51 c +@eop +4 @bop0 +cmbx10.329 @sf +[ 24 45 -3 11 26.136] 47 @dc +cmr10.329 @sf +[<3F006180F0C0F060607000700070007000700070007000700070007000700070007000700070007000700070007000700070 + 007000F007F0007000000000000000000000000000E001F001F001F000E0> 16 40 2 9 13.889] 106 @dc +cmbx10.432 @sf +[<00FF800003FFF0000FFFF8001F01FE003C007F0078003F8078003F80FC001FC0FE001FC0FE001FE0FE001FE0FE001FE07C00 + 1FE018001FE000001FE000001FE000001FC000001FC000001F800C003F000E003E000F80FC000FFFF8000E7FC0000E000000 + 0E0000000E0000000E0000000E0000000E0000000FFE00000FFFC0000FFFE0000FFFF0000FFFF8000FFFFC000FFFFE000F80 + 3F000C000300> 32 39 -3 0 34.370] 53 @dc +[<0000FFE00000000FFFFE0000003FC07F800000FF001FE00001FC0007F00003F80003F80007F00001FC000FF00001FE001FE0 + 0000FF001FE00000FF003FC000007F803FC000007F807FC000007FC07FC000007FC07F8000003FC0FF8000003FE0FF800000 + 3FE0FF8000003FE0FF8000003FE0FF8000003FE0FF8000003FE0FF8000003FE0FF8000003FE0FF8000003FE0FF8000003FE0 + 7F8000003FC07F8000003FC07F8000003FC07FC000007FC03FC000007F803FC000007F801FE00000FF001FE00000FF000FF0 + 0001FE0007F00001FC0003F80003F80001FC0007F000007F001FC000003FC07F80000007FFFC00000000FFE00000> 48 41 -4 0 51.638] 79 @dc +[ 40 42 -3 0 38.189] 104 @dc +[ 32 41 -4 0 38.189] 83 @dc +[<00078003C00000078003C000000FC007E000000FC007E000000FC007E000001FE00FF000001FE00FF000003FF01FF800003F + F01FB800003FF01FB800007F783F3C00007F383F1C0000FF383F1E0000FE1C7E0E0000FE1C7E0E0001FE1EFC0F0001FC0EFC + 070001FC0EFC070003F807F8038003F807F8038007F807F803C007F003F001C007F003F001C00FE007E000E0FFFE7FFC0FFE + FFFE7FFC0FFEFFFE7FFC0FFE> 48 27 -1 0 49.646] 119 @dc +[<0001C000000003E000000003E000000007F000000007F00000000FF80000000FF80000000FF80000001FDC0000001FDC0000 + 003FDE0000003F8E0000007F8F0000007F070000007F07000000FE03800000FE03800001FC01C00001FC01C00003FC01E000 + 03F800E00007F800F00007F000700007F0007000FFFE03FF80FFFE03FF80FFFE03FF80> 40 27 -1 0 36.280] 118 @dc +cmbx10.329 @sf +[ 40 31 -2 0 40.908] 72 @dc +[<0007FC0600003FFF8E0000FE01FE0003F000FE0007E0007E000FC0007E001F80007E003F00007E003F00007E007F00007E00 + 7E00007E007E00007E00FE003FFFE0FE003FFFE0FE00000000FE00000000FE00000000FE00000000FE000000007E00000600 + 7E000006007F000006003F00000E003F00000E001F80001E000FC0001E0007E0003E0003F000FE0000FE03DE00003FFF0E00 + 0007FC0200> 40 31 -3 0 41.097] 71 @dc +[<0030018000007803C000007803C000007803C00000FC07E00000FC07E00001F60FB00001F60F300001F60F300003E31E1800 + 03E31E180007C1BE0C0007C1BC0C0007C1BC0C000F80F806000F80F806001F00F803001F00F00300FFE7FE1FE0FFE7FE1FE0> 40 20 -1 0 37.751] 119 @dc +[<387CFEFEFE7C38> 8 7 -4 0 14.520] 46 @dc +[<00FFFE00FFFE0007C00007C00007C00007C00007C00007C0FFFFFEFFFFFEE003C07003C03803C01803C00C03C00E03C00703 + C00383C00183C000C3C00063C00073C0003BC0001FC0000FC00007C00007C00003C00001C0> 24 29 -1 0 26.136] 52 @dc +[ 32 31 -2 0 31.438] 76 @dc +[<0003FC0000001FFF8000007E03C00000F800600001F000300001F000180003E000180007E0000C0007E0000C0007E0000C00 + 07E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C00 + 07E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C00FFFF01FFE0 + FFFF01FFE0> 40 31 -2 0 40.213] 85 @dc +[<001FFFE000001FFFE0000000FC00000000FC00000000FC00000000FC00000000FC00000000FC00000000FC00000000FC0000 + 0000FC00000000FC00000000FC00000001FC00000001FE00000003FF00000007F30000000FE18000000FE1C000001FC0C000 + 003F806000003F807000007F00300000FE00180001FC001C0001FC000C0003F800060007F000070007F0000380FFFE003FF8 + FFFE003FF8> 40 31 -1 0 39.519] 89 @dc +[ 40 31 -2 0 39.519] 88 @dc +[ 32 31 -2 0 35.731] 80 @dc +[<00018000300000000380003800000003C0007800000007C0007C00000007C0007C00000007E000FC0000000FE000FE000000 + 0FF001FE0000000FF001FE0000001FB001FB0000001F9803F30000003F9803F38000003F1803F18000003F0C07E18000007E + 0C07E0C000007E0E0FE0C000007E060FC0C00000FC060FC0600000FC031F80600001FC031F80700001F8031F80300001F801 + BF00300003F001BF00180003F001FF00180003F000FE00180007E000FE000C0007E000FE000C000FE000FC000E000FC000FC + 000E00FFFE0FFFC0FFE0FFFE0FFFC0FFE0> 56 31 -1 0 54.039] 87 @dc +[ 40 31 -2 0 40.971] 75 @dc +cmbx10.432 @sf +[<003FC00000FFF00003FFFC0007E07E000FC03F001F803F801F801FC03F001FC03F001FE07F001FE07F001FE07F001FE07F00 + 1FE0FF001FE0FF001FE0FF001FE0FF001FC0FF801FC0FF801F80FFC03F00FFC03E00FF707C00FF3FF800FF0FC000FF000000 + 7F0000007F0000007F0000003F8000003F801F001F803F801FC03F800FC03F8007E03F8003F01F8001FC0F0000FFFE00003F + FC000007F000> 32 39 -3 0 34.370] 54 @dc +[<00000E00000700000000001F00000F80000000001F00000F80000000001F80001F80000000003F80001FC0000000003F8000 + 1FC0000000003FC0003FC0000000007FC0003FE0000000007FC0003FE000000000FFE0007FF000000000FFE0007FF0000000 + 00FFF000FFF000000001FE7000FF3800000001FE7000FF3800000001FE7801FF3800000003FC3801FE1C00000003FC3801FE + 1C00000007FC1C03FC1E00000007F81C03FC0E00000007F81E07FC0E0000000FF00E07F8070000000FF00E07F8070000001F + F00F0FF0078000001FE0070FF0038000001FE0070FF0038000003FE0039FE003C000003FC0039FE001C000003FC003FFE001 + C000007F8001FFC000E000007F8001FFC000E00000FF8000FF8000F00000FF0000FF8000700000FF0000FF8000700001FF00 + 00FF0000780001FE0000FF0000380001FE0001FF0000380003FC0001FE00001C0003FC0001FE00001C00FFFFE07FFFF007FF + F0FFFFE07FFFF007FFF0FFFFE07FFFF007FFF0> 72 41 -1 0 71.065] 87 @dc +[ 48 41 -2 0 51.970] 65 @dc +[ 40 41 -3 0 45.163] 69 @dc +[ 40 41 -3 0 43.253] 70 @dc +[<1C003E007F00FF80FF80FF807F003E001C00> 16 9 -5 0 19.094] 46 @dc +cmti10.329 @sf +[<00FF80000300F8000C000F00100001C01000000020000000403E0F004061988040C0F84041C0382081C03C2081C01C108180 + 1C1041C01C1041C00E0841C00E0841C00E0841C00E0820E0070820E00704107007041030070408180408040C080804061808 + 0201E008010000100080002000600020001800C0000603000001FC00> 32 32 -6 0 34.847] 64 @dc +[<07C000183800300400700200700100F00000F00000F00000F00000F000007800007800007800003C02001C07001E07800E07 + 8003008001C100007E00> 24 20 -4 0 20.908] 99 @dc +[<601E00E0310070310070708070708070708038384038700038700038E0001FC0001E00001D00001C80000E40C00E21E00E11 + E00E08E00704200703C007000007000003800003800003800003800001C00001C00001C00001C0000FE00001E000> 24 32 -3 0 20.908] 107 @dc +cmr10.329 @sf +[<0F800030E000407000407800F03800F83C00F83C00F83C00203C00003C00003C00003C00003C00003C00003C00003C00003C + 00003C00003C00003C00003C00003C00003C00003C00003C00003C00003C00003C00003C00003C00007C000FFFC0> 24 32 -2 1 23.358] 74 @dc +cmti10.329 @sf +[<3C0000630000F1800079C00030E00000E00000E000007000007000007000007000003800003800003800003800001C00001C + 00001C00001C00000E00000E00000E00020E00010700010700008700008700004600003C0000000000000000000000000000 + 00000000000000000003800003C00003C0000180> 24 40 2 9 13.939] 106 @dc +cmr10.329 @sf +[ 32 31 -2 0 34.090] 72 @dc +4 @bop1 +cmr10.329 @sf +224 307 p (Co) s +1 r (de) s +14 r (for) s +14 r (our) s +14 r (latest) s +14 r (sequence) s +14 r (assem) s +0 r (bly) s +13 r (program) s +cmti10.329 @sf +14 r 98 c +-2 r (ap) s +cmr10.329 @sf +13 r (is) s +14 r (in) s +14 r (directories) s +cmbx10.329 @sf +14 r (bap) s +cmr10.329 @sf +224 364 p (and) s +cmbx10.329 @sf +16 r (bap/osp-bits) s +cmr10.329 @sf +0 r 46 c +24 r (Mak) s +0 r 101 c +15 r (the) s +16 r (ob) s +3 r (jects) s +16 r (in) s +cmbx10.329 @sf +16 r (staden) s +cmr10.329 @sf +17 r (\014rst,) s +16 r (then) s +17 r (the) s +16 r (ones) s +16 r (in) s +cmbx10.329 @sf +224 420 p (bap/osp-bits) s +cmr10.329 @sf +0 r 44 c +17 r (and) s +16 r (\014nally) s +16 r (the) s +16 r (ones) s +16 r (in) s +cmbx10.329 @sf +16 r (bap) s +cmr10.329 @sf +0 r 46 c +24 r (In) s +cmbx10.329 @sf +16 r (ted) s +cmr10.329 @sf +16 r (is) s +16 r (the) s +16 r (trace) s +16 r (editing) s +224 477 p (program.) s +cmbx10.432 @sf +224 620 p 53 c +69 r (Other) s +23 r (Soft) s +-1 r 119 c +-2 r (are) s +22 r (Pro) s +-1 r (vided) s +cmr10.329 @sf +224 721 p (Other) s +13 r (soft) s +0 r 119 c +-1 r (are) s +11 r (and) s +13 r (scripts) s +13 r (can) s +13 r 98 c +1 r 101 c +13 r (found) s +13 r (in) s +13 r (the) s +cmbx10.329 @sf +13 r (alf) s +cmr10.329 @sf +5 r 44 c +cmbx10.329 @sf +13 r (abi) s +cmr10.329 @sf +0 r 44 c +cmbx10.329 @sf +13 r (cop) s +cmr10.329 @sf +0 r 44 c +cmbx10.329 @sf +14 r (getMCH) s +cmr10.329 @sf +0 r 44 c +cmbx10.329 @sf +224 778 p (scf) s +cmr10.329 @sf +5 r 44 c +cmbx10.329 @sf +17 r (frog) s +cmr10.329 @sf +18 r (and) s +cmbx10.329 @sf +17 r (scripts) s +cmr10.329 @sf +17 r (directories.) s +25 r (Eac) s +0 r 104 c +15 r (directory) s +17 r (con) s +0 r (tains) s +16 r (do) s +1 r (cumen) s +0 r (ta-) s +224 834 p (tion) s +15 r (describing) s +15 r (the) s +16 r (programs) s +15 r (con) s +-1 r (tained.) s +295 891 p (Since) s +18 r (release) s +19 r 118 c +0 r (ersion-1993.0) s +18 r 119 c +-1 r 101 c +18 r (ha) s +0 r 118 c +-2 r 101 c +18 r (distributed) s +19 r (the) s +cmti10.329 @sf +19 r (squirr) s +-2 r (el) s +19 r (\(v1.4\)) s +cmr10.329 @sf +224 947 p (pac) s +0 r 107 c +-3 r (age.) s +19 r (Please) s +13 r (read) s +12 r (the) s +13 r (disclaimer) s +13 r (that) s +13 r (accompanies) s +13 r (this) s +13 r (soft) s +-1 r 119 c +-1 r (are.) s +18 r (Ad-) s +224 1003 p (ditional) s +12 r (sources) s +12 r (and) s +12 r (scripts) s +12 r (can) s +13 r 98 c +1 r 101 c +12 r (found) s +12 r (in) s +cmbx10.329 @sf +12 r (expGetSeq) s +cmr10.329 @sf +0 r 44 c +cmbx10.329 @sf +13 r 118 c +-1 r (ep) s +1 r 101 c +cmr10.329 @sf +0 r 44 c +cmbx10.329 @sf +12 r (newted) s +cmr10.329 @sf +224 1060 p (and) s +cmbx10.329 @sf +15 r (squirrel-1.4) s +cmr10.329 @sf +15 r (directories.) s +295 1116 p (Man) s +-1 r 121 c +19 r (scripts) s +19 r (\(including) s +cmti10.329 @sf +20 r (squirr) s +-2 r (el) s +cmr10.329 @sf +0 r 41 c +19 r (and) s +19 r (\014lters) s +19 r 119 c +0 r (ere) s +18 r (dev) s +0 r (elop) s +0 r (ed) s +20 r (at) s +19 r (the) s +224 1173 p (MR) s +0 r (C-LMB) s +15 r (for) s +cmbx10.329 @sf +16 r (INTERNAL) s +19 r (USE) s +19 r (ONL) s +-4 r 89 c +cmr10.329 @sf +0 r 46 c +16 r 87 c +-3 r 101 c +15 r (are) s +17 r 97 c +-1 r 119 c +-1 r (are) s +15 r (that) s +17 r 112 c +1 r (eople) s +224 1229 p (elsewhere) s +21 r (will) s +21 r 119 c +0 r (an) s +-1 r 116 c +20 r (to) s +21 r (dev) s +-1 r (elop) s +20 r (similar) s +21 r (soft) s +0 r 119 c +-1 r (are.) s +37 r 87 c +-3 r 101 c +20 r (include) s +21 r (them) s +21 r (in) s +224 1286 p (the) s +14 r (Staden) s +14 r 80 c +-1 r (ac) s +-1 r 107 c +-2 r (age) s +12 r (merely) s +14 r (as) s +cmbx10.329 @sf +14 r (EXAMPLES) s +cmr10.329 @sf +14 r (of) s +14 r (what) s +13 r (has) s +14 r 98 c +1 r (een) s +14 r (ac) s +0 r (hiev) s +-1 r (ed) s +224 1342 p (elsewhere.) s +cmbx10.329 @sf +31 r (THESE) s +21 r (SCRIPTS) s +21 r (WILL) s +21 r (NOT) s +22 r 87 c +-1 r (ORK) s +21 r (ON) s +21 r (YOUR) s +224 1399 p (SYSTEM) s +18 r (WITHOUT) s +17 r (MODIFICA) s +-3 r (TION.) s +cmbx10.432 @sf +224 1542 p 54 c +69 r (When) s +23 r (All) s +23 r (Else) s +23 r 70 c +-5 r (ails...) s +cmr10.329 @sf +224 1643 p (If) s +22 r 121 c +0 r (ou) s +21 r (ha) s +-1 r 118 c +-1 r 101 c +21 r (an) s +0 r 121 c +21 r (problems) s +21 r (please) s +22 r (con) s +0 r (tact) s +21 r (the) s +22 r (authors,) s +23 r (Ro) s +2 r (dger) s +15 r (Staden) s +224 1700 p 40 c +cmti10.329 @sf +0 r (rs@mr) s +-1 r (c-lmb) s +-2 r (a.c) s +-3 r (am.ac.uk) s +4 r 41 c +cmr10.329 @sf +0 r 44 c +11 r (Simon) s +15 r (Dear) s +10 r 40 c +cmti10.329 @sf +0 r (sd@mr) s +-1 r (c-lmb) s +-2 r (a.c) s +-3 r (am.ac.uk) s +4 r 41 c +cmr10.329 @sf +10 r (and) s +11 r (James) s +15 r (Bon\014eld) s +224 1756 p 40 c +cmti10.329 @sf +0 r (jkb@mr) s +-1 r (c-lmb) s +-2 r (a.c) s +-3 r (am.ac.uk) s +4 r 41 c +cmr10.329 @sf +0 r 44 c +17 r 98 c +-1 r 121 c +15 r (email) s +17 r (or) s +16 r 98 c +0 r 121 c +15 r (writing) s +16 r (to) s +16 r (us) s +16 r (at:) s +23 r (MR) s +0 r 67 c +15 r (Lab-) s +224 1813 p (oratory) s +18 r (of) s +19 r (Molecular) s +18 r (Biology) s +-3 r 44 c +18 r (Hills) s +19 r (Road,) s +19 r (Cam) s +0 r (bridge,) s +18 r (CB2) s +15 r (2QH) s +0 r 44 c +19 r (U.K.) s +224 1869 p 87 c +-3 r 101 c +15 r (also) s +15 r 119 c +-1 r (elcome) s +15 r (general) s +15 r (commen) s +-1 r (ts) s +15 r (on) s +15 r (the) s +15 r (pac) s +0 r 107 c +-3 r (age.) s +925 2776 p 52 c +@eop +@end diff --git a/doc/install.tex b/doc/install.tex new file mode 100644 index 0000000..37515cc --- /dev/null +++ b/doc/install.tex @@ -0,0 +1,172 @@ +\documentstyle[a4,11pt]{article} + +\title{Installing the Staden Package} +\author{Simon Dear} +\date{21 May 1993} + + + +\begin{document} +\maketitle + + + +\section{Introduction} + +On the accompanying tape you will find executables for +one of SunOS 4.x, Sun +Solaris 2.x, DEC Ultrix, DEC OSF/1 and Silicon Graphics SGI operating systems. +Also there are sources for all the programs in the Staden package. +Programs in the package are: +\begin{description} + +\item[mep and xmep] Motif exploration program. +\item[nip and xnip] Nucleotide interpretation program. +\item[nipl] Nucleotide interpretation program (library). +Searches nucleotide libraries for patterns of motifs. +\item[pip and xpip] Protein interpretation program. +\item[pipl] Protein interpretation program (library). +Searches protein libraries for patterns of motifs. +\item[sip and xsip] Similarity investigation program. +\item[sipl] Similarity investigation program (library). +Compares a probe protein or nucleic acid sequence against +a library of sequences. +\item[sap and xsap] The original sequence assembly program. +\item[bap and xbap] Our latest, most advanced sequence assembly program. +\item[dap and xdap] An obsolete assembly program, superceded by {\em bap}. +\item[lip] Library interface program. +\item[rep] Repeat examination program. +\item[ted] X windows utility for displaying and editing +fluorescent sequencing machine traces. +\item[splitp1, splitp2 and splitp3] Refer to help/SPLITP.MEM. +\item[sethelp] Builds online help files. +\item[gip] Gel input program. +\item[convert] Converts between {\em xdap\/} and {\em xbap\/} databases. +\item[cop and cop-bap] Checks completed {\em xdap\/} and {\em xbap\/} +databases for editing errors. +\item[trace2seq] Extracts sequence from trace files. +\item[getABISampleName] Extracts sample names from ABI trace files. +\item[makeSCF] Converts existing trace files to the compact +SCF format. +\item[alfsplit] Splits the Pharmacia A.L.F. gel +file into multiple files, one for each sample. +\item[frog] Relabels lanes in ABI trace files. +\item[+ numerous scripts (including {\em squirrel (v1.4)\/})] + +\end{description} + + +\section{Requirements} + +You will need a tape drive to read the software off the distribution +tape (QIC-150, TK50, or Exabyte). You will also need a large amount of +disk storage to accommodate the whole package. For release +version-1993.0, requirements were +31Mb (SunOS 4.x), +36Mb (Sun Solaris 2.x) +30Mb (DEC Ultrix) +37Mb (DEC OSF/1) +and +27Mb (Silicon Graphics SGI.) + + +To compile the Staden package you will require: +\begin{itemize} +\item An ANSI C compiler. +\item A FORTRAN-77 compiler. +\item X11 (Release 4 or 5). +\item GNU make (except with SunOS and Solaris 2.x.) +\end{itemize} + +\section{Installation} + +To install the package, +\begin{enumerate} +\item Create a directory for where you would like the software to be +placed. You may have to be superuser to do this. + \begin{verbatim} mkdir /home/Staden\end{verbatim} +\item Change to this directory. + \begin{verbatim} cd /home/Staden\end{verbatim} +\item Place the tape into the tape unit. +\item Extract the software off the distribution tape (NOTE: the device name may be +different on your machine): + \begin{verbatim} tar xvf /dev/rst0\end{verbatim} +\item C shell users should set the environment variable {\bf STADENROOT} +to be the directory where the package is installed and source the file +{\em staden.login} found there. This is best done by adding lines to their +{\em .login} file: +\begin{verbatim} + setenv STADENROOT /home/Staden + source $STADENROOT/staden.login +\end{verbatim} +Users of the Bourne shell, sh, should similarly add lines their {\em .profile} file: +\begin{verbatim} + STADENROOT=/home/Staden + export STADENROOT + . $STADENROOT/staden.profile +\end{verbatim} + +The startup routines set environment variables and modify the shell's +search path so that it can find the programs in the Staden Package. +When users next log on to the system, they will be able to use the +programs. + +\end{enumerate} + + +\section {Installation on Unsupported Platforms} + +Install the software as you would for a supported machine. You will +need to remake all executables. The script {\em Staden\_install} can +be used to help recompile the package. A large number of +assumptions have been made, and you may need to change the makefiles +to suit your system. + +The sources have been organised into subdirectories of the directory +{\bf src}. In {\bf Misc} are routines common to many programs. They +should be made first. In {\bf staden} are all the programs of the +Staden suite ({\em mep}, {\em nip}, {\em pip}, {\em sap}, {\em sip}, +{\em dap}, {\em gip}, {\em vep}, {\em lip} and {\em rep}) with the +exception of {\em bap}. Code for our latest sequence assembly program +{\em bap} is in directories {\bf bap} and {\bf bap/osp-bits}. Make +the objects in {\bf staden} first, then the ones in {\bf +bap/osp-bits}, and finally the ones in {\bf bap}. In {\bf ted} is the +trace editing program. + + +\section {Other Software Provided} + +Other software and scripts can be found in the {\bf alf\/}, {\bf +abi\/}, {\bf cop\/}, {\bf getMCH\/}, {\bf scf\/}, {\bf frog\/} and {\bf +scripts} +directories. +Each directory contains documentation describing the programs +contained. + +Since release version-1993.0 we have distributed the {\em squirrel (v1.4)} +package. Please read the disclaimer that accompanies this software. +Additional sources and scripts can be found in {\bf expGetSeq}, {\bf vepe}, +{\bf newted} and {\bf squirrel-1.4} directories. + +Many scripts (including {\em squirrel}) and filters were developed at the MRC-LMB for +{\bf INTERNAL USE ONLY}. +We are aware that people elsewhere will want to develop +similar software. +We include them in the Staden Package merely as {\bf EXAMPLES} of +what has been achieved elsewhere. +{\bf THESE SCRIPTS WILL NOT WORK ON YOUR SYSTEM WITHOUT MODIFICATION.} + +\section {When All Else Fails...} +If you have any problems please contact the authors, +\mbox{Rodger Staden} +\mbox{(\em rs@mrc-lmba.cam.ac.uk\/)}, +\mbox{Simon Dear} +\mbox{(\em sd@mrc-lmba.cam.ac.uk\/)} +and +\mbox{James Bonfield} +\mbox{(\em jkb@mrc-lmba.cam.ac.uk\/)}, +by email or by writing to us at: +MRC Laboratory of Molecular Biology, Hills Road, Cambridge, \mbox{CB2 2QH}, U.K. +We also welcome general comments on the package. + +\end{document} diff --git a/doc/manual.rtf b/doc/manual.rtf new file mode 100644 index 0000000..cbc023c --- /dev/null +++ b/doc/manual.rtf @@ -0,0 +1,5154 @@ +{\rtf1\mac\deff2 {\fonttbl{\f0\fswiss Chicago;}{\f2\froman New York;}{\f3\fswiss Geneva;}{\f4\fmodern Monaco;}{\f5\fscript Venice;}{\f6\fdecor London;}{\f7\fdecor Athens;}{\f8\fdecor San Francisco;}{\f11\fnil Cairo;}{\f12\fnil Los Angeles;} +{\f13\fnil Zapf Dingbats;}{\f14\fnil Bookman;}{\f15\fnil N Helvetica Narrow;}{\f16\fnil Palatino;}{\f18\fnil Zapf Chancery;}{\f20\froman Times;}{\f21\fswiss Helvetica;}{\f22\fmodern Courier;}{\f23\ftech Symbol;}{\f24\fnil Mobile;}{\f33\fnil Avant Garde;} +{\f34\fnil New Century Schlbk;}}{\colortbl\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;}{\stylesheet{\s243\qc\sa60\sl280 +\f20 \sbasedon222\snext0 footer;}{\s244\sl220\tqc\tx4320\tqr\tx8640 \f4\fs16 \sbasedon0\snext0 header;}{\sl220 \f4\fs16 \sbasedon222\snext0 Normal,Screen Font;}{\s2\qc\sa200\sl480 \b\f20\fs36 \sbasedon222\snext2 Chapter Heading;}{\s3\sb200\sa120\sl360 +\b\f20\fs32 \sbasedon222\snext0 Main Subheading;}{\s4\qj\sa120\sl280 \f20 \sbasedon222\snext4 Body text;}{\s5\sb400\sa60\sl320\tx560 \b\f20\fs28 \sbasedon222\snext5 Subheading;}{\s6\sb240\sa60\sl280\tx560\tx860 \b\f20 \sbasedon5\snext6 SubSub heading;}{ +\s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \sbasedon4\snext7 Indent Body;}{\s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 \sbasedon222\snext8 Figure legends;}{\s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 \sbasedon6\snext9 SubSubSub heading;}} +\paperw11880\paperh16820\margl1440\margr1440\widowctrl\ftnbj\ftnrestart \sectd \linemod0\linex0\cols1\endnhere \pard\plain \s2\qc\sa600\sl480 \b\f20\fs36 {\i\fs48 Contents\par +}\pard\plain \s7\qj\fi-560\li560\sa120\sl400\tx560\tqr\tldot\tx8980 \f20 1\tab Preface\tab 1\par +2\tab Introduction\tab 3\par +3\tab Sequence input, editing and sequence library use\tab 17\par +4\tab Managing sequencing projects\tab 26\par +5\tab Analysing sequences to find genes\tab 51\par +6\tab Searching for motifs in nucleic acid sequences\tab 60\par +7\tab Using patterns to analyse nucleic acid sequences\tab 69\par +8\tab Searching for restriction sites\tab 77\par +9\tab Statistical and structural analysis of nucleotide sequences\tab 83\par +10\tab Translating and listing nucleic acid sequences\tab 93\par +11\tab Statistical and structural analysis of protein sequences\tab 99\par +12\tab Searching for motifs in protein sequences\tab 104\par +13\tab Using patterns to analyse protein sequences\tab 112\par +\pard \s7\qj\fi-560\li560\sa120\sl400\tx560\tqr\tldot\tx8980 14\tab Comparing sequences\tab 123\par +\pard\plain \s2\qc\sa200\sl480\tqr\tldot\tx8980 \b\f20\fs36 \sect \sectd \pgnrestart\linemod0\linex0\cols1\endnhere {\footer \pard\plain \s243\qc\sa60\sl280 \f20 \chpgn \par +}\pard\plain \s2\qc\sa600\sl480 \b\f20\fs36 1. Preface (November, 1992)\par +\pard\plain \s4\qj\sa120\sl280 \f20 This second edition of the manual contains only minor revisions. The changes are mostly to do with managing sequencing pro +jects which is the subject on which we are currently concentrating our efforts. We have replaced our previous Developing Assembly Program DAP with another developing assembly program BAP that can assemble Bigger projects. Although this new program can hand +le 8000 readings as opposed to the miserly 1000 of the previous version, it actualy uses its space more efficiently over the course of a project. It contains a mechanism for preventing simultaneous use (and hence corruption) of databases. In addition it is + approximately four times faster during assembly and five times faster when looking for "internal joins". It now contains a routine for selecting primers and templates during the "walking" stage of a project . The "find internal joins" function now calls u +p the contig joining editor with the two contigs aligned in the window and the editor has also been speeded up. Numerous other changes have also been made but we still regard BAP as temporary, and are actively working on its replacement which we believe wi +ll overcome the limitations that BAPs aged structure has imposed on it. We have also included routines for converting ABI 373A and Pharmacia A.L.F. data to our new trace file format, for automatically marking poor quality regions of readings from these mac +hines and for converting DAP databases to BAP databases.\par +\pard \s4\qj\sa120\sl280 Other changes include providing a postscript option for saving graphics output, and facilities for using the author and freetext indexes of the sequence libraries. The sequence library indexes are v +ery useful and allow rapid searching. The freetext index is derived from ALL the text in the annotations - not just the keywords. We have also added a new repeat examining routine in NIP and a new repeat listing option in SIP.\par +\pard \s4\qj\sa120\sl280 \par +\pard\plain \s2\qc\sa600\sl480 \b\f20\fs36 1. 1 Preface to first edition \par +\pard\plain \s4\qj\sa120\sl280 \f20 +It could be said that this manual is long overdue, for, apart from the extensive online help available from within the programs, it is the first printed guide to using a package that has been around for longer than I care remember. On the other hand, to + misquote a cliche much used by reviewers, it could be said that this manual fills a much needed gap, in that I believe the best way to learn about computer programs is to use them. Those who are prepared to experiment and play with programs will discover +far more than any manual of reasonable size can hope to convey. However the manual serves to give users an overview of what is available and a starting point for their exploration of the programs.\par +\pard \s4\qj\sa120\sl280 One of my objectives was to be able to distribute the manua +l on floppy disk so that each site using the programs could print as many copies as they need. We had to balance the quality of the graphics and the sophistication of the layout, against the ease of producing updates and the availability of software, and d +ecided to to use the WORD4 program running on the Apple Macintosh. The graphics figures reproduced in the manual are far below the quality seen on the terminal screen, and in some cases should be viewed as merely schematic.\par +\pard \s4\qj\sa120\sl280 Most of the chapters are self-contained but users are strongly advised to read sections 3 to 7 in chapter 1, as to do so will save a lot of time.\par +\pard \s4\qj\sa120\sl280 In future editions we will add chapters on other programs in the package and expand the Notes sections to give more information about the theory and algorithms used. We welcome comments and suggestions for improvements.\par +\pard \s4\qj\sa120\sl280 I thank Brian Pashley for transforming my original documents into, what I hope will be, a useful manual.\par +\pard\plain \s3\sb200\sa120\sl360 \b\f20\fs32 Rodger Staden, March 1992.\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 2. Introduction\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Materials\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 2.1\tab Versions\par +2.2\tab Terminals\par +2.3\tab Digitizers\par +2.4\tab Sequencing machines\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab User interfaces\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 3.1\tab The xterm and VAX interface\par +3.2 \tab The X interface\par +3.3\tab Use of the bell\par +3.4\tab Printing and saving results in files\par +3.5\tab Use of feature tables\par +3.6\tab Use of graphics\par +3.7\tab The active region\par +3.8\tab Files of file names\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Character sets\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 4.1\tab Character sets for finished sequences\par +4.2\tab Symbols used in gel readings\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Sequence formats\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 5.1\tab Personal sequence files\par +5.2\tab Sequence libraries\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Conventions used in text\par +7.\tab Notes\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 +In this chapter we give an overview of the chapters on the "Staden Package" of programs. Here we describe the equipment required and outline the scope of the package and the user interfaces. In the next chapter we cover character sets, sequence formats and + sequence library access.\par +\pard \s4\qj\sa120\sl280 The main programs in the package are as follows\:\par +\pard\plain \s7\qj\sa120\sl280\tx1120 \f20 GIP\tab Gel input program\par +\pard \s7\qj\sa120\sl280\tx1120\tx1580 SAP\tab Sequence assembly program\par +\pard \s7\qj\sa120\sl280\tx1120 BAP\tab Sequence assembly program\par +NIP\tab Nucleotide interpretation program\par +PIP\tab Protein interpretation program\par +SIP\tab Similarity investigation program\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx1120 MEP\tab Motif exploration program\par +NIPL\tab Nucleotide interpretation program (library)\par +PIPL\tab Protein interpretation program (library)\par +SIPL\tab Similarity investigation program (library)\par +XBAP\tab Sequence assembly program\par +XNIP\tab Nucleotide interpretation program\par +XPIP\tab Protein interpretation program\par +XSIP\tab Similarity investigation program\par +XMEP\tab Motif exploration program\par +\pard\plain \s4\qj\sa120\sl280 \f20 GIP uses a digitiser for entry of DNA sequences from autoradiographs. SAP, BAP and XBAP handle everything relating to assembling and edi +ting gel readings. NIP provides functions for analysing and interpretting individual nucleotide sequences. PIP provides functions for analysing and interpretting individual protein sequences. MEP analyses families of nucleotide sequences to help discover n +ew motifs. NIPL performs pattern searches on nucleotide sequence libraries. PIPL performs pattern searches on protein sequence libraries. SIP provides functions for comparing and aligning pairs of protein or nucleotide sequences. SIPL searches nucleotide a +nd protein sequence libraries for entries similar to probe sequences. The programs whose names begin with a letter X are X11 (see below) versions of the programs. For example XNIP is an X11 version of NIP.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Materials\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Versions.\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The programs run on Apple Macintosh computers, on VAX computers using the VMS operating system, and on SUN workstations (which use the UNIX operating system.) The SUN version should run, with only minor changes, on other machines running UNIX and currently + we are aware of versi +ons running on DEC ULTRIX, Silicon Graphics, Alliant FX2800 and Convex machines. Currently the Macintosh version is "frozen" in its April 1990 state, the VAX version is "frozen" in its April 1991 state and all development is being done on the SUN version. +\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.1\tab VAX version.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The VAX version will run on any VAX using the VMS operating system. A FORTRAN compiler is required.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.2\tab UNIX version.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The UNIX version is being used here on SPARCstations and DECstation 5000/240s with at least 8 megabytes of memory, 20 +0 megabyte internal disk drives and 700 megabyte external disks. Colour monitors such as the GX are preferable for running the programs which display traces from fluorescent sequencing machines, but monochrome displays are adequate for all other programs. +We also use tape desktop backup packs for archiving, and a cdrom drive for handling the sequence libraries.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.3\tab Other UNIX versions.\par +\pard\plain \s4\qj\sa120\sl280 \f20 Users of UNIX machines other than SUN SPARCstations, DECstation 5000/240 and SGI Indigo R3000 will require a FORTRAN comp +iler and ANSI C. When operated directly on the workstation screen all UNIX versions require X11 release 4.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.4\tab The Macintosh version\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The Macintosh version of the package requires a machine with at least 1 megabyte of memory and a 20 megabyte hard disk. It only operates on monochrome screens or colour screens set to black/white mode. The package contains only programs SAP, GIP, NIP, PIP + and SIP. All further information about this version of the package is contained in the notes.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Terminals.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The program +s can also be operated via a serial port using Tektronix terminals, PC's running MS-Kermit, or Apple Macintoshs running Versaterm Pro. The UNIX versions can also be run from X teminals or microcomputers running X emulators.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Digitizers.\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The gel reading input program uses a sonic digitizer called a GRAPHBAR GP7 made by Science Accessories Corp., 200 Watson Blvd., Stratford, CT 06497, USA. When ordering specify that the device should be set to use metric units.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Sequencing machines.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The programs can handle data produced by the Applied Biosystems Inc. 373A and Pharmacia A.L.F fluorescent sequencing machines.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab User Interfaces\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The programs have two user interfaces. The first runs under the terminal emulator xterm and the second runs directly under X. On the VAX, at present only the xterm interface is available, but on UNIX systems either interface can be used. The xterm version +of the package will operate on the workstation screen, X terminals, Tektronix terminals, PC's or Macintoshes (see above). When run + on the workstation screen the programs have separate text and graphics windows, each of which can be moved, resized and iconized, and the text windiow can be scrolled in both directions. The versions that run directly under X can only be used on the works +tation screen, X terminals or using an X emulator. They produce separate text and graphics windows, an independent, constantly available help window and a separate dialogue window. All input is controlled by mouse selection and dialogue boxes.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.1\tab The xterm and VAX interface\par +\pard\plain \s4\qj\sa120\sl280 \f20 The user interface is common to all programs. It consists of a set of menus and a uniform way of presenting choices and obtaining input from the user. This section describes\: + the menu system; how options are selected and other choices made; how values are +supplied to the program; how help is obtained, and how to escape from any part of a program. In addition it gives information about saving results in files and the use of graphics for presenting results.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.1.\tab Menus and option selection\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Each program has several menus and numerous options. Each menu or option has a unique number that is used to identify it. Menu numbers are distinguished from option numbers by being preceded by the letter m (or M, all programs make no distinction between u +pper and lower case letters). With the exception of some parts of program SAP, the menus are not hierachical, rather the options they each contain are simply lists of related functions and their identifying numbers. Therefore options can be selected inde +pendently of the menu that is currently being shown on the screen, and the menus are simply memory aides. All options and menus are selected by typing their option number when the programs present the prompt \par +\pard \s4\qc\sb120\sa180\sl280 "? Menu or option number =" \par +\pard \s4\qj\sa120\sl280 +To select a menu type its number preceded by the letter M. To select an option type its number. If users type only "return" they will get menu m0 which is simply a list of menus. If users select an option they will return to the current menu after the func +tion is completed. Where possible, equivalent or identical options have been given the same numbers in all programs, and so users quickly learn the numbers for the functions they employ most often.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.2\tab Execution and dialogue\par +\pard\plain \s4\qj\sa120\sl280 \f20 +All inputs requested by the program (apart from file names) have default values. In addition most of the analytical functions have a default path through which they will pass, so when users select an option, in many cases the program will immediately perfo +rm the operation selected without further dialo +gue. However if users precede an option number by the letter d (e.g. D17), they will force the program to offer dialogue about the selected option before the function operates, hence allowing them to change the value of any of its parameters. In addition, +alternative suboptions will be made available.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.3\tab Help\par +\pard\plain \s4\qj\sa120\sl280 \f20 Help about each option can be obtained by preceding the option number by the symbol ? when users are presented with the prompt "? Menu or option number", (e.g. ?17 gives help on the option 17), but +there are two further ways of obtaining help. Whenever the program asks a question users can respond by typing the symbol ? and they will receive information about the current option. In addition, option number 1 in all the programs will give help on all o +f a programs functions. \par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.4.\tab Quitting \par +\pard\plain \s4\qj\sa120\sl280 \f20 To exit from any point in a program users type ! for quit. If a menu is on the screen this will stop the program, otherwise they will be returned to the last menu. \par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.5.\tab Making selections\par +\pard\plain \s4\qj\sa120\sl280 \f20 Questions and choices are dealt with in three ways. Where there are choices that are not obvious opposites, or there are more than two choices, "radio buttons" and "check boxes" are used.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\pagebb\tx1140 \b\f20 3.1.5.1.\tab Choosing between opposites.\par +\pard\plain \s4\qj\sa120\sl280 \f20 Obvious opposites such as "clear screen" and "keep picture" are presented with only the default shown. For example in this case the default is generally "keep picture" so the program will display\: \par +\pard\plain \li1720\sa200\sl220 \f4\fs16 "Keep picture (y/n) (y) =" \par +\pard\plain \s4\qj\sa120\sl280 \f20 and the picture will be retained if the user types Y or y or only return. If the user types N or n the picture will be cleared. Anything other than these or ? or ! will cause the question to be asked again.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.5.2. \tab Choosing one from many.\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Radio buttons are used when only one of a number of choices can be made at any one time. The choices are presented arranged one above the other, each choice with a number for its selection, and the default choice marked with an X. For example when the user + is reading a new sequence file the following choices of format are offered.\par +\pard\plain \li1720\sb300\sl220\tx2460\tx3400 \f4\fs16 Select sequence file format\par +\pard \li1720\sl220\tx2460\tx3400 \tab 1\tab Staden\par +\tab 2\tab EMBL\par +X\tab 3\tab GenBank\par +\tab 4\tab PIR\par +\tab 5\tab GCG\par + 6 FASTA\par +\pard \li1720\sa300\sl220\tx2460\tx3400 ? Selection (1-5) (3) =\par +\pard\plain \s4\qj\sb60\sa120\sl280 \f20 Any single option can be selected by typing the option number, and the default option, (here shown as 3), is also obtained by typing only "return". Again help can be obtained by typing ? and quit by typing !. +\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.5.3.\tab Choosing at least one from many.\par +\pard\plain \s4\qj\sa120\sl280 \f20 Check boxes are used when any number of a set of choices can be made (i.e. the choices are not exclusive). Choices are made by typing choice numbers. Each choice c +an be considered as a switch whose setting is reversed when it is selected. Choices that are currently switched on are marked with an X. The user quits from making selections by typing only "return". For example in the routine that plots base composition u +sers can elect to plot the frequencies of any combination of bases, e.g. only A, or A+T, or A+T+G etc. The following check box is offered to the user\: \par +\pard\plain \li1720\sb300\sl220\tx2420\tx3400 \f4\fs16 X\tab 1\tab T\par +\pard \li1720\sl220\tx2420\tx3400 \tab 2\tab C\par +X\tab 3\tab A\par +\tab 4\tab G\par +\pard \li1720\sa300\sl220\tx2420\tx3400 ? Selection (1-4) ( ) =\par +\pard\plain \s4\qj\sb60\sa120\sl280 \f20 As shown this will plot the A+T composition. To switch off T select 1, to switch on C select 2, etc, to quit, having set the bases required type only "return". \par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 \page 3.1.6.\tab Input of numerical values \par +\pard\plain \s4\qj\sa120\sl280 \f20 All input of integer or decimal numbers is presented in a standard way with the allowed range shown in brackets and the default value also in brackets. For example\: \par +\pard\plain \li1700\sb160\sa300\sl220 \f4\fs16 ? Window (5-31) (11) = \par +\pard\plain \s4\qj\sa120\sl280 \f20 In this example users could type any number between 5 and 31, or "return" only, or ! or ? (see above). Any other input will cause the program to ask the question again. Typing only "r +eturn" gives the default value (here 11). \par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.7.\tab Input of character strings\par +\pard\plain \s4\qj\sa120\sl280 \f20 Character strings are requested using informative prompts of the form\:\par +\pard\plain \li1720\sb160\sa300\sl220 \f4\fs16 ? Search string =\par +\pard\plain \s4\qj\sa120\sl280 \f20 Or where possible the prompt will be preceded by a default value as in\:\par +\pard\plain \li1720\sb160\sl220 \f4\fs16 Default search string = atatatata\par +\pard \li1720\sa300\sl220 ? Search string =\par +\pard\plain \s4\qj\sa120\sl280 \f20 Question mark (?) or ! will get help or quit. Where appropriate, for example when a whole list of strings have been defined one after the other, typing return only will be a signal to the program that input is complete. +\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.2.\tab The X interface\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This interface deals with all the types of interactions described above but options are selected using pulldown menus and all inputs are via appropriately styled dialogue boxes and buttons. Default values are accepted by clicking on an "OK" button, or typi +ng return on the keyboard. Values are changed by overtyping the defaults. Quit is available from each dialogue via a "CANCEL" button. Help is constantly available via a "HELP" button in the main dialogue window. Details such as requestin +g dialogue when an option is selected are dealt with using a button labelled "execute with dialogue" which toggles to "execute".\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.3.\tab Use of the bell \par +\pard\plain \s4\qj\sa120\sl280 \f20 The programs use the bell to indicate that a task is completed. When the bell sounds, the programs will wait until return is typed. Users can quit from these points by typing ! but no help is available.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.4.\tab Printing and saving results in files \par +\pard\plain \s4\qj\sa120\sl280 \f20 A few of the functions in the programs automatically write their textual results to disk files, but for most functi +ons users can choose whether results appear on the terminal screen or go to a file. For these functions the normal, or default, place for results to appear is on the screen, and users need to decide before the function is selected if they want to redirect +the results to a file. In all programs the option "Redirect output" gives control over whether results appear on the screen or go to a file. When a program is started results will be sent to the screen. If the option "Redirect output" is selected users wil +l be given the choice of redirecting either text or graphics to a file or of creating a postscript file for the graphics. The program will then ask users to supply a file name. If users elect to redirect output, from that point on ,all results will be sent + to the file until the option is selected again, in which case the "redirection file" will be closed, and results will again appear on the screen. If these files contain textual results they can be looked at from within the programs by using option "List +a text file". Once the program is left users can employ an appropriate system command to print the files. There is no function within the programs to direct files to a printer. If users elect to create a postscript file for the graphics the graphics will a +lso appear on the screen. If they redirect graphics the graphics commands (in Tektronix codes) will only go to the file and will not appear on the screen\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 3.5.\tab Use of feature tables\par +\pard\plain \s4\qj\sa120\sl280 \f20 One particular use of redirection should be noted. The programs can use EMB +L/GenBank feature tables as input for directing translation of DNA to protein, etc, but the tables must be stored in separate text files, and cannot be read directly from the sequence libraries. The only routines that can read the sequence libraries are th +ose available under "Read a sequence". So to create a text file containing the feature table for a particular library entry users must redirect text output to disk, and then use the "Read a sequence" to display the appropriate feature table. The feature ta +ble will be written to the file, and then the file can be used for controlling translation etc. Note however that the redirection mechanism is a general function and it therefore does not add the required header and tail to saved files. To make the files u +seable as feature tables they need, as a minimum, a line at the top with the word FEATURES starting in column 1, and two empty lines at the end of the file!\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.6.\tab Use of graphics \par +\pard\plain \s4\qj\sa120\sl280 \f20 The analytical programs including NIP, PIP and SIP present the results of many of their analyses graphically.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.1.\tab The drawing board and plot positions\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The position at which the results for any function appear on the screen is defined relative to a notional users "drawing board" of dimension 10,000 by 10,000. This drawing board fills the screen and results are drawn in windows defined using symbols x0,y0 +and xlength,ylength, where x0,y0 is the position of the bottom left hand corner of the window, and xlength is the width of the window and ylength the height of the window. The win +dow positions for each option are read from a file when a program is started. If required individual users can have their own set of plot positions, and also the positions can be redefined from within the programs using the option "Reposition plots". +\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.2.\tab The plot interval\par +\pard\plain \s4\qj\sa120\sl280 \f20 +For those analyses that draw continuous lines to represent results (for example a plot of base composition) the user is asked to supply the "Plot interval". All the analyses produce a value for every point along the sequence but often i +t is unnecessary to actually plot the values for all the points. The plot interval is simply the distance between the points shown on the screen. If the user selects a plot interval of 1, every point will be plotted; a plot interval of 3 will show every th +ird point. \par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.3.\tab The window length\par +\pard\plain \s4\qj\sa120\sl280 \f20 The word "window" is used in a further way by the programs. Most of the functions that analyse the content of a sequence (the simplest such routine plots the base composition) perform their calculations over a segment o +f the sequence of a certain length, display the result, then move on by 1 position, and recalculate. The fixed size of segment over which a calculation is performed is called a "window" and the segment size is the "window length". Many analytical functions + request "? Window length =", or more frequently "? Odd window length =". An odd number is used so that when a result is displayed for a particular window position it is derived from an equal number of points either side of the windows' midpoint.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.4.\tab Use of the cross hair\par +\pard\plain \s4\qj\sa120\sl280 \f20 +All programs that produce graphical output provide a function for using a cross hair to examine the plots. After the cross hair function is selected the cross will appear in the graphics window and can be steered around using the mouse or directional keys. + Special keyboard characters hit while the function is in operation produce the following results. For all programs the letter s (for sequence) will show the local sequence around the cross hair position. For the sequence comparison pro +grams that show a dot matrix the two sequences will be displayed above one another. For the sequencing project management programs all the aligned sequences in the contig will be displayed. For the sequence comparison programs the letter m (for matrix) wil +l show a matrix in which all identical characters for a window around the cross hair are marked. The punctuation symbol , will show the local position in sequence units, but leave the cross hair on the screen, whereas the space bar and any other non-specia +l character will show the local position and exit the cross hair function. Further special characters are defined in the chapter on managing sequencing projects.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.5\tab Drawing scales on plots\par +\pard\plain \s4\qj\sa120\sl280 \f20 All the programs have a function "Draw a ruler" which will allow users to add scales to the axes of graphical plots. The scale can be positioned anywhere on the plot.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.6\tab Saving graphics\par +\pard\plain \s4\qj\sa120\sl280 \f20 The best way of saving the graphics is to use the "Redirect output" function to open a postscript file which will then contain a co +py of all plots that appear on the screen. This of course requires the file to be opened before the plots are drawn. Many terminals are not capable of dumping their screen contents to a file for subsequent printing. One convenient way of obtaining hard cop +y of graphical results is to use a micro computer as a terminal. On the Macintosh we use the terminal emulator versa termPro. This allows graphics to be saved as Macintosh files that can be annotated and printed using Macdraw and other painting programs. A +lternatively graphics can be redirected to a file and printed using a laser printer with tektronix capability (see "Printing and saving results in files"). \par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.7.\tab The active region\par +\pard\plain \s4\qj\sa120\sl280 \f20 +All the analytical programs use an "active region" for most of their functions. This is simply the current section of the sequence over which the analysis will be applied. When a sequence is first read in the active region will be set to its whole length, +but the user can restrict the scope of analytical functions by use of an opt +ion called "Define active region". However some functions such as "List the sequence" are always given access to the whole sequence and will allow the user to define a limited range after they have been selected.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.8.\tab Files of file names\par +\pard\plain \s4\qj\sa120\sl280 \f20 +A useful device that is employed by many of the programs is that of "files of file names". If a program needs to perform the same operation in turn on each of 20 files, the user should not have to type in 20 file names. Instead the user types in the name o +f a single file which contains the names of the other 20 files. This single file is a file of file names. They are used, for example, to process batches of gel readings, or to compare a sequence against a library of motifs.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab Character Sets\par +\pard\plain \s4\qj\sa120\sl280 \f20 There are two types of character sets employed by the programs\: those for finished sequences and those used during sequencing projects.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 4.1\tab Character sets for finished sequences\par +\pard\plain \s4\qj\sa120\sl280 \f20 The analytical programs will operate with uppercase or lowercase sequence characters. For nucleic acids T and +U are equivalent. For proteins the standard 1 letter codes are used. The analytical programs also use IUB symbols for redundancy in back translations and for sequence searches. The symbols are shown in table 2.1 \par +\pard \s4\qj\li2260\ri2220\sb300\sa120\sl280\box\brsp100\brdrth \tx3420\tx4800 A,C,G,T\par +\pard \s4\qj\li2260\ri2220\sa120\sl280\box\brsp100\brdrth \tx3420\tx4800 R\tab (A,G)\tab 'puRine'\par +Y\tab (T,C)\tab 'pYrimidine'\par +W\tab (A,T)\tab 'Weak'\par +S\tab (C,G)\tab 'Strong'\par +M\tab (A,C)\tab 'aMino'\par +K\tab (G,T)\tab 'Keto'\par +H\tab (A,T,C)\tab 'not G'\par +B\tab (G,C,T)\tab 'not A'\par +V\tab (G,A,C)\tab 'not T'\par +D\tab (G,A,T)\tab 'not C'\par +\pard \s4\qj\li2260\ri2220\sa120\sl280\keepn\box\brsp100\brdrth \tx3420\tx4800 N\tab (G,A,C,T)\tab 'aNy'\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Table 1.1\tab The NC-IUB characters used by the analytical programs\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 4.2\tab Symbols used in gel readings\par +\pard\plain \s4\qj\sa120\sl280 \f20 Th +e information stored about a sequence reading has to show the original sequence, recording any doubts about its interpretation, and also, where possible, allow the changes made during editing to be indicated. Lowercase characters are used by the sequence p +roject management programs for recording readings, and uppercase symbols are used when changes are made during editing. Alternatively the reverse convention can be used. Any other characters in a sequence are treated as dash (-) characters. The symbols are + shown in table 2.2.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 5.\tab Sequence Formats\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The data formats for the programs that deal with sequencing projects are described in the chapter on managing sequencing projects. All analytical programs can read sequences stored in several formats. We distinguish between two sources of input namely\: + "sequence libraries" and "personal files".\par +\pard \s4\qj\sa120\sl280 \par +\pard \s4\qj\li1120\ri1200\sa120\sl280\box\brsp100\brdrth \tqc\tx2800 {\b Symbol \tab Meaning}\par +\pard \s4\qj\li1120\ri1200\sl280\box\brsp100\brdrth \tx1400\tqc\tx2800\tqc\tx4240\tqc\tx5640\tx6820 \tab c\tab Definitely\tab c\par +\tab t\tab "\tab t\par +\tab a\tab "\tab a\par +\tab g\tab "\tab g\par +\tab 1\tab Probably\tab c\par +\tab 2\tab "\tab t\par +\tab 3\tab "\tab a\par +\tab 4\tab "\tab g\par +\tab d\tab "\tab c\tab Possibly\tab cc\par +\tab v\tab "\tab t\tab "\tab tt\par +\tab b\tab "\tab a\tab "\tab aa\par +\tab h\tab "\tab g\tab "\tab gg\par +\tab k\tab "\tab c\tab "\tab c-\par +\tab l\tab "\tab t\tab "\tab t-\par +\tab m\tab "\tab a\tab "\tab a-\par +\tab n\tab "\tab g\tab "\tab g-\par +\tab r\tab a or g\par +\tab y\tab c or t\par +\tab 5\tab a or c\par +\tab 6\tab g or t\par +\tab 7\tab a or t\par +\tab 8\tab g or c\par +\tab -\tab a or g or c or t\par +\pard \s4\qj\li1120\ri1200\sl280\box\brsp100\brdrth \tx1400\tqc\tx3780\tqc\tx4240\tqc\tx5640\tx6820 \tab A\tab a set by auto edit or corrected by user\par +\tab C\tab c set by auto edit or corrected by user\par +\tab G\tab g set by auto edit or corrected by user\par +\tab T\tab t set by auto edit or corrected by user\par +\pard \s4\qj\li1120\ri1200\sl280\box\brsp100\brdrth \tx1400\tqc\tx4020\tqc\tx5640\tx6820 \tab *\tab padding character placed by auto assembler\par +\pard \s4\qj\li1120\ri1200\sl280\keepn\box\brsp100\brdrth \tx1400\tqc\tx2800\tqc\tx4240\tqc\tx5640\tx6820 else = -\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa300\sl240\tx1140 \f21\fs20 Table 2.2\tab The symbols used to record gel readings\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 5.1\tab Personal sequence files\par +\pard\plain \s4\qj\sa120\sl280 \f20 The programs can read sequences from files in PIR, EMBL, GenBank, GCG, FASTA and Staden formats. Staden format + means text files with records of up to 80 characters; all spaces are removed; lines with ";" in the first position are treated as comments and will be displayed when the file is read but not included in the sequence; if the first line of data contains a 2 +0 character header of the form <---abcdefghij-----> it too will not be included in the processed sequence. This last facility allows the programs to read consensus sequences created by the sequence project management programs. Files in PIR format can conta +in any number of entries (which the user selects by entry name), but all other formats are expected to contain only one sequence. If they contain more only the first will be read.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 5.2\tab Sequence libraries\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Users may not appreciate the fact that because the sequence libraries are so large, programs need to use indexes to provide rapid retrieval of individual entries. An index is a list of entry names and pairs of offsets. For each entry name the offsets defin +e the position at which its sequence and annotation s +tart in the large file. The index, which is in any case relatively small, is arranged so that it can be searched quickly - for example the EMBL cdrom index is sorted alphabetically. When the user supplies an entry name the program rapidly finds it in the i +ndex file and then uses the associated offsets to locate the entry in the larger sequence files.\par +\pard \s4\qj\sa120\sl280 The sequence libraries are stored in different ways on the VAX and the SUN. On the VAX we adopted the widely used PIR format and indexing method and on the SUN we use the EMBL cdrom format and indexes.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 5.2.1\tab Sequence libraries on the VAX\par +\pard\plain \s4\qj\sa120\sl280 \f20 +On the VAX all libraries are stored in PIR format, and except for the facility to select entries by accession number, the same functions are provided as those on the SUN. Note that this means that most libraries need reformatting after they have been read + from the distribution media. Because, for each entry, the sequence and its annotation are stored separately, the reformatting process consumes significant computer resources. T +hese reformatting programs are available from PIR and we give no further information here. The programs that search whole libraries of sequences also expect the libraries to be in PIR format.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 5.2.2.\tab Sequence libraries for the UNIX version\par +\pard\plain \s4\qj\sa120\sl280 \f20 +For the UNIX version of the programst we use the EMBL cdrom as the primary source of sequence data and have chosen their indexing method for all libraries. These indexes leave the sequence libraries in their distribution format and simply provide offsets t +o the original fi +les. The cdrom provides the EMBL nucleic acid sequence library and the SWISSPROT protein sequence library. Currently it also includes indexes for entry names, accession numbers, authors and freetext and has an additional "title" file which, for each entry, + consists of entry name, entry length and an 80 character description of the entry. These indexes allow rapid retrieval of entries by name or accession number, and the author and freetext indexes can be searched very rapidly. The files can be left on the +cdrom or transfered to a hard disk. The programs that search whole libraries of sequences expect the libraries to be in cdrom format or PIR format.\par +\pard \s4\qj\sa120\sl280 +We have written our own programs for producing EMBL cdrom type indexes for other sequence libraries. These allow us to use the PIR protein libraries in CODATA format and between release updates of the EMBL nucleotide library. Others may wish to use them to + produce indexes for libraries such as GenBank. In addition to our own programs the scripts that produce the indexes also use the UNIX sort program. We give no further details here but the programs are described in Staden and Dear, 1992.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 5.2.2.1\tab Library description files.\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The following information is only relevent to those installing the sequence libraries on a SUN. To make the sequence library handling as flexible as possible we use several level of files. As stated above, at present we only deal with the EMBL and SWISSPRO +T libraries as distributed on cdrom and the PIR protein library in CODATA format. By including a "library type" flag in the library description file we also leave open the possibility of using alternative formats. \par +\pard \s4\qj\sa120\sl280 We describe the libraries at 3 levels\: + 1) a list of libraries and their types, which points to 2) the files which name the libraries individual files and their file types, then, finally 3) the librairies individual files. The files used are described below.\par +\pard\plain \s7\qj\fi-1120\li1120\sa120\sl280\pagebb\tx1120 \f20 Level 1)\tab The top level file is a list of available libraries which contains\: the library type, the name of the file containing th +e names of each libraries individual files, and the prompt to appear on the users screen. \par +\pard\plain \s4\qj\sa120\sl280 \f20 Example\: \par +\pard \s4\qj\li1100\sa120\sl280 File name\: SEQUENCELIBRARIES\par +File contents\:\par +\pard\plain \li1120\sl220 \f4\fs16 A\tab EMBLLIBDESCRP EMBL nucleotide library ! in cdrom format\par +A\tab SWISSLIBDESCRP SWISSPROT protein library! in cdrom format\par +\pard \li1120\sa300\sl220 B\tab PIRLIBDESCRP PIR protein library! in CODATA format\par +\pard\plain \s4\qj\sa180\sl280 \f20 The first two libraries are of type A. The logical names are EMBLLIBDESCRP and SWISSLIBDESCRP, and the prompts are "EMBL nucleotide library" and "SWISSPROT protein library". The third library is o +f type B with logical name PIRLIBDESCRP. Space is used as a delimiter and anything to the right of a ! is a comment.\par +\pard\plain \s7\qj\fi-1100\li1100\sa120\sl280\tx1120 \f20 Level 2)\tab The file containing the names of the libraries individual files contains flags to define the file types and the path or logical names of the files. Current file types are\: \par +\pard\plain \fi100\li980\sl220 \f4\fs16 A\tab Division_lookup\par +B\tab Entryname_index\par +C\tab Accession_target\par +D\tab Accession_hits\par +E\tab Brief_directory.\par +F\tab Freetext_target\par +G\tab Freetext_hits\par +H\tab Author_target\par +I\tab Author_hits\par +\pard\plain \s4\qj\sa120 \f20 Example\par +\pard \s4\qj\li1120\sa120 File name\: EMBLLIBDESCRP\par +File contents\:\par +\pard\plain \fi100\li980\sl220 \f4\fs16 A\tab STADTABL/EMBLdiv.lkp\par +B\tab /cdrom/indices/embl/entrynam.idx\par +C\tab /cdrom/indices/embl/acnum.trg\par +D\tab /cdrom/indices/embl/acnum.hit\par +E\tab /cdrom/indices/embl/brief.idx\par +F\tab /cdrom/indices/embl/freetext.trg\par +G\tab /cdrom/indices/embl/freetext.hit\par +H\tab /cdrom/indices/embl/author.trg\par +I\tab /cdrom/indices/embl/author.hit\par +\pard \li1120\sa300\sl220 \par +\pard\plain \s7\qj\fi-1120\li1120\sa120\sl280\tx1120 \f20 Level 3)\tab +The individual library files. The contents of all files below Division_lookup are exactly as they appear on the cdrom. The Division_lookup file is rewritten so the directory structure and file names can be chosen locally. Its format is I6,1x,A. \par +\pard\plain \s4\qj\sb300\sa180\sl280 \f20 The files which define all the programs and standard data files used by the package\: + staden.login and staden.profile, define the file SEQUENCELIBRARIES which contains the list of available libraries. As should be clear from the description above the three +levels need to be created (actually modified from the contents of the distribution tape) and all names can be changed locally, or set to be the same as those on the cdrom.\par +\pard\plain \s7\qj\fi-1120\li1120\sa120\sl280\tx1120 \f20 \par +\pard\plain \s4\qj\sa120\sl280 \f20 Example of Division_lookup file \par +\pard \s4\qj\li1120\sa120\sl280 File name\: STADTABL/EMBLdiv.lkp\par +Contents\:\par +\pard\plain \li1120\sl220 \f4\fs16 1\tab /cdrom/embl/fun.dat\par +2\tab /cdrom/embl/inv.dat\par +3\tab /cdrom/embl/mam.dat\par +4\tab /cdrom/embl/org.dat\par +5\tab /cdrom/embl/phg.dat\par +6\tab /cdrom/embl/pln.dat\par +7\tab /cdrom/embl/pri.dat\par +8\tab /cdrom/embl/pro.dat\par +9\tab /cdrom/embl/rod.dat\par +10\tab /cdrom/embl/syn.dat\par +11\tab /cdrom/embl/una.dat\par +12\tab /cdrom/embl/vrl.dat\par +13\tab /cdrom/embl/vrt.dat\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 6.\tab Conventions Used In Text\par +\pard\plain \s4\qj\sa120\sl280 \f20 Obviously the programs can perform many more operations than there is space to describe but, in the selection of uses shown, we have tried to give some feel for the programs' sco +pe. For this reason, and the need to conform as closely as possible to the format of the book, we have chosen specific paths through the programs, rather than attempt to describe all routes. For some sections, such as that on the facilities available for e +diting contigs, this has not been possible and we have instead described how the major commands are used. It should also be noted that the user interactions described in the methods sections are those that would be required if the options were selected in +the "Execute with dialogue" mode. In practice many of the options would normally be used without any dialogue being required.\par +\pard \s4\qj\sa120\sl280 +In the section on the user interface we outlined the different modes of obtaining input from users. Throughout the specific chapters we have adopted the following conventions to indicate which mode of input is being employed. When a program requests numeri +cal or string input we have used the term "Define", as in Define "Minimum search score". When a program requests that a choice is +made between several options, as in the case of radio buttons or check boxes, we have used the term "Select". When a program offers a choice between two options in the form of a yes or no answer, as in "Hide translation", we use the terms "Accept" or "Reje +ct". When the digitizer program uses the stylus for input we have used the term "Hit".\par +\pard \s4\qj\sa120\sl280 Because it is difficult to produce figures including pull down menus and dialogue boxes, almost all examples containing user input are taken from the xterm interface. Ho +wever the actual wording of the prompts is the same for both interfaces.\par +\pard \s4\qj\sa120\sl280 +The programs contain routines for drawing scales on plots and for simple annotation, but in general such embellishment is not done automatically by the programs. This is because the programs are designed so that many plots can be superimposed, and it is be +tter for the user to explicitly decide to add scales and annotation. More elaborate annotation can be added by saving the graphics output to files which can be handled by, say Macinto +sh, painting and drawing programs. None of the examples of graphical results shown in the following chapters have added scales\: all are exactly as drawn by the programs.\par +\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \par +\par +\par +\par +7.\tab NOTES\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 7.1\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Although all the programs in the Macintosh version of the package work, the conversion to this machine was never finished. The package does not provide access to the sequence libraries, handling only simple text files containing sequences, or those generat +ed by the assembly program SAP. The user interface, although using pu +ll down menus and dialogue boxes for all interactions, is not as "Mac like" as many would expect. However many people find this version very useful, and for others, the digitizer program alone makes the package worth having. Data input from a digitizer is +a task suited to a machine like the Macintosh, and the data files can be transferred to a larger machine for assembly and other analysis. With the exception of sequence library access, all the options available in the 1990 VAX version are contained in the +package (See Staden, 1990). We give no further details specific to the Macintosh version.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 8.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1990. An improved sequence handling package that runs on the Apple Macintosh. Comput. {\i Applic. Biosc}. {\b 4}, 387-393.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Staden, R. and Dear, S. 1992. Indexing the sequence libraries\: Software providing a common indexing system for all the standard sequence libraries. {\i DNA Sequence} {\b 3}, 99-105.\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 3. Sequence Input, Editing and Sequence Library Use\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 1.1\tab Introduction to sequence input\par +1.2 \tab Introduction to keyboard input\par +1.3\tab Introduction to input from digitizer\par +1.4\tab Introduction to editing single sequences\par +1.5\tab Introduction to using the sequence libraries\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Sequence input from keyboard\par +2.2\tab Sequence input from digitizer\par +2.3\tab Sequence input from the Pharmacia A.L.F.\par +2.4\tab Sequence input from the ABI 373A.\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.5\tab Editing a nucleic acid sequence using restriction sites and a translation and base numbering as landmarks.\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.6\tab Searching the freetext and author indexes of a sequence library\par +2.7\tab Using accession numbers to retrieve data from a sequence library\par +2.8\tab Displaying the annotations for an entry in a sequence library\par +2.9\tab Reading a sequence from sequence library\par +2.10\tab Worked example of sequence library access\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +4.\tab References\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we describe sequence input and editing and the use of sequence libraries.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.1\tab Introduction to sequence input and editing\par +\pard\plain \s4\qj\sa120\sl280 \f20 The package contains facilities for input of sequence data from the keyboard, sonic digitizer +s, and ABI 373A and Pharmacia A.L.F fluorescent sequencing machines. Editing of single sequences can be performed using system editors such as EDT on the VAX and EMACS on the SUN. Editing of sequence alignments is discussed in the chapter on managing sequ +encing projects.\par +\pard\plain \s6\sa60\sl280\pagebb\tx560\tx860 \b\f20 1.2\tab Introduction to keyboard input\par +\pard\plain \s4\qj\sa120\sl280 \f20 The program SAP contains an option to enter sequence at the keyboard. It also creates a file of file names and will list the sequences. Users may choose any 4 keys to represent the characters A, C, G and +T. For example 4 adjacent keys in the same order as the lanes on a gel could be used. The program translates these symbols to A, C, G and T, and any other characters are left unchanged. No line of input should be longer than 80 characters. Terminate input +with the symbol @.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.3\tab Introduction to input from digitizer\par +\pard\plain \s4\qj\sa120\sl280 \f20 Digitisers provide a convenient way of entering sequences from films into a computer. The digitiser, which is connected directly to the computer, operates on a light box, and is controlled by a pr +ogram named GIP (1). The film to be read is taped firmly to the surface of the light box, and the user defines the lane order and the centres of the four lanes to be read. These positions are defined at the point where reading will commence and the program + adjusts their values as the film is read. The user reads the sequence and transfers it to the computer by hitting the centres of the bands progressing up the film. Any number of sets of lanes and films can be read in a single run of the program. Each sequ +ence is stored in a separate file and a file of file names is also written. The program also uses a menu, which is a series of reserved areas of the light box surface, for entering commands and uncertainty codes. When the pen is pressed in these areas the +program responds accordingly. Each time the pen tip is depressed in the digitizing area the program sounds the bell on the terminal to indicate to the user that a point has been recorded. As the sequence is read the program displays it on the screen. +\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.4\tab Introduction to editing single sequences\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The editing method used by the programs is designed to give users access to an editor with which they are familiar - i.e. the one on their machine, say EDT on a VAX or EMACS on a UNIX system, and yet to allow them to edit a sequence which contains all the +landmarks they need in order to know where they are. Users can create a file containing a simple listing of the sequence (single stranded) with numbering, using "list the sequence", and then edit it with their syste +m editor, using the numbering to know where they are within the sequence. When the edits are complete they exit from the editor and the program "analyses" the edited file to extract only the sequence characters. Similarly a file containing a three phase tr +anlslation, or a file containing a sequence plus its three phase translation, plus its restriction sites marked above the sequence (see figure 3.1), can be edited. In order to be able to "analyse" such complicated listings and correctly extract the sequenc +e the following simple rule is used\: + all lines in the file that contain a character that is not A,C,T,G or U are deleted. It is obviously important to be aware of this rule and its implications. For protein sequences only a simple listing i.e. the sequence plus numbering, can be used.\par +\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 1.5\tab Introduction to using the sequence libraries\par +\pard\plain \s4\qj\sa120\sl280 \f20 The installation of the sequence libraries is described in the introductory chapter. Direct access to the libraries is provided by all programs that need such a facility\: it is + not performed by separate programs. The facilities currently offered in NIP, PIP, SIP, NIPL, PIPL, and SIPL include the following\:\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab Get a sequence by knowing its entry name\par +\tab Get a sequences' annotation by knowing its entry name\par +\tab Get an entry name by knowing its accession number\par +\pard\plain \li1120\ri1240\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 HapII\par +\pard \li1120\ri1240\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth HpaII\par + MspI MseI\par +. .HincII\par +. .HindII\par +. .HpaI DsaV\par +. .. EcoRII\par +. .. TspAI\par +. .. . ApyI\par +. .. . BstNI\par +. .. . MvaI\par +. .. . ScrFI MaeIII\par +. .. . . . BsrI MseI\par +ccggttagactgttaacaacaaccaggttttctactgatataactggttacatttaacgc\par + 10 20 30 40 50 60\par + P V R L L T T T R F S T D I T G Y I * R\par + R L D C * Q Q P G F L L I * L V T F N A\par +\pard \li1120\ri1240\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth G * T V N N N Q V F Y * Y N W L H L T P\par +\pard\plain \s8\qj\fi-1140\li1140\sb80\sa120\sl240\tx1140 \f21\fs20 Figure 3.1\tab The first page width of a sequence display that can be edited by the program.\par +\pard\plain \s7\qj\fi-560\li560\sb360\sa120\sl280\tx560 \f20 \tab Search the author index for author names\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab Search the freetext index for keywords\par +\pard\plain \s4\qj\sa120\sl280 \f20 The facilities currently offered in NIPL, PIPL and SIPL include\:\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab Search whole library\par +\tab Search only a list of entry names\par +\tab Search all but a list of entry names\par +\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Sequence input from keyboard\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Type in gel readings".\par +2.\tab Accept "Use special keys for A,C,T,G".\par +3.\tab Define the keys in turn.\par +4.\tab Define "File file names". A file of file names so the readings can be processed as a batch.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define in the sequence by typing it in using the selected keys. Finish by typing an @ symbol.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "File name for this gel reading". This is the name for the sequence just entered.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Accept "Type in another reading". This cycles round to step 5. If rejected the next step follows.\par +8.\tab Accept "List gel readings". The batch of readings entered will each be listed, one after the other, headed by their file names, on the screen.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Sequence input from digitizer\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Tape the autoradiograph down securely on the light box.\par +2.\tab Start the program (GIP).\par +3.\tab Define "File of file names".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Using the digitizer pen hit the digitizer menu ORIGIN, program menu ORIGIN, program menu START.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab After the bell has sounded the program will give the default lane order. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab If correct hit CONFIRM otherwise hit RESET. To reset the lane order hit the A,C,G,T boxes in the menu in left to right order.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Hit START, then hit in left to right order, at a height level with the first band to be read, the start positions for the next four lanes. The progr +am will report the mean lane separations and asks for confirmation that they are correct.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Hit START\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Hit the bands on the film in sequence order. If necessary use the uncertainty codes in the program menu. Continue until the sequence is finished.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Hit STOP.\par +10.\tab Define "Name for this reading".\par +11.\tab Accept "Read another sequence". Otherwise the program will stop.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Sequence input from the Pharmacia A.L.F.\par +\pard\plain \s4\qj\sa120\sl280 \f20 After processing and base calling on the PC the data for all 10 clones is contained in a single f +ile, and the user names each using local conventions. Then this single file is transfered to the SUN using PC-NFS. This program allows SUN directories to be mounted as if they were DOS disks and data can be transfered by use of the DOS copy command. On th +e SUN, to prepare for processing by program XBAP the 10 clones are split into 10 separate files each with the names given on the PC. In addition a file of file names is written Then the reads for the individual clones need to be examined to clip off the v +ector sequence and the poor data at the 5' end. See note 2.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Sequence input from the ABI 373A.\par +\pard\plain \s4\qj\sa120\sl280 \f20 After processing and base calling on the Macintosh the data for each clone is contained in 2 files\: + one is simply the sequence but the main file contains the raw data, trace data and sequence. For our processing we do not use the sequence file as we can ex +tract all we need from the main file. The user names each file using local conventions and then the folder is transfered to the SUN using TOPS. This program +allows SUN directories to be mounted as if they were on the Macintosh and data can be transfered by simply dragging folders on the Macintosh screen. On the SUN, to prepare for processing by program XBAP, a file of file names is written and the reads for t +he individual clones are examined to clip off the vector sequence and the poor data at the 5' end. See note 2.\par +\pard\plain \s6\fi-560\li560\sb240\sa120\sl280\tx560\tx980 \b\f20 2.5\tab Editing a nucleic acid sequence using restriction sites and a translation and base numbering as landmarks.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select NIP.\par +2.\tab Read in the sequence to be edited.\par +3.\tab Direct output to disk, say creating file edit.seq.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Use the restriction enzyme site search routine (See the relevant chapter) to create a file showing "Names above the sequence", as in figure 3.1.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Close the redirection file.\par +6.\tab Select "Edit the sequence". \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Name of file to edit". This is the file containing the sequence listing, say edit.seq.The sytem editor will start up.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Edit the sequence.\par +9.\tab Exit from the editor.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Accept "Make edited sequence active". The edited sequence will replace the original sequence. \par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Searching the freetext (or author) index of a sequence library\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Sequence library". The alternative is "Personal file", and if taken would be followed by questions about which of the formats "Staden, EMBL, GenBank, PIR, GCG or FASTA" it was stored in.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select, say, "EMBL nucleotide library".\par +4.\tab Select "Search text index for keywords".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Keywords". Type up to 5 keywords separated by spaces - i.e.space is the delimiting character (see note below about author searches).\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab +The search will start and for each match the program will display the contents of the matching line which includes the entry name, primary accession number, its length and a 80 character description. After every 20 matches the program will ring the bel +l and the user can escape by typing "!".\par +\tab The commands for searching the author index are effectively the same. Note that for authors it is useful to be able to link words together for names s +uch as De Gaule or von Meyenberg. The symbol underscore (_) can be used for this purpose - e.g. De_Gaule or von_meyenberg. The same facility is available for the keyword searches.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.7\tab Using accession numbers to retrieve data from a sequence library\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par +2.\tab Select "Sequence library".\par +3.\tab Select, say, "EMBL nucleotide library".\par +4.\tab Select "Get entry names from accession numbers".\par +5.\tab Define "Accession number". \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab The program will display the entry names corresponding to the accession number. The last entry name found will become the default entry name.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.8\tab Displaying the annotations for an entry in a sequence library\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par +2.\tab Select "Sequence library".\par +3.\tab select, say, "EMBL nucleotide library".\par +4.\tab Select "Get annotations".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Entry name". The program will display the annotation for the entry. After every 20 lines the program will ring the bell and the user can escape by typing "!".\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.9\tab Reading a sequence from a sequence library\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par +2.\tab Select "Sequence library".\par +3.\tab Select, say, "EMBL nucleotide library".\par +4.\tab Select "Get a sequence".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Entry name". The program will make the sequence the active sequence and display its base composition.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.10\tab Worked example of sequence library access\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The worked example in figure 3.2 shows a search of the text index for the keywords p53 and mouse, followed by a search of the author index for the names sanger and coulson, followed by search on accession number v00636, followed by "Get annotatio +ns" for entry lambda, and finally "Get a sequence" for entry lambda. \par +\pard\plain \sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 {\f22\fs18 Select sequence source\par +}\pard \sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 X 1 Personal file\par + 2 Sequence library\par + ? Selection (1-2) (1) =2\par + Select a library\par + X 1 EMBL 29 nucleotide library Dec 91\par + 2 SWISSPROT 20 protein library Nov 91\par + 3 PIR 31 protein library Dec 91\par + 4 NRL3D 58 From Brookhaven protein library Dec 91\par + 5 GenBank example\par + ? Selection (1-5) (1) =\par +Library is in EMBL format with indexes\par + Select a task\par + X 1 Get a sequence\par + 2 Get annotations\par + 3 Get entry names from accession numbers\par + 4 Search author index\par + 5 Search text index for keywords\par + ? Selection (1-5) (1) =5\par + Search for keywords\par + ? Keywords=p53 mouse\par +P53 hits 73\par +MOUSE hits 10140\par +\'00\par + MMANT01 X00875 536 Murine gene fragment for cellular tumour antigen\par + MMANT02 X00876 83 Murine gene fragment for cellular tumour antigen\par + MMANT03 X00877 21 Murine gene fragment for cellular tumour antigen\par + MMANT04 X00878 261 Murine gene fragment for cellular tumour antigen\par + MMANT05 X00879 184 Murine gene fragment for cellular tumour antigen\par + MMANT06 X00880 113 Murine gene fragment for cellular tumour antigen\par + MMANT07 X00881 110 Murine gene fragment for cellular tumour antigen\par + MMANT08 X00882 137 Murine gene fragment for cellular tumour antigen\par +}\pard \sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 MMANT09 X00883 74 Murine gene fragment for cellular tumour antigen\par + MMANT10 X00884 107 Murine gene for cellular tumour antigen p53 (exon\par + MMANT11 X00885 562 Murine p53 gene 3' region with exon 11\par + MMANTP53 M26862 536 Mouse tumor antigen p53 gene, 5' end.\par + MMLYN M64608 2044 Mouse lyn protein mRNA, complete cds.\par + MMP53 X00741 1377 Mouse mRNA for transformation associated protein\par + MMP53A M13872 1285 Mouse p53 mRNA, complete cds, clone pcD53.\par + MMP53B M13873 1241 Mouse p53 mRNA, complete cds, clone p53-m11.\par + MMP53C M13874 1322 Mouse p53 mRNA, complete cds, clone p53-m8.\par + MMP53G1 X01235 554 Mouse genomic DNA for 5' region of cellular tumou\par + MMP53IN4 X60470 729 M.musculus p53 gene for p53 protein, intron 4\par +\'00\par + MMP53P X01236 2132 Mouse pseudogene for cellular tumour antigen p53\par + MMP53R X01237 1773 Mouse mRNA for cellular tumour antigen p53\par + MMRSB2P5 M64597 196 Mouse B2 repeat in the 3' flank of protein 53 (p5\par + MMSFFV1 X64656 165 M.musculus Friend spleen focus forming virus (SFF\par + MMSFFV2 X64657 142 M.musculus Friend spleen focus forming virus (SFF\par + 24 different entries found\par +\'00\par + Select a task\par + X 1 Get a sequence\par + 2 Get annotations\par + 3 Get entry names from accession numbers\par + 4 Search author index\par + 5 Search text index for keywords\par + ? Selection (1-5) (1) =4\par + Search for keywords\par + ? Keywords=coulson sanger\par +COULSON hits 935\par +SANGER hits 15\par +\'00\par + LAMBDA V00636 48502 Genome of the bacteriophage lambda (Styloviridae)\par + MIBTXX V00654 16338 Complete bovine mitochondrial genome.\par + MIHSCG J01415 16569 Human mitochondrion, complete genome.\par + MIHSM1 M10546 2771 Human mitochondrial DNA, fragment M1, encoding tr\par + MIHSXX V00662 16569 H.sapiens mitochondrial genome\par + MIPX1C01 M10860 130 Bacteriophage phi-X174, nucleotides 3920-4049.\par + MIPX1C02 M10861 115 Bacteriophage phi-X174, nucleotides 3480-3595.\par + MIPX1C03 M10862 121 Bacteriophage phi-X174, nucleotides 4260-4380.\par + MIPX1CTI M10849 130 Bacteriophage phi-X174, nucleotides 3389-3520.\par + PHIX174 V01128 5386 Bacteriophage phi-X174 (cs70 mutation) complete g\par + R17CPRAA M24826 61 Bacteriophage R17 coat protein RNA fragment.\par + 11 different entries found\par +\'00\par + Select a task\par + X 1 Get a sequence\par + 2 Get annotations\par + 3 Get entry names from accession numbers\par + 4 Search author index\par + 5 Search text index for keywords\par + ? Selection (1-5) (1) =3\par + ? Accession number=v00636\par +Entry name LAMBDA\par + Select a task\par + X 1 Get a sequence\par + 2 Get annotations\par + 3 Get entry names from accession numbers\par + 4 Search author index\par + 5 Search text index for keywords\par + ? Selection (1-5) (1) =2\par + Default Entry name=LAMBDA\par + ? Entry name=\par +ID LAMBDA standard; DNA; PHG; 48502 BP.\par +}\pard \sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 XX\par +AC V00636; J02459; M17233; X00906;\par +XX\par +DT 09-JUN-1982 (Rel. 01, Created)\par +DT 03-JUL-1991 (Rel. 28, Last updated, Version 3)\par +XX\par +DE Genome of the bacteriophage lambda (Styloviridae).\par +XX\par +KW circular; coat protein; DNA binding protein; genome;\par +KW origin of replication.\par +XX\par +OS Bacteriophage lambda\par +OC Viridae; ds-DNA nonenveloped viruses; Siphoviridae.\par +XX\par +RN [1]\par +RP 1-48502\par +RA Sanger F., Coulson A.R., Hong G.F., Hill D.F., Petersen G.B.;\par +RT "Nucleotide sequence of bacteriophage lambda DNA";\par +RL J. Mol. Biol. 162\:729-773(1982).\par +XX\par +\'00\par + Select a task\par + X 1 Get a sequence\par + 2 Get annotations\par + 3 Get entry names from accession numbers\par + 4 Search author index\par + 5 Search text index for keywords\par + ? Selection (1-5) (1) =\par + Default Entry name=LAMBDA\par + ? Entry name=\par +DE Genome of the bacteriophage lambda (Styloviridae).\par + Sequence length 48502\par + Sequence composition\par + T C A G -\par + 11988. 11360. 12336. 12818. 0.\par +}\pard \sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 24.7% 23.4% 25.4% 26.4% 0.0%\par +}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 3.2\tab A worked example of sequence library use.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab NOTES\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab +The program menu for GIP is simply a set of boxes drawn on the digitizing surface that each contain a command or uncertainty code. Right handed users will find it is best to position the menu to the right of the digitizing +area, but in practice as long as its top edge is parallel to the digitizer box, it can be put anywhere in the active region. As well as the codes a,c,g,t,1,2,3,4,b,d,h,v,r,y,x,-,5,6,7,8 the following commands are included in the menu\: + DELETE removes the la +st character from the sequence; RESET allows the lane centres to be redefined; START means begin the next stage of the procedure; STOP means stop the current stage in the procedure; CONFIRM means confirm that the last command or set of coordinates are corr +ect. \par +\tab +The digitizing device also has a menu of its own. This lies in a two inch wide strip immediately in front of the digitizing box. Pen positions within this two inch strip are interpretted as commands to the digitizer and are not sent to the GIP program. In + general the only time users will need to use the device menu is when they tell GIP where the program menu lies in the digitizing area. This is done by first hitting ORIGIN in the device menu and then hitting the bottom left hand corner of the progra +m menu. If the bell does not sound after hitting START try hitting METRIC in the device menu (the program uses metric units, and some digitizers are set to default to use inches; hitting metric switches between the two).\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab +The user should try to hit the bands as near as possible to the centre of the lanes because the program tracks the lanes up the film using the pen positions. If the lane centres get too close the program stops responding to the pen positions of bands and +hence does not ring the bell. If t +his occurs users must hit the reset box in the menu and the program will request them to redefine the lane centres at the current reading position. Then they can continue reading. As a further safeguard the program will only respond to pen positions either + in the menu or very close to the current reading position.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Details about preparing the data from fluorescent sequencing machines for processing by XBAP are contained in the notes for the chapter on managing sequencing projects. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab All of the operation +s described for the EMBL nucleotide library can be performed in exactly the same way for GenBank and the SWISSPROT and PIR protein libraries. For keyword searching the freetext index is most useful because it contains all words in feature tables, definiti +on lines, title lines, keywords and comment lines. The searches are very fast. The search will find all words that start with the given keywords\: + e.g. keyword sugar will match with sugar, sugaractivating, sugars, etc. When several keywords are used together, only entries indexed on all the words will be reported. On the VAX, EMBL, GenBank, SWISSPROT and PIR can all be processed. \par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1984. A computer program to enter DNA gel reading data into a computer. {\i Nucl. Acids Res}. {\b 12}, 499-503.\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 4. Managing Sequencing Projects\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Starting a project database\par +2.2\tab Screening against restriction enzyme recognition sequences\par +2.3 \tab Screening against vector sequences\par +2.4 \tab Entering readings in to the project database (assembly)\par +2.5\tab Searching for internal joins\par +2.6\tab Editing in XBAP\par +2.7\tab Joining contigs interactively in XBAP\par +2.8\tab Selecting primers and templates\par +2.9\tab Examining the quality of a consensus\par +2.10\tab Using graphical displays to examine contigs\par +2.11\tab Disassembling contigs\par +2.12\tab Shuffling pads\par +2.13\tab Displaying a contig\par +2.14\tab Highlighting differences between readings and the consensus\par +2.15\tab Screen editing contigs in SAP\par +2.16\tab Automatic editing in SAP\par +2.17\tab Using the original editor in SAP\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +4.\tab References\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Data input, assembly, checking and editing are the major tasks of sequence project management. Data input is described in a previous chapter and here we cover everything else. The programs can deal with data derived from autoradiographs and from automated +gel reading machines such as the Applied Biosystems 373A and the Pharmacia A.L.F. and film readers such as the Amersham scanner \par +\pard \s4\qj\sa120\sl280 We describe two alternative programs for managing sequencing projects. They contain the same assembly and vector screen +ing routines but they differ in their editing methods. One program SAP (see references 1 and 2) can be operated from simple terminals and emulators but the other XBAP (3) requires an X terminal or emulator. XBAP contains a superior editor plus the facility + to annotate sequences and display the coloured traces for data derived from fluorescent sequencing machines. Those using autoradiographs will find that SAP is adequate but XBAP is essential for users of fluorescent sequencing machines. Readers should note + that several of the methods for displaying contigs described below are probably of value only to those unable to use the screen based contig editor in XBAP.\par +\pard \s4\qj\sa120\sl280 +Fluorescent sequencing machines provide machine readable data. This means, given appropriate software, that while making editing decisions the user can see, displayed on the screen, the coloured traces used to derive the sequence. However data from these +machines requires some extra processing. First the machines tend to produce long sequences with po +or quality at their 3' ends and so we have to decide how much of the data to use. Secondly the sequencing machine does not recognise the primer region (as the user would) so we need to have some way of removing it from the data. The poor quality data from +both ends of the sequence and the vector sequences are identified non-interactively by programs clip-seqs and vep. Alternatively these tasks can be performed interactively using program TED (4). We term the data from the 3' end of a reading that is not emp +loyed in the assembly process "unused" sequence. Note that we do not lose this data but simply ignore it until such time as it can be useful for locating joins between contigs, or for double stranding regions of the sequence.\par +\pard \s4\qj\sa120\sl280 +The method described here uses a database to store all the data for each sequencing project. The individual sequence readings derived from autoradiographs or from sequencing machines are initially stored in separate files but the program copies them into t +he database during the assembly process. For normal operation the program handles batches of readings - say 24 from a film or machine run. Batch processing is achieved by use of files of file names. \par +\pard \s4\qj\sa120\sl280 Depending on the strategy employed and the stage of the project the following operations may be performed.\par +\pard\plain \s7\qj\fi-560\li560\sb100\sa120\sl280\tx560 \f20 1)\tab Start a project database.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2)\tab Select primers and templates.\par +3)\tab Obtain readings.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4)\tab Put individual readings into the computer and write a file of file names. For data derived from fluorescent sequencing machines choose which data from + the 3' end of the reading should not be used for the assembly process.\par +5)\tab Screen the batch against any vectors that may be present, excising any vector sequence found and passing to the next step, the names of those readings that contain some non-vector sequence.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6)\tab Screen the batch against any restriction sites whose presence would indicate a problem, passing those that do not match on to the next step.\par +7)\tab Compare each reading in the batch with the current contents of the project database adding them to the contigs they overlap, joining contigs or starting new contigs.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8)\tab +Check the number of contigs and the quality of the consensus sequence and plan further experiments. Try to join contigs by searching for overlaps between their ends. (This is particularly useful for those using data from fluorescent sequencing machines, + where although the 3' end of the sequence is not good enough for automatic assembly, it can be valuable for finding overlaps between contigs).\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9)\tab Edit the contigs to resolve dissagreements.\par +10)\tab Produce a consensus sequence.\par +11)\tab Analyse the consensus sequence, possibly discovering further errors.\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Subsets of these operations will be cycled through repeatedly. A pure shotgun strategy would continue using steps 3-7, a pure primer walking strategy would also include step 2. A number of the steps require almost no user intervention, however checking qua +lity and final editing decisions are still interactive procedures. The program contains several options, such as displays of the overlapping reading +s in a contig, to help indicate, not only the poorly determined regions, but also which clones could be resequenced to resolve ambiguities, or those which can usefully be extended or sequenced in the reverse direction, to cover difficult regions. It is bes +t to use a command procedure or script for handling steps 5-7.\par +\pard \s4\qj\sa120\sl280 For our projects we have a script which users employ by typing "assemble filename", where filename is the file of file names for the current batch of readings. This script calls all the necessa +ry options in SAP or BAP (see notes) in order to make a backup of the database, screen against any vectors, assemble readings and print a report. In the text below we describe how these operations are performed interactively. \par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Starting a project database\par +\pard\plain \s4\qj\sa120\sl280 \f20 The assembled data for each project is stored in a database. At the beginning of a project it is necessary to create an empty database using program SAP or XBAP.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Open database"\par +2.\tab Select "Start new database"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define the database name. Database names can have from one to 12 letters and must not include full stop (.). \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Database is for DNA"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab +Define "Database size". This is an initial size and if necessary can be increased later using "Copy database". Roughly speaking it is the number of readings expected to be needed to complete the project. Currently BAP limits the maximum to 8000 and SAP + has a limit of 1000.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Maximum reading length". This is the length of the longest reading that will be added to the database. The minimum is 512 bases, and the maximum 4096.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The program should confirm that "copy 0" of the database has been started. See Note 14 for important information.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Screening against restriction enzyme recognition sequences\par +\pard\plain \s4\qj\sa120\sl280 \f20 +For some strategies it is necessary to compare readings against any restriction enzyme recognition sequences that may have been used during cloning and which should not be present in the data. The function operates on single readings or processes batches a +ccessed through files of file names. The algorithm looks for exact matches to recognition sequences. The recognition sequences should be stored in a simple text file with one recognition sequence per record.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Accept "Use file of filenames".\par +2.\tab Define "File of gel reading names". The input file of file names.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +Define "File for names of sequences that pass". A file of file names for those readings that do not contain the recognition sequences. After the run it will contain the names of all the files in the batch that do not match any + of the restriction enzyme recognition sequences. Hence it can be used for further processing of the batch.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "File name of recognition sequences". The name of the file of recognition sequences.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Screening against vector sequences\par +\pard\plain \s4\qj\sa120\sl280 \f20 +For most strategies it is necessary to compare readings against any vector sequences that may have been picked up during cloning. The package contains two routines for screening against vectors. The original function simply reports any matches between the +readings and t +he vector sequences and only passes on those that do not match. This function should now only be used to screen for any other sequences that should be excluded from the database, because the newer one (program name VEP for vector excising program) is capab +le of both finding the vector sequences and editing them out automatically. \par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.3.1\tab Clipping off vector sequences\par +\pard\plain \s4\qj\sa120\sl280 \f20 There are two types of vector that may need to be screened out of gel readings\: the sequencing vector and, for cases where, say, whole cosmids +have been shotgunned, the cloning vector. The two tasks are different. When screening out the sequencing vector we may expect to find data to exclude, both from the primer region and from the other side of the cloning site (when, for example, the insert i +s short). When screening out cosmid vector we may find that either the 5' end, or the 3' end, or the whole of the sequence is vector. Also for the cosmid search we need to compare both strands of the sequence. The program (VEP) works slightly differently f +or each of the two cases. Having read the vector sequence from a file the program asks for the "Position of the cloning site". A value of zero signifies that the search will be for the cosmid vector. A nonzero value signifies that the search is for the seq +uencing vector, and so in this case the program then asks for the "Relative position of the primer site". A negative relative position signifies that a reverse primer is being used, otherwise a forward primer is assumed.\par +\pard \s4\qj\sa120\sl280 The program screens a batch of read +ings using a file of file names and creates a new file of file names which contains the names of all those sequences that include some nonvector sequence. For each sequence that contains some vector it writes out a new copy of the file in which the vector +portion is identified.\par +\pard \s4\qj\sa120\sl280 +The search, which uses a hashing algorithm, is very rapid. Users specify a "Word length", the "Number of diagonals to combine" and a "Minimum score". The word length is the minimum number of consecutive bases that will count as a mat +ch. The algorithm treats the problem like a dot matrix comparison and finds the diagonal with the highest score. Then it adds the scores for the adjacent "Minimum number of diagonals to combine". If the combined score is at least "Minimim score" the sequen +ce is marked to indicate that it contains vector. The score represents the proportion of a diagonal that contains matching words, so the maximum score for any diagonal is 1.0.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Define "Input file of file names". This is the file containing the names of all the readings to be screened.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "File name of vector sequence". \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +Define "Position of cloning site". This is the base number, relative to the beginning of the vector sequence, that is on the 3' side of the insert site. For example for m13mp18 the SmaI site is at 6249. A zero value signifies that the search is for cosm +id vector.\par +4.\tab Define "Relative position of 3' end of primer site". This is the position, relative to the cloning site, of the first base that could be included in the sequence. For m13mp18, the 17mer Sequencing Primer and the SmaI site, the position is 41. +\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Word length". Only words of this length will be counted as matches.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Number of diagonals to combine". The scores for this number of diagonals around the highest scoring diagonal will be combined to give the total score.\par +7. \tab Define "Cutoff score". For a match, at least this proportion of the total length of the summed diagonals must contain identical words. \par +8.\tab Define "Output file of passed file names". The name of the file to contain the names of the readings to pass on to the assembly program.\par +\pard\plain \s4\qj\sa120\sl280 \f20 Processing will commence and finishes with a summary stating the number of files processed, the number completely vector, the number partly vector and the number free of vector.\par +\pard\plain \s9\fi-560\li860\sb160\sa60\sl280\tx1140 \b\f20 2.3.2\tab Screening for "vectors"\par +\pard\plain \s4\qj\sa120\sl280 \f20 This function is contained in both SAP and XBAP and operates on single readings or processes batches accessed through files of file names. The algorithm looks for exact matches of length "minimum match length" and disp +lays the overlapping sequences.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Accept "Use file of filenames".\par +2.\tab Define "File of gel reading names". The input file of file names.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +Define "File for names of sequences that pass". A file of file names for those readings that do not contain the vector sequence. After the run it will contain the names of all the files in the batch that do not match the vector sequence. Hence it can be + used for further processing of the batch.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "File name of vector sequence". The name of the file containing the vector sequence.\par +\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Entering readings into the project database (Assembly)\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Readings are entered into the database using the auto assemble function. This function compares each reading and its complement with a consensus of all the readings already stored in the database. If it finds any overlaps it aligns the overlapping sequence +s by inserting padding characters, and then adds the new reading to the database. Readings that overlap are added to existing contigs and readings that do not overlap any data in +the database start new contigs. If a new reading overlaps two contigs they are joined. Any readings that appear to overlap but which cannot be aligned sufficiently well are not entered and have their names written to a file of failed gel reading names. Not +e that it is possible that a reading may align well with two contigs (indicating a possible join) but that after it has been added to one of the contigs, the two contigs do not align sufficiently well. In this case, although the reading has been entered in +to the database its name will also be added to the file of failed readings. Alignments using more than the maximum number of paddings characters, or exceeding the maximum mismatch may be displayed, but the readings will not be entered into the database. It + is advisable to set the consensus cutoff to 51% before running the assembly routine as this will improve the alignments. A typical run of the assembly routine is shown in figure 4.1.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Accept "Permit entry"\par +2.\tab Accept "Use file of file names"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "File of gel reading names". The name of the input file of file names, probably passed on from "Screen against vector".\par +4.\tab Define "File for names of failures". A file to contain the names of the readings that the program fails to enter, or for which joins are not made.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Perform normal shotgun assembly"\par +6.\tab Accept "Permit joins"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Minimum initial match". Only possible overlaps containing exact matches of at least this number of consecutive identical characters will be considered for alignment.\par +8.\tab Define "Maximum number of pads per reading" This is the maximum number of padding characters permitted in any new reading during the alignment procedure\par +9.\tab Define "Maximum number of pads per reading in contig" This is the maximum number of padding characters permitted in the contig in order to align any new reading.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define "Maximum percent mismatch after alignment"\par +\pard\plain \li560\ri500\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Automatic sequence assembler\par +\pard \li560\ri500\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Database is logically consistent\par +? (y/n) (y) Permit entry\par +? (y/n) (y) Use file of file names\par +? File of gel reading names=demo.nam\par +? File for names of failures=demo.fail\par +Select entry mode\par +X 1 Perform normal shotgun assembly\par + 2 Put all sequences in one contig\par + 3 Put all sequences in new contigs\par +? Selection (1-3) (1) =\par +? (y/n) (y) Permit joins\par +? Minimum initial match (12-4097) (15) =\par +? Maximum pads per gel (0-25) (8) =\par +? Maximum pads per gel in contig (0-25) (8) =\par +? Maximum percent mismatch after alignment (0.00-15.00) (8.00) =\par +\par +Results skipped to save space\par +\par +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\par +Processing 4 in batch\par +Gel reading name=hinw.009 \par +Gel reading length= 292\par +Working\par +Contig 1 position 263 matches strand 1 at position 14\par +Contig 2 position 1 matches strand 1 at position 156\par +\pard \li560\ri500\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Total matches found 2\par +Trying to align with contig 1\par +Padding in contig= 1 and in gel= 0\par +Percentage mismatch after alignment = 2.9\par +Best alignment found\par + 251 261 271 281\par + aattacagcg tt,cctattg acgggcgcat ccac\par + ********** ** ** **** ********** ****\par + aattacagcg ttcccvattg acgggcgcat ccac\par + 1 11 21 31\par +Trying to align with contig 2\par +Padding in contig= 0 and in gel= 2\par +Percentage mismatch after alignment = 1.4\par +Best alignment found\par + 1 11 21 31 41 51\par + tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par + ********** ********** ********** ********** ********** **********\par + tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par + 156 166 176 186 196 206\par + 61 71 81 91 101 111\par + tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccggcagc gcccacactg\par + ********** ********** ********** ********** ***** ** * **********\par + tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccg,ca,c gcccacactg\par + 216 226 236 246 256 266\par + 121 131\par + ctcagacgac ggtcgctgc\par + ********** *********\par + ctcagacgac ggtcgctgc\par + 276 286\par +Overlap between contigs 2 and 1\par +Length of overlap between the contigs= -122\par +Entering the new gel reading into contig 1\par +This gel reading has been given the number 4\par +Working\par +Trying to align the two contigs\par +Padding in contig= 2 and in gel= 0\par +Percentage mismatch after alignment = 1.5\par +Best alignment found\par + 406 416 426 436 446 456\par + tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par + ********** ********** ********** ********** ********** **********\par + tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par + 1 11 21 31 41 51\par + 466 476 486 496 506 516\par + tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccg,ca,c gcccacactg\par + ********** ********** ********** ********** ***** ** * **********\par + tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccggcagc gcccacactg\par + 61 71 81 91 101 111\par + 526 536\par + ctcagacgac ggtcgct\par + ********** *******\par + ctcagacgac ggtcgct\par + 121 131\par +Editing contig 1\par +\pard \li560\ri500\sa100\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Completing the join between contigs 1 and 2\par + (Results for other readings skipped to save space)\par +\pard \li560\ri500\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Batch finished\par + 9 sequences processed\par + 9 sequences entered into database\par +\pard \li560\ri500\sa100\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth 2 joins made\par +\pard\plain \s8\qj\fi-1140\li1140\sb60\sa120\sl240\tx1140 \f21\fs20 Figure 4.1\tab Part of a typical run of "Auto assemble".\par +\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Searching for internal joins \par +\pard\plain \s4\qj\sa120\sl280 \f20 +The purpose of this function is to use data already in the database to find possible joins between contigs. Although most joins will be made automatically during assembly, due to poor alignments, some may not have been done. The function is particularly us +eful for sequences from fluorescent sequencing machines because it may be possible to find potential joins within the unused data from the 3' ends of readings. For each potential + join found, when the X version is used, the contig joining editor is automatically called up with the two contigs aligned in the edit windows.\par +\pard \s4\qj\sa120\sl280 +The program strategy is as follows. Take the first contig and calculate its consensus. If unused data is being employed, examine all readings that are in the complementary orientation, and sufficiently near to the contigs left end, to see if they have suff +iciently good unused sequence which, if present, would protrude from the left end of the contig. If found add th +e longest such sequence to the left end of the consensus. Do the same for the right end by examining readings that are in their original orientation. Repeat the consensus calculations and extensions for all contigs hence producing an extended consensus for + the whole database. If unused data is not being employed simply calculate the consensus for the whole database. Now look for possible joins by processing the extended consensus in the following way. Take the last, say 500, bases (termed the "probe length" + by the program) of the rightmost consensus, compare it in both orientations with the extended consensus of all the other contigs. Display any sufficiently good alignments. Repeat with the left end of the rightmost contig. Do the same for the ends of all t +he contigs, always comparing only with the contigs to their left, so that the same matches do not appear twice. \par +\pard \s4\qj\sa120\sl280 Good unused data is defined by sliding a window of "Window size for good data scan" bases outwards along the sequence and stopping when greater + than "Maximum number of dashes in scan window" appear in the window. Note that it is advisable to have some sort of cutoff because if we simply take all the data it might be of such poor quality that we wont find any good matches. An initial run employing + no unused data is also recommended. Sufficiently good alignments are defined by criteria equivalent to those used in auto assemble, however here we only display alignments that pass all tests.\par +\pard \s4\qj\sa120\sl280 All numbering is relative to base number one in the contig\: ma +tches to the left (i.e. in the unused data) have negative positions, matches off the right end of the contig (i.e. in the unused data) have positions greater than the contig length. The convention for reporting the orientations of overlaps is as follows\: + i +f neither contig needs to be complemented the positions are as shown. If the program says "contig x in the - sense" then the positions shown assume contig x has been complemented. For example in the results given in figure 4.2 the positions for the first o +verlap are as reported, but those for the second assume that the contig in the minus sense (i.e. 443) has been complemented.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find internal joins".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Minimum initial match". Only matches containing this number of consecutive identical characters will be found.\par +3.\tab Define "Maximum pads per sequence". Only alignments containing less than or equal this number of padding characters in each sequence will be found.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Maximum percent mismatch after alignment". Only alignments with at lea +st this level is similarity will be found. Particularly when poor data from the 3' ends of sequences derived from fluorescent sequencing machines is used, it is important to allow for a high degree of mismatch - say around 75%.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Probe length". This is the size of sequence from each end of each contig, that is compared with the total length of all other contigs.\par +6.\tab Accept "Employ unused data". This means, where available, add the unused data from the 3' ends of sequences, to the ends of the contigs.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab +Define "Window size for good data scan". To decide how much of the unused data should be added to the end of a contig the program scans outwards, counting the numbers of dashes (-) over a window of the size defined here.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Number of dashes in scan window". If the program finds this many dashes in the scan window it will add no more of the unused data to the end of the contig.\par +\pard\plain \qj\li680\ri780\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Possible join between contig 445 in the + sense and contig 405\par +\pard \li680\ri780\sl220\box\brsp100\brdrth Percentage mismatch after alignment = 4.9\par + 412 422 432 442 452 462\par +405 TTTCCCGACT GGAAAGCGGG CAGTGAGCGC AACGCAATTA ATGTGAG,TT AGCTCACTCA\par + ********* * ******** ***** *** ********** ********** **********\par +445 -TTCCCGACT G,AAAGCGGG TAGTGA,CGC AACGCAATTA ATGTGAG-TT AGCTCACTCA\par + -127 -117 -107 -97 -87 -77\par + 472 482 492 502 512\par +405 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT\par + ********** ********** ********** ********** **\par +445 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT\par + -67 -57 -47 -37 -27\par +Possible join between contig 443 in the - sense and contig 423\par +Percentage mismatch after alignment = 10.4\par + 64 74 84 94 104 114\par +423 ATCGAAGAAA GAAAAGGAGG AGAAGATGAT TTTAAAAATG AAACG-CGAT GTCAGATGGG\par + **** ***** ********** ********** ****** ** ***** **** *********\par +443 ATCG,AGAAA GAAAAGGAGG AGAAGATGAT TTTAAA,,TG AAACGACGAT GTCAGATGG,\par + 3610 3620 3630 3640 3650 3660\par + 124 134 144 154 164\par +423 TTG-ATGAAG TAGAAGTAGG AG-AGGTGGA AGAGAAGAGA GTGGGA\par + *** ****** ********** ** ******* *** ***** ** **\par +443 TTGGATGAAG TAGAAGTAGG AGGAGGTGGA ,GAG,AGAGA GTTGG-\par +\pard \li680\ri780\sl220\keepn\box\brsp100\brdrth 3670 3680 3690 3700 3710\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.2\tab Typical output from "Find internal joins".\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Editing in XBAP\par +\pard\plain \s4\qj\sa120\sl280 \f20 The XBAP editor is mouse-driven and can insert, delete and change readings in contigs. It has facilities to display the traces for data from fluorescent sequenci +ng machines and for annotation of readings. In addition it allows the poor quality data from the ends of readings to be viewed and, if required, added to the sequences. \par +\pard \s4\qj\sa120\sl280 +A typical view of the editor is shown in figure 4.3. This includes the edit window showing an 80 character section of a contig, (position 3899 to 3978). Each reading is numbered and named in the left hand panel, minus signs indicating those in their revers +e orientation. Underneath is their consensus. Some of the sequence letters are lighter + than the majority showing that they are "unused". One segment (3933 to 3949) is shaded which signifies that it has been annotated. The editing cursor is at position 3921. Above this window are the main buttons the user employs to direct the editing proces +s. Below the edit window is a panel showing the traces for readings 37 and 123. Notice they are centred on the cursor position. Here the traces are shown in four different line styles, but on a colour screen they each have different colours. In the bottom +of the figure is the search window. These features are described in the relevant sections below.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.1\tab Scrolling through the contig\par +\pard\plain \s4\qj\sa120\sl280 \f20 The editor allows scrolling from one end of a contig to the other using the scroll bar and scroll buttons and also the arrow keys.\par +\pard \s4\qj\sa120\sl280 Action of mouse button presses when the mouse pointer is in the scroll bar\:\par +\pard \s4\qj\li1720\sa120\sl280\tx4520 Middle Mouse Button\tab Set editor position\par +Left Mouse Button\tab Scroll forward one screenful\par +Right Mouse Button\tab Scroll backwards one screenful\par +\pard\plain \li80\ri20\sl220\keepn\box\brsp20\brdrth \f4\fs16 {{\pict\macpict\picw444\pich344 +82daffffffff015701bb1101a0008201000affffffff015701bb0900000000000000003100000000015601ba98007e00000000030703e900000000030703e900000000015601ba000102830002830002830007000286aa01a00007000186550140000700028600012000070001860001400007000286000120000b02013ff8 +8a00030ffe40000d0402200807c18c0003089220000f06012c28040110808e0003089240000f06022648040100808e0003089220001007012348040f31e3968f00030f924000100702220807911084598f000308122000100701258804111084508f00030812400010070224c804111084508f00030ff22000100701286804 +111094508f000308024000100702200807cf3863908f0003080220000b02013ff88a00030ffe4000070002860001200007000186000140000700028600012000070001865501400007000286aa01a00002830002830002830002830026e500001ff9ff0087f8ff01f87ff5ff01fe1fefff0087fcff01fe1ff2ff01f87ff2ff +00e0fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd00380200003cfa +000203fc03fa0008630c1800018180001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd006502000066fa0002030003fa000ac30c38000380c0001f807ffbff05841f8000003cfd0002087f87fbff07e01fe7fffffe1f81fcff03f0ffff87fdff0dc3f0ffff84186000060002106180fd +00051f800000600ffe0002084180fb00021fe018fc000020fd006b020000c3fe0008c01800000300030603fd000a01830c7800078060001ff3faff058418c000000cfd0002087f33fbff07e7ffe7fffffe1f9cfcff03fcffff33fdff0d99e67fff84186000060002107180fd000518c000006003fe0002084180fb00021800 +18fc000020fd0072020000c0fe0008c01800000300030603fd000a01830cd8000d8060001ff3fcff07f9ff84186000000cfd0002087e79fbff08e7ffe7cfe7fe1f9e7ffdff1ffcfffe79fff9ffff99e67fff8418600006000210718000006000186000006003fe0002084180fb00041800183018fe000020fd0072020000c0 +fe0000c0fe00040300030003fd000a0301989800098030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08e7ffe7cfe7fe1f9e7ffdff1ffcfffe7ffff9ffff9fe7ffff8418600006000210798000006000186000006003fe0002084180fb00041800183018fe000020fd00731d0000c00f0dc3f0781f4003003b1e0f +c0f0de000301981800018030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08e7ffe7ffe7fe1f9e7ffdff1ffcfffe7ffff9ffff9fe7ffff8418600006000210798000006000186000006003fe0002084180fb00041800180018fe000020fd007d790000c0198e60c01831c003f06706030198730003019818000180 +30001ff3e47c0f8790e07f841861e1b80c0fc1f078087f3f9e647e1e43ffe7fe270f81fe1f9e7879e787c0fcfffe7f9e607e1f9fe7f03f841866e0761e02106d878619f8001866f0786e0301e16c0841801e0fc61878001801d8f07e0786f020fd007d790000c030cc30c01831800300c30603030c60000300f01800018030 +001ff3e339e733c679ff8418c331cc0c186318cc087f879e633ccf19ffe07cc7cfe7fe1f9cf339e7339e7cfffe7f9e79fccf9fe7e79f84186730ce3302106d8cc330600018c398cc73030331fe08418033186618cc001f833830180cc39820fd007d790000c030cc30c01831800300c30603030c60000300f0180001803000 +1ff3e799fe79cff9ff841f8619860c00660186087ff39e6799e73fffe7f9e7cfe7fe1f81e79cce79fe7cfffe7f9e79f9e60781e7ff8418661986618210679861e060001f83018661830619b6084180618063318600180618301818630020fd007d790000c030cc30c01831800300c30603030c60000180f01800018060001f +f3e79c0e01cff9ff841987f9860c0fe601fe087ff99e6798073fffe7f9e7cfe7fe1f99e01cce01c07cfffe7f9e79f9e79fe7f03f8418661986618210679fe0c0600018030186618307f9b60841807f8fe331fe00180618301818630020fd007d790000c330cc30c0181f000300c30603030c60000180601800018060001ff3 +e79fe67fcff9ff8418c601860c18660180087ff99e6799ff3fffe7f9e7cfe7fe1f9ce7fe1e7f9e7cfffe7f9e79f9e79fe7ff9f8418661986618210639800c060001803018661830601b6084180601861e18000180618301818630020fd007d79000066198c30cc18300003006706033198600000c06018070180c0001ff3e7 +9fe67fcff9ff8418c601860c18660180087e799e6799ff3fffe7f9e7cfe7fe1f9ce7fe1e7f9e7cfffe799e79f9e79fe7ff9f8418661986618210639801e060001803018661830601b6084180601861e18000180618301818630020fd007d7900003c0f0c3078ff1f8003fc3b3fc1e0f06000006060ff070ff180001ff3e799 +e739cff99f84186319cc0c186318c6087f33cc633ce73fffe7fcc7cfe67e1f9e739f3f399e7cffff33cc799ccf9fe7e79f840cc618ce330210618c63306600180300cc73030319b6084180319860c0c60018033830198cc30020fd0068f9000130c0ef005d1f80679c0f83cffc3f841861f1b87f8fa1f07c087f87e2647e0f +3fffe01e2601f0fe1f9e783f3f83c1601fff87e27c3e1f9fe7f03f84078618761e02106187c6183c00180300786e1fe1f1b60841fe1f0fa0c07c001fe1d9fe0f07830020fd0032f9000130c0ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc00010210f200010840f2000020fd0032f9000130c0 +ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc00010210f200010840f2000020fd0032f900011f80ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc00010210f200010840f2000020fd002de500001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc +00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2 +000020fd0026e500001ff9ff0087f8ff01f87ff5ff01fe1fefff0087fcff01fe1ff2ff01f87ff2ff00e0fd000283000283000283000283000283000283000283000283000901001f88ff00feff001a010010fc000006fe00010180fe000060fc00000c9d000002ff001f010010fc000006fe00010180fe000060fc00000cc2 +000102a8de000002ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de000002ff002316001000004010000600004001800200006000100400000cc200010554de000002ff00231600100000c03000060000c001800300 +006000180600000cc2000102a8de000002ff0023160010000180600006000180018001800060000c0300000cc200010554de000002ff0023160010000300c00006000300018000c0006000060180000cc2000102a8de000002ff0023160010000601800006000600018000600060000300c0000cc200010554de000002ff00 +23160010000c03000006000c0001800030006000018060000cc2000102a8de000002ff00231600100018060000060018000180001800600000c030000cc200010554de000002ff0023160010000c03000006000c0001800030006000018060000cc2000102a8de000002ff0023160010000601800006000600018000600060 +000300c0000cc200010554de000002ff0023160010000300c00006000300018000c0006000060180000cc2000102a8de000002ff0023160010000180600006000180018001800060000c0300000cc200010554de000002ff00231600100000c03000060000c001800300006000180600000cc2000102a8de000002ff002316 +001000004010000600004001800200006000100400000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de000002ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de0000 +02ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de000002ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001a010010fc000006fe00010180fe000060fc00000c9d000002ff0009 +01001f88ff00feff000901001f88ff00feff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff004a010010ed00030c0300c0fa00040781e0300cf90004781e +0780c0fa00040781e0780cf90004781e0040c0fa00040781e1fe0cf90004781e0780c0fa00040781e1fe0cf90002781e02ff004a010010ed00030c0781e0fa00040cc330701ef90004cc330cc1e0fa00040cc330cc1ef90004cc3300c1e0fa00040cc331801ef90004cc330cc1e0fa00040cc330061ef90002cc3302ff004e +010010ed00030c0cc330fa0004186618f033fa0005018661986330fa00041866198633fa000501866181c330fa00041866198033fa0005018661986330fa00041866180633fa000301866182ff004e010010ed00030c0cc330fa0004186619b033fa0005018661986330fa00041866198633fa000501866183c330fa000418 +66198033fa0005018661980330fa00041866180c33fa000301866182ff004a010010ed00030c186618f900046619306180fa00040661806618f900046618066180fa0004066186c618f900046619806180fa00040661980618f9000466180c6180fa0002066182ff004a010010ed00030c186618f90004c618306180fa0004 +0c61806618f90004c6180c6180fa00040c618cc618f90004c619b86180fa00040c619b8618f90004c618186180fa00020c6182ff004e010010ed00030c186618fa0005038338306180fa0004383380c618fa0005038338386180fa0004383398c618fa0005038339cc6180fa000438339cc618fa0005038338186180fa0002 +383382ff004a010010ed00030c186618f90004c1d8306180fa00040c1d838618f90004c1d80c6180fa00040c1d98c618f90004c1d8066180fa00040c1d986618f90004c1d8306180fa00020c1d82ff004a010010ed00030c186618f900046018306180fa00040601860618f900046018066180fa000406019fe618f9000460 +18066180fa00040601986618f900046018306180fa0002060182ff004e010010ed00030c0cc330fa00041860183033fa00050186018c0330fa00041860198633fa000501860180c330fa00041860180633fa0005018601986330fa00041860186033fa000301860182ff004e010010ed00030c0cc330fa00041866183033fa +0005018661980330fa00041866198633fa000501866180c330fa00041866198633fa0005018661986330fa00041866186033fa000301866182ff004a010010ed00030c0781e0fa00040cc330301ef90004cc331801e0fa00040cc330cc1ef90004cc3300c1e0fa00040cc330cc1ef90004cc330cc1e0fa00040cc330c01ef9 +0002cc3302ff004a010010ed00030c0300c0fa00040781e1fe0cf90004781e1fe0c0fa00040781e0780cf90004781e00c0c0fa00040781e0780cf90004781e0780c0fa00040781e0c00cf90002781e02ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d0100 +10ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff0076010010fc00041e0003c078f700650c0781e0301e0307f9fe0c0780c0300c0787f9fe0c0781e1fe1e1fe1e0000c0300c0781e0787f8781e0300c1fe1e0300c0781e0780c0301e1fe7f8 +301e0300c0787f8781e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff0076010010fc0004330000c0ccf700650c0cc33078330780c0301e0cc1e0781e0cc0c0301e0cc3303033030330001e0781e0cc330cc0c0cc330781e030330781e0cc330cc1e078330300c078330781e0cc0c0c +c3303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff0076010010fc0004618000c186f700650c186618cc618cc0c03033186330cc331860c03033186618306183061800330cc33186619860c186618cc33030618cc3318661986330cc618300c0cc618cc331860c18661830618cc618cc +33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff0076010010fc0004618000c186f700650c180600cc600cc0c03033180330cc331800c030331806003060030600cc330cc33180601800c180600cc33030600cc3318060180330cc600300c0cc600cc331800c18060030600cc600cc33030600cc330303 +30300c0cc0c180330cc6018033180600cc3302ff0076010010fc0004018000c006f700650c18060186601860c0306198061986619800c030619806003060030600cc6198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c18061 +9866018061980601866182ff007b010010fc0009018000c006000fc1e076fc00650c18060186601860c0306198061986619800c030619806003060030600786198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c18061986601 +8061980601866182ff007d010010fe000b01fe030000c00c00186330cefc00650c18067986679860c03061980619866199e0c030619806003067830601fe61986619806799e0c19e6018661830679866199e6018061986678300c18667986619800c180678306018667986618306798661830618300c1860c1806198667980 +6199e601866182ff007b010010fc00090e0000c0380018061986fc00650c180619fe619fe0c0307f9807f9fe7f9860c0307f9806003061830600787f9fe7f980619860c186601fe7f830619fe7f986601807f9fe618300c1fe619fe7f9800c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f9866 +01fe7f82ff007b010010fc0009180000c060000fc7f986fc00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c1806198661980619866018661 +82ff007b010010fc0009300000c0c00000660186fc00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601866182ff00 +7b010010fc0009600000c1800000660186fc00650c18661986619860c0306198661986619860c030619866183061830618006198661986619860c186619866183061986619866198661986618300c18661986619860c186618306198661986618306198661830618300c1860c186619866198661986619866182ff007b0100 +10fc0009600000c1800e186318cefc00650c0cc33986339860c030618cc61986618ce0c030618cc33030338303300061986618cc338ce0c0ce331866183033986618ce330cc61986338300c18633986618cc0c0cc338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff007b010010fc00 +097f8007f9fe0e0fc1f076fc00650c0781e9861e9860c03061878619866187a0c030618781e0301e8301e00061986618781e87a0c07a1e186618301e9866187a1e078619861e8300c1861e986618780c0781e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff0011010010f3000006fc +00000c9d000002ff0011010010f3000006fc00000c9d000002ff0011010010f3000006fc00000c9d000002ff0011010010f3000006fc00000c9d000002ff000d010010ed00000c9d000002ff0013010010ed00000cd7000002ecaadd000002ff0013010010ed00000cd7000001ec55dd000002ff007e010010fd000c787f80 +0000041e0001e0000003fe00650c0781e0301e0307f9fe0c0780c0300c0787f9fe0c0781e1fe1e1fe1e0000c0300c0781e0787f8781e0300c3febeabaaeafabeafaaeababebfeffababeabaaeafa7f8781e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff007e010010fd000ccc0180 +00000c33000330000007fe00650c0cc33078330780c0301e0cc1e0781e0cc0c0301e0cc3303033030330001e0781e0cc330cc0c0cc330781e1757757d5f5dd775dd5f57d775755d57d7757d5f5dd0c0cc3303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff007f010010fe000d018601 +8000001c6180061800000ffe00650c186618cc618cc0c03033186330cc331860c03033186618306183061800330cc33186619860c186618cc332baebaeebbbaeebbaebbaeeebabaaeaeeebaeebbbae0c18661830618cc618cc33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff007f010010fe00020186 +03fe00073c6000061800001bfe00650c180600cc600cc0c03033180330cc331800c030331806003060030600cc330cc33180601800c180600cc33175755dd775d5755d5775dd755755d5dd755dd775d50c18060030600cc600cc33030600cc33030330300c0cc0c180330cc6018033180600cc3302ff007e010010fd000106 +03fe00076c60000618000013fe00650c18060186601860c0306198061986619800c030619806003060030600cc6198661980601800c1806018661abaeabaeebbaaeabaaebbaeeaabaaebaeeabaeebbaa0c180600306018660186618306018661830618300c1860c180619866018061980601866182ff007e010010fd000c0c +060003f0cc6e07c618003f03fe00650c18060186601860c0306198061986619800c030619806003060030600786198661980601800c1806018661975755d775dd5755d575dd7755755d5d7755d775dd50c180600306018660186618306018661830618300c1860c180619866018061980601866182ff007f120010000007f8 +38060006198c730c6338006183fe00650c18067986679860c03061980619866199e0c030619806003067830601fe61986619806799e0c19e6018661abaefbaeebbbeeabaaebbaeefabaaebaeefbaeebbaa0c180678306018667986618306798661830618300c1860c18061986679806199e601866182ff007e010010fd000c +0c0c0000198c619801d8006003fe00650c180619fe619fe0c0307f9807f9fe7f9860c0307f9806003061830600787f9fe7f980619860c186601fe7f97575dff7fdd7755d57fdff75d755d5ff75dff7fdd50c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe7f82ff007e010010fd000c +060c0003f9fe61980018003f03fe00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c1866018661abaebbaeebbaeeabaaebbaeebabaaebaeebbaeebbaa0c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007f010010fe000d +0186180006180c61980018000183fe00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c186601866197575dd775dd7755d575dd775d755d5d775dd775dd50c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007f010010fe00 +0d0186180006180c61980618000183fe00650c18661986619860c0306198661986619860c030619866183061830618006198661986619860c1866198661abaebbaeebbaeebbaeebbaeebabaaebaeebbaeebbae0c186618306198661986618306198661830618300c1860c186619866198661986619866182ff007e010010fd +000ccc300006180c330c6330386183fe00650c0cc33986339860c030618cc61986618ce0c030618cc33030338303300061986618cc338ce0c0ce331866197577dd775ddf775dd75dd777d755d5d777dd775ddd0c0cc338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff007e010010fd +007578300003e80c1e07c1e0383f1fe000000c0781e9861e9860c03061878619866187a0c030618781e0301e8301e00061986618781e87a0c07a1e18661ababebaeebafabeafaebbaebeabaaebaebebaeebafa0c0781e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff0013010010ed +00000cd7000001ec55dd000002ff0013010010ed00000cd7000002ecaadd000002ff0013010010ed00000cd7000001ec55dd000002ff0013010010ed00000cd7000002ecaadd000002ff0013010010ed00000cd7000001ec55dd000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff007f010010 +fe000dc0781e000001fe0c000010000003fe00650c02808028080aa2a8200a02000020080282a8aa080280a0aa001fe1e0300c0300c0781e0787f8781e0300c1fe1e0300c0781e0780c0301e1fe7f8301e0300c0787f8781e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff007f1200 +10000001c0cc33000001801c000030000007fe00650c04414044140100405011050000501404404010140441101000030330781e0781e0cc330cc0c0cc330781e030330781e0cc330cc1e078330300c078330781e0cc0c0cc3303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff007f12 +0010000003c18661800001803c00007000000ffe00650c08222082220200808820888000882208208020220822082000030618cc330cc33186619860c186618cc33030618cc3318661986330cc618300c0cc618cc331860c18661830618cc618cc33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff007f +120010000006c18661800001806c0000f000001bfe00650c10011100110100404440044000441110004010111004001000030600cc330cc33180601800c180600cc33030600cc3318060180330cc600300c0cc600cc331800c18060030600cc600cc33030600cc33030330300c0cc0c180330cc6018033180600cc3302ff00 +7f120010000004c00601800001804c0001b0000013fe00650c08020880208200808220082000822088008020208802002000030601866198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c180619866018061980601866182ff +007f010010fe000dc006030003f1b80c07c330003f03fe00650c10041100410100410440104001044110004010411004001000030601866198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c180619866018061980601866182 +ff007f120010001fe0c00c0e000619cc0c0c6630006183fe00650c08a2088a2082008082200822a8822088a0802020880200202a8306018661986619806799e0c19e6018661830679866199e6018061986678300c18667986619800c180678306018667986618306798661830618300c1860c18061986679806199e6018661 +82ff007f010010fe000dc03803000018060c180630006003fe00650c10455104550100415440154001545510404010551004001000030601fe7f9fe7f980619860c186601fe7f830619fe7f986601807f9fe618300c1fe619fe7f9800c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe +7f82ff007f010010fe000dc060018003f8060c1807f8003f03fe00650c08220882208200808220082000822088208020208802002000030601866198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601 +866182ff007f010010fe000dc0c061800618060c180030000183fe00650c10441104410100410440104001044110404010411004001000030601866198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c1806198661980619866 +01866182ff007f010010fe000dc18061800619860c180030000183fe00650c08220882208200808220882000822088208020208822082000030619866198661986619860c186619866183061986619866198661986618300c18661986619860c186618306198661986618306198661830618300c1860c18661986619866198 +6619866182ff007f010010fe000dc18033000618cc0c0c6030386183fe00650c044410444101004104111040010441044040104104411010000303318661986618cc338ce0c0ce331866183033986618ce330cc61986338300c18633986618cc0c0cc338303318633986618303398661830618300c1860c0cc61986338cc61 +8ce331866182ff007f7b0010000007f9fe1e0003e8787f87c030383f1fe000000c02a2082a20820080820a082000822082a08020208280a020000301e18661986618781e87a0c07a1e186618301e9866187a1e078619861e8300c1861e986618780c0781e8301e1861e986618301e98661830618300c1860c078619861e878 +6187a1e1866182ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff0040010010fe000bc0307f800001fe0c0000c030fe +0002c0000cbf002201e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff0040160010000001c07801800001801c0001c070000001c0000cbf002203303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff0040160010000003c0cc01800001803c0003 +c0f0000003c0000cbf0022061830618cc618cc33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff0040160010000006c0cc03000001806c0006c1b0000006c0000cbf0022060030600cc600cc33030600cc33030330300c0cc0c180330cc6018033180600cc3302ff0040160010000004c1860300000180 +4c0004c130000004c0000cbf00220600306018660186618306018661830618300c1860c180619866018061980601866182ff0040010010fe0011c186060003f1b80c0fc0c030000fc0c0000cbf00220600306018660186618306018661830618300c1860c180619866018061980601866182ff0040010010fe0011c1860600 +0619cc0c1860c030001860c0000cbf00220678306018667986618306798661830618300c1860c18061986679806199e601866182ff0040010010fe0011c1860c000018060c0060c030001800c0000cbf0022061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe7f82ff0040010010fe0011c1 +860c0003f8060c0fe0c030000fc0c0000cbf00220618306018661986618306198661830618300c1860c180619866198061986601866182ff0040010010fe0011c0cc18000618060c1860c030000060c0000cbf00220618306018661986618306198661830618300c1860c180619866198061986601866182ff0040010010fe +0011c0cc18000619860c1860c030000060c0000cbf00220618306198661986618306198661830618300c1860c186619866198661986619866182ff0040010010fe0011c07830000618cc0c1860c0300e1860c0000cbf00220338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff004016 +0010000007f830300003e8787f8fa7f9fe0e0fc7f8000cbf002201e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c +9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff007e010010fd0001781efe0007041e03c1e0000003fe00650c0200a0aa2a8200a0282a8282a82808020080280a0282a8280a020080aa0a020080280a028080200a0aa2a8200a020080282a8280a0aa0a0200a020080aa0a020080002a820 +2a8aa080aa0a020080280a0200a028080200a028000280a0280a0202a8aa0a02ff007e010010fd0001cc33fe00070c33066330000007fe00650c050110100405011044040440404414050140441104404044110501401011050140441104414050110100405011050140440404411010110501105014010110501400004050 +0401014010110501404411050110441405011044000441104411050040101102ff007f010010fe000d0186618000001c6186661800000ffe001b0c088208200808820882080820808222088220822088208082208882fe20298882208220882220882082008088208882208208082208202088820888220202088822000080 +88080202fe20198882208220888208822208820882000822088220888080202082ff007f010010fe0002018060fe00073c6006061800001bfe00650c04440010040444010004100041001104411100401000410040044110104004411100401001104440010040444004411100041004001040044400441101040044110000 +40440401011010400441110040044401001104440100001004010040044040104002ff007f010010fe0002018060fe00076c60060018000013fe00650c082200200808220080080800808020882208802008008080200822082020082208802008020882200200808220082208800808020020200822008220820200822080 +0080820802020820200822088020082200802088220080000802008020082080202002ff007f010010fe000d01b86e0003f0cc6e060030003f03fe00650c1044001004104401000410004100411044110040100041004010441010401044110040100411044001004104401044110004100400104010440104410104010441 +000041040401041010401044110040104401004110440100001004010040104040104002ff007f120010000007f9cc730006198c730600e0006183fe00650c0822282008082200800808a0808020882208802288a0808a20082208202288220880200802088222820080822288220880080802282020082228822082022882 +208aa080820802020820200822088a2008222880208822008a2a8802288a20082080202282ff007f010010fe000d0186618000198c619f8030006003fe00650c154410100415440100041040410055154551004110404104401545501041154551004010055154410100415441154551000410041010401544115455010411 +5455000041540401055010401545510440154411005515440104001004110440154040104102ff007f010010fe000d0186618003f9fe61860018003f03fe00650c0822082008082200800808208080208822088020882080822008220820208822088020080208822082008082208822088008080208202008220882208202 +088220800080820802020820200822088220082208802088220082000802088220082080202082ff007f010010fe000d0186618006180c61860618000183fe00650c10441010041044010004104041004110441100411040410440104410104110441100401004110441010041044110441100041004101040104411044101 +04110441000041040401041010401044110440104411004110440104001004110440104040104102ff007f010010fe000d0186618006180c61860618000183fe00650c082208200808220882080820808220882208822088208082208822082020882208822088220882208200808220882208820808220820208822088220 +8202088220800080820802020820208822088220882208822088220882000822088220882080202082ff007e010010fd000ccc330006180c33060330386183fe00650c104110100410411044040440404441104410441104404044111044101011104410441104441104110100410411104410440404411010111041110441 +0101110441000041040401041010111044104411104110444110411044000441104411104040101102ff007e010010fd0075781e0003e80c1e0601e0383f1fe000000c0820a820080820a0280802a0802820882208280a82a0802a0a082208200a882208280a028208820a820080820a88220828080280a8200a0820a88220 +8200a882208000808208020208200a0822082a0a0820a828208820a02a000280a82a0a082080200a82ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff0012010010ed00000ce6000103ffba +000002ff0012010010ed00000ce6000103ffba000002ff007b010010fa007201e078618787f9861e1861e0000c0781e0301e0307f9fe0c0780c0300c0787f9fe0c0781e1fe1e1fe1e3ff0c0300c0781e0787f8781e0300c1fe1e0300c0781e0780c0301e1fe7f8301e0300c0787f8781e1fe1e0301e0300c1fe1e0300c1fe0 +c1fe7f8307f8780c0301e0780c0781e0300c02ff007b010010fa00720330cc718cc601c633186330000c0cc33078330780c0301e0cc1e0781e0cc0c0301e0cc3303033030333ff1e0781e0cc330cc0c0cc330781e030330781e0cc330cc1e078330300c078330781e0cc0c0cc3303033078330781e030330781e0301e0300c +0780c0cc1e078330cc1e0cc330781e02ff007b010010fa007206198671986601c661986618000c186618cc618cc0c03033186330cc331860c03033186618306183061bff330cc33186619860c186618cc33030618cc3318661986330cc618300c0cc618cc331860c18661830618cc618cc33030618cc33030330300c0cc0c1 +86330cc6198633186618cc3302ff007b010010fa007206018679980601e660186600000c180600cc600cc0c03033180330cc331800c030331806003060030603ff330cc33180601800c180600cc33030600cc3318060180330cc600300c0cc600cc331800c18060030600cc600cc33030600cc33030330300c0cc0c180330c +c6018033180600cc3302ff007b010010fa007206018679980601e660186600000c18060186601860c0306198061986619800c030619806003060030603ff6198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c1806198660180 +61980601866182ff007b010010fa00720601866d8c0601b630186300000c18060186601860c0306198061986619800c030619806003060030603ff6198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c1806198660180619806 +01866182ff007b010010fa00720601866d8787e1b61e1861e0000c18067986679860c03061980619866199e0c0306198060030678306020161986619806799e0c19e6018661830679866199e6018061986678300c18667986619800c180678306018667986618306798661830618300c1860c18061986679806199e6018661 +82ff007b010010fa00720601866780c6019e03186030000c180619fe619fe0c0307f9807f9fe7f9860c0307f9806003061830603ff7f9fe7f980619860c186601fe7f830619fe7f986601807f9fe618300c1fe619fe7f9800c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe7f82ff00 +7b010010fa0072060186678066019e01986018000c18061986619860c0306198061986619860c030619806003061830603ff6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007b0100 +10fa0072060186638066018e01986018000c18061986619860c0306198061986619860c030619806003061830603ff6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007b010010fa00 +72061986639866018e61986618000c18661986619860c0306198661986619860c03061986618306183061bff6198661986619860c186619866183061986619866198661986618300c18661986619860c186618306198661986618306198661830618300c1860c186619866198661986619866182ff007b010010fa00720330 +cc618cc60186330cc330000c0cc33986339860c030618cc61986618ce0c030618cc3303033830333ff61986618cc338ce0c0ce331866183033986618ce330cc61986338300c18633986618cc0c0cc338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff007b010010fa007201e0786187 +87f9861e0781e0000c0781e9861e9860c03061878619866187a0c030618781e0301e8301e3ff61986618781e87a0c07a1e186618301e9866187a1e078619861e8300c1861e986618780c0781e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff0012010010ed00000ce6000103ffba00 +0002ff0012010010ed00000ce6000103ffba000002ff0012010010ed00000ce6000103ffba000002ff0012010010ed00000ce6000103ffba000002ff0012010010ed00000ce6000103ffba000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed +00000c9d000002ff000901001f88ff00feff0006fe008955fe0006fe0089aafe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000dfe000187ff8e000201ffc2fe000ffe0003440100f8900002011241fe000ffe000385850020900002011242fe000ffe000344c90020900002011241fe +0013fe000784690022c71c71c094000201f242fe0013fe00074441002320a28a20940002010241fe0013fe000784b1002207a0f980940002010242fe0013fe00074499002208a0804094000201fe41fe0013fe0007850d002208a28a20940002010042fe0013fe000744010022079c71c0940002010041fe000dfe000187ff +8e000201ffc2fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe0006fe0089aafe0006fe008955fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe0012fe000083fdff00f0fdff00fc95000002fe0013fe000042fd000110 +80fe00000495000001fe0013fe000082fd00011080fe00000495000002fe0013fe000042fd00011080fe00000495000001fe0013fe000082fd00011080fe00000495000002fe0014fe000a420000180010860301800495000001fe0014fe000a82000018c010860301800495000002fe0014fe000042fe0006c01086000180 +0495000001fe0014fe000a820fb338f01087c70f9e0495000002fe0014fe000a4219b318c010866319b30495000001fe0014fe000a8219b318c010866319bf0495000002fe0014fe000a4219b318c010866319b00495000001fe0014fe000a820fb318cc10866319b30495000002fe0014fe000a42019f7e7810866fcf9e04 +95000001fe0014fe000682018000001080fe00000495000002fe0014fe000642018000001080fe00000495000001fe0013fe000082fd00011080fe00000495000002fe0013fe000042fd00011080fe00000495000001fe0012fe000083fdff00f0fdff00fc95000002fe000afe0000408b000001fe000afe0000808b000002 +fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000efe000080f10000019cff0082fe000efe000040f10000019c000081fe0014fe000080f1000001d000fdaa00a8d20000 +82fe0016fe000040f1000001d1000001fd550050d2000081fe0014fe000080f1000001d000fdaa00a8d2000082fe0016fe000040f1000001d1000001fd550050d2000081fe0014fe000080f1000001d000fdaa00a8d2000082fe0016fe000040f1000001d1000001fd550050d2000081fe0021fe000080fe0009079f80000c +f003c0000cfe000001d000fdaa00a8d2000082fe0023fe000040fe00090cc180001d980660001cfe000001d1000001fd550050d2000081fe0020fe000080fd0008c180003d800660002cfe000001d000fdaa00a8d2000082fe0022fe000040fd0008c3003c6d81e6600f0cfe000001d1000001fd550050d2000081fe0021fe +000d80000007e3830006cdf333e0198cfe000001d000fdaa00a8d2000082fe0023fe000d40000007e0c6003ecd9b00600c0cfe000001d1000001fd550050d2000081fe001afe000080fd0008c60066fd9b0060030cfe0000019c000082fe001bfe000040fe00090ccc00660d9b3663198cfe0000019cff0081fe001bfe0000 +80fe0009078c003e0cf1e3c78f3ffe0000019cff0082fe0012fe000040f7000003fc0000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0070fe000080f1000301f00002fe0006e0000001000002fe00 +1e8000003800003e0001f00000010000e000000e000007c00007000001f0001cfe000610000001000002fe00247000000e00000380007c00007000001c000004000020000003e00007000004000020000007fe00011c82fe0070fe000040f1002f014000050000011000000280000500000140000044000008000040000002 +800110000011000001000008800000400022fe000628000002800005fe00158800001100000440001000008800002200000a000050fe001080000880000a0000500000088000002281fe0070fe000080f1000601400008800001fe001504400008800002200000400000080000400000044001fe000610000001000008fe00 +02400020fe001f4400000440000880000080000010000004000010000080000020000011000088fe000b800008000011000088000008fe00012082fe0075fe0001401ffaff00f8fa000601400008800001fe001504400008800002200000400000080000400000044001fe000610000001000008fe0002400020fe001f4400 +000440000880000080000010000004000010000080000020000011000088fe000b800008000011000088000008fe00012081fe0075fe00018010fa000008fa000601400008800001fe001504400008800002200000400000080000400000044001fe000610000001000008fe0002400020fe001f4400000440000880000080 +000010000004000010000080000020000011000088fe000b800008000011000088000008fe00012082fe0075fe00014010fa000008fa00060140000f800001fe001507c0000f800003e000004c000008000040000007c001fe000c10000001000009800000400020fe001f7c000007c0000f80000080000013000004c00010 +00009800002000001f0000f8fe001080000980001f0000f80000098000002081fe0075fe00018010fa000008fa000601400008800001fe001504400008800002200000440000080000400000044001fe000c10000001000008800000400020fe001f4400000440000880000080000011000004400010000088000020000011 +000088fe00108000088000110000880000088000002082fe0075fe00014010fa000008fa002f014000088000011000000440000880000220000044000008000040000004400110000011000001000008800000400022fe001f4400000440000880000088000011000004400010000088000022000011000088fe0010800008 +8000110000880000088000002281fe0079fe0005801078000380fe000008fa002901400008800000e0000004400008800002200000380000080000400000044000e000000e000001000007fe000240001cfe001f440000044000088000007000000e00000380001000007000001c000011000088fe000b8000070000110000 +88000007fe00011c82fe0017fe00054010cc000180fe000008fa0000019c000081fe0017fe00058010c0000180fe000008fa0000019c000082fe0017fe00094010c0f1e18780337c08fa0000019c000081fe0017fe000980107998318cc0336608fa0000019cff0082fe0017fe000940100d81f18fc0336608fa0000019cff +0081fe0017fe000980100d83318c00336608fa0000019c000082fe0017fe00094010cd9b318cc0337c08fa0000019c000081fe0017fe0009801078f1f7e7801f6008fa0000019c000082fe0014fe00014010fb00016008fa0000019c000081fe0014fe00018010fb00016008fa0000019c000082fe0013fe00014010fa0000 +08fa0000019c000081fe0013fe00018010fa000008fa0000019c000082fe0013fe0001401ffaff00f8fa0000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0013fe0001801ff8ff00e0fc0000019c00 +0082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0019fe00074010780003800003fe000020fc0000019c000081fe0019fe00078010cc0001800003fe000020fc +0000019c000082fe0019fe00074010c00001800003fe000020fc0000019c000081fe0019fe000b8010c0f1e187801f3ccdf020fc0000019c000082fe0019fe000b40107998318cc03366cd9820fc0000019c000081fe0019fe000b80100d81f18fc03366cd9820fc0000019c000082fe0019fe000b40100d83318c003366fd +9820fc0000019c000081fe0019fe000b8010cd9b318cc03366fd9820fc0000019c000082fe0019fe000b401078f1f7e7801f3c499820fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00 +014010f8000020fc0000019c000081fe0013fe0001801ff8ff00e0fc0000019c000082fe000efe000040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0012fe000080f1000001dc000006c2000082fe0018fe0001401ffdff0080f7000001dc00010180c3000081fe0013 +fe00018010fd000080f70000019c000082fe0017fe00014010fd000080f7000001dc000040c2000081fe001bfe00018010fd000080f7000001dc000040e800000cdc000082fe001bfe00014010fd000080f7000001dc000040e7000080dd000081fe001dfe000680100000c00080f7000001dc00018020e8000080dd000082 +fe001cfe000640100000c60080f7000001db000010e9000020dc000081fe001cfe000680100000060080f7000001db000010e9000040dc000082fe001cfe000640107d99c78080f7000001db000010e8000010dd000081fe0018fe00068010cd98c60080f7000001c1000010dd000082fe001cfe00064010cd98c60080f700 +0001dd000002e7000080dc000081fe001cfe00068010cd98c60080f7000001dd000002e7000080dc000082fe001cfe000640107d98c66080f7000001dd000002e6000004dd000081fe001efe000680100cfbf3c080f7000001dd0002020004e8000004dd000082fe001cfe000240100cfe000080f7000001db000004ea0000 +01db000081fe001cfe000280100cfe000080f7000001db000004ea000001db000082fe001bfe00014010fd000080f7000001db000004e8000001dd000081fe001ffe00018010fd000080f7000001c1000001fd000040f3000061f1000082fe0026fe0001401ffdff0080f7000001dd000008e8000002fb000030f300018080 +f40002014081fe0022fe000080f1000001dd000008e8000002fb000008f40002010080f40002041082fe001afe000040f1000001dd000008e5000080ee000040f2000081fe001afe000080f1000001dd0002080001e7000040e00002100482fe001afe000040f1000001db000001ea000004fc000002e1000081fe001cfe00 +0080f1000001db000001ea000004fc000002e30002400282fe001efe000040f1000001db000001e700042000000202f4000010f0000081fe001efe000080f1000001c000041000000401f40002100020f40002800282fe0024fe000040f1000001ee00004cf1000020e8000008fb000001f40002200020f2000081fe0028fe +000080f100010180ef000080f1000020e8000008fb000001f40002200010f5000301000182fe001cfe000040f100010140de000020e500010332ef000010f2000081fe0020fe000080f1000001ee000001f1000320000080e7000001e2000302000182fe0020fe000040f1000001ef0002020080ef000080eb000010fc0000 +08e1000081fe0021fe000080f1000001ef000002ed000080eb000010fc000008e4000304000082fe0021fe000040f1000001e2000040fa000080e6000380100080f5000040f0000081fe002cfe000080f100010110ee000020f800010110f9000003e7000340100080f50002800008f5000304000082fe002bfe000040f100 +010108f00002040020f2000040fd000020ed000020fa000080f50002800008f2000081fe0031fe000080f100010108f0000008f600010404fd000040fd000010ed000020fa000040f50002800004f5000308000082fe001ffe000040f100010104de000040fe000010e7000020f0000004f2000081fe0027fe000080f10000 +01ed000008f800010801fd0005800000401002e8000010e3000310000082fe0022fe000040f1000001ef0002100008ef0002400002ed000040fc000020e1000081fe002bfe000080f1000001ef000020f60002100040fb000040f70000fcf6000040fc000020e4000310000082fe0029fe000040f1000001e10002100028fd +00014040f900010102f1000308400020f6000002ef000081fe0037fe000080f100010102ee000002f8000020fe000002fc0002400080fb00010202f1000308400020f6000302000002f5000320000082fe0039fe000040f100010102f000044000020070f800040800800080fc0002800040fd00010201f6000040fa000020 +f6000304000002f2000081fe003ffe000080f100010102f0000040fe000080fa000620000202010080fa000020fd0002040080f7000080fa000020f60005040000020006f7000340000082fe0032fe000040f100010102ea000001f90002880001fd00048000000810fd0002040040f2000004f0000301002040f5000081fe +003bfe000080f1000001ed00050100010000c0fd0005400000200081fe0005208000200808fd0002080020f2000002ee00012020f8000340000082fe002ffe000040f1000001ef00078000010000c06020f400042000001010fc0002100010f7000080fc000080e1000081fe0040fe000080f1000001f500000efc000080fd +00012180fc000080fd000040fe000020fe000010fc0002100010f8000001fb000080fb000014eb000380000082fe003ffe000040f1000001f5000311000003fc000004f0000011f80002200008f2000302800008fd00034000001cfe000008fd00018004fe000101e0fa000081fe0057fe000080f10002010080fe00000afb +00042080000c80fe00010108fa000001fc000020fe000301000008fb0002200004f2000b018000080060000002000022fe000010fe0007808002003c000218fd000380000082fe0053fe000040f10002010080fd000080fc0005204000306001fe000088f4000002fb0002080002fd0002400004f8000002f9000008fd0007 +8000004100004010fe000080fe000342000406fe000080fe000081fe005bfe000080f100070100800e00002020fc0005404000401001fe000010fe000004fe000002fc000024f9000002fd0002400002f8000002f9000c0802060000010000808000b010fe000080fe00074100080100000c41fe000082fe0051fe000040f1 +00070100400980008010fc00044020004008f9000002f8000004fe000002fe00018002fd0002800001f2000001fe000a0201000200000100400108fd00078200010080801001fa000081fe005afe000080f100040100001040f900048010008004fd000040fe000002fe000002fc000014fe0005140000048001fe000001fe +0000c0f3000001fc0008800000400100200208fd0002020000fe80052000c0000009fe000082fe0057fe000040f10012010000202002000800000c0000800801000402fe000040fe000002f400040800000480fd000001fe00013ffcfa000004fc000001fb00070400000200200204f900078040400020004004fe000081fe +0056fe000080f100040100004010fc0008130001000402000202f6000004fc000008fe000308000001fc000002fd000003fa000004fc000002fe000008fd0005200400100404fa0008010040800020008002fe000082fe004efe000040f10011010000400804000400002080010002040002fd000040f0000008f9000002fc +0000c0f5000f02800004100020080000040010080220fe0008080000810021000010fb000081fe005cfe000080f100040100208008fc00074040020001180002fd000090fa000008fc000004fe000308000001fe0002540004fc000030f5000f02800002000010000020080010080120fe000b48000042001e000008000003 +fe000082fe005dfe000040f10012010020800408000100018040020000e0000104fe000090fb000007fb000010fb000601000041000008fc000008fc00014008f9000002fe0008200000080008100140fe000340000002fd000308010001fe000081fe005afe000080f100040100110004fd000302002004fd000401040000 +01fc0003400018c8fc000012f8000340010008fc000008fd0002011010f900010220fd0006101000081000c0fe000340000004fd000304020002fe000082fe005cfe000040f1000c01001100021000008004002004fd000001f8000340002020fc000020fe000910000002000024000010fc000004fd00010404fa000e4000 +00400004800000100004600040fe000350000024fd000002fb000081fe0061fe000080f100040100020001fd000318001008fd000001fd000004fd000320004030fc000d2102000014000004800020004020fc000502000e000010f9000040fd000906000008200003800040fe000310000018fd00070200000480000082fe +005bfe000040f1000c01000200009000004020000808fc000090fe000004fd000320018010fb000c08800004000004800010000020fc00070100118000400120fc000008f8000040fd000020fb000008fd00070104000040000081fe005afe000080f10005010004000080fe000320000410fc000090f800020e0010fb0006 +a0200004000004fd00011040fb000680204001000020fc000008fe000680000800000880fd000020fb000008fd000301080008fe000082fe005afe000040f1000c010004000060000020c0000220fc000340000004fb0002f00008f8000022fc000320000480fb000640c02004000080fc000610200001000002fe000080fe +00010120fe000340000018fd000001fb000081fe005cfe000080f1000c01000c000020000003000001c0fc00044000000802fd000303000028fa0002100040fe00006afd000080fb0002230020f8000910100001000011000005fd00010210fe000360000014fc000685000810000082fe0059fe000040f100080100080000 +60000014fd000018fd00046000001002fd000304000008fc000080fa0004c000880001fa00061c001010000040f9000001fc000002fd00010410fe0003202a8020fc000690000010000081fe005cfe000080f1000801000c00001000000cfd00018180fe000c6000001000000140000c000088fc0002800008fb0003820800 +01f8000010fe000040f9000601000020000004fd00010408fe000310800020fc0002b04010fe000082fe005cfe000040f1000801000c000090000011fe000001fc000020fc000604100008000004fd000001fd000980000021002804000280f900040820000040fb0002280002fe0002800008fc000008fe000390002042fc +000040fc000081fe005cfe000080f10008010010000008000010fc000010fe00001efd001a8000040014000104000003c002000004009f800020100004000420f9000004f700008afd000380400012fc000004fe000382000841fc000660102004000082fe005dfe000040f1000901001000010800002040fe00050a080000 +0110fd000b401001002400000200000420fc00046140004410fe00010408f900040280000120fc000380000002fc000060fc000002fb000080fc000660000004000081fe0066fe000080f10008010020000008000040fe0008080001000001081802fe000a40005040001b0200000810fe00050200813c0080fd000008f800 +0302000001fe000c7800010100800a800100000181fd00012001fe000308000280fc0006e0044000007082fe0067fe000040f1001501004000010400008010000010208080000008061d40fd0008048001808100001008fd00040180838010fd00011002f9000003fe001d1006008407860402002440010010020001e00000 +4000c000012000010080fd00011001fe00018c81fe0064fe000080f100070100410000040001fa001d014000050000100019000001000804110000100400004001010000604004fe000010f700208000080081808380055002401e60000010020080015000800020000104000280bcfd000690000002010282fe006bfe0000 +40f1001501208100020200020004014000404004101904400404fd00170240101008800020026001b0000100001d00040001002001fc00220400000440001e090001070011fc0000e1200200000c0000040001000010000087fe0cfc00070208000001020281fe006efe000080f1002a01cd00800001000403870410400000 +18040212101000000200000c11804000400061c180060c0082000017fe00040100400004fd000051fe00152000a1820002008006038010912807c0003000200008fe000628000a0e01f040fd00010308fe0002040182fe0070fe000040f1002b0102c8a0040080040401d386810010100086020540010001001ff001850000 +40018020800803000c000060c0fe0021818001c300000755004000101009c044000480405800610590c01c3004c000001102fe00088680a4320000042020fe00060401e000040c81fe0070fe000080f100660102860800004008000004058006004181004101f0000fe881e00005500001a006000040100099d00040003001 +038082000200800091800010804009938030000b5520600018830022600c0700003040c080000205ea084000001810183d403404021800283082fe0070fe000040f10066013c28c20887a0300000cc05048184079c0100860e00b03e1200001820000011e80018202001c0200029000f00bc78020004100803401f87870780 +9f86078d103000518000060c00119003fde000c10021ea0008086c018000002005441038003a0c0700100081fe000efe000080f10000019cff0082fe000efe000040f10000019cff0081fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b00 +0002fe000afe0000408b000001fe000efe000080f10000019cff0082fe000efe000040f10000019c000081fe0014fe000080f1000001e000fbaa00a0c4000082fe0016fe000040f1000001e1000001fb550040c4000081fe0014fe000080f1000001e000fbaa00a0c4000082fe0016fe000040f1000001e1000001fb550040 +c4000081fe0014fe000080f1000001e000fbaa00a0c4000082fe0016fe000040f1000001e1000001fb550040c4000081fe0021fe000d80000001878f0000fc600060000cfe000001e000fbaa00a0c4000082fe0023fe000d400000038cd98000c0e000e0001cfe000001e1000001fb550040c4000081fe0021fe000d800000 +058cc18000c16001e0002cfe000001e000fbaa00a0c4000082fe0023fe000d4000000180c1803cf861e3600f0cfe000001e1000001fb550040c4000081fe0021fe000d800003f183870006cc633660198cfe000001e000fbaa00a0c4000082fe0023fe000d400003f18601803e0c6306600c0cfe000001e1000001fb550040 +c4000081fe001bfe000d800000018c0180660c6307e0030cfe0000019c000082fe001bfe000d400000018c198066cc633063198cfe0000019cff0081fe001bfe000d80000007efcf003e79f9e0678f3ffe0000019cff0082fe0012fe000040f7000003fc0000019c000081fe000efe000080f10000019c000082fe000efe00 +0040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe006ffe000080f1000d01007c00001f0000200001c00010fe0017040000100000e0001f000007c002000001c00000e000007cfd001d3e00003800000400008000400080007000001c0000070003e0001c00001cfe0018 +40004000007c000e0000200000080001c000000e00000e0082fe006ffe000040f1000d0100100000040000500002200028fe00170a0000280001100004000001000500000220000110000010fd001d0800004400000a00014000a0014000880000220000088000800022000022fe0018a000a0000010001100005000001400 +02200000110000110081fe006ffe000080f1000d0100100000040000880002000044fe00131100004400010000040000010008800002000001fe000010fd003008000040000011000220011002200080000020000008000080002000002000000110011000001000100000880000220002fe0005100000100082fe0074fe00 +01401ffaff00f8fa000d0100100000040000880002000044fe00131100004400010000040000010008800002000001fe000010fd003008000040000011000220011002200080000020000008000080002000002000000110011000001000100000880000220002fe0005100000100081fe0075fe00018010fa000008fa0024 +0100100000040000880002000044001f001100004400010000040000010008800002000001fe0035100000f80008000040000011000220011002200080000020000008000080002000002000000110011000001000100000880000220002fe0005100000100082fe0074fe00014010fa000008fa000d0100100000040000f8 +000200007cfe00131f00007c0001300004000001000f800002000001fe000010fd00390800004000001f0003e001f003e000800000260000098000800026000020000001f001f000001000130000f800003e0002600000100000100081fe0074fe00018010fa000008fa000d0100100000040000880002000044fe00131100 +004400011000040000010008800002000001fe000010fd003908000040000011000220011002200080000022000008800080002200002000000110011000001000110000880000220002200000100000100082fe0074fe00014010fa000008fa000d0100100000040000880002200044fe0017110000440001100004000001 +000880000220000110000010fd003908000044000011000220011002200088000022000008800080002200002200000110011000001000110000880000220002200000110000110081fe0078fe0005801078000380fe000008fa000d0100100000040000880001c00044fe0017110000440000e000040000010008800001c0 +0000e0000010fd00390800003800001100022001100220007000001c000007000080001c00001c000001100110000010000e0000880000220001c000000e00000e0082fe0017fe00054010cc000180fe000008fa0000019c000081fe0017fe00058010c0000180fe000008fa0000019c000082fe0017fe00094010c0f1e187 +80337c08fa0000019c000081fe0017fe000980107998318cc0336608fa0000019cff0082fe0017fe000940100d81f18fc0336608fa0000019cff0081fe0017fe000980100d83318c00336608fa0000019c000082fe0017fe00094010cd9b318cc0337c08fa0000019c000081fe0017fe0009801078f1f7e7801f6008fa0000 +019c000082fe0014fe00014010fb00016008fa0000019c000081fe0014fe00018010fb00016008fa0000019c000082fe0013fe00014010fa000008fa0000019c000081fe0013fe00018010fa000008fa0000019c000082fe0013fe0001401ffaff00f8fa0000019c000081fe000efe000080f10000019c000082fe000efe00 +0040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0013fe0001801ff8ff00e0fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe000180 +10f8000020fc0000019c000082fe0019fe00074010780003800003fe000020fc0000019c000081fe0019fe00078010cc0001800003fe000020fc0000019c000082fe0019fe00074010c00001800003fe000020fc0000019c000081fe0019fe000b8010c0f1e187801f3ccdf020fc0000019c000082fe0019fe000b40107998 +318cc03366cd9820fc0000019c000081fe0019fe000b80100d81f18fc03366cd9820fc0000019c000082fe0019fe000b40100d83318c003366fd9820fc0000019c000081fe0019fe000b8010cd9b318cc03366fd9820fc0000019c000082fe0019fe000b401078f1f7e7801f3c499820fc0000019c000081fe0013fe000180 +10f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0017fe00018010f8000020fc000001c3000006db000082fe0013fe00014010f8000020fc0000019c000081fe001bfe0001801ff8ff00e0fc000001c3000010e0000040fd000082fe0017fe000040f1000001c300012040e1000090fd00 +0081fe0016fe000080f1000001c2000020e1000004fd000082fe0012fe000040f1000001a2000002fc000081fe0016fe000080f1000001c3000040e0000002fd000082fe001cfe0001401ffdff0080f7000001c300018010e2000004fc000081fe001ffe00018010fd000080f7000001da000080ea000008e1000001fd0000 +82fe001bfe00014010fd000080f7000001db000001c9000004fc000081fe001ffe00018010fd000080f7000001db000001ea000080e0000001fd000082fe0022fe00014010fd000080f7000001db00010208ec0002010004e2000008fc000081fe0020fe000680100000c00080f7000001da000004ea000002e0000080fe00 +0082fe001cfe000640100000c60080f7000001da000004ca000008fc000081fe0020fe000680100000060080f7000001da000004ec000002de000080fe000082fe001efe000640107d99c78080f7000001c40002020001e2000010fc000081fe0020fe00068010cd98c60080f7000001db000008e9000001e0000080fe0000 +82fe001cfe00064010cd98c60080f7000001db000010c9000010fc000081fe0020fe00068010cd98c60080f7000001db000010eb000002de000040fe000082fe0024fe000640107d98c66080f7000001db00011001ec000304000080e3000020fc000081fe0024fe000680100cfbf3c080f7000001e0000080fc000001e900 +0040e1000040fe000082fe0021fe000240100cfe000080f7000001e100010220fc000001ca000020fc000081fe0020fe000280100cfe000080f7000001da000001ec000004de000020fe000082fe0023fe00014010fd000080f7000001e100010408e6000304000020e3000020fc000081fe001ffe00018010fd000080f700 +0001db000040e8000020e1000020fe000082fe0020fe0001401ffdff0080f7000001e100010804fd000040c9000040fc000081fe001efe000080f1000001db000080eb000008ed000080f3000020fe000082fe002cfe000040f1000001fd000060e600011002fd0002800080ed000308000010f1000001f4000040fc000081 +fe002dfe000080f1000001fd000018e40002800280fe000080ea00040800000380f500010208f3000010fe000082fe002bfe000040f1000001eb000080f800042000000420fe000080e7000004f400010408f5000040fc000081fe002bfe000080f1000001fe000008ef000040f600014010fd000040ed000010ed000004f3 +000010fe000082fe0029fe000040f1000001fe000010e500042000100010e9000310000002f0000004f5000080fc000081fe0026fe000080f1000001fe000020f000010408f6000305400001e7000001e1000010fe000082fe002cfe000040f1000001fe00014001f100010404f8000020fe00011002e6000320000010e700 +0080fc000081fe0029fe000080f1000001fc000080e2000002ea000010fe000310001010f5000020f2000008fe000082fe002ffe000040f1000001fc000080e7000040fe00040802000020ed000020fc00012010f5000020f5000001fb000081fe002bfe000080f1000001fc000080f200011002f0000020e9000302002010 +f500014002f3000008fe000082fe002efe000040f1000001ec00012002f8000040fe000004fe000020e90002020020f400014002f6000001fb000081fe0021fe000080f1000301000004dc000020ed000020ed000001f3000004fe000082fe0029fe000040f1000301000008e4000080fe000004fd000003ee000020ed0000 +01f6000001fb000081fe0027fe000080f1000301000030ef00014001f3000004fe00010820ea000080e3000004fe000082fe0032fe000040f1000201001cfe000020f20002400080fa000001fd00010204fe00011010ea0002800004e8000001fb000081fe002dfe000080f10002010010fe000020e2000008fb000010f100 +0040fc00018004f6000001f1000004fe000082fe0033fe000040f1000001fc000010e8000001fd0008020800001000040028f1000040fc00018004f6000002f4000002fb000081fe0031fe000080f1000001fc000010f20002800080f100041040040040ec0002410004f6000302000080f4000004fe000082fe0031fe0000 +40f1000001ed000301000040fa000001fd000002fe00011080e9000041f4000302000080f7000002fb000081fe001efe000080f1000001d9000010ed000080ec000080f4000002fe000082fe002efe000040f10002010080e4000002fd000001fc0002010001fc000004f7000080ec000040f7000002fb000081fe0032fe00 +0080f100010101ee000301000040f40008100000010001000080fd00000bf2000020f0000003f5000002fe000082fe003afe000040f100010101fd000008f3000302000020fa000002fd000901100000010000020080fd00011080f30002100002e8000002fb000081fe003dfe000080f100010101fd000004ef000002f500 +0010fc0002020080fd00012040f8000080fd0002020002f6000008fe00011020f600040200080082fe0042fe000040f1000001fc000004ef000001fb000004fc0006a0000004000084fb00012020f9000001fc0002020002f6000008fe00012010f9000004fd0002200081fe0042fe000080f1000001fc000004f700000efe +00070200001000c003c0f5000306000084fb00014010f30002120001fd000008fb000310000020f400040100020082fe0046fe000040f1000001f7000040fc000009fe0007040000104000c002fe000008fc000340000006f800018010f3000014fb000002fb000310000020f7000004fd0002400081fe0042fe000080f100 +0001f7000010fc000310800070fd000340030001f5000004fb00042a00008008f9000001f5000020fc000060fe0002204004f600040100010082fe004efe000040f100010104f900010104fc000320400088fd000080fe000380000008fc000040fc0008400020008000010004f9000001f5000001fc000090fe0002204004 +f9000008fd0002800081fe0049fe000080f100010108f20008202000840800000880fe000040f9000f4000000400002000200000800100020ef4000008fe00030c000040fc000090f8000103e0fa000380010082fe0057fe000040f100010108fd000002fe00010201fc00074020010208000008fa000008fc000f60000004 +000020001001000002000231f400040800008040fe000080fe00010108f800010c10fe000008fe000301000081fe0051fe000080f100010108fd000002fa000670000040100202f0000040fc000a200010000040020001c0c0fb000002fc00070800008040c00080fd0002020840fe00018001fe0001700cfa000340008082 +fe005efe000040f1000001fc000001fe000a0400400188000080080401f6000010fc000ca0000002000060000002000004fe000040fb000002fc000310000080fd000040fe0010020440000001000100c000800200026008fe000302000081fe0056fe000080f1000001fc000001fb000b020600008008040110000008f100 +030a000050fd00011004fe000020fc000050fb000614000080000001fe00100180040440000010000001300080020004fc000340004082fe005cfe000040f1000001f8000e080020020100010004080090000004fd000308000010fc000310000009fc000304000008fe000020f5001c140000010008000020000240040240 +0000100000020803000100000410fe000304000081fe0053fe000080f1000001f5000b0400c0010004100080000004fd000008f6000001fa00010808fe000010fd0002010804f9000301000402fe00030c200802fe000a1200008204040001000002fd000310002082fe0057fe000040f100010140f9000e10000808002002 +0002200080000008fd000304000020fc000008fd00077810000804000010fe000010fb000004f40005200030100802fe000b120000840218000080100010fb000081fe0057fe000080f100010140f6000b100010020001c0006000000afd0001041cfb000001fe000910000008000400000220fe000008fd00010202fb0000 +02fc00070400001fc0081002fb00060801e000008010fa00012082fe005afe000040f100010180fc00098000001000041000080cfd000340000002fc0002620040fd0005020400001004fe00040408000020fe000008f5000d0200004400020000102000081001fb000010fd000340000090fb000081fe005dfe000080f100 +010180fc000040fc000320000410fd000040f900018180fc000002fd000008fe000402000001c0fe000004fd0002020108fc000d4000004400010400004000042001fe000304000030fd000340000080fc00011082fe005efe000040f1000001fb000941c000200002200003e0fd000040f90002804080fd00010204fe0007 +3000040000100001fd000304000001fe000008fc000340000040fd000c08800002200180000004000020fd000340400020fb000081fe005afe000080f1000001fb00014220fd000040fa000060fe000080fe0002010020f8000320000004fd00010e40fe000704000006c0040080fc0014810000400000080000800001c002 +80000008000020fd00014040fa00010982fe005bfe000040f1000001fb000604180040000140fa0004a000002080fe0002030011fb000302000020fc0002200010fd00050201f0082004fa0008810000080000800009fd00070285000008000040fd000320000020fc00010181fe0052fe000080f1000001fb00010804fd00 +0080fd000608000010000020fd0002030008f000012020fe00070202081010000020f90005080000900001fc0006d0400008000090fd000320000020fc00010682fe0058fe000040f100010180fc000608040080000080fd000602000010000040fd0002050006fb000001fc0005020001200040fd0007010206e010080020 +f4000006fd0007014000000c000090fd000310800040fc00010281fe005dfe000080f100010180fc00011003fe000001fc00072000011000004040fe0002048002fc000004fe000940000001000080004010fd00068c010008000010fb000080fd0002100004fd0002044020fe000001fc000011f900010282fe005cfe0000 +40f100010140fc000610008080000140fd000301000108fe000040fe0002040006fc00040800800040fc0002c00080fc00049000000810f900074000100000200006fd00011040fd000003fc000310000048fc00010481fe005afe000080f100010140fc0005300080000002fc000340000008fa0002080001fc000008f800 +0340010008fd000660000004000044fd000004fe0005100000200008fd00070820100010000408fd000310000008fc00010582fe005efe000040f100010140fc0008300041000002200001fe0002400008fc0004e000080009fe0004c000100040fc000380004002f8000304100080fd000008fe000010fd000009fd000748 +20000010000408fd00030a000080fc00010881fe005ffe000080f100010140fc000e480020000004000007400080000404fe000a2000940010000080000130fc000080fe0005800000020004fa000302000002fd000910202000100000400010fd000610100800020008fc00000af900010882fe0060fe000040f100010120 +fc001c4000120000042000040f80002004040002001001080010202040000208fe00021000e0fc00018004f800010220fb000a1008200040000010002088fe0006901000000a0010fc000308000104fd0002c01081fe0062fe000080f100010120fc000a8000100000080000081819fe000e0200020000024a002020004000 +0208fd00010110fb0002040002fa000301000601fc00098200007000008800c022fd000610040022002004fd000304000002fe000301201082fe0064fe000040f100010120fc000680000e00000810fe00120600100002000c005004040020208020000404fe0002080210fe000340011008f90002028140fa0003f0000048 +fe000d0700008000020008000022002002fd000a0c000200f8000002182081fe006afe000080f100010110fd002301000001c000100000208803000010020301f8080484804092001000040400800000050cfe000620001010000080fc0011082080200080003800030910008600010038fd00070200080400808040fc000a +0a00080104000004044082fe006afe000040f100010110fe002b3d020300083000100800010000800410010e8e07040802004228000800080200800004060200e00000020990f9001820088040180000c6010204480185000005c0000040001c0008fd0002800020fe000a0200a00104000008038081fe006ffe000080f100 +240108000400c702064000080020000040040410000001003001810a02408800000800100103fe00210802013a001c0fdc300000400a000a003800024200e72001010206020002848002fefd000a0610000402024021000188fe000a0102000683000008000082fe006ffe000040f10023010400114110cc0100110f002002 +01980000080280008810044010010080000006001001fe002101100bae848073f4a0680078000080388040c0003c0100800100c40c012004024007fd000b1008000004cc008002000302fe000a23a8800800800010000181fe0070fe000080f10048011500400200701118420a08c000060002080801aa8160680040e400a1 +20000001006003c00000cc240088022088091c800800602000c070002001900607f057552410010408022018fe000c0e00114000020084008c0009e0fe000a688371c800800321c00082fe0070fe000040f10055010251001dd90441858d919f8001098000a007980026a38008302800ce800fcc03c0cc8043fc1e00e8a070 +010301980301f00018802000080b99980ff800150800505000ca12a01620009000101e140000050090004afe000d81e000805cec31207f1070200081fe000efe000080f10000019cff0082fe000efe000040f10000019cff0081fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe +0000408b000001fe000afe0000808b000002fe0006fe008955fe0006fe0089aafe000283000283000283000afe000002b5aa00a8d4000afe000005b5550050d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b500 +0010d4000afe000002b5000008d4000afe000004b5000010d40015fe00020200f0fe00053000cc600060c0000008d40015fe0002040198fe00053000cc600060c0000010d40015fe0002020180fe00053000cc000060c0000008d40017fe000d040181e3cf8f3e00cce3e3e79980c2000010d40017fe000d0200f3306cd9b3 +00cc63366cd980c2000008d40017fe000d04001bf3ec183300fc63366cd980c2000010d40017fe000d02001b066c183300fc63366cdf80c2000008d40017fe000d04019b366c19b300cc63366cdf80c2000010d40016fe000c0200f1e3ec0f330085fb33e789c1000008d4000afe000004b5000010d4000afe000002b50000 +08d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d40014fe000004f600007ff9ff00c3f9 +ffd2000010d40014fe000002f6000040f9000043f9ffd2000008d40014fe000004f6000040f9000043f9ffd2000010d40014fe000002f6000040f9000043f9ffd2000008d4001cfe000304001f0cfe00010180fe000040f9000043f9ffd2000010d40025fe00080200198c0000030180fe000343000018fe0003180043f8fc +ff019fffd2000008d40026fe000604001980000003fe00050c0043000018fe0004180043f27ffdff019fffd2000010d40025fe000f0200199c7c78f3c3879f1e0043000018fe0003180043f3fcff019fffd2000008d40027fe001d0400198c66cd9b018cd98c0043e3c799b33cf8f9e043f3e183330c1c187fd2000010d400 +27fe001d0200198c60fd83018cd9800043306cdb3306cd9b3043e1cc9933e4c9933fd2000008d40027fe001d0400198c60c183018cd980004333ec1e333ec1998043f3cc9f3304f999ffd2000010d40027fe001d0200198c60cd9b318cd98c0043366c1e3f66c1986043f3cc9f0264f99e7fd2000008d40027fe001d04001f +3f6078f1e7e7999e0043366cdb3f66c19b3043f3cc9f0264f9933fd2000010d40021fe000002f800130c0043e3e799923ec0f9e043f3e19fb704fc187fd2000008d40014fe000004f6000040f9000043f9ffd2000010d40014fe000002f6000040f9000043f9ffd2000008d40014fe000004f6000040f9000043f9ffd20000 +10d40014fe000002f6000040f9000043f9ffd2000008d40014fe000004f600007ff9ff00c3f9ffd2000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004 +b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4002afe000002f600007ffaff00e1f6ff01f87ffaff00e1f8ff01fe1ffaff01f87ffbff00f0faff01e008d4002bfe000004f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00 +011080fb00012010d4002bfe000002f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012008d4002bfe000004f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012010d40033fe000202000ffe000203000cfe000040fa000021f600010840fa0000 +21f8ff01fe10fa00010840fb00011080fb00012008d4004ffe000804001980000003000cfe001943e0000600180000210f8000063000000cc000000843f000003ffe000221f87ffdff05fe7ffffe1078fb00110843e000181c00001083c0001c1800002010d40050fe0002020018fe001f03000c000c004330000630180000 +210cc000063000000cc000000840c000000cfe000c21f33fffff3ffcfe7ffffe10ccfb001108433000180c0000108660000c18c0002008d40050fe00100400181e3cf8f3e00f999e004330000030fe0004210cc00006fe00090ec000000840c000000cfe000621f33fffff3ffcfeff02fe10c0fb001108433000180c000010 +8660000c00c0002010d40053fe004d02000f3306cd9b300cd98c004333c78e3c3879f0210ccf1e3e71f1d00ecf363c0840c3c7400c66f8f021f320c1c30f0c3c7860fe10c0f1f6679f1e3c084337c79f0c3cd810866ccf0c38f1982008d40053fe004d040001bf3ec183300cd9800043e66cc63018cd98210f998366319b30 +0fc1bf660840c06cc00c66cd9821f0264c993fe4fe73267e10799b366cd9b3660843e66cd98c66fc10866cc18c18c1982010d40053fe004d020001b066c183300cd98000430666063018cd98210f1f9f66319b300dcfbf7e0840c3ecc00c66cdf821f3264c993f04fe73267e100dfb366fd9b07e0843060cd98c7efc10866c +cf8c18c1982008d40053fe004d040019b366c19b300ccf8c00430661863018cd98210d9833663199e00dd9b3600840c667800c66cd8021f3264c993e64fe73267e100d83366c19b0600843060cd98c60cc10876cd98c18c1982010d40053fe004d02000f1e3ec0f3300f819e0043066cc63318cd98210cd9b366319b000cd9 +b3660840c66c000c3ef99821f3264c993264ce73267e10cd99f66cd9b3660843060cd98c66cc1086ecd98c18ccf82008d4004efe000004f90044198c004303c79f9e7e7998210c4f1f3efd99e00ccfb33c0840c3e7800c06c0f021f3264cc387061818667e1078f033e7999e3c084306079f3f3ccc1083c7cfbf7e78182010 +d4003dfe000002f900030f000040fa000021fc00010330fd00090840000cc00066c00021f8ff04fe10000030fd00010840fb0002108060fe000301982008d40038fe000004f6000040fa000021fc000101e0fd00090840000780003cc00021f8ff04fe10000030fd00010840fb00011080fc0002f02010d4002bfe000002f6 +000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012008d4002bfe000004f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012010d4002afe000002f600007ffaff00e1f6ff01f87ffaff00e1f8ff01fe1ffaff01f87ffbff00f0faff01e008d4000afe +000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b50000 +08d40015fe000004f600007ff6ff01fe1ff5ffda000010d40017fe000002f6000040f600010210f6000001da000008d40017fe000004f6000040f600010210f6000001da000010d40017fe000002f6000040f600010210f6000001da000008d4001efe000004fd000301980380fe000040f600010210f6000001da000010d4 +001ffe000002fd000301980180fe000040f600020210fcf7000001da000008d40020fe000004fd000701980180000c0040f60002021030f800010c01da000010d40022fe000002fc000691e18ccf1e0040f60005021030000003fb00010c01da000008d40025fe000004fc000690318cd98c0040f6000d0210319be3c7801e +3cd9b1e7cf01da000010d40025fe000002fc0006f1f18cdf800040f6000d0210319b3663003366fdfb366c01da000008d40025fe000004fc000663318cd8000040f6000d0210319b37e0003066fdfbf66c01da000010d40025fe000002fc000663318cd98c0040f6000d0210319b3600003066cd9b066c01da000008d40025 +fe000004fc000661f7e7cf1e0040f6000d021030fbe663003366cd9b366cc1da000010d40021fe000002f800020c0040f6000d0210301b03c7801e3ccd99e66781da000008d4001bfe000004f6000058f600050210019b0003fa000001da000010d40019fe000002f600007cf60003021000f3f8000001da000008d40017fe +000004f6000066f600010210f6000001da000010d40017fe000002f6000040f600010210f6000001da000008d40015fe000004f600007ff6ff01fe1ff5ffda000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe0000 +04b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d40012fe00010201fbff03e1fffffec0000008d40012fe00010401fb000321000002c0000010d40012fe00010201fb000321000002c00000 +08d40012fe00010401fb000321000002c0000010d40012fe00010201fb000321000002c0000008d40014fe00010401fd0005018021001802c0000010d40014fe00010201fd0005018021001802c0000008d40014fe00010401fd0005018021001802c0000010d40015fe000b0201078f1e7c79f021079982c0000008d40015 +fe000b04010cd98366cd98210cdb02c0000010d40015fe000b0201061f9f60c198210cde02c0000008d40015fe000b040101983360c198210cde02c0000010d40015fe000b02010cd9b360cd98210cdb02c0000008d40015fe000b0401078f1f60799821079982c0000010d40012fe00010201fb000321000002c0000008d4 +0012fe00010401fb000321000002c0000010d40012fe00010201fb000321000002c0000008d40012fe00010401fb000321000002c0000010d40012fe00010201fbff03e1fffffec0000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5 +000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000005b5550050d4000afe000002b5aa00a8d400028300028300028300028300028300028300028300028300028300028300028300a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.3\tab A typical display from the contig editor in XBAP\par +\pard\plain \s4\qj\sb160\sa120\sl280 \f20 The four scroll buttons operate as follows\:\par +\pard \s4\qj\li1720\sa120\sl280\tx4520 "<<"\tab Scroll left half a screenful\par +"<"\tab Scroll left one character\par +">"\tab Scroll right one character\par +">>"\tab Scroll right half a screenful\par +\pard \s4\qj\sa120\sl280 +The Editor cursor can be positioned anywhere in the edit window by moving the mouse pointer over the character of interest, then pressing the left mouse button. The Editor cursor can also be moved by using the direction arrow keys.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.2\tab Editing operations \par +\pard\plain \s4\qj\sa120\sl280 \f20 The editor operates in two main edit modes - Replace + and Insert. Replace allows a character to be replaced by another. Insert allows characters to be inserted into a reading. Characters are entered by typing them from the keyboard. Only valid characters are permitted. Characters can be deleted by positionin +g the cursor one character to their right, then pressing the delete key. Normally Insert and Delete apply to the consensus line of the contig only. This restraint can be overridden by using the "Super Edit" mode of operation, though it should be employed w +ith caution as misuse may corrupt alignments.\par + +Edits can also be performed on the consensus, though they are restricted to insertion and deletion of padding characters ("*"). These edits also have special meanings. A deletion will delete all characters at the position to the left of the cursor in the c +ontig, and move the relative positions of all sequences starting to the right of the cursor position left one character. An insertion will insert the character typed ("*") into all gel reading sequences at the +cursors position in the contig, and move the relative positions of all sequences starting to the right of the cursor position right one character.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.3\tab Use of buttons \par +\pard\plain \s4\qj\sa120\sl280 \f20 The effect of the last edit can be undone by pressing the "Undo" button at the top of the editor window. Pressing it n times will undo the last n edits.\par +\pard \s4\qj\sa120\sl280 The cursor will automatically be positioned at the next problem when the "Find Next Problem" button is selected. The next problem is where the consensus shows either a disagreement ("-") or a pad ("*") character.\par +\pard \s4\qj\sa120\sl280 The edits to the contig can be saved by pressing the "Leave Editor" button and replying "Yes" to the prompt to "Save changes?".\par +As no changes are made to the working copy of the database until this point it is possible to abort the editor if the edit session ends up in an unsatisfactory state.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.4\tab Displaying traces for readings from fluorescent sequencing machines\par +\pard\plain \s4\qj\sa120\sl280 \f20 The original trace data from which the gel reading sequences were derived can be seen by double clicking (two quick clic +ks) with the middle mouse button on the area of interest. The trace will be displayed with the point clicked at the centre of the trace viewport. All traces that are displayed are maintained in one window, which will display a maximum of four traces. When +four traces are already being displayed and a new one is requested, the one at the top of the window is removed and the new one is added to the bottom. Traces can be removed individually by using the "quit" button in the panel next to the trace. \par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.5\tab Extending reads with the unused data\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Sequence data from fluorescent sequencing machines is normally clipped to remove the primer region and the poor quality data from the 3' end is marked to be ignored during assembly. Only the sequence used during assembly is made visible in the XBAP editor. + However the unused data is copied into the database and can be viewed from within the editor. Also the position of this "cutoff" can be altered. To display the unused sequences, press the "Display Cutoff" button at the to +p of the editor window. The cutoff sequence appears in grey. This sequence can be incorporated into the editable sequence, by moving the cutoff position. This is done by positioning the cursor at the end of the sequence, and using Meta-Left-Arrow and Meta- +Right-Arrow to adjust the point of cutoff. The Meta key is a diamond on the Sun keyboard.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.6\tab Using the pop-up menu\par +\pard\plain \s4\qj\sa120\sl280 \f20 A pop-up menu is revealed by depressing the "Control" key on the keyboard and at the same time pressing the left mouse button.\par +\pard \s4\qj\sa120\sl280 The menu has the following functions\:\par +\pard\plain \li1880\sl220 \f4\fs16 Find Next Problem\par +Highlight Disagreements\par +Save Contig\par +Create Tag\par +Edit Tag\par +Delete Tag\par +Search\par +Select Oligo\par +\pard\plain \s4\qj\sa120\sl280 \f20 \par +\pard \s4\qj\sa120\sl280 "Find Next Problem" and "Save Contig" are described above. Operations on tags are described in the section on annotation below, and then searching is outlined.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.7\tab Annotating readings\par +\pard\plain \s4\qj\sa120\sl280 \f20 Parts of a sequence can be annotated to record the positions of primers used for walking, or to mark sites, such as compressions, that have caused problems during sequencing. The annotations ar +e termed "tags". Each tag has a type such as "primer", a position, a length and a comment. Each type has an associated colour that will be shown on the display. First the segment to tag is selected, then it is annotated. The consensus sequence cannot be a +nnotated.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.8\tab Creating a new annotation\par +\pard\plain \s4\qj\sa120\sl280 \f20 Use the left mouse button to position the start of the selection. While this button is being held down, move the mouse to the other end of the segment. The selection can be extended further using the right mouse bu +tton. To create the annotation, invoke the pop-up menu, and select the "Create Tag" function. A small "tag editor" will appear which allows users to select the type of the annotation from a pull-down menu, and specify a comment if desired. To select a new +type pull down the Type menu, and select the entry desired. To enter a comment, simply type into the text window in the tag editor. The annotation is created when the "Leave" button on the tag editor is pressed, and is displayed in the colour defined in th +e tag database file (TAGDB).\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.9\tab Editing an existing annotation\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Position the cursor with the left mouse button on the tag, and select the "Edit Tag" off the pop-up menu. This invokes the tag editor, and changes to the type and comment of the annotation can be made. The tag is updated when the "Leave" button is pressed. +\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1180 \b\f20 2.6.10\tab Deleting an annotation\par +\pard\plain \s4\qj\sa120\sl280 \f20 To delete an existing annotation, position the cursor with the left mouse button on the tag, and select the "Delete Tag" off the pop-up menu.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1160 \b\f20 2.6.11\tab Searching\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Selecting "Search" brings up a window which can remain present during normal editor operation. The window allows the user to select the direction of search, the type of search and a value to search on. The value is entered into a value text window, then pr +essing the "search" button performs the search. If successful, the cursor is positioned accordingly. An audible tone indicates failure. Pressing the "ok" button removes the search window. The search window is automatically removed when the contig editor is + exited. There are seven different search modes.\par +\pard\plain \s9\fi-560\li1120\sb180\sa60\sl280\tx1700 \b\f20 2.6.11.1\tab Search by position\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This positions the cursor at the numeric position specified in the value text window. Eg a value of "1234" causes the cursor to be placed at base number 1234 in the contig. Positioning withing a reading is achieved by prefixing the number with the "@" char +acter, eg "@123" positions the cursor at base 123 of the sequence in which the cursor lies. Relative positions can be specified by prefixing the number with a plus or minus charac +ter. Eg "+1234" will advance the cursor 1234 bases. If possible, the cursor is positioned within the same sequence. The direction buttons have no effect on the operation of "search by position".\par +\pard\plain \s9\fi-560\li1120\sb180\sa60\sl280\tx1720 \b\f20 2.6.11.2\tab Search by reading name\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This positions the cursor at the left end of the gel reading specified in the value text window. If the value is prefixed with a slash it is assumed to be a gel reading name. Otherwise it is assumed to be a gel reading number. Eg "123" positions the cursor + at the left end of gel readi +ng number 123. "/a16a12.s1" positions at the start of reading a16a12.s1. If the value was "/a16" the cursor is positioned at the first reading which starts with "a16". The direction buttons have no effect on the operation of "search by reading name". +\par +\pard\plain \s9\fi-560\li1120\sb180\sa60\sl280\tx1700 \b\f20 2.6.11.3\tab Search by tag type\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This positions the cursor at the start of the next tag which has the the same type as specified by the type value menu. To change the type, select from the menu that pops up when the mouse is clicked on the button labeled "Type\:". Th +e search can be performed either forwards or backwards from the current cursor position. To find all tags, use "search by annotation", with a null text value string.\par +\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.4\tab Search by annotation\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This positions the cursor at the start of the next tag which has a comment containing the string specified in the value text window. The search performed is a regular expression search, and certain characters have special meanings. Be careful when your val +ue string contains ".", "*", "[", "^" or "$". The search can be performed either forwards or backwards from the current cursor position.\par +\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.5\tab Search by sequence\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This positions the cursor at the start of the next piece of sequence that matches the value specified in the text value window. The search is for an exact match, which means that the case of the value string is important. The search is performed on the gel + readings themselves, rather than the consensus sequence. The search can be performed either forwards or backwards from the current cursor position.\par +\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.6\tab Search by problem\par +\pard\plain \s4\qj\sa120\sl280 \f20 This positions the cursor at the next place in the consensus sequence which is not "A", "C", "G" or "T". The search can be performed either forwards or backwards from the current cursor position.\par +\pard \s4\qj\sa120\sl280 \par +\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.7\tab Search by quality\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This positions the cursor at the next place in the consensus sequence where the consensus for each strand is not "A", "C", "G" or "T" or where the two strands disagree. The search can be performed either forwards or backwards from the current cursor posit +ion.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 \par +2.7\tab Joining contigs interactively using XBAP\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The operation of the join editor in XBAP is very similar to the one for single contigs described above. It allows the user to align the ends of the two contigs by editing each contig separately. First specify which two contigs are to be joined. The program + checks that the two contig numbers are different (it will not allow circles to be formed!) The Join Editor consists of two Contig Editors in between which is sandwiched a disagreement box. This disagreement box + uses exclamation marks to denote mismatches between the two consensuses. A typical example is shown in figure 4.4. Here we see in the top window the right end of one contig and in the bottom window the left end of another. The left end of the overlap is c +orrectly aligned, as indicated by an absense of exclamation marks, but the top contig has an extra character at position 558 which is spoiling the alignment over the next segment. Notice that the "lock" button is highlighted denoting that the user has aske +d for the two contigs to scroll together.\par +\pard \s4\qj\sa120\sl280 The best strategy for joining is to align the leftmost character of the right contig with its counterpart in the left contig. Then press the \'d2Lock\'d3 + button before editing the contigs to make them align for the whole overlap. The overlap must be of at least +one character. Use the scroll bar and the scroll buttons ("<<", "<", ">", and ">>") for positioning the relative positions of the two contigs. The join position can be fixed by pressing the "lock" button at the top + of the Join Editor. Locking allows the two contigs to be scrolled as one when using the scroll bar and buttons, the left ends always in the same position relative to each other. Once locked, it is best to proceed to the right along the contigs, inserting +padding characters ("*") into the consensuses to minimise the disagreements. It is important that the user aligns the two contigs throughout the whole region of overlap before completing the join because it is only at this stage that the two contigs can be + edited independently. If a join is completed leaving a region of mismatch the consensus will consist of dashes and the assembly function will fail to find overlaps in the bad section. Misaligned sections can be corrected using the "super edit" mode of the + editor. The join can be completed by pressing the "Leave Editor" button. The percentage mismatch is displayed, and users are required to confirm that they want to perform the join.\par +\pard\plain \li100\ri80\sb100\sl220\keepn\box\brsp20\brdrth \f4\fs16 {{\pict\macpict\picw441\pich144 +4685ffffffff008f01b81101a0008201000affffffff008f01b80900000000000000003100000000008e01b798007c00000000014003db00000000014003db00000000008e01b7000102850002850026e600001ff9ff0087f8ff01f87ff5ff01fe1fefff0087fcff01fe1ffcff01f87ff2ff00e0f40026e600001ff9ff0084 +f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f4003701003cfa000203fc03fa0008630c18000181 +80001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f4005b010066fa0002030003fa000ac30c38000380c0001f807ffbff05841f8000003cfd0002087f87fbff07e01fe7fffffe107efc00030f000078fd00133c0f0000841860000600021f9ffffffe7ff84180fb00021fe018fc000020 +f400610100c3fe0008c01800000300030603fd000a01830c7800078060001ff3faff058418c000000cfd0002087f33fbff07e7ffe7fffffe1063fc0003030000ccfd001366198000841860000600021f9ffffffe7ff84180fb0002180018fc000020f400670100c0fe0008c01800000300030603fd000a01830cd8000d8060 +001ff3fcff07f9ff84186000000cfd0002087e79fbff08e7ffe7cfe7fe106180fd001b030001860006000066198000841860000600021f9ffffffe7ff84180fb00041800183018fe000020f400670100c0fe0000c0fe00040300030003fd000a0301989800098030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08 +e7ffe7cfe7fe106180fd001b030001800006000060180000841860000600021f9ffffffe7ff84180fb00041800183018fe000020f400681c00c00f0dc3f0781f4003003b1e0fc0f0de000301981800018030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08e7ffe7ffe7fe106180fd001b03000180000600006018 +0000841860000600021f9ffffffe7ff84180fb00041800180018fe000020f400726e00c0198e60c01831c003f0670603019873000301981800018030001ff3e47c0f8790e07f841861e1b80c0fc1f078087f3f9e647e1e43ffe7fe270f81fe1061878618783f03000180619f81e060180fc0841866e0761e021f9ff87e0e73 +f841801e0fc61878001801d8f07e0786f020f400726e00c030cc30c01831800300c30603030c60000300f01800018030001ff3e339e733c679ff8418c331cc0c186318cc087f879e633ccf19ffe07cc7cfe7fe10630cc618cc6183000180618603306018186084186730ce33021f9ff33ce667f8418033186618cc001f8338 +30180cc39820f400726e00c030cc30c01831800300c30603030c60000300f01800018030001ff3e799fe79cff9ff841f8619860c00660186087ff39e6799e73fffe7f9e7cfe7fe107e18633186018300018061860619f87e1800841866198661821f9fe799fe4ff84180618063318600180618301818630020f400726e00c0 +30cc30c01831800300c30603030c60000180f01800018060001ff3e79c0e01cff9ff841987f9860c0fe601fe087ff99e6798073fffe7f9e7cfe7fe10661fe331fe3f830001806186061860180fc0841866198661821f9fe799fe1ff841807f8fe331fe00180618301818630020f400726e00c330cc30c0181f000300c30603 +030c60000180601800018060001ff3e79fe67fcff9ff8418c601860c18660180087ff99e6799ff3fffe7f9e7cfe7fe10631801e18061830001806186061860180060841866198661821f9fe799fe0ff84180601861e18000180618301818630020f400726e0066198c30cc18300003006706033198600000c06018070180c0 +001ff3e79fe67fcff9ff8418c601860c18660180087e799e6799ff3fffe7f9e7cfe7fe10631801e18061830001866186061860180060841866198661821f9fe799fe47f84180601861e18000180618301818630020f400726e003c0f0c3078ff1f8003fc3b3fc1e0f06000006060ff070ff180001ff3e799e739cff99f8418 +6319cc0c186318c6087f33cc633ce73fffe7fcc7cfe67e10618c60c0c661830000cc3386633060181860840cc618ce33021f9ff33ce663f84180319860c0c60018033830198cc30020f4005efa000130c0ef00531f80679c0f83cffc3f841861f1b87f8fa1f07c087f87e2647e0f3fffe01e2601f0fe106187c0c07c3e9fe0 +00781d83c1e060180fc084078618761e021f80787e0e71f841fe1f0fa0c07c001fe1d9fe0f07830020f40032fa000130c0ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef000084fc0001021ffcff01f840f2000020f40032fa000130c0ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef00 +0084fc0001021ffcff01f840f2000020f40032fa00011f80ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef000084fc0001021ffcff01f840f2000020f4002de600001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001 +087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0087f8ff01f87ff5ff01fe1fefff00 +87fcff01fe1ffcff01f87ff2ff00e0f40002850002850002850002850002850002850002850002850007001f88ff01fe00180010fc000006fe00010180fe000060fc00000c9d00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000c +c9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de0001020024151000004010000600004001800200006000100400000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de0001020024151000018060 +0006000180018001800060000c0300000cc9000001fa550040de00010200241510000300c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de00010200241510000c03000006000c000180003000600001806000 +0cc9000002faaa00a0de000102002415100018060000060018000180001800600000c030000cc9000001fa550040de00010200241510000c03000006000c0001800030006000018060000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de000102002415 +10000300c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000180600006000180018001800060000c0300000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de00010200241510000040100006000040018002000060 +00100400000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00 +010180fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200180010fc000006fe00010180fe000060fc00000c9d0001020007001f88ff01fe0007001f88 +ff01fe000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200420010ed00000cfb000307f8780cf800037f9fe0c0f9000307f8780cf800037f8780c0f9000301e0300cf800031e0300c0f900 +0301e0780cf800071e0780c000000200420010ed00000cfb00030600cc1ef80003600061e0f900030600cc1ef80003600cc1e0f900030330781ef80003330701e0f900030330cc1ef80007330cc1e000000200420010ed00000cfb000306018633f8000360006330f9000306018633f8000360186330f900030618cc33f800 +03618f0330f9000306198633f800076198633000000200420010ed00000cfb000306018033f800036000c330f9000306018633f8000360186330f900030600cc33f80003601b0330f9000306018633f800076018633000000200460010ed00000cfb00040601806180f900036000c618f900040601866180f9000360186618 +f900040601866180f9000360130618f900040600066180f900076000661800000200460010ed00000cfb000406e1b86180f900036e018618f9000406e0cc6180f900036e186618f9000406e1866180f900036e030618f9000406e0066180f900076e00c61800000200460010ed00000cfb00040731cc6180f9000373018618 +f900040730786180f90003730ce618f900040731866180f9000373030618f9000407300c6180f900077303861800000200440010ed00000cfa000319866180f9000301830618f8000318cc6180f9000301876618f900040619866180f9000361830618f900040618386180f900076180c61800000200440010ed00000cfa00 +0319866180f9000301830618f8000319866180f9000301806618f900040619866180f9000361830618f900040618606180f900076180661800000200400010ed00000cfa0002198633f8000301860330f80002198633f8000301806330f900030618cc33f8000361830330f900030618c033f8000761986330000002004200 +10ed00000cfb000306198633f8000361860330f9000306198633f8000361986330f900030618cc33f8000361830330f9000306198033f800076198633000000200420010ed00000cfb00030330cc1ef80003330c01e0f900030330cc1ef80003330cc1e0f900030330781ef80003330301e0f900030331801ef80007330cc1 +e000000200420010ed00000cfb000301e0780cf800031e0c00c0f9000301e0780cf800031e0780c0f9000301e0300cf800031e1fe0c0f9000301e1fe0cf800071e0780c0000002000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c +9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102004b0010fe000dc0781e000001fe1e0001e0000003fe002f0c1fe0c0781e000001fe7f9fe7f8780c0787f9fe7f8301e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8780cc000102004b1110000001c0cc330000018033000330000007fe00 +050c0301e0cc33fe0026300c0300c0cc1e0cc0c0300c078330781e0303307833078330300c0cc1e0300c0301e0780c0cc0cc000102004b1110000003c18661800001806180061800000ffe002f0c0303318661800000300c0300c186331860c0300c0cc618cc33030618cc618cc618300c186330300c030330cc0c1860cc00 +0102004a1110000006c18660000001806000061800001bfe00050c0303318060fe0025300c0300c180331800c0300c0cc600cc33030600cc600cc600300c180330300c030330cc0c18cb000102004a1110000004c186600000018060000618000013fe00050c0306198060fe0025300c0300c180619800c0300c1866018661 +8306018660186600300c180618300c030619860c18cb000102004a0010fe000dc0cc6e0003f1b86e0fc618003f03fe00050c0306198060fe0025300c0300c180619800c0300c18660186618306018660186600300c180618300c030619860c18cb000102004b0010fe000dc07873000619cc73186338006183fe002f0c0306 +199e679fe7f8300c0300c1806199e0c0300c18667986618306018667986678300c180618300c030619860c19e0cc000102004b0010fe000dc0cc6180001806618061d8006003fe002f0c0307f98661800000300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f8300c0307f9fe0c1860cc000102004b +0010fe000dc186618003f806618fe018003f03fe002f0c0306198661800000300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860cc000102004b0010fe000dc186618006180661986018000183fe002f0c0306198661800000300c0300c180619860c0300c1866198661830601 +8661986618300c180618300c030619860c1860cc000102004b0010fe000dc186618006198661986618000183fe002f0c0306198661800000300c0300c186619860c0300c18661986618306198661986618300c186618300c030619860c1860cc000102004b0010fe000dc0cc33000618cc33186330386183fe002f0c030618 +ce33800000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc618300c030619860c0ce0cc000102004b4410000007f8781e0003e8781e0fa1e0383f1fe000000c0306187a1e800000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c078618300c030619860c07a0cc000102000b00 +10ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200630010fe000dc0041e000001fe0c03c7f8000003fe00470c1fe0c0780c030001fe7f9fe7f8780c0787f9fe7f83 +01e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e03e400010200640d10000001c00c33000001801c0666fe000007fe00480c0301e0cc1e078000300c0300c0cc1e0cc0c0300c078330781e0303307833078330300c0cc1e0300c0301e0780c0cc0c0 +3033078330300c030330cc1e078330cc330780c0cc330780e500010200640d10000003c01c61800001803c0666fe00000ffe00480c03033186330cc000300c0300c186331860c0300c0cc618cc33030618cc618cc618300c186330300c030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc0e500 +010200640d10000006c03c60000001806c0606fe00001bfe00480c03033180330cc330300c0300c180331800c0300c0cc600cc33030600cc600cc600300c180330300c030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc0e500010200640d10000004c06c60000001804c0606fe000013fe0048 +0c0306198061986330300c0300c180619800c0300c18660186618306018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601860e500010200640010fe000dc0cc6e0003f1b80c0606e0003f03fe00480c03061980619861e0300c0300c180619800c0300c1866018661 +8306018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601860e500010200640010fe000dc18c73000619cc0c060730006183fe00480c0306199e619867f8300c0300c1806199e0c0300c18667986618306018667986678300c180618300c030619860c19e0c0306018 +6678300c0306019e6198660180601860c180601860e500010200640010fe000dc18c61800018060c1f8018006003fe00480c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f8300c0307f9fe0c1860c030601fe618300c030601867f9fe60180601fe0c180601fe0e500010200 +640010fe000dc1fe618003f8060c060018003f03fe00480c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601860e500010200640010fe000dc00c61800618060c060018000183fe00480c0306 +198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601860e500010200640010fe000dc00c61800619860c060618000183fe00480c0306198661986000300c0300c186619860c0300c1866198661830619 +8661986618300c186618300c030619860c1860c03061986618300c030619866198661986619860c186619860e500010200640010fe000dc00c33000618cc0c060330386183fe00480c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc618300c030619860c0ce0c0303318633830 +0c030330ce61986330cc331860c0cc331860e500010200645d10000007f80c1e0003e8787f8601e0383f1fe000000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c078618300c030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e1860e5000102000b0010 +ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102007d1110000001e03001000001fe1e0067f8000003fe00660c1fe0c0780c030001fe7f9fe7f8780c0787f9fe7f8301 +e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e0300c0780c1fe1e0307f9fe000780c1fe1e1fe1e0787f8781e0780c0787f82007d0d1000000330700300000180330066fe000007fe00660c0301e0cc1e078000300c0300c0cc1e0cc0c0300c078330 +781e0303307833078330300c0cc1e0300c0301e0780c0cc0c03033078330300c030330cc1e078330cc330780c0cc330781e0cc1e030330780c030000cc1e03033030330cc0c0cc330cc1e0cc0c02007d0d1000000618f00700000180618066fe00000ffe00660c03033186330cc000300c0300c186331860c0300c0cc618cc +33030618cc618cc618300c186330300c030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc3318633030618cc0c030001863303061830619860c18661986331860c02007d0d1000000619b00f00000180600066fe00001bfe00660c03033180330cc330300c0300c180331800c0300c0cc600cc33 +030600cc600cc600300c180330300c030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc3318033030600cc0c030001803303060030601800c18060180331800c02007d0010fe000919301b00000180600066fe000013fe00660c0306198061986330300c0300c180619800c0300c186601866183 +06018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d0010fe000d1830330003f1b86e0766e0003f03fe00660c03061980619861e0300c0300c180619800c0300c18660186618306 +018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d0010fe000d303063000619cc730ce730006183fe00660c0306199e619867f8300c0300c1806199e0c0300c1866798661830601 +8667986678300c180618300c030619860c19e0c03060186678300c0306019e6198660180601860c180601866198061830601860c0307f9806183060030601800c18060180619800c02007d0010fe000de030630000180661986018006003fe00660c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe +619fe618300c1807f8300c0307f9fe0c1860c030601fe618300c030601867f9fe60180601fe0c180601fe7f9807f830601fe0c030001807f83060030601800c180601807f9800c02007d111000000180307f8003f80661986018003f03fe00660c0306198661986330300c0300c180619860c0300c18661986618306018661 +986618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d11100000030030030006180661986018000183fe00660c0306198661986330300c0300c180619860c0300c1866198661830601866198 +6618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d11100000060030030006198661986618000183fe00660c0306198661986000300c0300c186619860c0300c186619866183061986619866 +18300c186618300c030619860c1860c03061986618300c030619866198661986619860c186619866198661830619860c030001866183061830619860c18661986619860c02007d1110000006003003000618cc330ce330386183fe00660c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338 +300c0cc618300c030619860c0ce0c03033186338300c030330ce61986330cc331860c0cc33186618cc61830331860c030000cc6183033030330cc0c0cc330cc618cc0c02007d7b10000007f9fe030003e8781e0761e0383f1fe000000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e830 +0c078618300c030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e18661878618301e1860c03000078618301e0301e0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102 +000b0010ed00000c9d000102000b0010ed00000c9d000102007d0010fe000dc0787f800001fe0c180010000003fe00660c1fe0c0780c030001fe7f9fe7f8780c0787f9fe7f8301e0300c1fe1e0301e0301e1fe7f8780c1fe001fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e0300c0780c1fe1e0 +307f9fe7f8780c1fe1e1fe1e0787f8781e0780c0787f82007d1110000001c0cc01800001801c180030000007fe00660c0301e0cc1e078000300c0300c0cc1e0cc0c0300c078330781e0303307833078330300c0cc1e030000301e0780c0cc0c03033078330300c030330cc1e078330cc330780c0cc330781e0cc1e03033078 +0c0300c0cc1e03033030330cc0c0cc330cc1e0cc0c02007d1110000003c18601800001803c18007000000ffe00660c03033186330cc000300c0300c186331860c0300c0cc618cc33030618cc618cc618300c1863303000030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc3318633030618cc0c +0300c1863303061830619860c18661986331860c02007d1110000006c18003000001806c1800f000001bfe00660c03033180330cc330300c0300c180331800c0300c0cc600cc33030600cc600cc600300c1803303000030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc3318033030600cc0c03 +00c1803303060030601800c18060180331800c02007d1110000004c18003000001804c1801b0000013fe00660c0306198061986330300c0300c180619800c0300c18660186618306018660186600300c1806183000030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300 +c1806183060030601800c18060180619800c02007d0010fe000dc1b8060003f1b80c1b8330003f03fe00660c03061980619861e0300c0300c180619800c0300c18660186618306018660186600300c1806183000030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300c1 +806183060030601800c18060180619800c02007d1110001fe0c1cc06000619cc0c1cc630006183fe00660c0306199e619867f8300c0300c1806199e0c0300c18667986618306018667986678300c180618307f830619860c19e0c03060186678300c0306019e6198660180601860c180601866198061830601860c0300c180 +6183060030601800c18060180619800c02007d0010fe000dc1860c000018060c186630006003fe00660c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f830000307f9fe0c1860c030601fe618300c030601867f9fe60180601fe0c180601fe7f9807f830601fe0c0300c1807f +83060030601800c180601807f9800c02007d0010fe000dc1860c0003f8060c1867f8003f03fe00660c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c1806183000030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c0300c1806183 +060030601800c18060180619800c02007d0010fe000dc18618000618060c186030000183fe00660c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c1806183000030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c0300c180618306 +0030601800c18060180619800c02007d0010fe000dc18618000619860c186030000183fe00660c0306198661986000300c0300c186619860c0300c18661986618306198661986618300c1866183000030619860c1860c03061986618300c030619866198661986619860c186619866198661830619860c0300c18661830618 +30619860c18661986619860c02007d0010fe000dc0cc30000618cc0c186030386183fe00660c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc6183000030619860c0ce0c03033186338300c030330ce61986330cc331860c0cc33186618cc61830331860c0300c0cc6183033030 +330cc0c0cc330cc618cc0c02007d7b10000007f878300003e8787f986030383f1fe000000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c0786183000030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e18661878618301e1860c0300c078618301e0301e +0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200790010fa007301e078618787f9861e1861e0000c1fe0c0780c030001fe7f9f +e7f8780c0787f9fe7f8301e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e0300c0780c1fe1e0307f9fe7f8780c1fe1e1fe1e0787f8781e0780c0787f8200790010fa00730330cc718cc601c633186330000c0301e0cc1e078000300c0300c0cc1e0c +c0c0300c078330781e0303307833078330300c0cc1e0300c0301e0780c0cc0c03033078330300c030330cc1e078330cc330780c0cc330781e0cc1e030330780c0300c0cc1e03033030330cc0c0cc330cc1e0cc0c0200790010fa007306198671986601c661986618000c03033186330cc000300c0300c186331860c0300c0c +c618cc33030618cc618cc618300c186330300c030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc3318633030618cc0c0300c1863303061830619860c18661986331860c0200790010fa007306018679980601e660186600000c03033180330cc330300c0300c180331800c0300c0cc600cc3303 +0600cc600cc600300c180330300c030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc3318033030600cc0c0300c1803303060030601800c18060180331800c0200790010fa007306018679980601e660186600000c0306198061986330300c0300c180619800c0300c1866018661830601866018 +6600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa00730601866d8c0601b630186300000c03061980619861e0300c0300c180619800c0300c18660186618306018660186600300c18 +0618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa00730601866d8787e1b61e1861e0000c0306199e619867f8300c0300c1806199e0c0300c18667986618306018667986678300c180618300c03 +0619860c19e0c03060186678300c0306019e6198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa00730601866780c6019e03186030000c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f8300c0307f9fe0c18 +60c030601fe618300c030601867f9fe60180601fe0c180601fe7f9807f830601fe0c0300c1807f83060030601800c180601807f9800c0200790010fa0073060186678066019e01986018000c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c0306018 +6618300c030601866198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa0073060186638066018e01986018000c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c03060186618300c03 +0601866198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa0073061986639866018e61986618000c0306198661986000300c0300c186619860c0300c18661986618306198661986618300c186618300c030619860c1860c03061986618300c030619866198 +661986619860c186619866198661830619860c0300c1866183061830619860c18661986619860c0200790010fa00730330cc618cc60186330cc330000c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc618300c030619860c0ce0c03033186338300c030330ce61986330cc3318 +60c0cc33186618cc61830331860c0300c0cc6183033030330cc0c0cc330cc618cc0c0200790010fa007301e078618787f9861e0781e0000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c078618300c030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e18 +661878618301e1860c0300c078618301e0301e0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200 +0b0010ed00000c9d0001020007001f88ff01fe0007001f88ff01fe000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c +0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe +000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0 +fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c03 +0000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0 +f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c +0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300 +c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f0000102000b0010ed00000c9d00010200590010ed +00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c000 +00030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f0000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed0000 +0c9d000102000b0010ed00000c9d000102000b0010ed00000c9d0001020007001f88ff01fe0007001f88ff01fe00180010fc000006fe00010180fe000060fc00000c9d00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc90000 +01fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de0001020024151000004010000600004001800200006000100400000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de0001020024151000018060000600 +0180018001800060000c0300000cc9000001fa550040de00010200241510000300c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de00010200241510000c03000006000c0001800030006000018060000cc900 +0002faaa00a0de000102002415100018060000060018000180001800600000c030000cc9000001fa550040de00010200241510000c03000006000c0001800030006000018060000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de000102002415100003 +00c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000180600006000180018001800060000c0300000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de00010200241510000040100006000040018002000060001004 +00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180 +fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200180010fc000006fe00010180fe000060fc00000c9d0001020007001f88ff01fe0007001f88ff01fe +000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200340010ed00000cf700020300c0f70001780cf700020780c0f70001040cf700021fe0c0f70001780cf700021fe0c0f70003780c020033 +0010ed00000cf700020701e0f70001cc1ef700020cc1e0f700010c1ef700021801e0f70001cc1ef6000161e0f70003cc1e0200360010ed00000cf700020f0330f80002018633f70002186330f700011c33f70002180330f80002018633f600016330f800040186330200360010ed00000cf700021b0330f80002018633f700 +02186330f700013c33f70002180330f80002018033f60001c330f800040186330200370010ed00000cf70002130618f70002066180f700016618f700026c6180f80002180618f8000301806180f70001c618f800040186618200370010ed00000cf70002030618f70002066180f70001c618f70002cc6180f800021b8618f8 +000301b86180f80002018618f70003cc618200390010ed00000cf70002030618f700020c6180f80002038618f80003018c6180f800021cc618f8000301cc6180f80002018618f7000378618200370010ed00000cf70002030618f70002386180f70001c618f80003018c6180f700016618f8000301866180f80002030618f7 +0003cc618200380010ed00000cf70002030618f70002606180f700016618f8000301fe6180f700016618f8000301866180f80002030618f800040186618200350010ed00000cf70002030330f70001c033f70002186330f700010c33f600016330f80002018633f70002060330f800040186330200370010ed00000cf70002 +030330f80002018033f70002186330f700010c33f70002186330f80002018633f70002060330f800040186330200350010ed00000cf700020301e0f8000201801ef700020cc1e0f700010c1ef700020cc1e0f70001cc1ef700020c01e0f70003cc1e0200350010ed00000cf700021fe0c0f8000201fe0cf700020780c0f700 +010c0cf700020780c0f70001780cf700020c00c0f70003780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200100010ed00000cad0001ffc0f300010200100010ed00000cad0001ffc0f3000102007d0010fe0002 +01fe01fe0007041e0001e0000003fe00660c1fe0c0780c0307f9fe7f9fe1e0301e1fe7f9fe0c0780c0307f8780c0780c0787f9fe1e0307f9fe7f8300c1fe1e1fe7f8780c0787f9fe7f8781e0300c0781e0780c1fe1e0780c0301e0307f8780c1fe7f9fe00078f3dfe1e1fe1e0787f8781e0780c0787f82007d0010fe000201 +8003fe00070c33000330000007fe00660c0301e0cc1e0780c0300c03033078330300c0301e0cc1e0780c0cc1e0cc1e0cc0c030330780c0300c0781e030330300c0cc1e0cc0c0300c0cc330781e0cc330cc1e030330cc1e078330780c0cc1e0300c030000cce1c3033030330cc0c0cc330cc1e0cc0c02007d0010fe00020180 +07fe00071c6180061800000ffe00660c03033186330cc0c0300c030618cc618300c03033186330cc0c18633186331860c030618cc0c0300c0cc33030618300c186331860c0300c186618cc33186619863303061986330cc618cc0c186330300c03000186ccc3061830619860c18661986331860c02007d0010fe000201800f +fe00073c6000061800001bfe00660c03033180330cc0c0300c030600cc600300c03033180330cc0c18033180331800c030600cc0c0300c0cc33030600300c180331800c0300c180600cc33180601803303060180330cc600cc0c180330300c03000180ccc3060030601800c18060180331800c02007d0010fe000201801bfe +00076c60000018000013fe00660c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c18661830600300c180619800c0300c180601866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007d0010fe000d01b8330003 +f0cc6e078030003f03fe00660c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c18661830600300c180619800c0300c180601866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007d0010fe000d01cc63000619 +8c730cc0e0006183fe00660c0306199e619860c0300c03060186678300c0306199e619860c1806199e6199e0c030601860c0300c18661830678300c1806199e0c0300c180679866198060180618306018061986601860c180618300c0307f9809e43060030601800c18060180619800c02007c0010fd000c06630000198c61 +986030006003fe00660c0307f9867f9fe0c0300c030601fe618300c0307f9867f9fe0c1807f9867f9860c030601fe0c0300c1fe7f830618300c1807f9860c0300c180619fe7f980601807f830601807f9fe601fe0c1807f8300c030001808043060030601800c180601807f9800c02007c0010fd000c067f8003f9fe619fe0 +18003f03fe00660c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c180619866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007c0010fd000c06030006180c6198061800 +0183fe00660c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c180619866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007d0010fe000d0186030006180c619806180001 +83fe00660c03061986619860c0300c03061986618300c03061986619860c18661986619860c030619860c0300c18661830618300c186619860c0300c186619866198661986618306198661986619860c186618300c030001869e43061830619860c18661986619860c02007c0010fd000ccc030006180c330c6330386183fe +00660c030618ce619860c0300c03033186338300c030618ce619860c0cc618ce618ce0c030331860c0300c18661830338300c0cc618ce0c0300c0cc33986618cc330cc61830330cc61986331860c0cc618300c030000cc9e43033030330cc0c0cc330cc618cc0c02007c0010fd007678030003e80c1e07c1e0383f1fe00000 +0c0306187a619860c0300c0301e1861e8300c0306187a619860c0786187a6187a0c0301e1860c0300c186618301e8300c0786187a0c0300c0781e986618781e078618301e078619861e1860c078618300c030000789e4301e0301e0780c0781e078618780c0200100010ed00000cad0001ffc0f300010200100010ed00000c +ad0001ffc0f300010200100010ed00000cad0001ffc0f300010200100010ed00000cad0001ffc0f300010200100010ed00000cad0001ffc0f3000102000b0010ed00000c9d000102000b0010ed00000c9d00010200790010fa007301e078618787f9861e1861e0000c1fe0c0780c0307f9fe7f9fe1e0301e1fe7f9fe0c0780 +c0307f8780c0780c0787f9fe1e0307f9fe7f8300c1fe1e1fe7f8780c0787f9fe7f8781e0300c0781e0780c1fe1e0780c0301e0307f8780c1fe7f9fe000780c1fe1e1fe1e0787f8781e0780c0787f8200790010fa00730330cc718cc601c633186330000c0301e0cc1e0780c0300c03033078330300c0301e0cc1e0780c0cc1 +e0cc1e0cc0c030330780c0300c0781e030330300c0cc1e0cc0c0300c0cc330781e0cc330cc1e030330cc1e078330780c0cc1e0300c030000cc1e03033030330cc0c0cc330cc1e0cc0c0200790010fa007306198671986601c661986618000c03033186330cc0c0300c030618cc618300c03033186330cc0c18633186331860 +c030618cc0c0300c0cc33030618300c186331860c0300c186618cc33186619863303061986330cc618cc0c186330300c030001863303061830619860c18661986331860c0200790010fa007306018679980601e660186600000c03033180330cc0c0300c030600cc600300c03033180330cc0c18033180331800c030600cc0 +c0300c0cc33030600300c180331800c0300c180600cc33180601803303060180330cc600cc0c180330300c030001803303060030601800c18060180331800c0200790010fa007306018679980601e660186600000c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c1866 +1830600300c180619800c0300c180601866198060180618306018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa00730601866d8c0601b630186300000c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c18661830600300 +c180619800c0300c180601866198060180618306018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa00730601866d8787e1b61e1861e0000c0306199e619860c0300c03060186678300c0306199e619860c1806199e6199e0c030601860c0300c18661830678300c1806199e0 +c0300c180679866198060180618306018061986601860c180618300c0307f9806183060030601800c18060180619800c0200790010fa00730601866780c6019e03186030000c0307f9867f9fe0c0300c030601fe618300c0307f9867f9fe0c1807f9867f9860c030601fe0c0300c1fe7f830618300c1807f9860c0300c1806 +19fe7f980601807f830601807f9fe601fe0c1807f8300c030001807f83060030601800c180601807f9800c0200790010fa0073060186678066019e01986018000c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c18061986619806 +0180618306018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa0073060186638066018e01986018000c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c180619866198060180618306 +018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa0073061986639866018e61986618000c03061986619860c0300c03061986618300c03061986619860c18661986619860c030619860c0300c18661830618300c186619860c0300c1866198661986619866183061986619866 +19860c186618300c030001866183061830619860c18661986619860c0200790010fa00730330cc618cc60186330cc330000c030618ce619860c0300c03033186338300c030618ce619860c0cc618ce618ce0c030331860c0300c18661830338300c0cc618ce0c0300c0cc33986618cc330cc61830330cc61986331860c0cc6 +18300c030000cc6183033030330cc0c0cc330cc618cc0c0200790010fa007301e078618787f9861e0781e0000c0306187a619860c0300c0301e1861e8300c0306187a619860c0786187a6187a0c0301e1860c0300c186618301e8300c0786187a0c0300c0781e986618781e078618301e078619861e1860c078618300c0300 +0078618301e0301e0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102 +0007001f88ff01fe00028500028500a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.4\tab A typical display from the join editor in XBAP.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.8\tab Selecting primers and templates\par +\par +\pard\plain \qj \f4\fs16 {\plain \f20 1. Select "Edit contig". The primer and template selection function is available from the popup menu of the contig editor.\par +}\pard \qj {\plain \f20 \par +}\pard \qj {\plain \f20 2. Open the oligo selection window, by selecting "Select Oligo" from the contig editor popup menu.\par +}\pard \qj {\plain \f20 \par +}\pard \qj {\plain \f20 3. Position the cursor to where you want the oligo to be chosen. While the oligo selection window is visible, you will still have complete control over positioning and editing within the contig editor.\par +}\pard \qj {\plain \f20 \par +}\pard \qj {\plain \f20 4. Indicate the strand for which you require an oligo. This is done by toggling the direction arrow ("----->" or "<------"), if necessary.\par +}\pard \qj {\plain \f20 \par +}\pard \qj {\plain \f20 +5. Press the "Find Oligos" button to find all suitable oligos (See "Oligo selection" in Note 17.) Information for the closest oligo to the cursor position is given in the output text window. In the contig editor the position of the oligo is marked by a + temporary tag on the consensus. The window is recentered if the oligo is off the screen. Selecting "Display Selection Information" will print a short report on the numbers of oligos considered and rejected during oligo selection. \par +}\pard \qj {\plain \f20 \par +}\pard \qj {\plain \f20 6. If this oligo is not suitable (it may have been previously chosen, and found to be unsuitable by experimentation, say), the next closest oligo can be viewed by pressing "Select Next". \par +}\pard \qj {\plain \f20 \par +}\pard \qj {\plain \f20 +7. Suitable templates are automatically identified for the currently displayed oligo (See "Template selection" in Note 18.) By default, the template is that closest to the oligo site. If the choice is not suitable (it may be known to be a poor quality +template, say) another can be chosen from the "Choose Template for this Oligo" menu. Templates that do not appear on the menu can be specified by selecting "other". However, the template must be on the correct strand and be upstream of the oligo. \par +}\pard \qj {\plain \f20 \par +}\pard \qj {\plain \f20 +8. A tag can be created for the current oligo by pressing the button "Create a tag for this oligo". The annotation for this tag holds the name of the template and the oligo primer sequence. There are fields to allow the user to specify their own primer + name ("serial#") and comments ("flags") for this tag. An example of oligo tag annotation\: \par +}\pard \qj {\plain \f20 \par + serial#= \par + template=a16a9.s1 \par + sequence=CGTTATGACCTATATTTTGTATG \par + flags=\par +\par +}\pard \qj {\plain \f20 9. The oligo selection window is closed when "Create a tag for this oligo" or "Quit" is selected. \par +}\pard \qj {\plain \f20 \par +}\pard\plain \s6\qj\sa60\tx560\tx860 \b\f20 \par +\pard \s6\sa60\sl280\tx560\tx860 2.9\tab Examining the "quality" of a contig\par +\pard\plain \s4\qj\sa120\sl280 \f20 This function reports on the proportion of the consensus that is "well determined" and will display a sequence of symbols that indicate the quality +of the consensus at each position or produce a graphical display. Each strand of the contig is analysed separately using the consensus algorithm, and a position is declared "well determined" if it is assigned one of the symbols a,c,g,t. The current consen +sus calculation cutoff score is used.\par +\pard \s4\qj\sa120\sl280 A summary showing the percentage of the consensus that falls into each category of quality is shown. The analysis divides the data into five categories, assigning each a code as shown in figure 4.5. Code 0 means well +determined on both strands and they agree, 1 means well determined on the plus strand only, 2 means well determined on the minus strand only, 3 means not well determined on either strand and 4 means well determined on both strands but they disagree. If +the user chooses to have the data displayed graphically the following scheme is used. A rectangular box is drawn so that the x coordinate represents the length of the contig. The box is notionally divided vertically into 5 possible levels which are given t +he y values\: + -2,-1,0,1,2. The quality codes assigned to each base position are plotted as rectangles. Each rectangle represents a region in which the quality codes are identical, so a single base having a different code from its immediate neighbours will a +ppear as a very narrow rectangle. Obviously a single line at the midheight shows a perfect sequence. In figure 4.6 we show the result for the section of contig shown in figure 4.8.\par +\pard \s4\qj\sa120\sl280 \par +\par +\par +\par +\par +\pard \s4\qj\li1580\ri1760\sb160\sl280\box\brsp100\brdrth \tqc\tx2000\tqc\tx3960\tqc\tx6360 \tab {\b Strands\tab Quality\tab Y cordinates\par +}\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx2000\tqc\tx3960\tqc\tx6200 {\b \tab OK\tab code\par +}\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx1780\tqc\tx2120\tx2380\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab +\tab -\tab and the same\tab 0\tab 0\tab to\tab 0\par +\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx1780\tqc\tx2120\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab +\tab \tab 1\tab 0\tab to\tab 1\par +\tab -\tab \tab 2\tab -1\tab to\tab 0\par +\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx2120\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab neither\tab 3\tab -1\tab to\tab 1\par +\pard \s4\qj\li1580\ri1760\sa60\sl280\keepn\box\brsp100\brdrth \tqc\tx1780\tqc\tx2120\tx2400\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab +\tab -\tab but different \tab 4\tab -2\tab to\tab 2\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.5\tab The codes and coordinates used by the "Quality plot". \par +\par +\pard\plain \li1500\ri1660\sb400\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 94.67 % OK on both strands and they agree(0)\par +\pard \li1500\ri1660\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth 0.67 % OK on plus strand only(1)\par + 2.00 % OK on minus strand only(2)\par + 2.67 % Bad on both strands(3)\par + 0.00 % OK on both strands but they disagree(4)\par +\pard \li1500\ri1660\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth {\fs22 \par +}\pard \li1500\ri1660\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth 3310 3320 3330 3340 3350\par +0000000000 0000000000 0000000000 0000000000 0000000000\par +\par + 3360 3370 3380 3390 3400\par +0020000000 0000000032 0000032000 0000000000 0300000030\par +\par + 3410 3420 3430 3440 3450\par +\pard \li1500\ri1660\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth 0000000000 0010000000 0000000000 0000000000 0000000000\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4 6\tab Listed output from "Examine Quality" showing the results for the section of contig displayed in figure 4.8.\par +\pard \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.10\tab Using graphical displays to examine contigs\par +\pard\plain \s4\qj\sa120\sl280 \f20 The programs contain three graphical displays to aid the examination of contigs. The first simply gives an overview of all the contigs in the database and provides, with the use of a +crosshair, a mechanism for the other two displays to select contigs. One of these displays produces a schematic representation of each of the readings in a contig. The lines in the display show the relative positions of each reading and also their sense. T +he plot is divided vertically into two sections by a line that is identified by an asterisk drawn at each end. All lines that lie above this line represent readings that are in their original sense, all lines below show readings that are in the complementa +ry sense. The final graphical display is of the "quality" of the data as described above.\par +\pard \s4\qj\sa120\sl280 +When these graphical displays are visible users may employ a crosshair, moved by mouse or keyboard commands, to examine the data in more detail. The crosshair is positioned and when keyboard characters S, Q, N or Z are typed the program will show the local + aligned sequences in a text window, produce the quality plot, give the names of the nearest readings or zoom into the display. \par +\pard \s4\qj\sa120\sl280 A typical display of all three plots +is shown in figure 4.7. The top rectangle shows a separate line for each of the projects contigs. The righthand one is bisected by a vertical line indicating that it has been selected by the user. The next rectangle below is divided by a horizontal line ma +rked at each end by an asterisk. Each of the other horizontal lines in the box represents one of the selected contigs gel readings. Those above the dividing line are in their original orientation, those below have been complemented. The box below is also d +ivided by a horizontal line and shows the "quality" for each base in the contig. Rectangluar areas marked above the central line show sections that only have a good consensus on the minus strand, and rectangles below show good sections from the other stran +d. Places where the vertical lines reach the top and bottom of the box show disagreements between the two strands. Places with only the midline have a good consensus on both strands.\par +\pard\plain \li80\sl220\keepn\tx720 \f4\fs16 {{\pict\macpict\picw441\pich231 +237effffffff00e601b81101a0008201000affffffff00e601b8090000000000000000310000000000e501b79800780000000001f103bb0000000001f103bb0000000000e501b70001028900028900028900028900028900090100158e550054ff000901000f8eff00f0ff000d0100089b000008f5000010ff000d0100089b +000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b00 +0008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b0000 +08f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff005d14000801ffc0003ff803ff801ffc00007fff003fff80fb00003ffdff0ef000007fffe0 +003ffc03ffff00001ffdff00c0fd00003ffdff03f000007ff8ff04f003fffffefe00003ffcff00f0fc00000ffdff00fcfd000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff +000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff00 +0d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d +0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff005813000fff007fffe00ffe00fff007ffffc001ffe000faff00e0fd000e1fffffc0003fffe007fe0001fffff0fd00007ffdff00e0fd00031fffffc0f800041ffe000003feff00e0fc00001ffcff00f8fd000007f0ff00 +f0ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010 +ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff +000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000901000f8eff00f8ff0009010008 +8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009 +0100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0018010008ba00007ffaff038000003ffaff00 +f8e9000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00 +090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001d010008c40000 +1ffbff00e0f7000007faff00fcef000007f9ff00f0ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009010008 +8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001d +010008e2000307fffffcd600000ffaff00feee000007faff01fe10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010 +ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e +000010ff00090100088e000010ff001f010008fc000001fbff00c0c000007ffaff00c0ee000007fbff02fe0010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00 +090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e0000 +10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001a02000801f9ff00f8c000faff02c00000faff00f0eb000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e0000 +10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100188e000030ff00090100188e000030ff00090100188e000030ff00090100188e00023000000b020718 +e09000030e31c0000b020718e09000030e31c0000b020789e09000030f13c0000a0100ff8f000101feff00090100188e000030ff000901003f8eff02f800000b0201db8090000303b700000b020799e09000030f33c0000b020718e09000030e31c0000b020618609000030c30c000090100188e000030ff000901003c8e00 +0078ff00090100188e000030ff00090100188e000030ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100 +088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001d01000ffaff00c0d5000007f0ff00e0f3000001f0ff00e0f6000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00 +090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e0000 +10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001c02000803fbff00f8c400007ff0ff0200000ff9ff0080f1000010ff00090100088e000010ff00090100088e000010ff00090100088e +000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff000901 +00088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001f010008f800f1ff00c0d500007ffbff00c0ed00003ffbff00c0f8000010ff00090100088e000010ff00090100088e0000 +10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009010008 +8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001f010008e900007ff3ffe2000007fbff00fcea00007ffaff00f0fc000010ff +00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e00 +0010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0019010008dc000001f2ff00feef +000003faff0080df000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009010008 +8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0019 +010008db00007ffbff00f0de000003faff00c0e8000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff000901 +00088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff +000901000f8eff00f0ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001 +f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce +000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f800 +0008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff00190100 +08f8000008ce000001f3000010fc000004e1000010ff004203000fffb0fa00034800027fefff03fc02065fe7ff03f3800001fe00133fffffc201880000105177000001042408006002fe0001425ff4ff00fcfb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe001320 +00244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410 +ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe0013200024420188 +0000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff00520300 +0873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177 +000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa00 +0348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe001320002442018800001051770000010424 +08006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240 +ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe +00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402 +065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe +000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc0002 +6c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe0003010000 +40fe0003e0000004fb0002400410ff000901000f8eff00f0ff005703000873b8fc00052800c8000240ef00030402067cfc00026c00c0ef000313800001fe00136000244201880800105177000001043408086002fe000163f0fe000301000040fe0006e0000004000010fe0002400410ff004c03000852b8fc00022800c8e9 +000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe00 +00e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc0012400024000188 +08001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc0002 +2800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe00030100 +0040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc00124000 +2400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852 +b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe +000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc +001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c +03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd00 +0163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004a03000852b8fc00022fffc8e9000020fc00026c0040ef00 +010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc00001ffcff00f0ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010 +ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1 +000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc00 +0004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f30000 +10fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff000901000f8eff00f0ff00028900028900028900028900028900028900a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.7\tab A typical graphical display from XBAP or SAP.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \par +2.11\tab Disassembling contigs\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Sometimes it is necessary to drastically alter contigs. We may need to break a contig in two, remove a single reading, remove a whole set of consecutive readings from a contig, or remove a set of readings from the database independent of which contigs they + are in. \par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.11.1\tab Removing a single reading\par +\pard\plain \s4\qj\sa120\sl280 \f20 This function is found in the "Alter relationships" menu. The user types in the number of the reading to be removed. If the reading is required to hold the contig together - i.e. is the only one cove +ring a particular region - the program will create an extra contig consisting of the data to the right of the removed reading. The original contig will be shortened accordingly.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.11.2\tab Removing a set of readings\par +\pard\plain \s4\qj\sa120\sl280 \f20 This function is called "Disassemble readings" and can remove any group of readings from a database. It works in two modes\: + 1. A set of adjacent readings in a contig can be removed by the user naming the two end ones (the left one first); 2. A set of readings from any number of contigs can be remove +d by the user giving the name of a file that contains their names. In both modes the program cleans up the database by moving data to fill up any holes made in the files.\par + + For both modes of operation the program request a file of file names. If the user creates their own file (i.e. mode 2) each reading name must be on a separate line of the file. For mode 1 the user names the leftmost then the rightmost reading for removal. + They MUST be in left to right order. They and all intervening readings will be removed. For both modes, if necessary, new contigs will be created. \par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.11.3\tab Breaking a contig\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This function is found in the "Alter relationships" menu. It can be used to break a contig at the beginning of a particular reading so that the identified reading becomes the left end of a new contig. The user types in the number of the reading that will b +ecome the left end.\par +\pard \s4\qj\sa120\sl280 \par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.12\tab Shuffling pads\par +\pard\plain \s4\qj\sa120\sl280 \f20 One weakness of the assembly routine is that padding characters introduced to line up the readings are not always aligned with the pads in other sequences\: + a single problem such as a compression can give rise to pads apparently randomly arranged in the different readings covering the region. This function attempts to shuffle the pads around so that they align with one another, h +ence simplifying editing. No information is lost in the process\: only the positions of padding characters are changed. The function is best used prior to editing.\par +\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.13\tab Displaying a contig\par +\pard\plain \s4\qj\sa120\sl280 \f20 The "Display a contig" option shows the aligned readings for any par +t of a contig. Users select "Display a contig", then select the contig. The number, name and strandedness of each reading is shown and the consensus is written below. A typical example, showing part of a contig from positions 3301 to 3450, is seen in figu +re 4.8. Overlapping this region are readings 3, 40, 8, 37, 35 and 2, with archive names L3.SEQ, A21A7.S1 and so on. Readings 3, 8, 35 and 2 are in reverse orientation as indicated by the minus signs. There are a few padding characters in the working versio +ns, but the consensus (shown below each page width) has a definite assignment for every position except 3376. \par +\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.14\tab Highlighting differences between readings and the consensus\par +\pard\plain \s4\qj\sa120\sl280 \f20 +During the latter stages of a project this option is used to highlight disagreements between individual gel readings and their consensus sequences. Typical output is seen in the figure 4.9 which shows the result for the section of contig shown in figure 4. +8. Characters that agree with the consensus are shown as + symbols for the plus +strand and - for the minus strand. Characters that disagree with the consensus are left unchanged and so stand out clearly. Note that a similar display is now more conveniently available within the contig editor.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Set the consensus cutoff score.\par +2.\tab Redirect output to disk.\par +3.\tab Display the contig.\par +4.\tab Close the redirection file.\par +5.\tab Select "Highlight disagreements".\par +6.\tab Define the name of the redirection file.\par +7.\tab Define an output file name.\par +8.\tab Select a symbol for good plus strand data.\par +9.\tab Select a symbol for good minus strand data.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \page \par +\pard\plain \li760\ri760\sl220\box\brsp100\brdrth \tqr\tx8240 \f4\fs16 10.\tab Print the file.{\plain \f20 \par +}\pard \li760\ri760\sl220\box\brsp100\brdrth \tqr\tx8240 \tab 3310 3320 3330 3340 3350\par +\pard \li760\ri760\sl220\box\brsp100\brdrth -3\tab L3.SEQ \tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par +40\tab A21A7.S1\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par +-8\tab A16A2.S1\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par +37\tab A21A2.S1\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par +\tab CONSENSUS\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par +\par +\tab 3360 3370 3380 3390 3400\par +-3\tab L3.SEQ\tab gatctgaccaagcgacag*tttaaa*gtgctgcttgccatt*ctgcgt*a\par +40\tab A21A7.S1\tab gatctgaccaagcgacag*gttaaagttgctgctt\par +-8\tab A16A2.S1\tab gatctgaccaagcgacag*tttaaa*gtgctgcttgccatt*ctgcgt*a\par +37\tab A21A2.S1\tab ga-ctgaccaagcgacag*tttaaa*gtgctgcttgccatt*ctgcgt*a\par +35\tab A16D12.S1\tab gttttaaa-gtgctgcttgccatttctgcgtaa\par +-2\tab L2.SEQ\tab t*ctgcgt*a\par +\tab CONSENSUS\tab gatctgaccaagcgacag*tttaaa-gtgctgcttgccatt*ctgcgt*a\par +\par +\tab 3410 3420 3430 3440 3450\par +-3\tab L3.SEQ\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par +-8\tab A16A2.S1\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par +37\tab A21A2.S1\tab aaacctatgggtgggaataaaccaatggacagaatcaccgattctcaact\par +35\tab A16D12.S1\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par +-2\tab L2.SEQ\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par +\pard \li760\ri760\sl220\box\brsp100\brdrth \tab CONSENSUS\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.8\tab Typical output from "Display contig".\par +\pard\plain \li840\ri940\sb320\sl220\box\brsp100\brdrth \f4\fs16 3310 3320 3330 3340 3350\par +\pard \li840\ri940\sl220\box\brsp100\brdrth -3 L3.SEQ --------------------------------------------------\par + 40 A21A7.S1 ++++++++++++++++++++++++++++++++++++++++++++++++++\par + -8 A16A2.S1 --------------------------------------------------\par + 37 A21A2.S1 ++++++++++++++++++++++++++++++++++++++++++++++++++\par + atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par +\par + 3360 3370 3380 3390 3400\par + -3 L3.SEQ -------------------------*------------------------\par + 40 A21A7.S1 +++++++++++++++++++g+++++gt++++++++\par + -8 A16A2.S1 -------------------------*------------------------\par + 37 A21A2.S1 ++-++++++++++++++++++++++*++++++++++++++++++++++++\par +-35 A16D12.S1 -t----------------------t------a-\par + -2 L2.SEQ ----------\par + gatctgaccaagcgacag*tttaaa-gtgctgcttgccatt*ctgcgt*a\par +\par + 3410 3420 3430 3440 3450\par + -3 L3.SEQ --------------------------------------------------\par + -8 A16A2.S1 --------------------------------------------------\par + 37 A21A2.S1 ++++++++++++g+++++++++++++++++++++++++++++++++++++\par +-35 A16D12.S1 --------------------------------------------------\par + -2 L2.SEQ --------------------------------------------------\par +\pard \li840\ri940\sl220\keepn\box\brsp100\brdrth aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 \par +\pard \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 Figure 4.9\tab Typical output from "Highlight disagreements", showing the results for the section of contig displayed in figure 4.8.\par +\pard \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.15\tab Screen editing contigs in SAP\par +\pard\plain \s4\qj\sa120\sl280 \f20 When using SAP the best way for users to edit a whole contig interactively is to use their prefered external editor on the standard display of a contig. When the screen edit function is selected SAP writ +es a text file containing a display of the contig and passes it to an external editor - say EDT on the VAX or emacs on a UNIX system. The user modifies the file using the editor and when the editor is exited SAP moves the changed contig back into the proje +ct database.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Screen edit".\par +2.\tab Select the contig to edit.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define a temporary file for use by the editor. After a slight pause the editor will start and the first page of the contig will appear on the screen.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Edit the contig using the editors standard commands.\par +5.\tab Exit from the editor.\par +6.\tab Accept "Put contig back into the database".\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.16\tab Automatic editing of contigs in SAP\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This function automatically changes characters in gel readings to make them agree with the consensus sequence. At first sight this may seem like an unethical procedure but as is explained in the notes it is quite legitimate and saves a great deal of time. +In figure 4.10 we show the effect on using autoedit on the section of contig displayed in figure 4.8. All changed characte +rs (for example position 3369, reading A21A7.S1) are denoted by uppercase letters. Note that apart from position 3375 which has an unresolved consensus all other changes have been made. These edits were made using a combined consensus for both strands, but + the standard version of the program treats each strand separately and will only make a change if the consensus for the two strands agree.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Redirect output to disk.\par +2.\tab Select "Display contig".\par +3.\tab Identify the contig to edit/display.\par +4.\tab Close the redirection file.\par +5.\tab Print the file containing the displayed contig.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Check the contig and the original films and annotate the printout to indicate the required edits.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Set the cutoff for the consensus calculation.\par +8.\tab Select "Auto edit".\par +9.\tab Identify the contig and the section to edit. \par +10.\tab The program will display a summary of changes made.\par +11.\tab Display the contig and compare it with the annotated printout.\par +12.\tab Use another editing method to finish the editing.\par +\pard\plain \li820\ri960\sl220\pagebb\box\brsp100\brdrth \f4\fs16 3310 3320 3330 3340 3350\par +\pard \li820\ri960\sl220\box\brsp100\brdrth -3 L3.SEQ atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par + 40 A21A7.S1 atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par + -8 A16A2.S1 atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par + 37 A21A2.S1 atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par + CONSENSUS atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par +\par + 3360 3370 3380 3390 3400\par + -3 L3.SEQ gatctgaccaagcgacagtttaaa*gtgctgcttgccattctgcgtaaaa\par + 40 A21A7.S1 gatctgaccaagcgacagTttaaagGtgctg\par + -8 A16A2.S1 gatctgaccaagcgacagtttaaa*gtgctgcttgccattctgcgtaaaa\par + 37 A21A2.S1 gaTctgaccaagcgacagtttaaa*gtgctgcttgccattctgcgtaaaa\par +-35 A16D12.S1 gtttaaa-gtgctgcttgccattctgcgtaaaa\par + -2 L2.SEQ tctgcgtaaaa\par + CONSENSUS gatctgaccaagcgacagtttaaa-gtgctgcttgccattctgcgtaaaa\par +\par + 3410 3420 3430 3440 3450\par + -3 L3.SEQ cctatgggtggaataaaccaatggacagaatcaccgattctcaacttag\par + -8 A16A2.S1 cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par + 37 A21A2.S1 cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par +-35 A16D12.S1 cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par + -2 L2.SEQ cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par +\pard \li820\ri960\sl220\keepn\box\brsp100\brdrth CONSENSUS cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc{\fs22 \par +}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.10\tab The result of applying the "Auto editor" to the section of contig displayed in figure 4.5.\par +\pard\plain \s6\sb400\sa60\sl280\tx560\tx860 \b\f20 2.17\tab Using the original editor in SAP\par +\pard\plain \s4\qj\sa120\sl280 \f20 This simple editor can insert, delete + and change gel reading sequences by performing one selected operation at a time. It is used during the interactive entry of new readings and interactive joining of contigs. The commands request the position at which the edit is required and the number of +characters to insert, delete or change.\par +\pard\plain \s5\sb400\sa160\sl320\tx560 \b\f20\fs28 3. NOTES\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab +As each reading is entered into a project database it is given a unique number. The first is numbered 1, the second 2 and so on. Their original file names (known as "archives" because they are kept outsid +e the database and never edited) are also copied into the database. During assembly contigs are constantly being changed and reordered so the program identifies them by the numbers or names of the readings they contain. Whenever the program asks users to i +dentify a contig or reading they can type its number or its archive name. If they type its archive name they must precede the name by a slash "/" symbol to denote that it is a name rather than a number. For example if the archive name is fred.gel with numb +er 99, users should type /fred.gel or 99 when asked to identify the contig. Generally, when it asks for the reading to be identified, the program will offer the user a default name, and if the user types only return, that contig will be accessed. When a da +tabase is opened the default contig will be the longest one, but if another is accessed, it will subsequently become the current default. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab An XBAP database is made from five separate files\: the "archive names" file *.ARN, the "relationships" file *.RLN, + the "sequences" file *.SQN, the "tag" file *.TGN, and the "comments" file *.CCN. If the database is called FRED then version 0 of database FRED comprises files FRED.AR0, FRED.RL0, FRED.SQ0, FRED.TG0 and FRED.CC0. The version is the last symbol in the file + names. If the "copy database" option is used it will ask the user to define a new "version". The normal strategy is to use version 0 for all work and to use other versions as backups. Program SAP uses databases formed from only the first three of these f +iles. Normally the program is used to handle DNA sequences but many of the functions also work on protein sequences. The choice of sequence type is made when the database is started.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab The vector sequence should be stored in a simple text file with up to 80 characters of data per line. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab +Almost all readings are assembled automatically in their first pass through the assembly routine. Those that are not can be dealt with in two ways. Either they can be put through assembly again as single named rea +dings (Users should type n when asked "Use file of file names"), with the parameters set to allow the reading in. Or they can be entered through the assembly routine using the "Put all readings in new contigs" mode, and then joined to the contig they overl +ap using the Contig Joining Editor. If it is found that readings are not being assembled in their first pass through the assembler, then it is likely that the contigs require some editing to improve the consensus. Also it may be that poor quality data is b +eing used, possibly by users overinterpretting films or traces. In the long term it can be more efficient to stop reading early and save time on editing. For those using fluorescent sequencing machines the unused data can be incorporated after assembly. +\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Obviously we cannot use a script to operate a program that expects to be controlled by mouse clicks! The program BAP is an xterm version of XBAP which can be used from a script.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab There is a remote possibility of a join being missed by the "Find internal + joins" routine. If a small contig is wholly contained within a larger one, such that its ends are further than ("Probe length" - "Minimum initial match length") from the ends of the larger contig, and the consensus for the small contig lies to the left of + the consensus for large contig, the overlap will not be discovered. (See the search strategy).\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab For those using fluorescent sequencing machines and XBAP the combination of the contig editor and the graphical displays of consensus "quality" will probably + be sufficient for checking and editing contigs as everything can be done at the computer screen. For those using autoradiographs the facility to produce printouts of "display" and "highlight disagreements" options for use while checking films, and the aut +oedit command are most appropriate.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab +In general the quality of a reading deteriorates along the length of the gel and so it is also possible to use a length cutoff for the quality calculation. Only the data from the first section of each reading will be included in the calculation. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab +There are some limitations on the changes that can be made to the contigs when using the SAP screen editor. Alignments must be maintained during editing. Whole lines of sequence should not be deleted or added unless the order of the gel readings in the +contig is preserved. Each line in the contig display consists of gel reading numbers, their names and 50 character sections of sequence. Insertions are limited in the following way. No line of sequence can be extended rightwa +rds more than 5 characters beyond the end of a full length line (a full length line is 50 characters long). Only one character can be added to the left end of full length lines, but sections of sequence beginning further into a line can be extended leftwar +ds up to an equivalent position. Do not delete any non-sequence lines in the file. Before returning the contig to the database the program checks that the rules have been obeyed. If an error is found the number of the erroneous line in the file is displaye +d and the contig will not be changed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab +The following is a justification for using the auto edit function. The general strategy employed when collecting shotgun sequence data is to keep sequencing until the redundancy in the contigs is fairly high, and then to get a printout of a contig, che +ck problems against the films, note corrections on the printout, and make the changes using an interactive editor. In general the consensus is correct except for places where padding characters have been used to accomm +odate a single gel with an extra character, or where the consensus is dash. The important point for the auto editor is that most edits simply make the gel readings conform to the consensus, or remove columns of pads. The auto editor does the following. 1) + calculates a consensus for the contig (or part of a contig) to be edited, and then uses this consensus to direct the editing of the contig in 3 stages 2) stage 1\: + find and correct all places where, if the order of two adjacent characters is swapped, they will both agree with the consensus (given that they did not match the consensus before). These corrections are termed "transpositions" 3) stage 2\: + find and correct all places where there is a definite consensus but the gel reading has a different character. These corrections are termed "changes". 4) stage 3\: + delete all positions in which the consensus is a padding character. These corrections are termed "deletions". All changed characters are shown in uppercase letters so it will be obvious which characters + have been assigned by the program (except for deletions). The number of each type of correction will be displayed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab +The "calculate consensus" function, the "display contig" routine, the contig editor and the "show quality" option use the rules outlined here to calculate a consensus from aligned gel readings. The consensus sequence can contain any of 6 possble symbols\: + a,c,g,t,* and -. The last symbols is assigned if none of the others makes up a sufficient proportion of the aligned characters at any pos +ition in the contig. The following calculation is used to decide which symbol to place in the consensus at each position. Each uncertainty code contributes a score to one of a,c,g,t,* and also to the total at each point. Symbols like r and y which don't co +rrespond to a single base type contribute only to the total at each point. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab Definite assignments i.e. A,C,G,T,a,c,g,t,b,d,h,v,k,l,m,n,a,c,g,t,* =1 probable assignments i.e. 1,2,3,4 = 0.75 other uncertainty codes including r,y,5,6,7,8,- = 0.1 A cutoff scor +e between 51 and 100% is set by the user. (When the program starts this is set to 75%.). At each position in the contig we calculate the total score for each of the 5 symbols a,c,g,t and * (denote these by Xi, where i=a,c,g,t or *), and also the sum of the +se totals (denote this by S). Then if 100 Xi / S > the cutoff for any i, symbol i is placed in the consensus; otherwise - is assigned. For the "examine quality" algorithm each strand is treated separately but the calculation is the same. \par +12.\tab Databases can + become corrupted if the machine crashes so the programs contain a function "Check database for logical consistency" which checks to see if all the relational data is internally consistent. Some routines automatically perform this check before they start. + Users are advised to make frequent copies of their databases using the "Copy database" option. Note that if BAP is used in "execute with dialogue" mode the "Check logical consistency" function also creates a consensus for the whole database and scans it t +o find any regions which contain 15 dashes in 20 characters. Such a finding would indicate problems with the database.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\pagebb\tx560 13.\tab +We have covered many of the most important or complicated operations peformed by SAP and XBAP, but several others have not been mentioned. These include those for creation of consensus sequence files for processing by other programs, and complementing +contigs, both of which are trivial. There is also a set of routines for fixing corrupted databases.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 14\tab The VAX version of SAP will only a +llow one person to access a sequencing database at a time - producing an "unable to open database" error message if a second person tries. On UNIX machines there is no such check in program SAP so users need to make sure that simultaneous use does not occu +r. Otherwise the data will be corrupted. Program BAP prevents more than one person from using a database at any time. It does so using the following mechanism. When a user requests to open a particular copy (say 0) of a database (say DB) the program checks + for the existence of a file named DB_BUSY0 in the current directory. In normal circumstances, if the file exists, it indicates that somebody else is currently using the database and the program displays the message "Sorry database busy" and does not open +the files. If the file does not exist the program creates it and opens the database. When a user stops using the database (usually by quitting the program) the "busy file" is deleted, hence allowing others to use the database. If the program terminates abn +ormally the busy file will not be deleted and so the database will not be useable until the busy file is explicitly deleted using the rm command. Obviously it is dangerous to delete the file before checking if another user is using the database.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 15\tab +After a run of the assembly routine, reading names can appear in the file of failed reading names for the following reasons. 1. The reading file was not found; 2. the reading file was too short (less than the minimum match length); 3. the reading appear +ed to matc +h somewhere but failed to align sufficiently well (too many padding characters or too high a percentage mismatch); 4. a reading of the same name was already present in the database; 5. the reading was entered but also appeared to match another contig and t +he join was not made. This can occur for two reasons\: a. because the overlap between the two contigs was too large, or b. because after the reading is entered into one contig a new consensus is calculated and compared to the other contig\: + it may then not match as well as it did originally, and the join will not be made.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16\tab +We have recently devised our own file format (called SCF) for storing traces, sequences and confidence values for data produced by automated sequence readers (Dear and Staden, 1992). For ABI data these typically reduce the storage required to 30% of the + original. Data from the ABI 373A and the Pharmacia A.L.F. can be converted to this form using the program makeSCF. Note that A.L.F. files must first be processed by program alfsplit which s +plits the original data into one file per reading. Sequences can be extracted from SCF files in a form suitable for assembly by use of the program trace2seq. To locate and mark regions of a sequence from an automated sequence reader that are of too low a q +uality to be used for assembly we use the script clip-seqs. This script takes as input a file of reading file names. For each reading it renames the original file "original-filename~" and writes a new file called "original-filename" in which the poor quali +ty regions are marked.\par +\pard\plain \qj \f4\fs16 {\plain \f20 \par +}\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 17\tab The oligo selection engine is the one used in the program OSP. It is described in some detail in\: + Hillier, L., and Green, P. (1991). The parameters controlling the selection of oligos can be changed in the "Oligo Selection Parameters" window. The weigh +ts controlling the scoring of selected oligos can be changed in the "Oligo Selection Weights" window. By default, the oligos are selected from a window that extends 40 bases either side of the cursor. The size and location of this +window relative to the cursor position can be changed in the "Parameters" window. In XBAP oligos are ranked according to their proximity to the cursor position, rather than by their scores. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 18\tab For simplicity, each reading is considered to represent a template. In practise, many readings can be made off the same template. Suitable templates that are identified are those that\: + 1. are in the appropriate sense, 2. have 5' ends that start upstream of the oligo, and 3. are sufficiently close to the o +ligo to be useful. This last criterion relates to the insert size for the subclones used for sequencing and the average reading length. A template is considered useful if a full reading can be made from it, taking into account both of these factors. The d +efault insert size is 1000 bases, and the default average reading length is 400 bases. These values can be changed in the "Parameters" window. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1982. Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. {\i Nucl. Acids Res}. {\b 10 }(15)\:4731-4751.\par +2.\tab Staden, R. 1990. An improved sequence handling package that runs on the Apple Macintosh. Comput. {\i Applic. Biosci}. {\b 4}, 387-393.\par +3.\tab Dear S and Staden,R. 1991. A sequence assembly and editing for efficient management of large projects. {\i Nucl. Acids Res}. {\b 19}, 3907-3911.\par +4.\tab Hillier, L., and Green, P. 1991. "OSP\: an oligonucleotide selection program," PCR Methods and Applications, {\b 1}\:124-128. \par +5.\tab Dear S and Staden, R. 1992. A standard file format for data from DNA sequencing instruments. DNA Sequence, {\b 3}, 107-110.\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 5. Analysing Sequences to Find Genes\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 2.1\tab The uneven positional base frequencies method.\par +2.2\tab The positional base preferences method\par +2.3\tab The codon usage method\par +2.4\tab Searching for open reading frames\par +2.5\tab Searching for tRNA genes\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 We outline three methods for finding protein genes and one for locating tRNA genes, plus routines for finding open reading frames and displaying the p +ositions of stop codons. All the methods are contained in the program NIP. The correct interpretation of the analyses presented requires a good understanding of the underlying ideas used by the methods. Despite this we concentrate here on the use of the te +chniques and refer the reader to earlier publications (1-5) for more background information. \par +\pard \s4\qj\sa120\sl280 The assumption made by the methods for finding protein genes is that protein coding regions, when analysed in terms of 3 letter nonoverlapping "words", will look +different to noncoding regions analysed in the same way. Suppose we analyse a sequence in one reading frame and count its codons. Then we define the "positional base composition" as the frequency at which each of the four base types occupies each of the th +ree positions in codons. In coding regions the positional base frequencies will be less random than they are in noncoding regions. This is the basis of method 1\: + the "Uneven positional base frequencies method". If this reading frame is coding for a protein + the positional base composition will tend towards a particular bias which is common to the majority of genes. This is the basis of method 2 the "Positional base preferences method". If the sequence has a very biased base composition then in protein genes +this may effect the choice of amino acids, and will effect the use of bases in the third positions of codons. This bias is also utilised by the positional base preferences method. Finally if the reading frame is coding for a protein its use of codons is al +so likely to be nonrandom and this is the basis of method 3, the "Codon usage method".\par +\pard \s4\qj\sa120\sl280 +All the methods perform their analyses over segments of the sequence of size "window", and then move the window on by three bases and repeat the calculation. The "Uneven positional base frequencies" method only produces a single value for each segment and +hence cannot distinguish between frames or strand - it only measures the probability that a region is coding and nothing more. The other two methods produce different va +lues for each of the three potential reading frames and hence can help to decide which is coding. Their results are plotted in three separate boxes arranged one above the other. For these we also indicate which of the three reading frames is the highest sc +oring at each position along the sequence. This is done by plotting a single dot at the mid-height of the box that contains the highest score, so that if one frame is the highest scoring for many consecutive positions, the dots will produce a solid line at + the mid-height of its box. We also mark the positions of stop codons. These are represented by short vertical lines and are positioned so that they bisect the mid-height of each box. Start codons are marked at the base of the box for each reading frame. +\par +\pard \s4\qj\sa120\sl280 The search for tRNA genes involves looking for segments that could fold into the cloverleaf structure and which have the expected conserved bases in the appropriate positions.\par +\pard \s4\qj\sa120\sl280 Notice that we have not mentioned searches for relevent "signals" like promoters +or splice junctions which are also useful for finding genes. These searches are described in the chapter on searching for motifs. In the current chapter the only "signal" we include is the stop codon. However as all results are presented graphically it is +easy for users to overlay the displays of signal searches with those presented here and so effectively combine them.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab The uneven positional base frequencies method.\par +\pard\plain \s4\qj\sa120\sl280 \f20 This method produces a single value for each segment of the sequence, and wou +ld give the same result if applied to each reading frame or to the complementary strand. The results are plotted in a box that is cut by a horizontal line. This line is labelled 76% and we expect 76% of noncoding sequences to score below this line and 76% +of coding sequence to score above it. Of the methods described this one makes the fewest assumptions and so is a good unbiased indicator of the probability that a sequence is coding.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Uneven positional base frequencies".\par +2.\tab Define "Odd window length". \par +3.\tab Define "Plot interval".\par +\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 5.1. In the example shown the 5' end of the sequence codes for several proteins and the 3' end codes for ribosomal RNAs.\par +\pard\plain \li100\sb300\sl160\keepn \f4\fs16 {\plain \fs16 {\pict\macpict\picw436\pich41 +1103ffffffff002801b31101a00082a0008c01000affffffff002801b3070000000022000100010000a000a0a100a400020de801000a00000000000000000700010001220027000100da23000021000101b22300002300262300002100270001230000a000a301000affffffff002801b32300da21000101b2230026210027 +0001a000a12000170001001701b2220025000100df2300032300062301002300fb2300fd2300022300fe2302032300ff2300002300fe2300fd2301002300032300002300fd2302022300042300002300052300002301fd2300002300032300002300012302fd2300fd2300002300fd2300ff2301fe23000023000023000223 +00062302012300fc2300032300012300002301052300062300fa2300f82302ff2300fb2300002300002300002301002300002300002300002300002302002300002300032300052300012301092300ff2300042300022300002302042300fa2300fc2300fe2300022301002300fd2300002300002302032300002300fe2300 +ff2300012300002300022301012300fc2300062300012300032302ff2300002300032300fe2300022301082300f92300fd2300032302022300032300fb2300fa2300002302ff2300fe2300002300fc2300fe2301f92300fb230000230000230000230200230000230000230000230000230100230000230000230000230003 +2302fd2300002300002300002301002300002300002300002300002302002300002300002300002300002301002300002300002300032300032302082300fa23000723000223000423010b2300f82300fd2300fa2300fc2302fe2300fd2300fc2300fe23010023000023000023000023000023020023000023000023000023 +00002301002300002300002300002300002300022300fe2302002300002300002300002301002300002300002300002300002302022300002300fe2300002300052301002300fb2300002300062300fc2302032300012300002300fc2300032301002300012300ff2300012300032302022300fb2300fd2300ff2302fe2300 +032300ff2300042300ff2301fb2300002300002300002300002302002300002300022300032300fe2301002300052300012300032300ff2302042300052300042300002300082301fb2300fc2300fb2300fa2302ff2300fd2300002300fa2300012301ff2300042300ff2300032300012302fc2300012300032300fd230006 +2301032300032300032300022300062300fd2302fb2300032300f92300002300002301fb2300f72300022300002300002302002300fe2300ff2300fe2300002301002300052300fe2300ff2300002302fe2300002300002300022300072301fa2300032300ff2300042302ff2300fa23000023000323000323010023000623 +00fd2300032300fb2302032300ff2300012300fd2300052302042300fa2300fd2300ff2300002301f82300002300032300fd2300002302002300002300002300002301032300092300062300022300032302fd2300fa2300fe2300fd2300ff2301fd2300fb2300062300022300fe2302fc2300012300062300fc2300032301 +032300fe2300002300022300032302002300032300002300032300fe2300022301fe2300052300032300fe2300022302fa2300032300fa2300fe2300002301062300032300ff2300fe2302fd2300ff2300f22300022300fb2301022300fe230000230000230000230200230000230000230000230000230100230000230000 +2300002300002302002300002300002300002300022301062300012300032300002302fd2300022300062300042300fd2301002300fd2300032300fc2300fd2302fb2300002300fa2300022300002302fe2300002300002300002300002301002300002300002300002300052302042300022300002300fe23000323010323 +00002300002300022302fd2300002300fd2300fe2300ff2301032300062300012300032300f72300002300fa2302062300032300fd2300fd2301ff2300fe2300002300fc2300fe2302002300002300002300002300002301002300022300032300012300002302ff2300042300002300082300042301032300052300fa2300 +012300ff2302fe2300fc2300042300032301002300fd2300fa2300062300002302fd2300ff2300032300fa2300fe2301092300fc2300fe2300032300002302052300fa2300fe2300fc2300032301fd2300002300002300fa2300fe2302032300fd2300052300032301032300f82300032300fc2300072302062300fa230005 +2300002300032302fd2300fe2300f92300fe2300ff2301012300ff2300fb2300032300002300052302012300ff2300fd2300002300032301002300012300fa2300022300fb23020023000023000023000023000023010023000023000023000023000023020023000023000023000023000023010023000023000023000323 +02032300022300012300062300fd2301032300f92300012300ff2300032302fb2300ff2300012300fd2300032301fd2300022300002300032300042302fd2300062300032300fd2300002301ff2300042300032300002302ff2300f823000b2300fb2300f92301002300002300fe2300fd2300fd2302ff2300fe2300002300 +002300002301002300002300022300002300fe2302002300002300002300062300022302012300ff2300fd2300012300022300002301fd2300012300022300fe2300ff2302fd2300fe230000230000230000230100230000230000230000230000230200230000230000230000230100230000230000230000230000230200 +2300002300002300002300002301002300002300002300002300002302002300002300002300002300002301002300002300002300002300002302002300002300002300022301fe2300002300002300002300002302002300002300002300002300002301002300022300032300062300fd23020423000323000323000023 +00fc2301fe2300022300fb2300fc2300fe2302fd2300002300002300002301002300002300002300002300002302002300002300002300002300002300002300002301002300002300002300002300002302002300002300002300002302002300002300002300002300002301002300002300002300002300002302002300 +002300002300002300002301002300002300002300032300032302fa2300032300ff2300fe2300022301092300012300062300ff2302002300012300062300022300fb2301002300002300fd2300032300fd2302fd2300032300052300062300fd2301fe23000a2300fb2300032300fd230203230003230000230000230000 +2301fc2300fb2300fd2300f82302002300022300002300012300032301022300032300012300022300002302002300f82300082300fa2300012301002300052300002300fe2300002300042302fe2300fe2300ff2300032300032301002300fb2300072300002300fd23020423000023000023000023000023010023000023 +00002300002300002302002300002300002300fd2300ff2302042300002300002300fd2301022300fd2300fe2300fa2300032302002300002300002300fe2300022301002300032300032300f823000a2302012300fd2300002300032300002301ff2300fe2300002300002300032302002300002300fd2300032301002300 +fa2300032300002300ff2302032300012300f82300002300002301022300032300032300002300002302002300002300fa2300fe2300052301fd2300fd2300032300002300022300002302002300012300032300fd2300022301012300fd2300ff2300032300fb2302fe2300022300002300052300f8230101230000230002 +2300002300022302002300fb2300fd2300012301ff2300012300032300042300fb2302032300fb2300032300042300fc2302fa2300032300002300002300072301f92300022300012300fa23000a2302042300002300ff2300fd2300002301f52300012300fd23000223020023000023000123000223000323010523000023 +00032300012300002302002300002300002300f52300002301ff2300032300052300fe2300fe2302ff2300fd2300002300062300062301002300ff2300002300fd2302032300f92300ff2300032300fd2300002300fd2301042300ff2300fb2300002300022302002300fe2300032300ff2300012301022300fd2300032300 +032302fe2300072300fd2300012300002301002300fd2300062300002300fd2302fb2300002300ff2300fd2300002301042300fd2300ff2300012300022302002300fb2300fd2300052300042301fc2300012300ff2300012302fd2300032300fa2300062300022302002300fe2300022300fb2300022301fe230003230005 +2300fd2300032302022300fb2300fd2300002300042301fc2300012300082300032300002302002300002300ff2300002301012300002300002300fd2300fd2302002300022300fe2300052300f62301ff2300fe2300ff2300062300032300fe2300022302fb2300fd2300062300042301fe23000323000323000023000023 +02ff2300fb2300002300032300022301f92300fa2300fc2300fb2300002302ff2300072300002300ff2300002301fe2300062300022300092300fd2302002300022300fe2300032301022300002300012300002300fc2302fe2300062300fc2300042300002301002300002300002300002300ff2302fe2300032300fc2300 +012300032301002300002300ff2300002300002302fe2300002300ff2300042302fc2300042300002300ff2300fd2301012300ff2300042300002300002302002300002300002300002300002301002300002300002300fd2300002300022302012300fd2300022300fb2300fb2301032300fc23000b2300fd230000230204 +2300ff2300fe2300022300fd2301fe2300002300fd2300002300fd2302fe2300032300fc2300032300fe2301ff2300042300fc2300012302002300002300022300062300fd2301032300052300fd2300012300ff2302012300fd2300fe2300022300062301fc2300032300fb2300022300fc2302042300002300fb2300fa23 +00002301fb23000623000a2300012302022300fe2300032300fa2300022301fe2300fe2300ff2300fb23000a2302002300012300022300012300002302002300fd2300fd2300032300fa2301012300ff2300012300fd2300052302002300fb2300052300fb2300ff2300fe2301022300032300032300fe2300022302002300 +002300002300032300002301fd2300fe2300f72300f923000023020123000623000b2300002301002300fe2300002300fd2300022302fe23000b2300fa2300022300032301fd2300f92300072300042300002302002300ff2300012300fd2300fa2301002300012300042300032300fb2302fe2300fd2300032300022301fd +2300052300042300ff2300012302002300002300002300fc2300042301fd2300022300fd2300fe2300fe2302fd2300032300002300fa2300052301032300002300002300fa2300012302032300052300022300002301fd2300fe2300fe2300002300022302fa2300fe2300fa2300032300022300fe2300032302fa23000623 +00ff2300032301fa2300032300032300012300ff2302fe2300052300052300f92300022301fb2300062300ff2300012300002302052300fb2300fc2300002300fb2301032300062300072300012300fa2302fd2300082300f82300fd2300002301012300072300fc2300042302fe2300fa2300f82300022300042301022300 +fe2300022300002300fe2302002300fd2300002300f92300fd2301092300002300092300012300022302002300032300002300032300002301002300f52300fa2300032302fd2300002300fc23000923000b2301f92300002300fd2300fa2300fd2302062300052300012300ff2300032300fb2301072300042300002300ff +2300fe2302022300012300002300002300002302002300002300fc2300fea0008da00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.1.\tab Example output from the uneven positional base frequencies method. The 5' end codes for proteins and the 3' end contains ribosomal RNA genes.\par +\pard\plain \s6\sb360\sa60\sl280\tx560\tx860 \b\f20 2.2\tab The positional base preferences method\par +\pard\plain \s4\qj\sa120\sl260 \f20 As a result of the genetic code and the relative frequencies with which amino acids are used in proteins, DNA sequences codi +ng for proteins have a particular bias in their positional base frequencies. This method scans DNA sequences and measures the closeness of each reading frame to this bias in their positional base frequencies. The closeness to the expected bias is expressed + as a \: +"score". By default the program will use a "global" set of expected values for the positional base frequencies which are derived from average amino acid compositions in known proteins. Alternatively users may create their own set of expected values +by analysing known genes from the same genome. In addition users can combine the "global" values for the first two positions in codons with third position values derived from other genes of the same genome.\par +\pard \s4\qj\sa80\sl260 +In order to use a nonglobal standard, a codon table in the format described in the chapter on statistical analysis of nucleic acid sequences, can be created using the method "Creating a codon usage file". Alternatively a section of the sequence being analy +sed can be scanned to produce an internal standard. The method is particularly useful for selecting which reading frame is coding.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.2.1\tab Using the global standard\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Positional base preferences method".\par +2.\tab Select "Standard source" as "Global".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Window length". The default length of 67 should be used for most cases. Shorter windows give noisier plots and the longer the window the more chance there is of missing a short exon.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Plot interval".\par +\pard\plain \s4\qj\sa120\sl260 \f20 The plot will appear as in figure 5.2. This shows a 10,000 base section of sequence tha +t codes for several proteins in each of the three reading frames. See the introduction for an explanation of the plotting scheme used.\par +\pard\plain \s8\qj\fi-1140\li1140\sb300\sa120\sl240\keepn\tx1140 \f21\fs20 {{\pict\macpict\picw447\pich225 +0d7effffffff00e001be1101a0008201000affffffff00e001be090000000000000000310000000000df01bd98002400000000008d012000000000008d011f0000000000df01bd000102dd0006007fdfff00fc140040ed000e01f000e1ffffebffff87ff83d40004140040ed000e0110009200002a00008800425c00041400 +40ed000e2908009200002c00007800442a0004140040ed000e5a08008c0000140000400044220004170040f3000008fc00068608010c000010fd000324220004170078f3000008fc0002860501fe000010fd000328020004130040f3000008fc0002800701f9000328020004130040f3000014fc0002800101f90003300100 +04150040f30008140000100000800087f9000310010004150040f300081400001800010000a4f9000310010004130040f300081400002400010000e4f700018004130040f30008240000240001000018f700018004130040f30008220000240001000018f700018004130040f30008220000220002000018f7000180041300 +40f30008220000420002000008f700018004140040fa000002fb0005210000420002f400018004140040fa000002fb000541000042001cf400014004160040fa000003fd0007440041040042001cf400014004170040fb00011003fd0007cc00410c00410024f4000140041d1476befc5eafdbeff59adfb1e0d6ddbbc5ad0f +e1bd24f600031000f7bc1d1476befc5eafdbeff59edfb1e0d6fdbbc5ff0fe1bd20f600031000f7bc1d1476befc5eafdbeff59affb1e0d6ffbbc5ff0fe1bd40f600031000f7bc1b1476befc5eafdbeff59fffb1e0d7ffbbffffdfffbdbff4ff01f7bc1a014008fd000e2288080a0000010380800299008180f4000120041901 +400efd000d2288100a00000102810000690081f30001200419016016fd000d5588100200000102810000650081f30001101419016012fd000d5588100100000100410000050081f30001101419016022fd000d9508100100000100410000050081f3000110141a026021b0fe000d8d08100100000600410000030081f30001 +102c1a136041c80000030808200100000600410000020041f300010c2c1a1350410e2800030806200100000a00210000020041f300010c6c1a135081015900020006200100000800210000020049f300010aec190e5081008700040005a0014200080022fd000036f300010304180e4880000700040005a000c200080022fd +000012f2000004140e48800004c0240001a000c600080022ed000004140045fe000aa03400004000a600080012ed000004140045fe000aa048000040002602c8001aed000004140045fe000a1048000040002a03480014ed00000413007dfe00011080fd00041a04480014ed000004120043fe000011fc00041904280010ed +000004100042fe000015fc0002016428eb000004100042fe00000dfc0002016430eb0000040f0042fe00000afc00010198ea0000042523400a00000a44013c4001109a0034842208e0400200808100020806088001c094080800042501400afe001e44013c4001109a0034842208e0400200808100020806088001c0940808 +000406007fdfff00fc0a0040fb00000ce60000040a0040fb00000ae60000040a0040fb000012e60000040a0040fb000011e60000040b0040fc00010191e60000040b0078fc000101a1e60000040b0040fc00010941e6000004100040fc0002094080fc000010ed000004100040fc00020a0080fc000010ed000004100040fc +00020e0080fc000018ed000004100040fc00020e00c0fc00001ced000004100040fc00020e0040fc000024ed000004130040fc00020a0040fc000324000002f0000004130040fc0002080040fc000322000006f0000004130040fc0002100020fc000322000006f0000004130040fc0002100020fc000322000005f0000004 +130040fc0002100020fc000322000009f0000004140040fd000318100020fc000322000009f0000004170040fd000318100020fc0006220000090000c0f300000425235dea924fb4a5900076f67fdddb6f23effd311f5fe9f8769dc2bbc579fa7e5fd7e7f7fd7c25235dea924fb4a6600076f67fdddb6f63effd311f5fedf8 +769dc2bbc579fa7e5fd7e7f7fd7c25235debd24fb4a6600076f67fdddb6f63effd209f5fedf8769dc2bbc579fa7e5fd7e7f7fd7c25045debfa4fb4feff1bf6f67fdddb6f7feffd3f9f5feff8769dc2bbc579fa7e5fd7e7f7ff7c1b1440010800004020001000003000004100002080021af4000102041e1740020400004000 +001000002a00004100002080021a000004f7000102041e1760020700004000000800002e000040800020800419400004f7000165042017600205000080000008200022002040c0004080840140000af90003100055a4201760040100008000000a680041805280c0024040c400a4000bf9000310004d6420045004008001fe +000f0eac0041825280c00340413800a6000bf900032800806c20045008008001fe000f079200418355802803804100002a0011f900032800803c21045008008001fe001002120081834d803402004100001a0030c0fa00032400801421044808008002fe001002118180458d003404004200001a0040c0fa00034400801421 +044808008002fc000e41005c8000030400420000110080a0fc000540004400800421044808004002fc000e6100548000029400240000010080a0fc0005a0004400800421044410004004fc0002220064fe000894002c000001010020fc0005a0004201000423044410004004fc0002220020fe001358001000000101002400 +0010000120008201000420044210002004fc000012fc000068fd000e0101002c00001000011000820100041f044210003014fc000012fc000060fc000d82003c00001800021000820200041f047a20000818fc00001efc000060fc000d82000200081c02021000820200041f044220000838fc000010fc000020fc000d8200 +0200142c020410110204000417044240000828f0000d44000200146205040a1101080004170442400008e8f0000d4400020022a315040a1101880004160342400005ef000d48000100e2a335040d2a01700004250642d00307440c06fe001910a040025000c00000040800340401018100c8a4456a21741304250643d00304 +440c06fe001910a040025000c00000040800340401fe010088dc45ac2074130406007fdfff00fc0a0043fe000008e30000040a0043fe000008e30000040a0043fe000008e30000040e044280000414fc000010e90000040e044280000494fc000018e900000410047a80000776fc0002188020eb00000410044280000756fc +0002148030eb00000410044480000402fc0002278030eb00000410044440000802fc0002278048eb00000410044440000801fc0002264048eb00000410044440000801fc0002224048eb00000410044440000801fc0002224048eb00000410044440000801fc0002204048eb00000410044820001001fc0002205848eb0000 +0410044820002001fc0002405948eb0000041105482000400080fd0002402588eb0000041105482000400080fd0002c02588eb0000041105481000800080fd0002800588eb0000041205501000800040fe000301000684eb0000042523701fefe001cb3d2bffeb00020629f73b0ef1c60fef7ddff6f7dfe5f75e54fbacfd37 +34fc2523701fefe001cb3d2bffeb00020629f73b0ef1c61fef7ddff6f7dfe5f75e54fbacfd3734fc2523701fefe001cb3d2bffeb00020229f73b4ef1c62fef7ddff6f7dfe5f75e54fbacfd3734fc25237fffefffffcb3d2bffebfffffe29f73b4ef1c67fef7ddff6f7dfe5f75e54fbacfd373dfc1a05501001000040fe0003 +01000002fe000360000044f3000109b41a05600801000020fe000301000002fe000360000042f300010ab41a05600805000020fe000302000001fe000360000042f300010e141a05600806000020fe000302000001fe000390000042f3000116041a0560080800002cfe000302000001fe000398000082f3000110041a0540 +0908000014fe000012fe000681000088400082f3000110041a05400d08000014fe00001afe000682c00084c00081f3000110041a05400308000002fe00001afe000682400104c00081f3000110041a094002f00000020000082afe000682420103200101f3000120041a0940009000000300003426fe00069c250100200101 +f3000120041b09400080000001000054a4fe000664248100200201f400022020041f0040fc0003c0004364fe000c60288200204201000004000002fa00023040041e0040fc0002a00043fd000c4018820020a40100000c000102fa00023040041e0040fc0002a00040fd000c4010620020a40100000a000102fa0002314004 +1c0040fc0002200080fb000a6200132801000012000142fa00023140041f0078fc0002100080fb000a24001318010000121002cdfd00050800004a80041f0040fc0002100080fb000a14001b10010000111802adfd00050800004a80041f0040fc0002080080fb000a14001c0000880021e802b5fd0005140000ca80041f00 +40fc00010801fa00141400040000740020280c35803400001400014400041f0040fc00010c01fa00141800040000440040240c04803600001400010400042523400000aa0a020ec280020c801021001c809050009204c405501c846aee0573625284900c2523400000aa0a020ffe80020c801021001c8090500091ffc407f0 +1cffebfffff3fffe84900c06007fdfff00fc02dd00a00083ff}}\par +\pard \s8\qj\fi-1140\li1140\sa120\sl240\tx1140 Figure 5.2\tab Example output from the positional base preferences method. Most of the sequence is coding for proteins.\par +\pard\plain \s9\fi-560\li860\sb400\sa60\sl280\tx1140 \b\f20 2.2.2\tab Using a nonglobal standard\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Make an appropriate codon usage file as described in the chapter on statistical analysis of nucleotide sequences.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Positional base preferences method".\par +3.\tab Select "Standard source" as "Codon usage table".\par +4.\tab Define "File name of standard". The file will be read and displayed on the screen.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab +Select "Normalisation" as "Combine with global standard". This alternative means we will use the values for the first two positions of codons combined with the third position values from our codon table. Otherwise ("Use observed frequencies") will use a +ll three positions from our codon table. The positional base frequencies to be used will be displayed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Use 1.0 for positional weights". The alternative allows users to +give greater or lesser emphasis to any of the three positions by defining weights for each. The program displays the "Expected scores per codon in each frame".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Window length". Windows shorter than the default of 67 may be useful if the bias is sufficiently strong. Look at the "Expected scores in each frame" to help decide.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Plot interval".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Accept "Plot relative scores". This means that for each frame we plot its score divided by the sum of the scores for all three frames. It produces + smoother plots than the alternative "Plot absolute scores" which simply plots the scores for each frame. The minimum and maximum expected scores for the given standard and window length are displayed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Accept "Leave scaling values unchanged". The expected scores just displayed will be used to scale the plots. If required the user can change the scaling values at this point.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 The plot will now appear as in figure 5.2. Typical dialogue is shown in figure 5.3.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab The codon usage method\par +\pard\plain \s4\qj\sa120\sl280 \f20 The codon usage meth +od scans along a sequence and measures the closeness of each reading frames codon composition to an expected set of codons. Of the methods described it is the most sensitive, but consequently has to make the strongest assumption, namely that we know the ap +proximate codon usage for the genes being searched for. The codon usage will depend on the codon preferences and the amino acid composition of the protein product. For this reason the program contains three methods of "normalisation". The table of codon us +age may be used as read "Observed frequencies"; the table may be transformed to reflect an average amino acid composition "Normalise to average amino acid composition"; the table may be transformed to have no amino acid bias "Normalise to no amino acid bia +s". The table can be read from a file produced by "Creating a codon usage file" as described in the chapter on statistical analysis of nucleic acid sequences, or an "internal standard" can be used by the user defining a region of the current sequence. In t +he latter case the program will calculate the codon usage for the defined region.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Codon usage method".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Reject "Define internal standard". If an internal standard is used the program will ask for the end points of the segments over which to calculate the codon usage.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "File name of standard". The file will be read and displayed on the screen.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Normalisation" as "Average amino acid composition". The program will display the expected values for each reading frame for the window lengths 21, 31 and 41 codons. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Window length".\par +6.\tab Select "Plot interval".\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The plot will appear as in figure 5.4. This shows a 10,000 base section of sequence that codes for several proteins in each of the three reading frames. See the introduction for an explanation of the plotting scheme used.\par +\pard\plain \li1840\ri1980\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Positional base preferences method to find protein genes\par +\pard \li1840\ri1980\sl220\box\brsp100\brdrth Select standard source\par +X 1 Use global standard\par + 2 Use internal standard\par + 3 Use codon usage table\par +? Selection (1-3) (1) =3\par +? File name of standard=atpase.cods\par + ===========================================\par + F TTT 21. S TCT 33. Y TAT 15. C TGT 5.\par + F TTC 55. S TCC 40. Y TAC 40. C TGC 4.\par + L TTA 8. S TCA 7. * TAA 8. * TGA 0.\par + L TTG 19. S TCG 12. * TAG 1. W TGG 17.\par + ===========================================\par + L CTT 22. P CCT 17. H CAT 6. R CGT 73.\par + L CTC 21. P CCC 4. H CAC 30. R CGC 23.\par + L CTA 1. P CCA 10. Q CAA 19. R CGA 5.\par + L CTG 168. P CCG 48. Q CAG 80. R CGG 3.\par + ===========================================\par + I ATT 47. T ACT 14. N AAT 17. S AGT 8.\par + I ATC 98. T ACC 54. N AAC 52. S AGC 26.\par + I ATA 6. T ACA 7. K AAA 85. R AGA 0.\par + M ATG 75. T ACG 13. K AAG 28. R AGG 0.\par + ===========================================\par + V GTT 67. A GCT 56. D GAT 41. G GGT 90.\par + V GTC 29. A GCC 53. D GAC 66. G GGC 66.\par + V GTA 49. A GCA 59. E GAA 101. G GGA 5.\par + V GTG 57. A GCG 64. E GAG 41. G GGG 8.\par + ===========================================\par +Select normalisation\par +X 1 Use observed frequencies\par + 2 Combine with global standard\par +? Selection (1-2) (1) =2\par + T C A G Range\par + 1 0.177 0.211 0.277 0.336 0.159\par + 2 0.271 0.238 0.310 0.182 0.128\par + 3 0.242 0.301 0.168 0.289 0.132\par +? Use 1.0 for positional weights (y/n) (y) =\par + Expected scores per codon in each frame\par + 0.785 0.736 0.736\par +? odd span length (31-101) (67) =\par +? plot interval (1-11) (5) =\par +? Plot relative scores (y/n) (y) =\par +\par + Minimum maximum range\par + 0.3219 0.3519 0.0214\par +\pard \li1840\ri1980\sl220\keepn\box\brsp100\brdrth ? Leave scaling values unchanged (y/n) (y) =\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.3\tab +Typical dialogue from the "Positional base preferences method" using a nonglobal standard in the form of a codon table to specify the values for the third positions in codons.\par +\pard\plain \s6\sb400\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Searching for open reading frames\par +\pard\plain \s4\qj\sa120\sl280 \f20 This routine finds all open reading frames of some minimum length and writes its results in the form of an EMBL feature table. \par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find open reading frames".\par +\pard\plain \sl220\keepn \f4\fs16 {{\pict\macpict\picw442\pich218 +0f42ffffffff00d901b91101a0008201000affffffff00d901b9090000000000000000310000000000d801b898002400000000008d012000000000008d011f0000000000d801b8000102dd0006007fdfff00fc1e0040fb000ef0fe00f26100dc0e004000180ffa40fe00020ffdc0fa0000041f0070fc000f01110101159180 +a412004000280906c0fe0002100240fa0000041f0040fc000f011101010d92808232004008280802a0fe0002100240fa0000041f0040fc000f02110101080a81027200c008288801a0fe0002100040fa0000041f0070fc000f02110081080a89037108d49425900120fe0002200040fa0000041f0040fc000f02090082000a +8900a118d59445700120fe0002200040fa0000041f0070fc0015040a00c200048900a114dd9446700120000003c00020fa0000041f0040fc0015040a002200048a002124db5446100020000002000020fa0000041f0040fc0015040a002e00044a000123526282100020000002000020fa0000041f0070fc0015040a001000 +004a000121226280000020000002000020fa000004220040fc000b040a001000005a0001212223fe000920000002000020000008fd000004210040fc00010404fd00055c0000a02003fe000920000002000020000018fd000004210070fc00010404fd0005540000a02003fe000910000002000020000018fd000004200040 +fc000004fc0005740000a02001fe000910000002000020000018fd000004210070fc000008fc0005500000c02001fe000e100000040000200000180020000004210040fc000008fc0005100000c00001fe000e1000000400001000001400500000041e0070fc000008f90002c00001fe000e10000004000010000014005000 +00041c0040fc000018f90000c0fc000e1000001400001000001400480000041c0040fc000018f9000040fc000e1000001c00001000002400480000041e067c66de6dd21858f9000040fc000e1e7ff6fc00003dbebfe797cf9ddefc1e066c66de6dd21850f9000040fc000e1e7ff6e400003dbebfe797cf9ddefc1a067c66de +6dd2185ff3ff02fe7ff6feff08fdbebfff97ff9ddefc1a066c66de6dd21850f3000e1e7ff6e400003dbebfe797cf9ddefc180040fc000010f3000e100000200000094002a40584001004180070fc000010f3000e080000400000094002a40584003004180040fc000010f3000e080000400000094007424804002804180040 +fc000010f3000e0800004000000a400543c804002804180070fc000020f3000e0800004000000a4004033804002804180060fc000020f3000e0800004000000a3004023804002804180070fc000020f3000e04000040000006300c023004002804180060fc000020f3000e040000400000063008003002002804190060fd00 +010220f3000e040000400000063008001002004804190078fd00010220f3000e040000400000060808001002004804190058fd00010220f3000e040000400000060808000002004804190078fd00010220f3000e040000800000020808000002004804190048fd00010320f3000302000080fe000708080000020048041900 +44fd00010520f3000302000080fe00070810000002004804190074fd00010520f3000302000080fe000708100000020044041a0644040000014520f3000302000080fe000704100000020084041a067406008001c520f3000302000080fe000704100000020084041a06440a0080012540f3000302000180fe000704100000 +010084041a06440a0080022540f3000302000180fe000704100000010084041a06720901400224c0f3000301800280fe00070220000001008704252362796940a3daec02e005042000000800400000a70019e403041201200220210005b90484252373f9df7fffdaec02e005042000000800400000a70019ffff0412012003 +e0210005ff04fc06007fdfff00fc180643803f0e1e00c0f2000171eefd000101e0fe0002ff00041906728041111200c0f20002891280fe00010120fe0002810004190642804090a200c0f200028a1280fe00010120fe00028100041906424040a0a10140f20002860180fe00010120fe00028100041a06724080e0618140f3 +000301060180fe00010220fe00028080041a0642408000018120f3000301040180fe00010210fe00028080041a0672298000004920f3000301000080fe000702100000010080041a014419fe00014920f3000301000080fe000702100000010080041a014416fe00012920f3000301000080fe000702100000010080041a01 +7406fe00013620f3000e0100004000000202100000010080041a014406fe00011620f3000e020000400000020210000001004004190074fd00011620f3000e020000400000060210000001004004190044fd00010620f3000e020000400000060210000001004004180044fc000020f3000e02000040000006021000000100 +4004180074fc000020f3000e020000400000060208020002002004180044fc000020f3000e0200004000000a0408020002002004180074fc000020f3000e0200004000000a0408030002002004180048fc000010f3000e0400004000000a0408030002002004180048fc000010f3000e040000400000090408030002002004 +230078fc001d3c7a36ac17fffffdf7dddefebfffb1fc0000768bba9b5c0e85c31a003cfc230068fc001d3c7a36ac17fffffdf7dddefebfffb1fc0000768bba9b5c0e85c39c003cfc230068fc001d3c7a36ac17fffffdf7dddefebfffb1f40000768bba9b5c0ec5c39c003cfc23007ffcff0efc7a36ac17fffffdf7dddefebf +ffb1feff0bf68bba9f5ffec7c39ffffcfc180048fc000010f3000e100000200000111404c482880010041c0078fc000010f9000040fc000e100000200000111404c484880010041c0068fc000010f9000040fc000e100000200000111404c484880010041c0070fc000010f90000c0fc000e10000020000011340524848800 +10041e0070fc000008f90002c00001fe000e10000020000010b4052454480008041e0070fc000008f90002c00001fe000e10000020000020f405285850000804210070fc000008fc0005400000c00001fe000e10000020000020e805285850000804210050fc000008fc0005640000a00003fe000e10000020000020880628 +7850000804220040fc00010804fd0005640000a00003fe000e100000200000208806287850000804220070fc00010404fd0005640001200003fe000e200000200000208806283050000804230040fc000b040a001000005a0001210203fe000e2000003c0000200800283050000804230070fc000b040a003000005a000123 +0203fe000e200000240000200800280050000804230040fc001d040a002800005a0001230203001000200000240000200800180060000804230040fc001d040a004c00009a0001250302861000200000020000200000180020000404230070fc001d040a0044000099008114830286300120000003c0004000001000200004 +04230040fc001d020900820000890081148302853001200000022000400000100020000404230070fc00180211008200048900c108850445480120000002200040000010fe00010404230040fc000f02110102000a8900c208850429480120fe0005100240000010fe00010404200040fc000f01110102018a808122088484 +294802a0fe0002100240fb00010404200070fc000f011101010291808122004484288906a0fe0002100540fb000104042523400184262c0000949223065500813a00449418898ac68212084805400420800000106d6c2523700184262c0000f4ee22fe7500ff3e007c7c1887fac68212084ff8800420800000106ffc06007f +dfff00fc070040e0000104fc070070e000010704070040e000010404070040e000010404070070e000010404070040e000010404070070e000010404070040e0000104040b0040e6000008fc000108040b0070e6000008fc000108040b0040e6000008fc000108040b0070e6000008fc000108040b0040e6000008fc000108 +040b0050e6000008fc000108040c0070e600013404fd000108040d0070e6000734040028000008040d0070e6000774040038000008040d0070e6000754070048000008040d0068e60007540700480000080425107fdcef8d2bebf7efdfffc720ffcda7fdfbfeef0fff303dfbe9fc7fdff44eddf6976ef80425107fdcef8d2b +ebf7efdfffc720ffcda7fdfbfeef0fff303dfbe9fc7fdfb44efdf6976ef00425107fdcef8d2bebf7efdfffc720ffcda7fdfbfeef0fff303dfbe9fc7fdfb44cfdf6976ef00425107fdcef8d2bebf7efdfffc720ffcda7fdfbfeef0fff303dfbedfc7fdffc4ffdfe976efffc140048ed00030800002cfe000784082084840010 +04180078fc000008f300030c00003cfe00078408208486001004180048fc000008f300030c000034fe0007840821034a001004180044fc000008f3000312000034fe0007840811034a002004180074fc000008f3000e12000024000001040811024a0020041c0044fc000018fc000010f9000e12000024000001020811003a +002004200074fc000018fc000010fe000020fd000e120000240000010210110029002004200044fc000018fc000010fe000020fd000e120000220000010210110001004004210044fd00010418fc000010fe000020fd000e12000022000019021012000100400422017404fe00011424fc000010fe000020fd000e12000022 +000016021012000100400423014406fe00011424fc000018fe00012020fe000e11000042000016021012000100400423017406fe00013a24fc00002cfe0013306080000021000042000016021014000100800425014419fe00072a2400000600002cfe00135250c0000021000042000026021014000100800425014429fe00 +1e4a2400000600042c0020015a50c000002100004200002202100c000100800425237229800000492400000600042a002001de914040002100004200002202100c000100800425234240800000412400000600062a002002d695404000210401420000220210080000808004252372404060604126000006080a6a03300ad6 +8f40c000210601820000220220080000808004252342404090a081460400090c0a4a02b016c18820c001208a01820000200120080000810004251d428040912080c20c00090c1a4102b010418820a001a08a12810000200120fe0002810004251d728040911080c10a00091419812470204100212002c08912811000400120 +fe00028100042523529a212a1190d95e0dcb3aa381ddf873c10835a20ac0972e8338a04801202028048108a42523739a3f2e1e90d9ffffbbfb6381ddfff3c1081fbffec0f7ec83ffffc801e0202804ff08a406007fdfff00fc02dd00a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.4\tab Example output from the codon usage method. Most of the sequence is coding for proteins.\par +\pard\plain \s7\qj\fi-560\li560\sb400\sa120\sl280\tx560 \f20 2.\tab Define "Minimum open frame in amino acids".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select "Strands". The alternatives are\: + strand only, - strand only, or both strands. Typical output is shown in figure 5.5.\par +\pard\plain \li2120\ri2240\sb400\sl220\box\brsp100\brdrth \f4\fs16 FT CDS 525..965 \par +\pard \li2120\ri2240\sl220\box\brsp100\brdrth FT CDS 956..1789 \par +FT CDS 2128..2607 \par +FT CDS 2604..3155 \par +FT CDS 3159..4709 \par +FT CDS 4733..5623 \par +FT CDS 5539..7032 \par +FT CDS 7044..7454 \par +FT CDS 7797..8134 \par +FT CDS complement(2227..2634)\par +FT CDS complement(2250..3023)\par +FT CDS complement(3027..3899)\par +FT CDS complement(3903..4760)\par +FT CDS complement(4327..4626)\par +FT CDS complement(4646..5332)\par +FT CDS complement(5345..5647)\par +FT CDS complement(5635..6012)\par +FT CDS complement(6016..6441)\par +FT CDS complement(6445..7083)\par +FT CDS complement(7035..7445)\par +\pard \qj\li2120\ri2240\sl220\keepn\box\brsp100\brdrth FT CDS complement(7406..7777)\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.5\tab Typical output from "Find open reading frames"\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Searching for tRNA genes\par +\pard\plain \s4\qj\sa120\sl280 \f20 tRNA genes have two classes of feature that can be used to locate them in genomic sequences\: + their ability to fold into the cloverleaf secondary structure, and the presence of specific "conserved" bases at particular positions relative to this structure. The level of congruence with the canonical structure is quite variable\: + some tRNA genes contain intervening sequences and others, particular those from organelles, have few of the conserved bases. The program searches for potential cloverleaf forming str +uctures and optionally the presence of conserved bases. The user can define the range of loop sizes, the minimum numbers of potential base pairs, a range of intron sizes, and which, if any, of the conserved bases should be present. The results are presente +d either textually or graphically. \par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "tRNA search".\par +2.\tab Define "Maximum tRNA length".\par +3.\tab Define "Aminoacyl stem score". See note 8.\par +4.\tab Define "Tu stem score".\par +5.\tab Define "Anticodon stem score".\par +6.\tab Define "D stem score".\par +7.\tab Define "Minimum base pairing total".\par +8.\tab Define "Minimum intron length".\par +9.\tab Define "Maxmimum intron length".\par +10.\tab Define "Minimum length for TU loop".\par +11.\tab Define "Maximum length for TU loop".\par +12.\tab Accept "Skip search for conserved bases". See notes section.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 13.\tab Reject "Plot results". +This gives listed output in which the potential cloverleafs are displayed. The alternative plotted output simply draws a vertical line to represent the score for the potential gene, at the position it has been found. Typical dialogue and the beginning of s +ome listed output is shown in figure 5.6.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab +In general, for finding protein genes, we recommend the use of all the methods. The "Uneven positonal base frequencies" method can show which regions are likely to be coding but not which strand or fram +e. The "Positional base preferences" method can show the correct frame and also help to find which regions are coding. The "Codon usage" method has the greatest resolution, having been used successfully with windows of 11 codons, and can help find small ex +ons and to pinpoint exon/intron boundaries.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab +When the "Uneven positional base frequencies" calculation was applied to all the sequences in the 1984 version of the EMBL library 14% of noncoding segments failed to reach the value represented by the base of + the box, whereas all coding segments did. The top value of the box was not reached by any noncoding segments but was exceeded by 16% of coding sequences. 76% of noncoding segments failed to reach the line labelled 76% but 76% of coding segments fell above + it. We would not expect this result change significantly if it were to be recalculated on the current libraries.\par +3.\tab When the "Positional base preferences" method, using "global" values, was applied to all the {\i E. coli} genes in the 1984 version of the EMBL library it chose the correct reading frame for 91% of coding segments. {\i E. coli} + sequences were used for technical rather than scientific reasons and we have no reason to believe that other organisms should give significantly different results. This result used only the values for the first two positions in codons and so for genes wit +h a strongly biased base composition we would expect even better discrimination.\par +\pard\plain \li1180\ri1440\sb100\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 tRNA search\par +\pard \li1180\ri1440\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth ? Maximum trna length (70-130) (92) =\par +? Aminoacyl stem score (0-14) (11) =\par +? Tu stem score (0-10) (8) =\par +? Anticodon stem score (0-10) (8) =\par +? D stem score (0-8) (3) =\par +? Minimum base pairing total (30-44) (30) =\par +? Minimum intron length (0-30) (0) =\par +? Maximum intron length (0-30) (0) =\par +? Minimum length for TU loop (4-12) (6) =\par +? Maximum length for TU loop (6-12) (9) =\par +? Skip search for conserved bases (y/n) (y) =n\par +Give a score for each base, then a minimum total at the end\par +? Base 8, T is 100% conserved. Score (0-100) (0) =\par +? Base 10, G is 95% conserved. Score (0-100) (0) =\par +? Base 11, Y is 96% conserved. Score (0-100) (0) =\par +? Base 14, A is 100% conserved. Score (0-100) (0) =\par +? Base 15, R is 100% conserved. Score (0-100) (0) =\par +? Base 21, A is 97% conserved. Score (0-100) (0) =\par +? Base 32, Y is 100% conserved. Score (0-100) (0) =\par +? Base 33, T is 98% conserved. Score (0-100) (0) =\par +? Base 37, A is 91% conserved. Score (0-100) (0) =\par +? Base 48, Y is 100% conserved. Score (0-100) (0) =\par +? Base 53, G is 100% conserved. Score (0-100) (0) =\par +? Base 54, T is 95% conserved. Score (0-100) (0) =\par +? Base 55, T is 97% conserved. Score (0-100) (0) =\par +? Base 56, C is 100% conserved. Score (0-100) (0) =\par +? Base 57, R is 100% conserved. Score (0-100) (0) =\par +? Base 58, A is 100% conserved. Score (0-100) (0) =\par +? Base 60, Y is 92% conserved. Score (0-100) (0) =\par +? Base 61, C is 100% conserved. Score (0-100) (0) =\par +? Minimum total conserved base score (0-0) (0) =\par +? Plot results (y/n) (y) =n\par + 264\par + t\par + t-a\par + c-g\par + a-t\par + t+g\par +\pard \li1180\ri1440\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth a-t\par + a a\par + a-t gta\par + c aacgc\par + a t !!!! c\par + cgt gtgcg a\par + !!! t cga\par + a gca c\par + g t g\par + c aa t\par + a-t a\par + t-a t a\par + t-a\par + t-a\par + g t\par + c g\par +\pard \li1180\ri1440\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth caa\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.6\tab Typical dialogue and textual output from "Find tRNA genes".\par +\pard\plain \s7\qj\fi-560\li560\sa80\sl280\tx560 \f20 4.\tab If the codon table used by the "Codon usage" me +thod is normalised to have average amino acid composition it retains its codon preference bias for each amino acid type but now the amino acid composition is the average of all proteins. In general this is optimal\: + we have the expected codon preference bia +s plus an expected amino acid bias. If we normalise to no amino acid bias we are safeguarding ourselves against missing a protein of anomalous composition but at the expense of not employing all of the useful information for distinguishing coding from nonc +oding. \par +\pard \s7\qj\fi-560\li560\sa80\sl280\tx560 5.\tab +The program also contains a graphical version of Ficketts method (6), except here we use a window to analyse each segment of the sequence rather than giving a single value for each open reading frame. The tables used are those from the original publicat +ion.\par +\pard \s7\qj\fi-560\li560\sa80\sl280\tx560 6.\tab If the results from the "Find open reading frames" option are directed to disk (See the introductory chapter), the file can be used by the routines that use feature tables as input.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab The program also contains several routines for plotting the positions of stop and start codons for either strand of the sequence. One form of the output is included in figures 5.2 and 5.4.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab The tRNA gene search using a simple scoring system for base pairing\: + A-T and G-C base pairs each score 2 and G-T scores 1. The use of a "Minimum base pairing total" allows low cutoffs to be set for each individual stem, but that overall some reasonable +level of stability is possible. In this way a low score for one stem can be compensated by a high score in another.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Th +e cloverleaf is composed of four base-paired stems and four loops. Three of the stems are of fixed length but the fourth, the dhu stem which usually has four base pairs, sometimes has only three. All of the loops can vary in size. The following relationshi +ps between the stems in the cloverleaf are assumed in the program\: + (a) there are no bases between one end of the aminoacyl stem and the adjoining tuc stem; (b) there are two bases between the aminoacyl stem and the dhu stem; (c) there is one base between t +he dhu stem and the anticodon stem; (d) there are at least three bases between the anticodon stem and the tuc stem.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. and McLachlan, A.D. 1982. Codon preference and its use in identifying protein coding regions in long DNA sequences. {\i Nucl. Acids Res.} {\b 10}\:151-156.\par +2.\tab Staden, R. 1984. Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes. {\i Nucl. Acids Res}. {\b 12}\:551-567.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1985. Computer methods to locate genes and signals in nucleic acid sequences. (in) {\i Genetic Engineering, Principle and Methods}, Setlow J.K., Hollaender A., (eds.), {\b 7}\: +67-114, (Plenum Press, New York).\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Staden, R. 1990. Finding Protein Coding Regions in Genomic Sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:163-180 (Academic Press, New York).\par +5.\tab Staden, R. 1980. A computer program to search for tRNA genes. {\i Nucl. Acids Res}. {\b 8}\:817-825.\par +6.\tab Fickett, J.W. 1982. Recognition of protein coding regions in DNA sequences. {\i Nucl. Acids Res}. {\b 10}\:5303-5318.\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 6. Searching for Motifs in Nucleic Acid Sequences\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Searching for percentage matches to consensus sequences\par +2.2\tab Searching for consensus sequences using a score matrix\par +2.3\tab Using weight matrices for searching nucleotide sequences\par +2.4\tab Using "hardwired" motif searches.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +4.\tab References\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 The program NIP contains several ways of defining and searching for motifs (1-4), and also contains a number of "hardwired" motifs that are already +defined and can be selected as separate searches. We describe searches for percentage matches to consensus sequences, the use of score matrices and the creation and use of nucleotide and dinucleotide weight matrices (see note 7). In addition we give detail +s of the "hardwired" motifs available from the program. In another chapter we have covered searches for exact matches to consensus sequences by describing how to find restriction enzyme recognition sequences. When searching for exact matches, percentage ma +tches or using a score matrix the search string or consensus sequence may include IUB redundancy codes. All of the searches produce both listed and graphical output. The listed output displays the matching sequence and its position and the graphical output + draws a box to represent the length of the sequence, and plots vertical lines within the box at the positions of matches. The heights of the lines are proportional to the match score (see figure 6.1).\par +\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw447\pich44 +032fffffffff002b01be1101a0008201000affffffff002b01be0900000000000000003100000000002a01bd98002400000000001d012000000000001d011f00000000002a01bd000102dd0006007fdfff00fc060040df000004060040df000004060040df0000041002400088f7000020f1000001fd0000041002400088f7 +000020f1000001fd0000041002400088f7000020f1000001fd00000421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe0005014200 +05c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020 +012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc0003021004 +60fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482 +b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501 +420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00406007fdfff00fc02dd00a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.1\tab Typical graphical output from a motif sea +rch. It shows a rectangular box in which each match is identified by a vertical line whose height gives the match score and whose x coordinate indicates the position in the sequence.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Searching for percentage matches to consensus sequences\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find percentage matches".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par +3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par +4.\tab Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default.\par +5.\tab Accept "This sense". The alternative directs the program to search for the complement of the string.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Percent match". The search is performed, the results are presented graphically (see figure 6.1), the number of matches displayed, and the scores and positions of the top 10 matches displayed. +\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define the number of matches to "Display". For the number of mat +ches chosen the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round to step 3. See figure 6.2.\par +\pard\plain \li220\ri280\sb400\sl220\box\brsp100\brdrth \f4\fs16 Find percentage matches\par +\pard \li220\ri280\sl220\box\brsp100\brdrth ? Type in string (y/n) (y) =\par + ? Keep picture (y/n) (y) =\par + ? String=AAAATTTT\par +STRING=AAAATTTT\par +? This sense (y/n) (y) =\par + ? Percent match (1.00-100.00) (70.00) =\par +\par +Total scoring positions above 70.000 percent = 41\par +Scores 7 7 7 7 6 6 6 6 6 6\par +Positions 428 534 2994 7026 130 191 192 372 427 429\par +? Display (0-41) (0) =4\par +\par + 428\par + aaaatatt\par + ***** **\par + AAAATTTT\par + 1\par +\par + 534\par + aaagtttt\par + *** ****\par + AAAATTTT\par + 1\par + 2994\par + aaaatttc\par + *******\par + AAAATTTT\par + 1\par +\par + 7026\par + aaaacttt\par + **** ***\par + AAAATTTT\par +\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth 1\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.2\tab Worked example for the percentage match search\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.2\tab Searching for consensus sequences using a score matrix\par +\pard\plain \s4\qj\sa120\sl280 \f20 +A score matrix gives a score for the alignment of each possible pair of sequence symbols. The matrix used by this program includes all the IUB redundancy codes and gives scores that represent the level of redundancy. The matrix is shown in figure 6.3. +\par +\pard\plain \s7\qj\fi-560\li560\sb200\sa120\sl280\tx560 \f20 1.\tab Select "Find matches using a score matrix".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par +3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par +4.\tab Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default.\par +5.\tab Accept "This sense". The alternative directs the program to search for the complement of the string. The program displays the maximum possible score for the string.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Score". The search is performed, the results are presented graphically (see figure 6.1), the number of matches displayed, and the scores and positions of the top 10 matches displayed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab +Define the number of matches to "Display". For the number of matches chosen the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round + to step 3. The dialogue shown in figure 6.2 is almost exactly the same as that for "Searching for consensus sequences using a score matrix".\par +\pard\plain \li1580\ri1560\sb300\sl220\box\brsp100\brdrth \f4\fs16 T C A G - R Y W S M K H B V D N ?\par +\pard \li1580\ri1560\sl220\box\brsp100\brdrth T 36 0 0 0 9 0 18 18 0 0 18 12 12 0 12 9 0\par +C 0 36 0 0 9 0 18 0 18 18 0 12 12 12 0 9 0\par +A 0 0 36 0 9 18 0 18 0 18 0 12 0 12 12 9 0\par +G 0 0 0 36 9 18 0 0 18 0 18 0 12 12 12 9 0\par +- 9 9 9 9 36 18 18 18 18 18 18 27 27 27 27 36 0\par +R 0 0 18 18 18 36 0 9 9 9 9 6 6 12 12 18 0\par +Y 18 18 0 0 18 0 36 9 9 9 9 12 12 6 6 18 0\par +W 18 0 18 0 18 9 9 36 0 9 9 12 6 6 12 18 0\par +S 0 18 0 18 18 9 9 0 36 9 9 6 12 12 6 18 0\par +M 0 18 18 0 18 9 9 9 9 36 0 12 6 12 6 18 0\par +K 18 0 0 18 18 9 9 9 9 0 36 6 12 6 12 18 0\par +H 12 12 12 0 27 6 12 12 6 12 6 36 8 8 8 27 0\par +B 12 12 0 12 27 6 12 6 12 6 12 8 36 8 8 27 0\par +V 0 12 12 12 27 12 6 6 12 12 6 8 8 36 8 27 0\par +D 12 0 12 12 27 12 6 12 6 6 12 8 8 8 36 27 0\par +N 9 9 9 9 36 18 18 18 18 18 18 27 27 27 27 36 0\par +\pard \li1580\ri1560\sl220\keepn\box\brsp100\brdrth ? 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.3\tab The DNA score matrix using IUB symbols\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Using weight matrices for searching nucleotide sequences\par +\pard\plain \s4\qj\sa120\sl280 \f20 A we +ight matrix is the most sensitive way of defining a motif. It is a table of values that gives scores for each base type in each position along a motif. For a motif of length 8 bases the weight matrix would be a table 8 positions long and 4 deep. The simple +st way of choosing the values for the table is to take an alignment of all known examples of the motif and to count the frequency of occurrence of each base type at each position. These frequencies can be used as the table of weights. When the table is use +d to search a new sequence the program calculates a score for each position along the sequence by adding or multiplying (see note 6) the relevant values in the table. All positions that exceed some cutoff score are reported as matching the original set of +motifs.\par +\pard \s4\qj\sa120\sl280 +How can we select a suitable cutoff score? The simplest way is to apply the weight matrix to all the known occurrences of the motif - i.e. the set of sequence segments used to create the table - and to see what scores they achieve. The cutoff can b +e selected accordingly. For convenience the weight matrix is stored as a file along with its cutoff score, a title that is displayed when the file is read, and a few other values need by the program. A routine for creating weight matrix files from sets of +aligned sequences is included in the program. When a search using the weight matrix is performed the program will either list the matching sequence segments or plot their positions as for the other motif search methods.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.3.1\tab Creating a weight matrix file from a set of aligned sequences\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par +2.\tab Select "Make weight matrix".\par +\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 3.\tab +Define "Name of aligned sequences file". We assume the file of aligned sequences has already been created (See note 3). The program reads and displays the contents of the file numbering each sequence as it goes. Then it displays the length of the longes +t sequence.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Sum logs of weights". The alternative is to sum the weights when calculating scores (see note 4). \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Accept "Use all motif + positions". The alternative allows the user to define a "mask" which identifies positions within the motif that should be ignored when the matrix is created (see note 5). The program now calculates the weights and applies them in turn to each of the seque +nces in the file. The number and score for each sequence is displayed, followed by the top, bottom and mean scores and the standard deviation. In addition the mean plus and minus 3 standard deviations is displayed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Cutoff score". The default is the mean minus 3 standard deviations, but users may, for example, decide to use the lowest score obtained by the sequences in the file.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Top score for scaling plots". This parameter is used by the graphics output routine when scaling the plots. Its value will influence the height of lines plotted to represent matches.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Position to identify". When a search is performed it is not always appropriate to report the position of a match relative to the leftmost base in the motif. For example wh +en performing a splice junction search we may want to know the position of the G in the conserved GT, rather than the position of the first base in the matrix. The "Position to identify" allows the user to define which base is marked. The bases in the tabl +e are number 1,2,3 and so on.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Define a "Title". This is a title that will be displayed when the matrix file is read prior to performing a search. It is limited to 60 characters.\par +10.\tab Define "Name for new weight matrix file". Give a name for the weight matrix file. Typical dialogue is shown in figure 6.4.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 \page 2.3.2\tab Searching using a weight matrix\par +\pard\plain \s4\qj\sa120\sl280 \f20 Once a weight matrix has been stored in a file it can be used to search any sequence. Results can be displayed graphically or the matching sequence segments can be listed out with their scores.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par +2.\tab Select "Use weight matrix".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Motif weight matrix file". The name of the file containing the weight matrix. The program reads the file and displays its title.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define +"Cutoff score". The default will be the value set when the weight matrix file was created. If the score is negative the program will calculate sums of logs of frequencies, otherwise it will add frequencies.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Accept "Plot results". Alternatively they will be listed.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The results will appear as in figure 6.5\par +\pard\plain \li1440\ri1500\sb200\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Motif search using weight matrix\par +\pard \li1440\ri1500\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select operation\par +X 1 Use weight matrix\par + 2 Make weight matrix\par + 3 Rescale weight matrix\par +? Selection (1-3) (1) =2\par +? Name of aligned sequences file=heatshock.seq\par + 1 ATAAAGAATATTCTAGAA\par + 2 CTCGAGAAATTTCTCTGG 144\par + 3 TTCTCGTTGCTTCGAGAG 36\par + 4 GCCTCGAATGTTCGCGAA 15\par + 5 GACTGGAATGTTCTGACC 45 DROSOPHILA HSP68\par + 6 ATCTCGAATTTTCCCCTC 12\par + 7 ATCCAGAAGCCTCYAGAA 35 DROSOPHILA HSP83\par + 8 CTCTAGAAGTTTCTAGAG 25\par + 9 TTCTAGAGACTTCCAGTT 15\par + 10 CCCCAGAAACTTCCACGG 147 DROSOPHILA HSP22\par + 11 GCGAAGAAAATTCGAGAG 46\par + 12 TGCCGGTATTTTCTAGAT 26\par + 13 CCCGAGAAGTTTCGTGTC 97 DROSOPHILA HSP23\par + 14 TTCCGGACTCTTCTAGAA 13 DROSOPHILA HSP26\par + 15 CTCGAGAAAGCTCGCGAA 204 XENOPUS HSP70\par + 16 CTCGCGAATCTTCCGCGA 194\par + 17 CTCGCGAAAGTTCTTCGG 139\par + 18 CTCGGGAAACTTCGGGTC 72\par + 19 TGCCAGAAGTTGCTAGCA 124 XENOPUS HSP30\par + 20 CTCGGGAACGTCCCAGAA 14\par + 21 ATCCCGAAACTTCTAGTT 129 SOYBEAN HSP17\par + 22 GTCCAGAATGTTTCTGAA 98\par + 23 TTTCAGAAAATTCTAGTT 78\par + 24 CCCAAGGACTTTCTCGAA 28\par + 25 TTTTAGAATGTTCTAGAA 179 DICTYOSTELIUM DIRS-1\par + 26 TTCTAGAACATTCGAAGA 169\par +Length of motif 18\par +? Sum logs of weights (y/n) (y) =\par + ? Use all motif positions (y/n) (y) =\par + Applying matrix to input sequences\par + 1 -15.609 ATAAAGAATATTCTAGAA\par + 2 -15.965 CTCGAGAAATTTCTCTGG\par + 3 -18.186 TTCTCGTTGCTTCGAGAG\par + 4 -15.331 GCCTCGAATGTTCGCGAA\par + 5 -20.897 GACTGGAATGTTCTGACC\par + 6 -17.347 ATCTCGAATTTTCCCCTC\par + 7 -16.271 ATCCAGAAGCCTCYAGAA\par + 8 -12.227 CTCTAGAAGTTTCTAGAG\par + 9 -15.933 TTCTAGAGACTTCCAGTT\par + 10 -15.604 CCCCAGAAACTTCCACGG\par + 11 -17.866 GCGAAGAAAATTCGAGAG\par + 12 -17.159 TGCCGGTATTTTCTAGAT\par + 13 -16.399 CCCGAGAAGTTTCGTGTC\par + 14 -14.646 TTCCGGACTCTTCTAGAA\par + 15 -14.801 CTCGAGAAAGCTCGCGAA\par + 16 -16.163 CTCGCGAATCTTCCGCGA\par + 17 -16.280 CTCGCGAAAGTTCTTCGG\par + 18 -15.598 CTCGGGAAACTTCGGGTC\par + 19 -17.721 TGCCAGAAGTTGCTAGCA\par + 20 -16.257 CTCGGGAACGTCCCAGAA\par + 21 -14.243 ATCCCGAAACTTCTAGTT\par + 22 -16.456 GTCCAGAATGTTTCTGAA\par + 23 -15.453 TTTCAGAAAATTCTAGTT\par + 24 -17.443 CCCAAGGACTTTCTCGAA\par + 25 -13.335 TTTTAGAATGTTCTAGAA\par + 26 -15.914 TTCTAGAACATTCGAAGA\par +Top score -12.227 Bottom score -20.897\par +Mean -16.119 Standard deviation 1.636\par +Mean minus 3.sd -21.028 Mean plus 3.sd -11.210\par +? Cutoff score (-999.00-9999.00) (-21.03) =\par +? Top score for scaling plots (-21.03-999.00) (-11.21) =\par +? Position to identify (0-18) (1) =\par +? Title=Heatshock weights 24-10-91\par +\pard \li1440\ri1500\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth ? Name for new weight matrix file=heatshock.wts\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.4\tab An example run of creating a weight matrix\par +\pard\plain \li1400\ri1500\sb300\sl220\box\brsp100\brdrth \f4\fs16 Motif search using weight matrix\par +\pard \li1400\ri1500\sl220\box\brsp100\brdrth Select operation\par +X 1 Use weight matrix\par + 2 Make weight matrix\par + 3 Rescale weight matrix\par +? Selection (1-3) (1) =\par +? Motif weight matrix file=heatshock.wts\par + Heatshock weights 24-10-91\par +? Cutoff score (-9999.00-9999.00) (-21.03) =\par +? Plot results (y/n) (y) =\par +\par + 619 -20.84 gctcggaagcttctgctc\par + 818 -20.74 ttggcgaagctttcaaag\par + 1190 -21.02 gccaggtaagtttcagac\par + 1601 -20.91 tttgcgactgttcggtaa\par + 2387 -20.24 cgctcgcagattctggac\par + 2534 -20.87 gccgagaagatcatcgaa\par + 2890 -16.38 ctcccggatgttctggag\par + 2989 -19.54 ctcgcgaaaatttctgct\par + 3451 -20.76 atcctggaagttccggtt\par + 6020 -20.73 tctcaggaactgctggaa\par + 6335 -20.51 gctgagaaattccgtgac\par + 7107 -20.31 ctctggtctggtcgagaa\par + 7117 -19.61 gtcgagaaaatccaggta\par +\pard \li1400\ri1500\sl220\keepn\box\brsp100\brdrth 7892 -20.18 cttccgaaagtgctgcat\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.5\tab Example run of a search using a weight matrix to produce text output.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Using "hardwired" motif searches.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The program contains predefined motif definitions for the following\:\par +\pard \s4\qj\li1120\sa120\sl280 {\i E. coli} promoters\par +prokaryotic ribosome binding sites\par +mRNA splice junctions\par +eukaryotic ribosome binding sites\par +polyadenylation sites\par +\pard \s4\qj\sb240\sa120\sl280 All except the po +lyadenylation site, which is simply defined as an exact match to the string AATAAA, are represented as weight matrices. Each search is performed simply by the user selecting the appropriate option from the menu and each plots its results in its own graphic +s window. The ribosome binding site searches are reading frame specific and so they normally plot their results to fit nicely with the output from the "gene search by content" methods described in the chapter on finding genes. Likewise the splice junction +searches produce separate output for each of the three reading frames. Below, as an example of using the hardwired motifs, we show how to perform such a search.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.4.1\tab Searching for splice junctions\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Splice search using weight matrix". The program automatically reads in weight matrices that define the donor and acceptor sites and displays their titles.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Donor cutoff score". The default is stored in the file.\par +3.\tab Define "Acceptor cutoff score". The default is stored in the file.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4. \tab Accept "Plot results". The alternative lists the results giving the position, score, matching sequence and reading frame. A typical plotted result appears in figure 6.6.\par +\pard\plain \qj\ri-100\sb240\sl480\keepn \f4\fs16 {{\pict\macpict\picw454\pich123 +04be00000000007b01c6001102ff0c00fffe0000002d8f9e002d8f9e00000000004e011f000000000001000a00000000004e011f0098802400000000004e011f0000000000000000002d8f9e002d8f9e000000010001000100000000000000000000000000439867000000010000ffffffffffff0001000000000000000000 +00004e011f00000000004e011f000002dd0006007fdfff00fc060040df000004060040df0000040a0040e9000020f80000040a0040e9000020f80000040c0040e9000020fa00022000040c0040e9000020fa0002200004110040eb0005200020000080fd0002200004170040fd000001f200071000200020000090fd000220 +0004170040fd000001f200071000600020080090fd0002200004170040fd000001f200071000600020080090fd0002200004180040fe00011001f2000712006000200c0090fd000224000406007fdfff00fc060040df0000040a0040ee000008f30000040a0040ee000008f30000040a0040ee000008f30000040a0040ee00 +0008f30000040a0040ee000008f30000040a0040ee000008f300000c0a0040ee000008f300000c0a0040ee000008f300000c0e0040ee000008fe000010f700000c180040f6000001fc0002010008fe000010fc000008fd00000c2002400004fd0005400010000001fc000601000808001010fc000008fe0001800c06007fdf +ff00fc060040df000004060040df0000040a0040fc000004e50000040a0040fc000004e50000040c0040fc000004e700020104040c0041fc000004e70002010404100041fc000004fe000008eb0002010404100041fc000004fe000008eb0002010404150041fc000004fe00010814f3000010fb00020104041a014180fd00 +0004fe0005081400400040f7000010fb00020904041b02498008fe000904400200081400400040f7000050fb000209040406007fdfff00fc060040df000004060040df000004060040df000004060040df000004060040df0000040a0040fe000010e30000040e0040fe000010f4000001f10000040e0040fe000010f40000 +01f10000040e0040fe000010f4000001f10000040e0040fe000018f4000001f1000004180040fe000018f60002080001fb00040800008001fc0000041d04400000081afd000005fc000308080001fb00040800008001fc00000406007fdfff00fc060040df000004060040df0000040a0040f8000008e90000040a0040f800 +0008e90000040a0040f8000008e90000040a0040f8000008e90000040e0040f8000008ee000004fd000004140040fa0002400008f6000002fa000004fd000004180040fe000040fe0002400008f6000002fa000004fd000004190040fe000040fe0002400008f600010a02fb000004fd000004220048fe000a402000004000 +4801000001fe0006408000000a0202fc000004fd00000406007fdfff00fc060040df000004060040df000004060040df000004060040df000004090040e2000340000004090040e20003400000040c0340000002e50003400000040c0340000002e50003400000040e0340000002e70005080040020004120340000002eb00 +0001fe0005080040020004120340080002eb000001fe00050800400200041b044008020280f6000040fd000008fd000001fe000508004002000406007fdfff00fc02dd000000ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.6\tab +Typical graphical output from using the hardwired splice junction search. The results are presented in a reading frame specific way so it shows, in the bottom three boxes, results for donor sites and in the top three boxes those for acceptor sit +es. In both cases the vertical ordering of the boxes is frame 0 at the bottom, frame 1 in +the middle and frame 2 at the top. For example there is a very strong peak corresponding to an acceptor in frame 1 that can be seen just over halfway along the sequence .\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab +For this program a motif is a short segment of sequence of fixed length. More complex structures termed "patterns" which we define as sets of motifs separated by varying gaps, are covered in another chapter. The current chapter should be read before the + chapter on patterns. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab It is debateable whether the gain in sensitivity that is afforded by the use of a score matrix is of value for searching nucleotide sequences, however it is very important for protein sequences.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +The files of aligned sequences used to make weight matrices have the following format. Each sequence should be on a separate line. The sequence should start in column 2 and is terminated by a new line or a space. Anything after the space is treated as +a comment. The files can be created by previous searches or using an editor.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab The frequencies in the weight +matrix can be used in two ways to calculate scores for sequences. Some users prefer to add the frequencies to give a total score, and others to multiply them by summing their logs. If we regard the frequencies as probabilities then multiplication seems the + correct procedure. The user chooses which method will be employed when the weight matrix is created, however the choice can be overridden when the matrix is used. If multiplication is selected then all results will presented as sums of logs.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Masking th +e weight matrix is particularly useful in cases where a limited number of examples of a motif are available, or when the motif may have several components. In the first case the limited number of examples may make the matrix unrepresentative of the motif b +ecause the bases in the unconserved positions may bias the results of searches. When a large number of examples is available to create the matrix, the unconserved positions should tend towards equal base composition and hence have no influence on the overa +ll score. We stated that a motif might have several components\: for example a motif might have both structural and specificity components. We may want to separate out the two parts and masking provides such a facility.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab +The weight matrix handling routine contains a further option "Rescale weight matrix". If the user has edited a weight matrix to change the frequency values this provides a way of selecting a new cutoff score. It allows users to read in a set of aligned + sequences and a weight matrix and to apply the matrix to the set of sequences to see the range of scores achieved. A new weight matrix file containing the selected cutoff score is written to disk.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab The program also contains a set of routines identical to those used to create and search for nucleotide weight matrices, but which deal instead with dinucleotide weight matrices. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab The reader is reminded that most options in the program, if selected when in "execute without dialogue" mode, will automatically use a set of defaults and produce a +result with little or no user input. Most motif searches require far less user input than that shown above, where we have tried to show the scope of the methods.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab +Although the program contains hardwired motifs we expect most sites that use the programs to accumulate their own libraries of motifs and patterns, which users can employ by simply knowing the names of the corresponding files.\par +\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \page 4.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1984. Computer methods to locate signals in nucleic acid sequences. {\i Nucl. Acids Res}. {\b 12}\:521-538.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Staden, R. 1985. Computer methods to locate genes and signals in nucleic acid sequences. (in) {\i Genetic Engineering, Principle and Methods, }Setlow J.K., Hollaender A., (eds.), {\b 7}\: +67-114, (Plenum Press, New York).\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4 (1)}\:53-60.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 7. Using Patterns to Analyse Nucleic Acid Sequences\par +\pard\plain \s5\sb200\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Creating a pattern file containing an exact match motif and weight matrix motif.\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.2\tab Searching a sequence using a pattern file\par +2.3\tab Comparing a sequence against a library of patterns\par +2.4\tab Searching sequence libraries for patterns\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +4.\tab References\par +\pard\plain \s5\sb200\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 Here we describe one of the most powerful facilities provided by the program NIP\: the ability to define and search for complex patterns of motifs (1-3). +In another chapter we give details of seaching for individual motifs but here we show how to create patterns and libraries of patterns and to use them to search single sequences and sequence libraries. Once a pattern has been defined and stored in a file +it can used to search any sequence. In addition if users want to routinely screen sequences against libraries of patterns this can be achieved by use of files of file names. The program can produce several alternative forms of output. It will display the s +egment of sequence matching each individual motif in the pattern, display all the sequence between and including the two outermost motifs, produce a description of the match in the form of an EMBL feature table, or draw a simple graphical plot.\par +\pard \s4\qj\sa120\sl280 At the end of the chapter we describe how a related program NIPL is used to search libraries of sequences to find patterns. NIPL is capable of producing alignments of sequence families.\par +\pard \s4\qj\sa120\sl280 Patterns are defined as sets of motifs with variable spacing. Each motif in a pat +tern can be defined using any of several methods, and their positions relative to one other are defined in terms of minimum and maximum separations. In addition, by the use of logical operators, each motif can be declared to be essential (the AND operator) +, optional (the OR operator), or forbidden (the NOT operator). The following methods (termed "classes" by the program) for defining motifs are provided\: + 1) exact match to a short sequence; 2) percentage match to a short sequence; 3) match to a short sequen +ce using a score matrix and cutoff score; 4) match to a weight matrix; 5) match to the complement of a weight matrix; 6) inverted repeat or stem-loop; 7) exact match to a short sequence with a defined step; 8) direct repeat. Classes 1, 2 , 3 and 7 permit t +he use of IUB redundancy codes.\par +\pard \s4\qj\sa120\sl280 The motifs in a pattern are numbered sequentially and motif spacing is defined in the following way. When a new motif is added to a pattern the user specifies the "Reference motif" by its number and then a "Relative start po +sition". The "Relative start position" is defined by taking the first base of the "Reference motif" as position 1, the next as 2, and so on. Then the user defines the allowed variation in the spacing by specifying the "Number of extra positions". Notice th +at the position of a motif can be defined relative to any other motif, and that a negative "Relative start position" declares the motif to be to the left of its "Reference motif".\par +\pard \s4\qj\sa120\sl280 The probability of finding each individual motif in the current sequence, th +e product of the probabilities for all the motifs in a pattern "Probability of finding pattern", and the "Expected number of matches" is calculated and displayed by the program. In addition to the cutoffs used for the individual motifs, users can apply two + pattern cutoffs\: "Maximum pattern probability" and "Minimum pattern score".\par +Below we describe\: how to create a pattern; how to use a pattern file to search a sequence; how to use a "File of pattern file names" to search a sequence for a whole library of +patterns. To describe how to create a pattern file we first show all the steps to make one containing two motifs, and then, to save space, the parts specific to the individual motif types are sketched in the notes section.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2. Methods\par +\pard\plain \s6\fi-560\li560\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Creating a pattern file containing an exact match motif and weight matrix motif.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher".\par +2.\tab Select "Pattern definition mode" as "Use keyboard".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select "Results display mode" as "Motif by motif". The alternatives are listed in the introduction.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Motif definition mode" as "Exact match".\par +5.\tab Define "Motif name". Each motif can be given an 8 character name.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "String". Type in the sequence of the motif. The program will display the probability of finding the motif.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Select "Motif definition mode" as "Weight matrix".\par +8.\tab Define "Motif name".\par +9.\tab Select "Logical operator" as "AND". The alternatives are "OR" and "NOT".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Select "Number of reference motif". At this stage the only choice is 1 and this is the default.\par +11.\tab Define "Relative start position". The base position relative to the "Reference motif". See the introduction.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 12.\tab Define "Number of extra positions".\par +13.\tab Define "Weight matrix file name". Type the name of the file containing the weight matrix.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab The program now cycles round to step 7 and all subsequent passes round the loop to add further motifs to the pattern would differ only in the details for the different motif "classes".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 14.\tab Select "Pattern complete"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 15.\tab Accept "Save pattern in a file". The alternative does not save the pattern and so it can only be used once on the current sequence.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16.\tab Define "Pattern definition file". Give a name for the new file.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 17. \tab +"Define "Pattern title". All patterns can have a 60 character title that can be displayed when the pattern file is read and the sequence searched. The program will now display a detailed textual description of the pattern, the "Probability of finding +the pattern" and the "Expected number of matches".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 18.\tab Define "Maximum pattern probability". Yes maximum\: any match with a greater probability of being found will be rejected. If no value is specified the search will be quicker (see notes).\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 19.\tab +Define "Minimum pattern score". A minimum pattern score only makes sense if all the motifs in the pattern are defined with compatible scoring methods. For example percentage matches and weight matrices using sums of logs are incompatible. Searching wil +l now commence and any matches displayed using the chosen method. A worked example of creating such a pattern and performing a search is shown in figure 7.1, and the actual pattern file is shown in figure 7.2.\par +\pard\plain \li1360\ri1300\sb200\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Pattern searcher\par +\pard \li1360\ri1300\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select pattern definition mode\par +X 1 Use keyboard \par + 2 Use pattern file \par + 3 Use file of pattern file names\par +? Selection (1-3) (1) =\par +Select results display mode\par +X 1 Motif by motif \par + 2 Inclusive \par + 3 Graphical \par + 4 EMBL feature table \par +? Selection (1-4) (1) =\par +Select motif definition mode\par +X 1 Exact match \par + 2 Percentage match \par + 3 Cut-off score and score matrix \par + 4 Cut-off score and weight matrix\par + 5 Complement of weight matrix \par + 6 Inverted repeat or stem-loop \par + 7 Exact match, defined step \par + 8 Direct repeat \par + 9 Pattern complete \par +? Selection (1-9) (1) =\par +? Motif name=T run\par +? String=TTTTT\par +Probability of score 5.0000 = 0.870E-03\par +Select motif definition mode\par +X 1 Exact match \par + 2 Percentage match \par + 3 Cut-off score and score matrix \par + 4 Cut-off score and weight matrix\par + 5 Complement of weight matrix \par + 6 Inverted repeat or stem-loop \par + 7 Exact match, defined step \par + 8 Direct repeat \par + 9 Pattern complete \par +? Selection (1-9) (1) =4\par +? Motif name=heat\par +Select logical operator\par +X 1 And \par + 2 Or \par + 3 Not \par +? Selection (1-3) (1) =\par +? Number of reference motif (1-1) (1) =\par +? Relative start position (-1000-1000) (6) =10\par +? Number of extra positions (0-1000) (0) =20\par +? Weight matrix file name=heatshock.wts\par + Heatshock weights 18-12-90 \par +Probability of score -21.0280 = 0.117E-02\par +Select motif definition mode\par + 1 Exact match \par + 2 Percentage match \par + 3 Cut-off score and score matrix \par +X 4 Cut-off score and weight matrix\par + 5 Complement of weight matrix \par + 6 Inverted repeat or stem-loop \par + 7 Exact match, defined step \par + 8 Direct repeat \par + 9 Pattern complete \par +? Selection (1-9) (4) =9\par +? Save pattern in a file (y/n) (y) =\par +? Pattern definition file=_paper.pat\par +? Pattern title=demo pattern\par +Pattern description\par +\par +demo pattern \par +Motif 1 named T run is of class 1\par +Which is an exact match to the string\par +TTTTT\par +Motif 2 named heat is of class 4\par +Which is a match to a weight matrix with score -21.028\par +and the 5 prime base can take positions 10 to 30\par +relative to the 5 prime end of motif 1\par +It is anded with the previous motif.\par +Probability of finding pattern = 0.1015E-05\par +Expected number of matches = 0.1734E+00\par +? Maximum pattern probability (0.00-1.00) (1.00) =\par +? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par +Working\par +Match\par + 505 T run \par + ttttt\par + 528 heat \par + ttaaagaaagttttatac\par +Total matches found 1\par +\pard \li1360\ri1300\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Minimum and maximum observed scores -15.34 -15.34\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 7.1\tab Worked example of creating a simple pattern and performing a search.\par +\pard\plain \li2380\ri2520\sb300\sl220\box\brsp100\brdrth \f4\fs16 demo pattern \par +\pard \li2380\ri2520\sl220\box\brsp100\brdrth A1 T run Class \par + TTTTT\par + @ End of string\par + A4 heat Class \par + 1 Relative motif\par + 10 Relative start position\par + 20 Number of extra positions\par +\pard \li2380\ri2520\sl220\keepn\box\brsp100\brdrth heatshock.wts\par +\pard\plain \s8\qj\fi-1140\li1140\sb80\sa120\sl240\tx1140 \f21\fs20 Figure 7.2\tab The pattern file created by the work shown in figure 7.1.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.2\tab Searching a sequence using a pattern file\par +\pard\plain \s7\qj\fi-560\li560\sb160\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Pattern definition mode" as "Use pattern file".\par +3.\tab Select "Results display mode" as "Inclusive"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Pattern definition file". Type the name of the file containing the pattern. The pr +ogram will read the file then display its title, a detailed textual description of the pattern, the "Probability of finding the pattern", and the "Expected number of matches".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Maximum pattern probability". \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Minimum pattern score". Searching will now commence and any matches displayed using the chosen method. A worked example, using the pattern file created in figure 7.1 is shown in figure 7.3.\par +\pard\plain \li1300\ri1320\sb300\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Pattern searcher\par +\pard \li1300\ri1320\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select pattern definition mode\par +X 1 Use keyboard \par + 2 Use pattern file \par + 3 Use file of pattern file names\par +? Selection (1-3) (1) =2\par +? Pattern definition file=_paper.pat\par +Select results display mode\par +X 1 Motif by motif \par + 2 Inclusive \par + 3 Graphical \par + 4 EMBL feature table \par +? Selection (1-4) (1) =2\par +Probability of score 5.0000 = 0.870E-03\par + Heatshock weights 18-12-90 \par +Probability of score -21.0280 = 0.117E-02\par +\par +Pattern description\par +\par + demo pattern \par +Motif 1 named T run is of class 1\par +Which is an exact match to the string\par +TTTTT\par +Motif 2 named heat is of class 4\par +Which is a match to a weight matrix with score -21.028\par +and the 5 prime base can take positions 10 to 30\par +relative to the 5 prime end of motif 1\par +It is anded with the previous motif.\par +Probability of finding pattern = 0.1015E-05\par +Expected number of matches = 0.1734E+00\par +? Maximum pattern probability (0.00-1.00) (1.00) =\par +? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par +Working\par + 505 T run \par + tttttgatgcttgactctaagccttaaagaaagttttatac\par +Total matches found 1\par +\pard \li1300\ri1320\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Minimum and maximum observed scores -15.34 -15.34\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 7.3\tab Worked example of using a pattern file as input.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.3\tab Comparing a sequence against a library of patterns\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This mode of operation allows a sequence to be searched, in turn, for any number of patterns each stored in a separate pattern file. The names of the files containing the individual patterns must be stored in a simple text file. This file is called "a file + of pattern file names" and its name is the only user input required to define the search.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par +2.\tab Select "Pattern definition mode" as "Use file of pattern file names".\par +3.\tab Select "Results display mode" as "Inclusive"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab +Define "File of pattern file names". Type the name of the file containing the list of pattern file names. The program will read the file and then, in turn, all the pattern files it names. Each of these patterns will be compared against the current seque +nce but only those that give matches will produce any output. The pattern title and each match will be displayed.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Searching sequence libraries for patterns\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The program NIPL can be used to search sequence libraries for patterns. Its use is similar to the pattern search routine described above, except that it does not have the facility for creating pattern files, so they must be created beforehand using NIP. In + addition to its obvious application of finding new occurrences of patterns or checking on their frequency it is a usef +ul way of obtaining sequence alignments. It can restrict its search to a list of named entries or can search all but those on a list of entries. It can restrict its output to showing the highest scoring match in each sequence, but by default it will show a +ll matches.\par +\pard \s4\qj\sa120\sl280 +Of its modes of output, two require further description. The first "Padded sections" creates a new file for each match. The file will contain the sequence between and including the two outermost motifs in the pattern. It will be gapped to the f +urthest extent defined by the pattern, which means that if all the files were subsequently written one above the other all the motifs in the pattern would be exactly aligned, with the sections between them containing the requisite numbers of padding charac +ters. The second such mode of output is called "Complete padded sequences". Here the user must know the maximum distance between the leftmost motif and the start of all the sequences that match. A trial run in which only the positions of matches are report +ed is usually required. The user gives this maximum distance to the program. The program then writes a new file containing the full length of all matching sequences, again maximally gapped (including their left ends) so that they would all align if written + above one another. For both of these modes of output the files created are named "entryname" where "entryname" is the name given to the sequence in the sequence library. These modes are best used with the option "Report all matches" rejected, so that only + the best match for each sequence is reported. The sequences can be lined up using the sequence assembly program SAP.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select NIPL.\par +2.\tab Define "Name for results file."\par +3.\tab Select a library.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab +Select "Search whole library". The alternatives are "Search only a list of entries" and "Search all but a list of entries". The files containing the list of entries should contain one entry name per line, left justified.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Results display mode" as "Inclusive". The alternatives include "Motif by motif", "Scores only", "Complete padded sequences" and "Padded sections".\par +6.\tab Accept "Report all matches". The alternative only shows the best match for each sequence.\par +7.\tab Define "Pattern definition file". The name of the file containing the pattern created using NIP. \par +\tab The program displays a textual description of the pattern and the expected number of matches per 1000 residues assuming an average nucleic acid composition.\par +8.\tab Define "Maximum pattern probability". The program will run much more quickly if none is given.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Define "Minimum pattern score".\par +\pard\plain \s4\qj\sa120\sl280 \f20 The search will start.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab +The "exact match" motif class requires a consensus sequence. The "percentage match" motif class requires a consensus sequence and a cutoff score. The "score matrix" motif class requires a consensus sequence and a cutoff score. The "weight matrix" searc +h and the "complement of a weight matrix" only require the name of the file containing the matrix. The "inverted repeat" or "stem-loop" requires a stem length, minimum and maximum loop sizes, + and a cutoff score using scores A-T = G-C = 2, G-T = 1. Note that if the user defines an inverted repeat as a "Reference motif" the "Relative position" can be defined from either its 5' or 3' ends. The "direct repeat" motif class requires a repeat length +, the minimum and maximum gap between the two occurrences of the repeat, and a minimum score.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab The motif class "Exact match, defined step" is rarely used. A typical use might be to find a start codon followed, for some minimum distance, by no stop codons + in the same reading frame. The step would have the value 3 to keep the reading frame the same as that of the start codon, and the stop codon searches would be included using the NOT operator.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +The details of the probabilty calculations are outside the scope of this article. They are quite rapid and are essential both for assessing the statistical significance of any matches found and for allowing meaningful cutoffs to be applied to patterns. +Obviously, in general, cutoff scores are inappropriate for patterns containing a mixture of motif classes.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab +The program calculates the "Probability of finding the pattern" and the "Expected number of matches". The first figure is actually the product of the individual motif probabilities but the latter figure is more useful because it takes into account the a +llowed variation in spacing between motifs and the length of the current sequence. In both cases the composition of the current sequence is also used so that different probabilities would be calculated for other sequences.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab +The pattern definition system is very flexible. Assume that a laboratory has a large library of patterns stored in its computer. Different groups or users may want to screen their sequences against different subsets of a pattern library. Each group ther +efore uses its own "File of pattern file names" which contains only the names of the pattern files that are relevant to their sequences. Of course a pattern may contain only one motif. Hence a library of patterns can include both simple and comp +lex patterns. In the same way a laboratory may have a large library of weight matrices defining different motifs and different users may want to combine them in different ways to produce their own patterns. \par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4(1)}\:53-60.\par +2.\tab Staden, R. 1989. Methods for calculating the probabilities of finding patterns in sequences. {\i CABIOS} {\b 5(2)}\:89-96.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 8. Searching for Restriction Sites\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Search for restriction sites and list them enzyme by enzyme\par +2.2\tab Search for restriction sites and list them by position\par +2.3\tab Search for restriction sites and list their names above the sequence\par +2.4\tab Search for restriction sites and plot their positions\par +2.5\tab Find restriction enzymes that cut infrequently\par +2.6\tab Producing a back translation from a protein sequence\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The program NIP contains a routine for finding and displaying the positions of the cut sites of restriction enzyme recognition sequences. Linear or circular sequences can be searched and the results can be listed in various forms or displayed graphically. +The recognition sequences to be searched for can be typed on the keyboard or read from files. The format of these files is given in note 1. At the end of the chapter we also describe how to pro +duce back translations of protein sequences so that these routines can be used to search them for restriction sites.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Search for restriction enzyme sites and list them enzyme by enzyme\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Input source" as "All enzymes file". A number of standard files are available and users may also have their own.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Accept "Search for all names". \par +4.\tab Select "Order results enzyme by enzyme".\par +5.\tab Accept "List matches".\par +6.\tab Accept "The sequence is linear". The alternative is circular.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Accept "Search for definite matches". The alternative is to search for possible matches in a sequence containing IUB redundancy codes.\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The results will then appear in the form shown in figure 8.1 Each match is numbered and its enzyme name given, followed by the matching sequence with the cut site indicated by a ' symbol. The position of the cut site is given followed by the length of the +potential fragment ending at that site, followed by a list of fragments sizes sorted on length.\par +\pard\plain \li1160\ri1380\sl220\box\brsp100\brdrth \f4\fs16 Matches found= 3\par +\pard \li1160\ri1380\sl220\box\brsp100\brdrth Name Sequence Position Fragment length\par + 1 AccII cg'cg 313 312 51\par + 2 AccII cg'cg 364 51 188\par + 3 AccII cg'cg 552 188 312\par + 449 449\par +Matches found= 6\par + Name Sequence Position Fragment length\par + 1 AciI cc'gc 503 502 12\par + 2 AciI gc'gg 553 50 12\par + 3 AciI gc'gg 714 161 50\par + 4 AciI gc'gg 872 158 105\par + 5 AciI gc'gg 884 12 158\par + 6 AciI cc'gc 896 12 161\par + 105 502\par +Matches found= 3\par + Name Sequence Position Fragment length\par + 1 AcyI gg'cgtc 698 697 5\par + 2 AcyI gg'cgtc 765 67 67\par +\pard \li1160\ri1380\sl220\keepn\box\brsp100\brdrth 3 AcyI ga'cgcc 996 231 231\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 8.1\tab Typical output from "List enzyme by enzyme".\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Search for restriction enzyme sites and list them by position\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par +2.\tab Select "Input source" as "All enzymes file". \par +3.\tab Accept "Search for all names". \par +4.\tab Select "Order results by position".\par +5.\tab Accept "List matches". \par +6.\tab Accept "The sequence is linear".\par +7.\tab Accept "Search for definite matches". \par +\pard\plain \s4\qj\sa120\sl280 \f20 The results will then appear in the form shown in figure 8.2 Each match is numbered and its enzyme name given, followed b +y the matching sequence with the cut site indicated by a ' symbol. The position of the cut site is given followed by the length of the potential fragment ending at that site.\par +\pard\plain \s6\fi-540\li560\sb240\sa60\sl280\tx560 \b\f20 2.3\tab Search for restriction enzyme sites and list their names above the sequence\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par +2.\tab Select "Input source" as "All enzymes file". \par +3.\tab Accept "Search for all names". \par +4.\tab Select "Show names above the sequence".\par +5.\tab Reject "Hide translation".\par +6.\tab Accept "Use 1 letter codes".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Line length". This is the number of bases that will appear on each line of output. It must be a multiple of 30. \par +\pard\plain \li1640\ri1720\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Name Sequence Position Fragment length\par +\pard \li1640\ri1720\sl220\box\brsp100\brdrth 1 HapII c'cgg 2 1\par + 2 HpaII c'cgg 2 0\par + 3 MspI c'cgg 2 0\par + 4 MseI t'taa 14 12\par + 5 HincII gtt'aac 15 1\par + 6 HindII gtt'aac 15 0\par + 7 HpaI gtt'aac 15 0\par + 8 DsaV 'ccagg 23 8\par + 9 EcoRII 'ccagg 23 0\par +10 TspAI 'ccagg 23 0\par +11 ApyI cc'agg 25 2\par +12 BstNI cc'agg 25 0\par +13 MvaI cc'agg 25 0\par +14 ScrFI cc'agg 25 0\par +15 MaeIII 'gttac 47 22\par +16 BsrI actggt' 49 2\par +17 MseI t'taa 55 6\par +18 MaeII a'cgt 63 8\par +19 SfaNI gcatcaacaa'gata 86 23\par +\pard \li1640\ri1720\sl220\keepn\box\brsp100\brdrth 20 MaeII a'cgt 91 5\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 8.2\tab Typical output from "List by position".\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 8.\tab Accept "The sequence is linear".\par +9.\tab Accept "Search for definite matches". \par +\pard\plain \s4\qj\sa120\sl280 \f20 The results will then appear in the form shown in figure 8.3 The sequence is listed with a 3 phase translation underneath and every tenth base numbered. Above the sequence the positions of the cut sites of res +triction enzymes are marked.\par +\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Search for restriction enzyme sites and plot their positions \par +\pard\plain \s7\qj\fi-560\li560\sa80\sl260\tx560 \f20 1.\tab Select "Search".\par +2.\tab Select "Input source" as "All enzymes file". \par +3.\tab Accept "Search for all names". \par +4.\tab Select "Order results by position".\par +5.\tab Reject "List matches". \par +6.\tab Accept "The sequence is linear".\par +7.\tab Accept "Search for definite matches".\par +\pard\plain \s4\qj\sa80\sl260 \f20 The results will then appear in the form shown in figure 8.4. Each enzyme that has a match is named at the left edge of the display and its cut sites are marked by short +vertical lines. If the display window fills up the bell will ring. Users may then take a screen dump before typing return. The program then displays the message " ? Restart plotting from bottom of frame". To do so type return. To quit type !.\par +\pard\plain \li1200\ri1240\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Search for restriction enzyme sites\par +\pard \li1200\ri1240\sl220\box\brsp100\brdrth Select operation\par +X 1 Search\par + 2 List enzyme file\par + 3 Clear text\par + 4 Clear graphics\par +? Selection (1-4) (1) =\par +Select input source\par + 1 All enzymes file\par +X 2 Six cutter file\par + 3 Four cutter file\par + 4 Personal file\par + 5 Keyboard\par +? Selection (1-5) (2) =1\par +? Search for all names (y/n) (y) =\par + Select results display mode\par +X 1 Order results enzyme by enzyme\par + 2 Order results by position\par + 3 Show only infrequent cutters\par + 4 Show names above the sequence\par +? Selection (1-4) (1) =4\par +? Hide translation (y/n) (y) =n\par + ? Use 1 letter codes (y/n) (y) =\par + ? Line length (30-90) (60) =\par +? The sequence is linear (y/n) (y) =\par + ? Search for definite matches (y/n) (y) =\par +\par + HapII\par + HpaII\par + MspI MseI\par + . .HincII\par + . .HindII\par + . .HpaI DsaV\par + . .. EcoRII\par + . .. TspAI\par + . .. . ApyI\par + . .. . BstNI\par + . .. . MvaI\par + . .. . ScrFI MaeIII\par + . .. . . . BsrI MseI\par +ccggttagactgttaacaacaaccaggttttctactgatataactggttacatttaacgc\par + 10 20 30 40 50 60\par + P V R L L T T T R F S T D I T G Y I * R\par + R L D C * Q Q P G F L L I * L V T F N A\par +\pard \li1200\ri1240\sl220\keepn\box\brsp100\brdrth G * T V N N N Q V F Y * Y N W L H L T P\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 8.3\tab Typical dialogue and output for a "Names above the sequence" search.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Finding restriction enzymes that cut infrequently\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par +2.\tab Select "Input source" as "All enzymes file". \par +3.\tab Accept "Search for all names". \par +4.\tab Select "Show only infrequent cutters".\par +5.\tab Define "Maximum number of cuts".\par +6.\tab Accept "The sequence is linear".\par +\pard\plain \li160\ri200\sl220\keepn\box\brsp100\brdrth \f4\fs16 {{\pict\macpict\picw430\pich254 +0b99ffffffff00fd01ad1101a0008201000affffffff00fd01ad090000000000000000310000002400fa01ac9800240000000000b7011f0000000000b7011f0000002400fa01ac000102dd001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001 +f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000210000006007fdfff00fc06f5000020ea0006f5000020ea0006f5000020ea0006f5000020ea0006f5000020ea0006f5000020ea0006007f +dfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc040020de00040020de00040020de00040020de00040020de00040020de0006007fdfff00fc1402 +000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000210000006007fdfff00fc06fb000004e40006fb000004e40006fb00 +0004e40006fb000004e40006fb000004e40006fb000004e40006007fdfff00fc0af8000080fe000080eb000af8000080fe000080eb000af8000080fe000080eb000af8000080fe000080eb000af8000080fe000080eb0006007fdfff00fc040020de00040020de00040020de00040020de00040020de00040020de0006007f +dfff00fc0602000040e0000602000040e0000602000040e0000602000040e0000602000040e00006007fdfff00fc0602000040e0000602000040e0000602000040e0000602000040e0000602000040e00006007fdfff00fc0602000040e0000602000040e0000602000040e0000602000040e0000602000040e00006020000 +40e00006007fdfff00fc06eb000040f40006eb000040f40006eb000040f40006eb000040f40006eb000040f40006007fdfff00fc06eb000040f40006eb000040f40006eb000040f40006eb000040f40006eb000040f40006eb000040f40006007fdfff00fc06eb000010f40006eb000010f40006eb000010f40006eb000010 +f40006eb000010f40006007fdfff00fc040020de00040020de00040020de00040020de00040020de00040020de0006007fdfff00fc06eb000008f40006eb000008f40006eb000008f40006eb000008f40006eb000008f40006007fdfff00fc06fe000020e10006fe000020e10006fe000020e10006fe000020e10006fe0000 +20e10006fe000020e10006007fdfff00fc06fe000020e10006fe000020e10006fe000020e10006fe000020e10006fe000020e10006007fdfff00fc06fe000020e10006fe000020e10006fe000020e10006fe000020e10006fe000020e10006007fdfff00fc06eb000008f40006eb000008f40006eb000008f40006eb000008 +f40006eb000008f40006eb000008f40006007fdfff00fc06eb000010f40006eb000010f40006eb000010f40006eb000010f40006eb000010f40006007fdfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fe000008e10006fe000008e10006fe000008e1 +0006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fa000080e50006fa000080e50006fa000080e50006fa000080e50006fa000080e50006007fdfff00fc06eb000008f40006eb000008f40006eb000008f40006eb000008f40006eb000008f40006eb000008f40006007fdfff00fc06fe000008e100 +06fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc02dd00a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00020000000e00 +252c000800140554696d65730300140d00092e0004000001002b010b055472753949a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a000c0000001800252a0a055366614e49a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a0014000000 +2000252a08055363724649a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a001c0000002800252a08044d766149a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00260000003200252a0a044d737049a00097a10096000c010000000200 +000000000000a1009a0008fffd00000011000001000a002e0000003a00252a08044d736549a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00370000004300252a09064d6165494949a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00 +400000004c00252a09054d61654949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00490000005500252a09054d70614949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00510000005d00252a08044d706149a00097a10096000c01 +0000000200000000000000a1009a0008fffc00000011000001000a00590000006500252a080648696e644949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00630000006f00252a0a0648696e634949a00097a10096000c010000000200000000000000a1009a0008fffc000000 +11000001000a006b0000007700252a080648696e503149a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00750000008100252a0a0548696e3649a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a007d0000008900252a080448686149a0 +0097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00870000009300252a0a054861704949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a008f0000009b00252a08054861654949a00097a10096000c010000000200000000000000a1009a00 +08fffd00000011000001000a0098000000a400252a090645636f524949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00a1000000ad00252a090745636c31333649a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00a9000000b50025 +2a080444736156a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00b2000000be00252a090444646549a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00ba000000c600252a080443666f49a00097a10096000c01000000020000000000 +0000a1009a0008fffc00000011000001000a00c3000000cf00252a09054273744f49a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00cc000000d800252a09054273744e49a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00d4000000 +e000252a080442737249a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00de000000ea00252a0a084273703134334949a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00e6000000f200252a08054273694c49a00097a10096000c0100 +00000200000000000000a1009a0008fffd00000011000001000a00f0000000fc00252a0a0441707949a00097a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 8.4\tab Typical output from "Plot positions".\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 7.\tab Accept "Search for definite matches". \par +\pard\plain \s4\qj\sa120\sl280 \f20 The names and number of cut sites of all enzymes that cut less than or equal to the "Maximum number of cuts" will then be displayed.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Producing a back translation from a protein sequence \par +\pard\plain \s4\qj\sa120\sl280 \f20 +The routine for producing back translations is contained in the program PIP. It back translates protein sequences into DNA using the standard genetic code. The translation can use either the IUB symbols or a set of codon preferences. If a set of codon pre +ferences is used they must conform to the format of codon tables pr +oduced by the nucleotide interpretation program, and the back translation will contain the favoured codons. If, for any amino acid there is no favoured codon, the IUB symbols will be employed. The program will plot the redundancy along the sequence and hen +ce can be used to find the best sequences to use as primers. The DNA sequence can be saved to a file and analysed using the nucleotide analysis program. \par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Back translate".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "No codon preference". The alternative will cause the program to ask for "File name of codon table", which should be in the same format as those created by the nucleotide interpretation program. +\par +3.\tab Reject "Plot redundancy". The alternative will ask for a window length to use for the plot. The window length is in codons. A plot will appear in which the best primers are sited at the peaks and the worst at the troughs.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Save DNA to disk"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "File name for DNA sequence". This file can later be read into program NIP and all the searches described above employed.\par +\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \page 3.\tab Notes\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab +The file containing the definitions of the restriction enzymes names and their recognition sequences uses the standard IUB redundancy symbols and has the following format. Each name is followed by a /, then each of its recognition sequences is followed +by a /. The last recognition sequence for each enzyme is followed by //. The cut sites should be indicated by a '. If the cut site is not contained in the recognition sequence, the recognition sequence should be extended by sufficent N symbo +ls. For example the two lines from the standard file shown below define the enzymes Alw21I and Alw26I. These files are kindly updated each month by Dr. Rich Roberts.\par +\pard \s7\qj\li1720\sa120\sl280\tx1720 Alw21I/GWGCW'C//\par +Alw26I/GTCTCN'NNNN/'NNNNNGATCC//\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab +To search for a subset of the restriction enzymes in a file the user should reject "Search for all names" and the program will ask for the names of the enzymes wanted and extract their recognition sequences from the file. Alternatively, if a user was al +ways using the same subset, then a file containing only those enzymes could be created by editing the standard file. This file would then be selected as "Personal file" for "Input source".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +The routine also allows names and recognition sequence to be entered on the keyboard. This is selected as "Keyboard" for "Input source", and the program will prompt for names and their recognition sequences. In this way the routine can be used to search + for exact matches to any short sequence. Again IUB redundancy codes can be used.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab When back translating +from proteins it is often useful to produce a back translation using both a table of codon preferences and one using the IUB symbols. This is because the restriction enzyme search program can distinguish between definite and possible cuts in the sequence. +Those matches that the program terms "definite matches" are ones in which the specification of the recognition sequence corresponds exactly to that of the back translation. The program will also find what it terms "possible matches" which are ones that dep +end on the particular codons chosen for each amino acid. These are sites at which recognition sequences could be engineered to produce a cut in the DNA without changing the amino acid, but which are not necessarily found in the original sequence. \par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 9. Statistical and Structural Analysis of Nucleotide Sequences\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Calculating the base composition\par +2.2\tab Calculating the dinucleotide composition\par +2.3\tab Calculating the codon composition\par +2.4 \tab Creating a codon usage file\par +2.5\tab Plotting the base composition\par +2.6 \tab Searching for anomalous compositions\par +2.7\tab Search for anomalous word usage\par +2.8\tab Calculate codon constraint\par +2.9 \tab Searching for stem-loops\par +2.10\tab Searching for long range inverted repeats\par +2.11\tab Searching for long range repeats\par +2.12\tab Searching for repeated words\par +2.13\tab Searching for possible Z DNA\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +4.\tab References\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we deal with performing simple statistical and structural analysis of nucleotide sequences and also describe some more unusual test +s. We cover base, dinucleotide and codon compositions, potential amino acid compositions, and the relative frequencies of each base in each position of codons. We describe how to produce plots to show regions of unusual composition and to measure the codon + bias for a gene. In addition we describe a set of functions for finding "structures" in nucleotide sequences, including short range inverted repeats or stem-loops, long range inverted repeats, long range direct repeats, and Z DNA. All the methods are cont +ained in the program NIP.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Calculating the base composition\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab Select "Calculate base composition". The composition of the active region is shown.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.2\tab Calculating the dinucleotide composition\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab +Select "Calculate dinucleotide composition". The dinucleotide composition of the active region and an expected dinucleotide composition is shown. The expected composition is calculated from the base composition assuming a random order of bases in the sequ +ence. See figure 9.1.\par +\pard\plain \li1180\ri1440\sb200\sl220\box\brsp100\brdrth \f4\fs16 T C A G\par +\pard \li1180\ri1440\sl220\box\brsp100\brdrth Obs Expected Obs Expected Obs Expected Obs Expected\par +T 5.86 5.97 6.18 5.99 4.24 5.91 8.14 6.56\par +C 6.10 5.99 5.14 6.02 5.91 5.93 7.38 6.59\par +A 5.57 5.91 5.64 5.93 7.91 5.84 5.05 6.49\par +\pard \li1180\ri1440\sl220\keepn\box\brsp100\brdrth G 6.90 6.56 7.56 6.59 6.11 6.49 6.30 7.22\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa200\sl240\tx1140 \f21\fs20 Figure 9.1\tab The dinucleotide composition display\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Calculating the codon composition\par +\pard\plain \s4\qj\sa120\sl280 \f20 This function counts codons, amino acid composition, protein molecular weights, hydrophobicity and base compos +itions. Users select the segments of the sequence to be analysed. The segments can be defined on the keyboard or from an EMBL/GenBank feature table.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate codon composition".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Show observed counts". The alternative displays its codon tables so that the total for each amino acid sums to 100. This makes it easier to see any bias present in the codon usage.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Accept "Define segments using keyboard". The alternative is to use a feature table.\par +4.\tab Define "From". The start of the segment to be analysed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab +Define "To". The end of the segment to be analysed. The results will be displayed as in figure 9.2 and then the program will again ask "From". The user should define a zero value for "From" when all segments of interest have been analysed. The program w +ill then display a cummulative total for all the values it calculates.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The counts are broken down into several figures. Apart from the codon counts we see the base composition by position in codon expressed as a percentage of each bases own + frequency; base composition by position in codon expressed as a percentage of the overall base composition of the segment; base composition expected for the observed amino acid composition if there was no codon preference; percentage deviations of the ob +served amino acid composition from an average amino acid composition (1) ; the molecular weight and hydrophobicity (2) of the putative amino acid sequence.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4 Creating a codon usage file\par +\pard\plain \s4\qj\sa120\sl280 \f20 This method writes a file of codon usage in the form of a codon tab +le (see figure 9.2). Such tables can be used by several other methods contained within the programs. If required the user can start with an existing file and add to it.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate a codon table and write it to disk".\par +2.\tab Accept "Start with empty table".\par +\pard\plain \li440\ri500\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Calculate base, codon and amino acid compositions\par +\pard \li440\ri500\sl220\box\brsp100\brdrth ? Show observed counts (y/n) (y) =\par + ? Define segments using keyboard (y/n) (y) =\par +\par +? From (0-8134) (0) =1\par +? To (1-8134) (8134) =1000\par +? + strand (y/n) (y) =\par + ===========================================\par + F TTT 5. S TCT 7. Y TAT 4. C TGT 2.\par + F TTC 17. S TCC 3. Y TAC 5. C TGC 3.\par + L TTA 3. S TCA 4. * TAA 3. * TGA 1.\par + L TTG 4. S TCG 3. * TAG 0. W TGG 7.\par + ===========================================\par + L CTT 3. P CCT 6. H CAT 6. R CGT 3.\par + L CTC 1. P CCC 1. H CAC 4. R CGC 2.\par + L CTA 0. P CCA 4. Q CAA 3. R CGA 1.\par + L CTG 36. P CCG 6. Q CAG 5. R CGG 4.\par + ===========================================\par + I ATT 12. T ACT 3. N AAT 6. S AGT 0.\par + I ATC 13. T ACC 5. N AAC 7. S AGC 7.\par + I ATA 1. T ACA 2. K AAA 9. R AGA 0.\par + M ATG 9. T ACG 7. K AAG 3. R AGG 1.\par + ===========================================\par + V GTT 6. A GCT 5. D GAT 7. G GGT 9.\par + V GTC 3. A GCC 6. D GAC 6. G GGC 9.\par + V GTA 7. A GCA 2. E GAA 5. G GGA 5.\par + V GTG 9. A GCG 7. E GAG 3. G GGG 3.\par + ===========================================\par + Total codons= 333.\par + T C A G\par +1 25.00 34.27 40.28 35.94\par +2 45.42 28.63 36.02 22.27\par +3 29.58 37.10 23.70 41.80\par + ----- ----- ----- -----\par += 100% 100% 100% 100%\par +1 21.32 25.53 25.53 27.63 = 100%\par +2 38.74 21.32 22.82 17.12 = 100%\par +3 25.23 27.63 15.02 32.13 = 100%\par +% 28.43 24.82 21.12 25.63 Observed, overall totals\par +% 29.65 23.25 23.95 23.15 Expected, even codons per acid\par + A C D E F G H I K L\par + 20. 5. 13. 8. 22. 26. 10. 26. 12. 47.\par +O-E % -27. -11. -25. -61. 71. 10. 38. 52. -36. 59.\par + M N P Q R S T V W Y\par + 9. 13. 17. 8. 11. 24. 17. 25. 7. 9.\par +O-E % 14. -10. 1. -39. -41. 6. -11. 15. 64. -15.\par +\pard \li440\ri500\sl220\keepn\box\brsp100\brdrth Total acids= 329. Molecular weight= 36493. Hydrophobicity= 64.7\par +\pard\plain \s8\qj\fi-1140\li1140\sb80\sa280\sl240\tx1140 \f21\fs20 Figure 9.2\tab A worked example of calculating codon, base and amino acid compositions.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 3.\tab Accept "Show observed counts". The alternative is to have the counts for each amino acid type sum to 100.\par +4.\tab Accept "Define segments using keyboard". The alternative is to use an EMBL/GenBank feature table.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "From". The start of the segment to count over.\par +6.\tab Define "To". The end of the segment.\par +7.\tab Accept "+ strand". Alternatively the minus strand.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab The table will appear on the screen and the program will cycle round to step 5. When all segments have been defined a zero v +alue for "From" will instruct the program to display on the screen a table which is the sum of all the individual tables.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Name for codon table file". Give the name of the file in which to save the final table. \par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Plotting the base composition\par +\pard\plain \s4\qj\sa120\sl280 \f20 This function plots the base composition for each "window length" of the sequence. The frequency of any combinations of bases can be plotted.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot base composition".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select which combination of bases to plot. The default is A+T, but any single base or combination of bases can be used.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +Select "Odd window length". This is the size of window over which each count is made, it is "odd" so that the plotted point exactly corresponds to the centre of each window. The count is made over the window and then the window is moved on by 1 base, an +d the count repeated.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Plot interval". Especially when using long windows it is unnecessary to plot the results for every point along the sequence. A plot interval of 5 will mean the value for every fif +th point will be plotted. The plot will appear in the form shown in figure 9.3\par +\pard\plain \ri-100\sb360\sl220\keepn \f4\fs16 {\plain \fs16 {\pict\macpict\picw451\pich82 +343affffffff005101c21101a00082a0008c01000affffffff005101c2070000000022000100010000a000a0a100a400020de801000a0000000000000000070001000122004f000100b223000021000101c123000023004e23000021004f0001230000a000a301000affffffff005101c22300b221000101c123004e21004f +0001a000a122003c000100ff2300fb2300fa2300f82300fa2300fb2300fe2300022300012300022301002300002300022300042300ff2300012300002300ff2300fe2300032300022300022300032300012300fd2302022300fe2300022300fe2300fd2300032300fd23000323000223000023000023000023000123000523 +00fe2301002300ff2300fe2300ff2300002300012300002300fd2300002300032300022300002300fe2300002300ff2300fd2302002300032300fd2300fe2300fe2300002300002300022300032300022300002300012300002300022300002301012300fd2300022300fe2300002300ff2300fe2300002300032300002300 +022300fe2300ff2300fe2300fd2302002300002300fe2300022300002300fe2300022300fe2300022300032300032300ff2300002300fe2300032302ff2300012300032300002300fa2300ff2300012300002300002300002300fb2300002300002300022300002300022301fe2300052300002300032300002300fe2300ff +2300002300fe2300032300ff2300fd2300002300012300ff2302012300032300ff2300fd2300002300062300fe2300022300fe2300ff2300fe2300022300002300fe2300ff2301002300002300012300022300fe2300002300022300fd2300012300fd2300002300022300002300fe2300002302022300012300022300fe23 +00032300002300022300fe2300022300fe2300032300022300022300fe2300fe2301002300002300002300002300ff2300fb2300022300fe2300002300002300002300002300fd2300002300032300ff2302002300032300fe2300022300002300fe2300002300022300fe2300002300ff2300002300fe2300002300032302 +002300022300fd2300012300032300002300ff2300002300fd2300fc2300022300002300022300fe2300022301012300002300022300032300012300002300022300fe2300ff2300fd2300032300fe2300002300fd2300022302012300002300002300022300fd2300012300002300022300fe2300f8230002230002230000 +2300032300002300fd2300002300002300062300002301012300ff2300fe2300002300052300022300002300fc2300002300ff2300002300012300ff2300002300002300fe2302032300fd2300ff2300012300ff2300002300042300ff2300fe2300022300012300002300ff2300002300002301fe2300ff2300fe23000023 +00ff2300012300002300002300022300002300012300022300002300002300002302012300022300fe2300002300022300fe2300002300002300002300ff2300fe2300032300022300fd2300002302012300002300fd2300022300fe2300ff2300fe2300022300012300ff2300002300002300002300012300ff2300002301 +002300fe2300032300ff2300002300012300032300042300fe2300022300fe2300fe2300022300002300002302fd2300fe2300022300002300fe2300022300fe2300002300022300012300002300002300ff2300002300012301ff2300012300002300002300ff2300002300fe230000230000230002230000230000230000 +2300fd2300012302002300022300fd2300fe2300002300ff2300fe2300022300002300012300032300ff2300042300002300002302002300022300fe2300022300fd2300002300fb2300002300002300022300fe2300022300002300012300022300012301002300042300002300002300002300002300002300fe2300fe23 +00022300002300002300052300002300002302fe2300022300002300002300fe2300ff2300fc2300002300022300022300002300002300fe2300fe2300022301002300fd2300fe2300002300022300012300022300022300002300002300fe2300022300002300fb2300fe2302ff2300fe2300002300002300032300ff2300 +032300012300002300022300fe2300002300022300fe2300002301002300ff2300012300fd2300002300002300002300ff2300002300002300002300fd2300002300002300012300ff2302012300032300002300032300022300022300032300fe2300002300002300ff2300002300fc2300002300002302ff230000230001 +2300022300022300002300012300ff2300012300022300012300fd2300ff2300fe2300fe2301ff2300fe2300fd2300fd2300002300fe2300022300052300002300012300022300032300032300ff2300012300002300ff2300fe2300002300002302022300fe2300002300fe2300ff2300012300002300ff23000023000123 +00ff2300fd2300002300012300ff2300fd2301fc2300fd2300022300012300002300ff2300002300fd2300032300032300002300fe2300ff2300002300032302022300002300002300012300022300002300002300012300ff2300fe2300032300002300ff2300012300032301002300002300022300002300fe2300ff2300 +012300fd2300ff2300fe2300022300002300002300002300fe2302032300022300012300ff2300012300fd2300022300012300022300fb2300ff2300012300002300002300002302002300fd2300022300012300022300fe2300ff2300002300012300002300002300ff2300fd230001230000230000230103230000230000 +2300022300002300012300022300fe2300002300002300022300fd2300012300ff2300012302ff2300fe2300022300fe2300ff2300002300012300fd2300002300022300002300002300002300002300002301012300022300fe2300ff2300012300002300022300fe2300002300ff2300012300002300002300ff23000023 +02fe2300022300fe2300052300fe2300ff2300032300002300002300002300012300042300fb2300032300002300fd2301002300012300fd2300002300ff2300012300ff2300032300002300fd2300fd2300002300012300002300032302ff2300002300012300052300022300012300fb2300002300ff2300012300fd2300 +002300022300fe2300022302012300ff2300fe2300002300032300fd2300002300032300fc2300012300002300032300ff2300012300022301fd2300fe2300ff2300032300fd2300012300fd2300002300002300002300022300012300ff2300032300fe2302032300002300022300002300fe2300fd2300002300ff230001 +2300ff2300002300fe2300002300002300032300fd2301ff2300012300ff2300002300032300002300fb2300fd2300032300022300002300002300012300fd2300032302002300022300012300fd2300032300002300ff2300012300022300fe2300002300022300002300002300002300fe2300002300002300ff23000023 +02012300ff2300012300002300002300002300fd2300002300002300002300fd2300fe2300022300022300fe2301032300032300ff2300012300022300fe2300002300002300002300ff2300002300fe2300022300012300032302fd2300002300022300012300042300032300fd2300fe2300fe2300002300022300fe2300 +002300fc2300012300022301002300012300ff2300002300002300002300002300012300002300002300022300022300002300012300002302ff2300002300fe2300fd2300012300ff2300fe2300002300002300ff2300fe2300ff2300032300fe2300022301fe2300ff2300fe2300fe2300002300022300fe2300042300fe +2300022300002300012300032300022300fe2302002300022300fe2300002300022300fe2300002300ff2300002300fe2300002300022300fe2300002300ff2300012302ff2300002300002300002300012300002300022300002300002300012300022300012300022300002300fe2301002300022300fe2300002300fd23 +00ff2300fd2300002300fe2300002300002300fd2300012300042300032302012300002300ff2300002300032300012300002300022300fd2300012300002300022300fe2300ff2300fe2301002300ff2300012300002300fd2300022300002300012300002300fd2300ff2300fe2300022300002300002302032300002300 +fe2300022300012300032300ff2300032300002300fe2300022300002300022300002300012300002301ff2300012300ff2300002300012300022300012300002300ff2300002300fd2300002300fe2300fe2300ff2302fd2300002300032300012300002300022300002300002300002300022300fe2300fe2300ff230001 +2300022302fe2300022300fe2300002300022300022300002300002300002300fc2300002300022300002300fe2300022301fe2300002300002300022300002300002300002300022300fc2300ff2300012300ff2300002300002300fe2302ff2300fd2300012300002300fb2300ff2300032300fe23000223000223000123 +00002300002300032300022300fd2301012300022300fd2300002300002300002300012300022300fe2300002300fd2300022300fe2300022300012300fc2300012300ff2300012300002302022300002300002300002300012300022300012300022300fd2300fe2300022300012300fd2300022300012302002300002300 +022300002300fd2300002300012300002300fd2300ff2300012300032300fd2300022300012301022300fe2300022300022300002300fe2300002300002300fd2300012300ff2300fe2300ff2300002300002300fe2302022300032300fe2300032300002300002300002300ff2300fe2300fd230000230002230000230001 +2300002301002300ff2300002300012300ff2300012300022300002300002300012300042300032300fb2300022300fe2302002300fe2300fd2300002300ff2300fe2300022300012300002300002300022300002300012300022300022301002300002300fe2300fd2300002300012300002300ff2300fe2300ff23000323 +00fe2300022300002300fe2302022300002300fe2300ff2300fe2300032300ff2300002300fe2300002300032300002300ff2300002300012300032302002300002300002300fd2300022300012300002300fd2300002300002300022300fd2300002300012300ff2301fe2300002300002300002300022300012300ff2300 +012300022300fd2300fe2300002300002300002300ff2302012300002300ff2300012300022300042300ff2300fe2300002300002300ff2300032300012300002300ff2301002300002300fe2300ff2300032300002300fe2300002300002300032300ff2300032300fe2300002300002302ff230000230000230000230000 +2300012300ff2300002300fe2300002300ff2300002300fd2300002300fe2300002301fe2300ff2300002300002300002300052300012300022300002300032300032300002300002300022300fe2302fd2300002300002300002300fb2300052300012300022300fe2300052300002300ff2300032300012300ff2302fd23 +00002300012300ff2300fe2300002300002300fb2300022300012300fd2300ff2300002300002300fd2301012300022300fe2300032300022300012300ff2300002300012300ff2300012300002300fd2300fd2300002300002302032300ff2300032300002300fe2300022300fe2300022300012300002300ff2300012300 +ff2300002300002300032300fe2300022300022300002301fe2300fe2300002300ff2300fd2300032300002300fe2300fd2300032300022300fe2300032300022300022302002300002300002300fe2300fe2300042300fe2300002300fe2300ff2300fe2300ff2300002300fe2300ff2301002300fe230000230002230003 +2300002300012300022300012300ff2300002300fd2300fe2300022300fe2302002300ff2300fc2300002300042300012300032300fd2300002300022300012300022300fe2300ff2300002300012302002300ff2300002300002300012300022300032300fe2300002300002300022300fe2300022300fd2300fe23010023 +00002300ff2300fe2300002300022300002300012300ff2300002300012300ff2300fe2300002300ff2302032300fd2300042300ff2300012300022300032300002300fe2300002300ff2300fb2300002300002300022301002300002300012300022300fd2300062300002300002300fe2300ff2300fe2300002300ff2300 +012300ff2300012302fd2300022300002300002300012300002300002300002300022300012300002300022300002300fe2300022302002300002300fe2300022300fe2300ff2300002300012300022300fe2300ff2300002300fe2300022300002301012300002300ff2300002300002300012300002300ff2300fe230000 +2300002300ff2300002300002300012302002300032300ff2300002300fe2300022300032300002300fe2300fd2300fd2300002300022300012300022301fe2300022300002300002300012300052300ff2300fe2300002300002300fe2300022300fe2300ff2300002300fe23020023000023000223000023000023000023 +00012300ff2300032300022300fe2300fe2300002300002300002301002300002300002300022300002300002300fe2300022300022300fe2300002300fe2300ff2300012300022302022300002300002300fe2300022300002300002300002300002300002300002300002300012300002300002302022300fb2300022300 +fe2300fd2300012300fd2300fd2300002300fd2300022300002300002300002300002300fe2300022300fc2300002300022300fe2301002300022300022300032300fd2300012300002300022300012300fd2300002300fd2300022300fe2300032302002300002300ff2300002300fe230000230000230000230002230000 +2300012300ff2300032300fe2300002301002300002300002300002300ff2300002300012300032300002300022300012300002300ff2300012300ff2302002300fe2300002300ff2300fd2300fc2300022300002300032300fd2300052300fe2300022300012300032301002300022300fe2300022300fd2300002300fe23 +00002300002300ff2300fd2300fe2300022300fe2300022300002302002300fe2300022300032300012300022300002300002300fe2300ff2300012300002300fa2300032300002302002300ff2300032300fe2300fd2300002300022300fe2300fe2300022300022300002300012300032300002301ff2300012300002300 +022300002300fd2300fd2300002300fe2300032300022300012300002300ff2300002302032300fe2300022300fe2300002300022300fb2300002300022300fe2300032300022300fe2300ff2300fe2301032300ff2300012300fd2300002300002300002300002300022300fe2300022300fd2300032300002300002300fe +2302ff2300002300fc2300042300032300fe2300ff2300fe2300fe2300022300022300002300012300fd2300fe2301022300022300042300002300002300002300002300002300002300002300ff2300012300022300fd2300002302002300fe2300022300002300032300fe2300ff23000123000223000123000223000023 +00002300fe2300002302042300fe2300fe2300002300042300fe2300032300ff2300012300002300ff2300012300ff2300002300fc2301002300022300002300fd2300002300002300012300002300022300002300002300fd2300032300002300fe2300ff2302fe2300ff2300012300002300ff2300002300002300012300 +002300022300012300022300fb2300072300fc2301042300002300012300002300ff2300fe2300032300fd2300fb2300002300fd2300ff2300002300002300012302ff2300002300012300002300032300002300fd2300ff2300002300002300012300022300002300012300fd2300022300002300032300fe2300002302ff +2300002300fb2300052300002300fe2300002300002300002300002300022300032300fe2300fc2300002300fe2301032300ff2300012300ff2300fb2300002300002300032300022300002300012300032300ff2300012300002302052300fe2300002300ff2300fd2300002300002300002300fe23000223000023000023 +00002300002300002301012300ff2300002300fe2300ff2300002300002300012300ff2300002300002300002300002300042300022302fe2300ff2300012300ff2300fe2300ff2300fe2300fd2300fe2300ff2300012300002300022300002300032301032300002300ff2300042300002300002300ff2300fe2300002300 +ff2300012300022300fe2300fd2300fd2300002302002300fe2300002300ff2300fb2300012300ff2300032300fe2300022300032300002300002300032300002302042300002300012300002300ff2300002300012300022300fb2300002300032300fd2300002300022300fe2301022300fe230002230000230001230003 +2300fd2300fd2300032300ff2300002300002300002300fe2300fe2302052300ff2300002300002300fc2300002300022300002300002300fd2300012300fd2300022300fe2300ff2300fc2301002300fd2300022300fe2300002300ff2300012300002300022300fe2300032300002300ff2300012300ff2302fe23000023 +00fd2300ff2300fe2300002300002300002300002300002300022300012300032300022300012301ff2300032300002300022300fe2300fe2300ff2300fd2300002300fe2300fd2300fe2300022300022300002302012300022300012300ff2300042300002300ff2300012300fd2300ff2300fe2300ff2300012300fd2300 +fe2302022300fe2300052300022300012300002300002300032300022300002300002300002300fd2300002300002300012301fd2300002300fd2300022300fe2300022300fe2300ff2300fe2300002300002300002300002300022300002302032300002300012300002300022300002300012300fc2300fe2300022300fe +2300ff2300fe2300fe2300022301022300012300ff2300012300002300032300022300002300002300fe2300002300002300022300012300ff2300fb2300022300fe2300022300002302002300fe2300fd2300032300fd2300fe2300ff23000023000123000223000223000323000123000523000523020523000323000123 +00ff2300fe2300002300fd2300fd2300002300fe2300002300002300022300012300022300032301022300002300fe2300002300ff2300fe2300fe2300ff2300fe2300ff2300fe2300fd2300002300002300fe2302022300002300012300fd2300002300002300022300fe2300002300022300002300fe2300022300fd2300 +012301002300022300052300fe2300fd2300002300012300022300022300032300002300fd2300012300022300012302022300fd2300002300fe2300ff2300012300ff2300fe2300022300032300012300ff2300002300fe2300ff2300fe2301fe2300002300ff2300fd2300032300fd2300002300032300fe2300032300fd +2300ff2300002300012300002302022300012300022300fe2300042300fe2300022300fe2300022300fc2300fd2300ff2300012300fd2300032302022300012300fd2300002300022300012300ff2300012300ff2300002300fb2300022300012300002300022301012300ff2300fe23000223000323000023000023000223 +00fe2300fe2300022300022300002300012300022302fe2300ff2300032300002300fe2300002300002300002300002300002300ff2300002300002300fe2300fe2300ff2301002300002300fe2300022300fd2300002300002300032300002300fd2300002300012300ff2300012300002302ff2300002300002300012300 +002300ff2300012300022300002300012300ff2300002300002300012300022301022300fc2300022300002300002300002300022300fe2300fd2300002300fd2300012300022300fe2300002302ff2300032300012300002300052300002300002300fd2300032300022300fb2300fe2300002300ff230000230203230000 +2300002300fd2300032300022300002300fe2300022300fe2300fe2300022300fd2300002300fe2300002301002300ff2300012300002300ff2300fe2300022300002300fe2300ff2300012300002300002300022300012302ff2300012300022300012300fd2300022300012300002300022300002300fd2300fe2300ff23 +00042300ff2300fd2300002300002300fe2300022301032300002300fd2300002300002300032300fe2300052300002300fd2300fe2300032300002300022300002302fe2300002300022300032300002300ff2300032300fe2300ff2300012300022300012300ff2300012300042300fc2301ff2300012300022300002300 +fe2300022300022300fe2300002300fe2300022300fe2300fd2300022300fe2302ff2300fc2300002300002300ff2300002300fd2300fe2300fd2300fd2300012300ff2300012300ff2300002302012300042300032300002300fd2300002300042300022300012300042300002300002300032300002300032301fe230000 +2300002300002300022300fe2300002300022300fe2300042300002300002300fe2300002300fe2302ff2300fe2300002300ff2300002300fe2300002300002300002300002300002300022300fe2300022300002300032301fe2300002300022300012300fd2300022300012300fd2300ff23000323000023000023000023 +00032300002302002300fe2300022300022300fe2300002300022300fe2300fe2300002300042300002300fb2300fd2300002302fe2300052300fe2300002300fd2300002300022300012300022300012300fc2300012300ff2300002300012301022300fe2300022300012300002300ff2300fe2300022300002300012300 +002300ff2300002300002300012302042300fe2300fd2300fe2300032300002300002300002300022300002300032300022300002300fe2300002300ff2301002300012300ff2300fc2300fd2300022300002300002300012300002300002300002300002300022300002302022300fc2300fd2300022300002300fd230004 +2300ff2300fe2300fd2300002300022300fc2300022300022301fc2300ff2300032300022300012300fd2300002300fd2300012300002300ff2300002300002300012300042302fe2300032300002300002300002300ff2300fe2300002300fe2300002300fd2300ff23000123000223000123000223020023000223000123 +00002300022300fe2300022300002300012300fd2300002300002300022300032300022300fe2300fe2300002300002300002301002300022300fd2300012300022300002300002300022300002300fc2300002300002300002300ff2300012302ff2300fe2300002300022300002300002300032300fe2300ff2300032300 +022300fe2300002300002300002301fe2300022300022300012300ff2300002300002300012300fd2300022300002300fe2300002300002300022302002300002300002300012300022300fe2300ff2300fc2300002300fd2300022300fd2300002300fc2300042300fc2301022300022300002300fe230002230000230001 +2300ff2300012300022300002300032300002300022300fe2302002300fe2300022300fd2300012300ff2300012300002300002300022300002300002300022300012300ff2302fc2300002300002300022300fe2300ff2300032300fd2300002300002300002300002300fd2300012300002301ff23000023000023000323 +00fb2300032300032300ff2300002300012300022300002300002300022300012300fd2302002300032300fd2300fe2300022300fe2300022300fe2300022300fe2300002300002300042300fc2300042301002300fe2300002300022300002300002300002300002300012300ff2300012300022300fe2300022300fe2302 +022300012300fd2300022300fe2300ff2300032300fb2300032300ff2300fe2300022300fe2300022300032301fe2300032300fd2300002300002300002300022300fd2300002300012300fd2300022300032300012300fd2302032300002300002300002300022300fd2300002300fe2300002300ff2300012300ff230000 +2300fe2300022300fe2302032300fd2300022300fe2300fe2300042300002300002300002300002300002300fe2300022300002300002301fe2300fe2300022300022300012300022300002300002300fe2300022300002300002300002300fd2300002302002300002300fc2300022300fe23000223000223000023000023 +00002300fe2300022300012300022300fb2301002300fe2300022300002300022300002300012300ff2300012300022300012300002300002300fd2300ff2302002300fe2300fe2300002300002300ff2300002300032300002300002300022300002300fe2300002300032300ff2300012300002300022300002300012302 +022300012300fc2300002300002300fe2300ff2300fe2300022300fc2300002300ff2300012300ff2300fe2301ff2300002300fc2300022300fd2300002300012300002300002300052300fd2300002300022300032300fe2302002300002300032300022300002300002300022300fb2300002300012300022300fd230001 +2300ff2300002301fd2300012300022300fb2300022300012300002300fd2300022300012300002300002300022300fe2300022300fe2302ff2300002300fe2300002300022300fe2300002300fe2300ff2300012300002300002300002300fc2300012301002300002300002300022300032300fd23000123000423000123 +00022300002300002300fe2300ff2300002302fe2300002300002300fe2300002300002300022300022300002300032300fe2300ff2300012300002300022302fd2300002300012300fd2300002300032300002300002300ff2300002300fe2300002300022300002300fe2301002300002300002300032300022300002300 +012300ff2300002300002300002300fe2300ff2300002300002300002302002300012300002300002300ff2300002300002300012300002300002300ff2300fe2300022300002300002301fe2300002300fe2300022300002300002300fe2300002300002300042300002300fe2300002300022300002302002300fe230000 +2300fb2300022300012300ff2300012300fd2300002300022300fe2300022300fe2300002301032300ff2300052300032300002300032300002300022300002300fe2300022300fe2300fe2300ff2300002302002300002300012300002300002300022300002300002300032300032300002300ff2300002300fe23000223 +00002302002300002300fe2300002300ff2300012300002300002300002300022300fd2300fc2300002300022300002301022300fe2300002300002300002300002300002300fe2300fc2300fc2300022300fd2300032300fb2300032302fc2300fe2300022300032300fe2300002300032300042300002300012300022300 +012300ff2300fe2300022301fe2300fd2300032300002300032300ff2300032300fe2300fd2300002300002300fd2300022300012300032300ff2300032300fd2300012300002300042302fe2300fd2300002300012300ff2300002300fe2300ff2300fc2300022300022300fe2300fd2300fe2300032302ff230000230001 +2300ff2300002300012300022300022300002300012300002300002300002300002300032301ff2300002300fb2300002300002300002300002300fd2300fd2300012300ff2300012300022300fe2300022302032300022300032300002300002300fe2300032300002300042300fe2300fe2300fd23000223000123000223 +01fe2300002300002300fd2300022300002300032300fd2300fd2300002300fe2300002300022300fe2300fd2300fe2302ff2300002300012300002300022300002300012300002300002300022300032300fd2300fe2300ff2300fe2301022300002300012300002300ff2300002300032300032300022300012300022300 +002300022300002300012302ff2300fe2300fe2300022300fe2300002300fd2300022300012300002300002300042300fc2300022300022302012300002300ff2300002300002300fe2300fe2300022300002300fe2300ff2300002300012300ff2300002300002301002300fe2300002300022300002300fe2300ff230001 +2300ff2300012300fd2300fe2300002300ff2300012302042300012300032300002300ff2300fe2300ff2300042300ff2300fd2300fe2300fe2300ff2300032300002301032300ff2300fe2300002300022300002300012300002300002300002300002300022300002300002300002302012300ff2300002300fe2300ff23 +00032300fe2300002300002300fd2300002300052300002300fe2300002301ff2300fc2300002300022300fb2300002300002300032300002300002300022300022300012300032300022300022302fe2300022300fe2300022300002300012300002300ff2300012300002300ff2300012300022300012300fd2302002300 +ff2300fc2300ff2300002300fe2300ff2300fe2300032300002300002300022300002300012300022301022300012300fd2300fe2300002300002300002300002300002300fd2300ff2300012300002300ff2300002300fe2300032300002300fd2300022302012300002300002300002300022300fe230000230003230000 +2300fd2300fd2300fe2300ff2300032300022301002300002300002300002300062300052300002300032300fd2300002300002300002300002300002300fe2300002302ff2300012300022300002300012300022300fe2300032300002300002300002300fd2300002300002300002301ff2300fe2300ff2300002300fc23 +00022300002300fe2300022300002300022300002300002300032300fe2302022300002300fe2300032300ff2300012300ff2300002300012300ff2300012300002300ff2300fe2300032302002300ff2300012300ff2300012300002300032300ff2300fe2300ff2300012300ff2300012300ff2300002300fe2301002300 +ff2300012300022300fe2300002300002300002300002300fd2300032300022300fd2300002300012302fd2300022300012300ff2300002300fe2300002300022300fc2300022300fe2300002300042300012300002301002300022300012300002300002300002300ff2300fe2300002300022300fd2300012300fd230002 +2300012302ff2300002300fe2300fe2300002300ff2300032300fd2300fe2300022300fd2300002300012300002300002302002300002300002300ff2300032300032300fe2300002300042300fe2300022300002300012300022300fe2300002301002300ff2300fe2300022300002300fe23000023000223000023000023 +00fe2300022300002300002300012302002300ff2300012300022300fe2300022300002300fe2300002300ff2300002300002300fc2300022300fe2301ff2300012300ff2300fe2300002300002300022300012300002300022300fe2300022300002300fe2300ff2302fe2300ff2300032300032300fd2300012300ff2300 +002300032300022300002300fe2300fe2300042300002301fe2300022300fc2300002300022300fe2300ff2300002300fe2300002300ff2300002300002300fe2300022300012302ff2300002300032300032300022300002300002300002300002300002300fc2300002300ff2300fe2300002302002300002300fd230003 +2300022300012300002300002300022300fe2300022300022300fc2300022300fe2300002300042300fe2300022300012301022300002300fe2300fd2300022300002300fc2300002300ff2300fd2300002300012300032300002300002302002300002300002300022300022300fe2300002300002300fe23000223000523 +00fe2300022300fe2300002300ff2301012300ff2300fe2300002300022300fc2300022300002300002300002300002300022300fe2300002300022302fe2300fe2300fd2300002300002300022300fe2300022300002300fd2300032300012300042300002300002301032300002300012300002300ff2300fe2300002300 +002300ff2300fe2300022300fc2300022300022300012302ff2300002300002300012300ff2300012300fd2300fe2300002300042300fe2300fe2300022300fd2300002302012300ff2300012300ff2300fe2300022300002300012300002300022300fe2300002300fd2300032300ff2300002301002300002300002300fe +2300ff2300012300002300002300032300002300ff2300012300ff2300012300002302022300fe2300002300ff2300fe2300022300012300002300ff2300fe2300002300022300fe2300002300002301ff2300fb2300012300022300022300fe2300002300002300022300012300032300022300fd2300012300022302fe23 +00022300002300fe2300ff2300012300022300002300fe2300002300002300022300fe2300002300002301022300022300032300fe2300ff2300012300022300012300002300002300ff2300fe2300ff2300012300ff2300fc2302002300002300002300fd2300ff2300012300022300012300002300022300052300fe2300 +002300032300002302ff2300032300002300fe2300fd2300002300002300022300002300002300fe2300002300022300012300ff2301012300ff2300fd2300002300002300002300fe2300022300fc2300ff2300fd2300fe2300fe2300042300fc2302002300ff230003230000230005230003230002230000230001230000 +2300022300fd2300002300002300002300fb2301002300fe2300ff2300002300032300012300ff2300002300032300022300fe2300032300032300ff2300fe2302022300012300ff2300fd2300012300ff2300fe2300022300012300ff2300002300012300032300fc2300012300ff2300fc2300fd2300fd23000023020023 +00fd2300032300002300022300032300002300052300012300002300022300fe2300032300fc2300002301012300fd2300002300fe2300002300022300fd2300012300022300fe2300022300fe2300002300ff2300002302002300fe2300022300002300012300022300fd2300002300002300032300002300fe2300002300 +ff2300012300fd2301032300022300022300002300032300002300002300002300012300002300fd2300032300fc2300002300002302002300002300002300012300022300fd2300032300fe2300ff2300012300002300ff2300fe2300002300002301002300002300032300fd2300fe2300022300022300fe2300fe230005 +2300fd2300002300022300012300002302ff2300002300fe2300fe2300002300042300002300fc2300022300022300032300032300002300032300002300fe2302002300002300002300fc2300fe2300fd2300fe2300fd2300ff2300012300ff2300002300032300fe2300022301fe2300002300ff23000023000023000023 +00fe2300002300002300032300002300002300002300022300012302ff2300002300002300fe2300032300022300fe2300ff2300fe2300022300002300fd2300002300fb2300fd2301002300fd2300012300002300022300fe2300032300032300052300032300022300002300002300022300012302002300ff2300002300 +fc2300042300fe2300fe2300ff2300fe2300fa2300fd2300fe2300002300022300002300002301002300012300002300032300ff2300012300ff2300fd2300002300fe2300022300002300002300002300fe2302002300032300002300002300ff2300012300ff2300002300002300032300fe230002230003230000230000 +2302002300032300002300ff2300fe2300fe2300fc2300012300022300012300002300022300002300052300002301002300fe2300ff2300fe2300002300002300fe2300ff2300032300022300fe2300002300002300fe2300002302002300002300ff2300002300032300032300ff2300fe2300032300002300002300fd23 +00002300002300002300032300002300ff2300002300002300002301012300002300022300fd2300002300fc2300022300fe2300052300022300002300002300002300012300002302002300002300002300ff2300002300002300012300ff2300012300022300fe2300fd2300ff2300fe2300002302fd2300fe2300032300 +002300022300fe230002230003230000230008230003230003230005a0008da00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa340\sl240\tx1140 \f21\fs20 Figure 9.3\tab A typical base composition plot. This is an A+T plot for bacteriophage Lambda and shows that one half is A+T rich and the other G+C rich.\par +\pard\plain \s6\sb240\sa100\sl280\tx560\tx860 \b\f20 2.6\tab Searching for anomalous compositions\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This "search" is performed by comparing a standard composition against each segment of the sequence and plotting the difference. The difference between the observed and expected composition at each point is expressed as the chi-square value. + Any one of the base, dinucleotide or trinucleotide compositions can be used as the standard. No expected level of divergence is used so the program always displays the results so that the plots fill the alloted space on the screen. At the end the observed + range is displayed.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot dinucleotide composition differences as chi squared". Alternatively select base or trinucleotides.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Start". Define the position of the first base to be used in the standard.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "End". Define last base of the standard. The default standard region is the whole sequence.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Odd window length". \par +5.\tab Define "Plot interval".\par +\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 9.4\par +\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw448\pich119 +06f6ffffffff007601bf1101a0008201000affffffff007601bf0900000000000000003100000000007501be98002400000000004e012000000000004e011f00000000007501be000102dd0006007fdfff00fc0a0040fc000002e50000040a0040fc000002e50000040a0040fc000003e50000040a0040fc000003e5000004 +0a0040fc000003e50000040b0040fc00010380e60000040b0040fc00010280e60000040b0040fc00010280e60000040b0040fc00010240e60000040b0040fc00010240e60000040d0040fc0003027ffff8e80000040d0040fc000302000008e80000040d0040fc000302000008e80000040d0040fc000302000008e8000004 +0d0040fc000302000008e80000040d0040fc000302000008e80000040d0040fc000304000008e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040e +0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc00030400 +0004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000002e9000102040f0040fc000304000002ea00028002040f0040fc000304000002ea00028002040f0040fc000304000002ea0002a006040f0040fc000304000002ea00 +02a00604130040fc000304000002f6000080f60002a00604130040fc000304000002f6000080f60002a00604130040fc000304000002f6000080f60002a00544140040fc000308000002f700010180f60002a00564150040fc000308000002f700010180f70003017005641a0040fc000308000002fd000003fc00010180f8 +000420015005441a0040fc000308000002fd000003fc00010140f8000420015005441a0040fc000308000002fd000005fc00010240f8000420015005441e0040fe000620000800000202fe000005fd0002400240f8000430015005841e0040fe000620000800000202fe000005fd0002400240f8000430015005841f0040fe +000620000800000202fe00010480fe0002400240f8000450015005841f0040fe0011200008000001060002020480000001c00240f8000450015005841f0040fe0011300008000001060002020480000001c00240f800045001100584230040fe0011300008000001050002020480000001c00440fd000018fd000450011005 +84230040fe0011300008000001050006060480020001a00440fd000014fd00044801080584230040fe0011300008000001050006060480020001200440fd000017fd0004c801080984241540018000500008000001090006060880060001200440fd0009110008800088020809842415400160005000080000010900050608 +80050002200420fd0009110008800108020809042523400120005000080000010900050608800500022004200180080021000880010802080904252340012000500008000001110005050880050002200420028008002080088001080208090425234002200050001000000111000905888005040221842002800810208018 +a001080208090425234002200050001000000111000905888009040221882002800810608018a00108020a1104250640022000500010fe001990800909884008870211882004401410608015600108020a1004250640021000500010fe001990800989884008950211882004401430808075600105020a1004250640021000 +500010fe001990800889484008f90212882004401428808045500105020a1004250640021000500010fe0019a080088948400809020a88200440142880804550010502061004250640021800900010fe0019a080108948400808840a48100440142880804550010704051004250640041800880050fe0019a08010b9504090 +08840a4810044012ac8040455101070405a004250640041802880050fe0019a080106050409000840a5010042022c50040835101068405a00425064004084b0800d0fe0019a08010605048900094065010082021430040821281028405a00425064004064d0dffa0fe0019a04010403048d000b40450104831e1030044800e +8900840560042506780406b4020020fe00196040204020555000b804700c48124003004480088900840560042506480805b4020020fe000d40402040203560006804100ca80cfe00087a80085600880040042203444805b0fb000d405fa0000033600048000003b80cfe00080a80005600d000400406007fdfff00fc02dd00 +a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 9.4\tab An anomalous composition plot. This shows an immunoglobulin switch region and the plateau corresponds to a segment composed entirely of A and G bases.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.7\tab Search for anomalous word usage\par +\pard\plain \s4\qj\sa120\sl280 \f20 This function is designed to examine the abundances of short words in a nucleotide sequence to see if particular ones are either under or over repre +sented (3). It compares the observed and expected frequencies and plots them for each segment of the sequence. There has been some work on the relative abundances of CG dinucleotides in eukaryotic sequences (e.g. reference 4) and this routine can be used t +o examine such biases or any others that might be of interest.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot observed-expected word usage".\par +2.\tab Define "String". That is the word to search for. The default is CG.\par +3.\tab Define "Odd window length".\par +4.\tab Define "Plot interval".\par +5.\tab Define "Maximum plot value". Define the maximum expected value for the plot.\par +6.\tab Define "Minimum plot value".\par +\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 9.5.\par +\pard\plain \ri-60\sb200\sl220\keepn \f4\fs16 {\plain \fs16 {\pict\macpict\picw453\pich122 +0800ffffffff007901c41101a00082a0008c01000affffffff007901c4070000000022000100010000a000a0a100a400020de801000a000000000000000007000100012200770001008a23000021000101c32300002300762300002100770001230000a000a301000affffffff007901c423008a21000101c3230076210077 +0001a000a120003b0001003b01c322003b00011a082300022302002300fe2302022300fe2301fe2300022300fe2302022300fe2301022300002302002300002300fe2302022300002301022300002302fc2300042300fe2301fe2300ff2302012300022301002300fd2300002302002300002302002300002301fe23000223 +00002302fe2300022301fd2300032300002302fd2300002302fe2300002300002301ff2300002302fe2300002301002300002300022302012300022301002300012302002300002300022302002300002301002300002302002300002301002300012300022302fe2300022302fe2300002301002300022300002302fe2300 +022301002300002300fe2302022300002301002300002300022302002300012302002300022301002300002300012302002300022301fe2300002302022300002300002302002300002301002300032302002300002300ff2301012300002302ff2300002301002300002300012302ff2300002302012300002301022300fe +2300002302022300fe2300002301fd2300002302022300002302012300002300ff2301fc2300002302002300ff2301012300002300002302022300fd2302002300fe2301ff2300032302fe2300002300022301002300012302002300002301ff2300002300002302002300012302ff2300012301ff2300fe23000223020123 +00002301002300022300002302fe2300022302fd2300fe2300002301022300002302002300002301002300fe2300022302002300012301002300022302fe2300002300002302002300002301002300022302002300002300002301002300002302002300002302002300032300fb2301002300022302002300022301012300 +002300022302002300002300002301012300022302012300ff2302002300002300fe2301002300002302002300002301002300022300002302002300fe2302022300012301ff2300002302fe2300ff2300002301fe2300022302fe2300002301ff2300002300002302002300fe230200230000230100230000230002230201 +2300ff2300012301022300002302fe2300002302002300002300002301022300fe2302fd2300002301002300002300022302002300fe2301022300012302022300002300fe2302022300002301002300fe2302ff2300002300012301ff2300042302002300002302002300ff2300012301fd2300002302022300002301fe23 +00002300002302002300002300002302002300002301002300002302002300ff2300002301fe2300022302fe2300022301fe2300fe2300002302002300022302fe2300022301fe2300ff2302fe2300022300002301002300002302002300012302022300002300fe2301022300fe2302fd2300022301002300fe2300002302 +022300002300002301012300ff2302fe2300032302002300ff2300002301012300ff2302fe2300002301002300002300002302ff2300012302002300ff2301002300012300002302fd2300022301002300fe2302032300002300002301022300fe2302022300012302ff2300002300012301ff230000230200230000230000 +2301002300012302ff2300002300012302002300ff2301002300002302012300002300022301002300002302002300002301022300002302012300002300002302ff2300012301022300fe23020023000023000023010023000223020023000123020223000123000023010223000023020023000023010023000023000223 +02012300002300022301002300002302002300012302002300002300ff2301002300002302002300002301012300ff2300002302002300002302012300002301002300002300002302002300002301ff2300002302002300012300002301ff2300fe2302032300ff2302002300002300002301002300002302002300fe2300 +002301022300012302002300002300ff2302002300002301002300002302002300002300012301002300002302002300002302022300fd2301fe2300022300012302ff2300012301002300ff2302002300012300fd2301022300fe2302002300002302022300002300002301fe2300002302fd2300fe2300002301022300fe +2302002300002300022302002300032301ff2300002302002300012300ff2301012300002302032300002301022300022300fb23020123000023020023000023010023000223000023020023000023010223000023020023000023000123020023000023010223000023020023000023000023010023000023000023020023 +00012301002300002302ff2300012300002302022300002301fe2300022302002300012300002301042300002302002300012302002300002301002300002300002302002300002301002300ff2302fe2300fe2300002301ff2300002302032300fe2302022300002300fe2301ff2300002302002300002300fe2301002300 +002302002300fd2300ff2302002300012301fb2300002302002300fd2300ff2301fe2300032302ff2300002301fe2300fd2300012302ff2300012302fd2300ff2301002300002300fe2302ff2300fe2301002300002302002300fe2300002302002300002301002300ff2302002300fe2300ff2301fe2300fd2300002302fe +2300ff2302fe230000a0008da00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 9.5\tab +A plot of anomalous word usage. This shows a plot of CG usage for the Human CMV immediate-early region. The frequency of CG is much lower than would be expected from the composition.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.8\tab Calculate codon constraint\par +\pard\plain \s4\qj\sa120\sl280 \f20 This method measures the level of constraint imposed on a sequence by coding for a protein. The codon constraint is the difference between the observe +d codon improbability and the mean improbability for a sequence of the same composition. That is it is a measure of the codon bias and the program performs the calculation over windows of length 99 codons. See reference 5. The user can select segments to a +nalyse either by defining them on the keyboard or by using an EMBL/GenBank feature table. The result for each selected segment, which is simply a single number, is displayed.\par +\pard\plain \s7\qj\fi-560\li560\sa80\sl280\tx560 \f20 1.\tab Select "Calculate codon constraint".\par +2.\tab Accept "Define segments using keyboard".\par +3.\tab Define "From". The start of the segment.\par +4.\tab Define "To". The end of the segment.\par +5.\tab Accept "+ strand".\par +\pard\plain \s4\qj\sa120\sl280 \f20 The result will be displayed, and the program will ask for the next segment to be defined. \par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.9\tab Searching for stem-loop structures\par +\pard\plain \s4\qj\sa120\sl280 \f20 This routine finds simple putative stem-loop structures having a minimum number of base pairs in their stems. Results can be listed or plotted.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search for hairpin loops".\par +2.\tab Define "Minimum loop size".\par +3.\tab Define "Maximum loop size".\par +4.\tab Define "Minimum number of base pairs"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Reject "Plot results". The alternative writes out the stem-loops as shown in figure 9.6. The plotted output marks the position of each stem, the height of the mark showing the length of the stem.\par +\pard\plain \li3480\ri3940\sb200\sl220\box\brsp100\brdrth \f4\fs16 g\par +\pard \li3480\ri3940\sl220\box\brsp100\brdrth g.t\par + t.g\par + c-g\par + a-t\par + t.g\par + t.g\par + g-c\par + t.g\par + g.t\par + g.t\par + t.g\par + t.g\par + g-c\par + t.g\par +tggcga gttttaa\par +\pard \li3480\ri3940\sl220\keepn\box\brsp100\brdrth 843\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 9.6\tab A typical textual display from the routine for finding simple hairpin loops.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.10\tab Searching for long range inverted repeats\par +\pard\plain \s4\qj\sa120\sl280 \f20 This method finds inverted repeats. It allows for no mismatches, insertions or deletions within the matching segments.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find long range inverted repeats".\par +2.\tab Accept "Plot results". The alternatve lists out all the matching segments.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Start". The beginning of the region to analyse. In general the whole sequence will be analysed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "End".\par +5.\tab Define "Minimum inverted repeat". The length of the minimum match.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The results will now be plotted in an unusual way as shown in figure 9.7 in which the positions of matching segments are joined by rectangular lines.\par +\pard\plain \li100\sb200\sl220\keepn\box\brsp20\brdrth \f4\fs16 {{\pict\macpict\picw445\pich118 +0448ffffffff007501bc1101a0008201000affffffff007501bc0900000000000000003100000000007401bb98001e00000000003d00f000000000003d00ec00000000007401bb000102e3000701001fe6ff00c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c0070100 +18e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c00a00 +7ff1ff00c0f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00e007ff5ff00e0fe000040f60000c00f017818fb0000 +01f4ff00f0fb0000c019017818fb000501c1800000e0fe000040fe00017030fb0000c019017818fb000501c1800000e0fe000040fe00017030fb0000c019017818fb000501c1800000e0fe000040fe00017030fb0000c01502781807f7ff00e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040 +fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1 +800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01102781804fc000001f5ff01f030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c180 +0000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01c1678 +1804000007ffffe1c1800000f0006000400008007030fb0000c01c167ffffc000007ffffe1c1800000f0006000400008007030fb0000c01c16781804000007ffffe1c1800000f0006000400008007030fb0000c01c16781804000007ffffe1c1800000f0006000400008007030fb0000c01c1678180407fe07ffffe1c18000 +00f0006000400008007030fb0000c01c1678180407fe07ffffe1c1800000f0006000400008007030fb0000c01c1678180407fe07ffffe1c1800000f0006000400008007030fb0000c002e300a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl20\tx1140 \f21\fs20 Figure 9.7\tab +A plot of direct or inverted repeats. Each matching segment is joined by a rectangular line. Here we show the direct repeats of at least 25 bases in a mouse immunoglobulin switch region.\par +\pard\plain \s6\sb120\sa40\sl280\tx560\tx860 \b\f20 2.11\tab Searching for long range repeats\par +\pard\plain \s4\qj\sa120\sl260 \f20 This method finds direct repeats. It allows for no mismatches, insertions or deletions within the matching segments.\par +\pard\plain \s7\qj\fi-560\li560\sa80\sl260\tx560 \f20 1.\tab Select "Find long range repeats".\par +2.\tab Accept "Plot results". The alternatve lists out all the matching segments.\par +\pard \s7\qj\fi-560\li560\sa80\sl260\tx560 3.\tab Define "Start". The beginning of the region to analyse. In general the whole sequence will be analysed.\par +\pard \s7\qj\fi-560\li560\sa80\sl260\tx560 4.\tab Define "End".\par +5.\tab Define "Minimum repeat". The length of the minimum match.\par +\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 The results will now be plotted in an unusual way as shown in figure 9.7 in which the positions of matching segments are joined by rectangular lines.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.12\tab Searching for repeated words\par +\pard\plain \s7\qj\sa120\sl260\tx540 \f20 \tab This function can be used to examine the frequencies of repeated words within a sequence. It finds all words that occ +ur more than once. A "word" is a particular sequence of bases so we are dealing only with exact repeats. The user selects a minimum word length and the program finds all words of that length that occur more than once. Then it "follows" each repeated word u +ntil it becomes unique. For each word length it can report the number of different repeated words, the number of occurrences of each word, and their actual sequences and positions.\par +\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 1.\tab Select "Examine repeats".\par +\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 2.\tab Define "Minimum word length". The maximum expected and observed word lengths are displayed.\par +3.\tab Define "Minimum word length for display of repeated word frequencies". The number of different repeated words of each length is listed.\par +\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 4.\tab Define "Minimum frequency for display of repeated words". \par +\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 5.\tab Define "Minimum word length for display of repeated words". All words occurring this number of times and of this given word length will be displayed.\par +\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 \par +\pard\plain \sl220\box\brsp100\brdrth \f4\fs16 {\f22\fs18 Expected length of longest repeat 12\par +}\pard \sl220\box\brsp100\brdrth {\f22\fs18 ? Minumim word length (1-6) (6) = \par +Working\par + Memory used in bytes 75164. Length of longest repeat 13\par + ? Show repeat frequencies for words of at least length (6-13) (13) = 10\par + For length 10 the number of different repeated words is 86\par + For length 11 the number of different repeated words is 21\par + For length 12 the number of different repeated words is 5\par + For length 13 the number of different repeated words is 2\par + ? Show repeats for words of length (6-13) (13) = 10\par + ? Show repeats for words occuring with frequency (2-9999) (2) = 3\par + aaggcatcat\par + occurs at 276\par + occurs at 969\par + occurs at 6938\par + gtctggcggc\par + occurs at 1891\par + occurs at 4714\par + occurs at 7250\par + ? Show repeats for words of length (6-13) (13) = 12\par + ? Show repeats for words occuring with frequency (2-9999) (2) = \par + gttactggtggt\par + occurs at 641\par + occurs at 851\par + aaaggcatcatg\par + occurs at 968\par + occurs at 6937\par + aaggcatcatgg\par + occurs at 969\par + occurs at 6938\par + ttactggtggtg\par + occurs at 642\par + occurs at 852\par + ctgctgggccgt\par + occurs at 3477\par + occurs at 6424\par +}\pard \sl220\box\brsp100\brdrth {\f22\fs18 ? Show repeats for words of length (6-13) (13) =!\par +}\pard \sl220 {\f22\fs18 \par +}{\f22\fs20 Figure 9.8 Typical output from "Examine repeats".\par +}\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 \par +2.13\tab Searching for possible Z DNA\par +\pard\plain \s4\qj\sa60\sl260 \f20 +The program contains three algorithms for searching for sequences with the potential for forming Z DNA. In varying ways they look for segments of alternating purines and pyrimidines and they all plot their results. A typical result is shown in figure 9.9. +\par +\pard\plain \sl220\keepn \f4\fs16 {{\pict\macpict\picw447\pich119 +0512ffffffff007601be1101a0008201000affffffff007601be0900000000000000003100000000007501bd98002400000000004e012000000000004e011f00000000007501bd000102dd0006007fdfff00fc060040df000004060040df000004060040df000004060040df000004060040df000004060040df0000040600 +40df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040 +df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df0000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001 +fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080 +f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041702400040f6000001fb000380000004fb000440002000041702400040f6000001fb000380000004fb000440002000041702400040f6000001fb000380000004fb000440002000041702400040f6000001fb00 +0380000004fb000440002000041802400040f600010180fc0003c0000004fb000440002000041802400040f600010180fc0003c0000004fb000440002000041802400040f600010180fc0003c0000004fb0004400020000421044000400020fc0005020002000181fc0006c0004004000440fe000440003084142104400040 +0020fc0005020002000181fc0006c0004004000440fe0004400030841421044000400020fc0005020002000181fc0006c0004004000440fe0004400030841421044000400020fc0005020002000181fc0006c0004004000440fe0004400030841422044000c00030fc0005020003000281fd0007014000c006000440fe0004 +600051843c22044000c00030fc0005020003000281fd0007014000c006000440fe0004600051843c22044000c00030fc0005020003000281fd0007014000c006000440fe0004600051843c23044000c01430fc001903004300028181020042014000c0060146600040006404d3563c23044000c01430fc0019030043000281 +81020042014000c0060146600060006404d3563c23044000c01c30fc001903004300028181020042014000c00601e6600060006406d3563c23044000c01c30fc001903004300028181020042014000c00601e6600050006406d3563c23044000c01e28fc0019030062800282818600a3014000c00601e6a00088006406d55e +3c23044000c01628fc0019030062800282818600a3014000c0060156a00088006405d55e3c23044000c01628fc0019030062800282818600a3014000c0060156a00084006405d55e3c20045ffffff7effaff03fefffefefeff02bfff7ffdff095fbfff87fffffddd7ffc060050df000004060060df000004060060df000004 +060060df000004060040df00000406007fdfff00fc02dd00a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 9.9\tab A plot of predictions for potential Z DNA containing some high peaks produced by regions of alternating purines and pyrimidines.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Whenever the program reads a sequence file it always displays the base composition to provide the user with a check on the correctness of the file.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab +The search for anomalous words function operates in the following way. Users select a "word" - say CG and a window length. The program examines each successive window length along the sequence, with each window overlapping the previous one by windowleng +th-1 bases. For each window position the program calculates the base composition and the number of +occurrences of the chosen word. From the base composition it calculates an expected number of occurrences of the chosen word by simply multiplying the relevent frequencies and assuming random ordering. It plots observed - expected hence showing regions tha +t are enriched or depleted in the chosen word.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +The codon constraint calculation offers a measure of the codon bias that is independent of any set tables of expected codons. Although some users may find the underlying mathematics difficult to understand +the values obtained provide an interesting measure. It was shown (5) for a set of {\i E. coli} genes that their values of codon constraint correlated with their levels of expression. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab The algorithm for finding possible stem loops counts A-T, G-C and G-T pairs as matching but will only find stems with no mismatches or loopouts.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab The long range inverted and direct repeat routines are fast but only find exact matches. More flexible and exhaustive methods are described in the chapter on sequence comparisons.\par +6.\tab It is also possible to use the pattern searching routines to define and search for inverted and direct repeats. They are particularly useful for finding specific structures - for example tRNA folds.\par +\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 7.\tab +It is possible that the "Examine repeats" algorithm may run out of memory, particularly if a short minimum word length is chosen or the sequence is very long or very repetitive. If this occurs the maximum word length reported may not be the longest in t +he sequence\: the memory will have been consumed before it was found.\par +\pard\plain \s5\sb320\sa60\sl320\tx560 \b\f20\fs28 \page 4.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab McCaldon,P. and Argos,P. 1988 Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences. {\i Proteins} {\b 4}, 99-122.\par +2.\tab Sweet,R.M. and Eisenberg,D. 1983. Correlation of sequence hydrophobicity measures similarity in three-dimensional protein structure. {\i J. Mol. Biol}. {\b 171}\:479-488.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Honess,R.W., Gompels,U.A., Barrell,B.G., Craxton,M., Cameron,K.R., Staden,R., Chang,Y.-N and Hayward,G.S. 1989 Deviations from expected frequencies +of CpG dinucleotides in herpesvirus DNAs may be diagnostic of differences in the states of their latent genomes. {\i J. Gen. Virol}, {\b 70}, 837-855.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Bird,A.P. 1980 DNA methylation and the frequency of CpG in animal DNA. {\i Nucl. Acids Res}. {\b 8}, 1499-1504.\par +5.\tab McLachlan, A.D., Staden, R., and Boswell, D.R. 1984. A method for measuring the non-random bias of a codon usage table. {\i Nucl. Acids Res}. {\b 12}\:9567-9575.\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 10. Translating and Listing Nucleic Acid Sequences\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Listing the sequence with all six reading frames translated\par +2.2\tab Listing the sequence with its open reading frames translated\par +2.3\tab Listing the sequence with defined segments translated\par +2.4\tab Listing the sequence with translated segments defined from a feature table\par +2.5\tab Producing a file of protein sequences for all open reading frames.\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.6\tab Producing a file of protein sequences for segments defined from a feature table\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we deal with producing simple listings from nucleotide seque +nces. All functions are contained in the program NIP. We can list the sequence alone, in single or doubled stranded format or with translations to protein. The translations can be of all six phases, all open reading frames, or of specified segments. The p +ositions of these segments can be defined on the keyboard or read from a EMBL/GenBank feature table. Translations can use the one letter or three letter codes. In addition we can produce files containing only the protein translations, and which are suitabl +e for processing by other programs. Again the positions of the translated segments can be defined on the keyboard, read from a feature table, or be all open reading frames. For the user, producing all these results is very simple, so we only give examples +of "methods" and show what the results look like. All outputs that list the sequence can be produced from the menu option named "Translate and list".\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Listing the sequence with all six reading frames translated\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par +2.\tab Accept "Show translation".\par +3.\tab Select "The segments to translate will be "All six frames"".\par +4.\tab Accept "Use 1 letter codes".\par +5.\tab Define "Start". Where to list from.\par +6.\tab Define "End". Where to list to.\par +7.\tab Define "Line length". The number of characters in each line of output.\par +8.\tab Reject "Number ends of lines". This alternative writes the positions underneath each line.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The listing will then appear. Given the choices taken it will look the same as figure 10.1.\par +\pard\plain \li1240\ri1280\sb200\sl220\box\brsp100\brdrth \f4\fs16 Q D Y I G H H L N N L Q L D L R T F S L\par +\pard \li1240\ri1280\sl220\box\brsp100\brdrth R I T * D T T * I T F S W T C V H S R W\par + G L H R T P P E * P S A G P A Y I L A\par +caggattacataggacaccacctgaataaccttcagctggacctgcgtacattctcgctg\par + 1010 1020 1030 1040 1050 1060\par +gtcctaatgtatcctgtggtggacttattggaagtcgacctggacgcatgtaagagcgac\par + L I V Y S V V Q I V K L Q V Q T C E R Q\par + P N C L V G G S Y G E A P G A Y M R A P\par + S * M P C W R F L R * S S R R V N E S\par +\par + V D P Q N P P A T F W T I N I D S M F F\par + W I H K T P Q P P S G Q S I L T P C S S\par +G G S T K P P S H L L D N Q Y * L H V L\par +gtggatccacaaaaccccccagccaccttctggacaatcaatattgactccatgttcttc\par + 1070 1080 1090 1100 1110 1120\par +cacctaggtgttttggggggtcggtggaagacctgttagttataactgaggtacaagaag\par + H I W L V G W G G E P C D I N V G H E E\par + P D V F G G L W R R S L * Y Q S W T R R\par +T S G C F G G A V K Q V I L I S E M N K\par +\par + S V V L G L L F L V L F R S V A K K A T\par + R W C W V C C S W F Y S V A * P K R R P\par +L G G A G S V V P G F I P * R S Q K G D\par +tcggtggtgctgggtctgttgttcctggttttattccgtagcgtagccaaaaaggcgacc\par + 1130 1140 1150 1160 1170 1180\par +agccaccacgacccagacaacaaggaccaaaataaggcatcgcatcggtttttccgctgg\par + R H H Q T Q Q E Q N * E T A Y G F L R G\par + P P A P D T T G P K I G Y R L W F P S W\par +E T T S P R N N R T K N R L T A L F A V\par +\par + S G V P G K F Q T A I E L V I G F V N G\par + A V C Q V S F R P R L S W * S A L L M V\par +Q R C A R * V S D R D * A G D R L C * W\par +agcggtgtgccaggtaagtttcagaccgcgattgagctggtgatcggctttgttaatggt\par + 1190 1200 1210 1220 1230 1240\par +tcgccacacggtccattcaaagtctggcgctaactcgaccactagccgaaacaattacca\par + A T H W T L K L G R N L Q H D A K N I T\par + R H A L Y T E S R S Q A P S R S Q * H Y\par +\pard \li1240\ri1280\sl220\keepn\box\brsp100\brdrth L P T G P L N * V A I S S T I P K T L P\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 10.1\tab A six phase translation using the 1 letter codes\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Listing the sequence with its open reading frames translated\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par +2.\tab Accept "Show translation".\par +3.\tab Select "The segments to translate will be "Open reading frames"".\par +4.\tab Define "Minimum open frame in amino acids".\par +5.\tab Accept "Use 1 letter codes".\par +6.\tab Define "Start". Where to list from.\par +7.\tab Define "End". Where to list to.\par +8.\tab Define "Line length". The number of characters in each line of output.\par +9.\tab Select "Both strands"\par +10.\tab Accept "Number ends of lines".\par +\pard\plain \s4\qj\sa120\sl280 \f20 A typical result is shown in figure 10.2.\par +\pard\plain \li720\ri680\sb200\sl220\box\brsp100\brdrth \tx7780 \f4\fs16 Q D Y I G H H L N N L Q L D L R T F S L\par +\pard \li720\ri680\sl220\box\brsp100\brdrth \tx7780 caggattacataggacaccacctgaataaccttcagctggacctgcgtacattctcgctg\tab 1060\par + . \: . \: . \: . \: . \: . \:\par +gtcctaatgtatcctgtggtggacttattggaagtcgacctggacgcatgtaagagcgac\par + L I V Y S V V Q I V K L Q V Q T C E R Q\par + * S S R R V N E S\par +\par + V D P Q N P P A T F W T I N I D S M F F\par +gtggatccacaaaaccccccagccaccttctggacaatcaatattgactccatgttcttc\tab 1120\par + . \: . \: . \: . \: . \: . \:\par +cacctaggtgttttggggggtcggtggaagacctgttagttataactgaggtacaagaag\par + H I W L V G W G G E P C D I N V G H E E\par +T S G C F G G A V K Q V I L I S E M N K\par +\par + S V V L G L L F L V L F R S V A K K A T\par +tcggtggtgctgggtctgttgttcctggttttattccgtagcgtagccaaaaaggcgacc\tab 1180\par + . \: . \: . \: . \: . \: . \:\par +agccaccacgacccagacaacaaggaccaaaataaggcatcgcatcggtttttccgctgg\par + R H H Q T Q Q E Q N * E T A Y G F L R G\par +E T T S P R N N R T K N R L T A L F A V\par +\par + S G V P G K F Q T A I E L V I G F V N G\par +agcggtgtgccaggtaagtttcagaccgcgattgagctggtgatcggctttgttaatggt\tab 1240\par + . \: . \: . \: . \: . \: . \:\par +tcgccacacggtccattcaaagtctggcgctaactcgaccactagccgaaacaattacca\par + A T H W T L K L G R N L Q H D A K N I T\par +L P T G P L N * V A I S S T I P K T L P\par +\par + S V K D M Y H G K S K L I A P L A L T I\par +agcgtgaaagacatgtaccatggcaaaagcaagctgattgctccgctggccctgacgatc\tab 1300\par + . \: . \: . \: . \: . \: . \:\par +tcgcactttctgtacatggtaccgttttcgttcgactaacgaggcgaccgggactgctag\par + A H F V H V M A F A L Q N S R Q G Q R D\par +\pard \li720\ri680\sl220\keepn\box\brsp100\brdrth \tx7780 L T F S M Y W P L L L S I A G S A R V I\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa180\sl240\tx1140 \f21\fs20 Figure 10.2\tab A listing showing the translation of open reading frames from both strands of a sequence from position 1001 to 1300\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Listing the sequence with defined segments translated\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par +2.\tab Accept "Show translation".\par +3.\tab Select "The segments to translate will be "Typed on the keyboard"".\par +4.\tab Accept "Use 1 letter codes".\par +5.\tab Define "Start". Where to list from.\par +6.\tab Define "End". Where to list to.\par +7.\tab Define "Line length". The number of characters in each line of output.\par +8.\tab Select "Both strands".\par +9.\tab Accept "Number ends of lines".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define "Translate from". Define the start of the next segment to translate - say the next exon.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab Define "Translate to". Define the end of the next segment to translate.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 12.\tab Select "Strand". As both strands have been selected above the program will allow either to be translated for each defined segment.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The program will now cycle around through steps 10, 11 and 12 until a zero value is defined for "Translate from". At which point the listing will appear. Given the choices made it will look the same as figure 10.2. +\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Listing the sequence with translated segments defined from a feature table\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par +2.\tab Accept "Show translation".\par +3.\tab Select "The segments to translate will be "Read from a feature table"".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Feature table file name". Type the name of the file containing the appropriate feature table in EMBL/GenBank format.\par +5.\tab Define "Operator". This defines which feature table operators should be employed when selecting the segments to translate.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Use 1 letter codes"\par +7.\tab Define "Start". Where to list from.\par +8.\tab Define "End". Where to list to.\par +9.\tab Define "Line length". The number of characters in each line of output.\par +10.\tab Select "Both strands"\par +11.\tab Accept "Number ends of lines".\par +\pard\plain \s4\qj\sa120\sl280 \f20 The program will now read the feature table file and translate the segments defined using the selected operator(s) and the listing will appear as in figure 10.2.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Producing a file of protein sequences for all open reading frames.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and write protein sequences to disk".\par +2.\tab Reject "Translate selected regions". The alternative is "Open reading frames".\par +3.\tab Define "Minimum open frame in amino acids".\par +4.\tab Select "Both strands".\par +5.\tab Define "File name for translation".\par +\pard\plain \s4\qj\sa120\sl280 \f20 +A typical results file is shown in figure 10.3. It shows that the file is written in FASTA format (i.e. an entry name line starting with a > symbol (here the first entry name is 188, the start of the DNA segment), followed by a title (here in EMBL feature +table format giving the start and end of the DNA that produced the protein), followed by the sequence terminated by an *.\par +\pard \s4\qj\sa120\sl280 \par +\pard\plain \sl220 \f4\fs16 {\f22\fs18 \par +}\pard \li1260\ri1360\sl220\box\brsp100\brdrth {\f22\fs18 >188 188..733\par +}\pard \li1260\ri1360\sl220\box\brsp100\brdrth {\f22\fs18 TMEVNKKQLADIFGASIRTIQNWQEQGMPVLRGGGKGNEVLYDSAAVIKWYAERDAEIEN\par + EKLRREVEELRQASEADLQPGTIEYERHRLTRAQADAQELKNARDSAEVVETAFCTFVLS\par + RIAGEIASILDGLPLSVQRRFPELENRHVDFLKRDIIKAMNKAAALDELIPGLLSEYIEQ\par + SG*\par +>711 711..2633\par + VNISNSQVNRLRHFVRAGLRSLFRPEPQTAVEWADANYYLPKESAYQEGRWETLPFQRAI\par + MNAMGSDYIREVNVVKSARVGYSKMLLGVYAYFIEHKQRNTLIWLPTDGDAENFMKTHVE\par + PTIRDIPSLLALAPWYGKKHRDNTLTMKRFTNGRGFWCLGGKAAKNYREKSVDVAGYDEL\par + AAFDDDIEQEGSPTFLGDKRIEGSVWPKSIRGSTPKVRGTCQIERAASESPHFMRFHVAC\par + PHCGEEQYLKFGDKETPFGLKWTPDDPSSVFYLCEHNACVIRQQELDFTDARYICEKTGI\par + WTRDGILWFSSSGEEIEPPDSVTFHIWTAYSPFTTWVQIVKDWMKTKGDTGKRKTFVNTT\par + LGETWEAKIGERPDAEVMAERKEHYSAPVPDRVAYLTAGIDSQLDRYEMRVWGWGPGEES\par + WLIDRQIIMGRHDDEQTLLRVDEAINKTYTRRNGAEMSISRICWDTGGIDPTIVYERSKK\par + HGLFRVIPIKGASVYGKPVASMPRKRNKNGVYLTEIGTDTAKEQIYNRFTLTPEGDEPLP\par + GAVHFPNNPDIFDLTEAQQLTAEEQVEKWVDGRKKILWDSKKRRNEALDCFVYALAALRI\par + SISRWQLDLSALLASLQEEDGAATNKKTLADYARALSGEDE*\par +>74 complement(74..727)\par + LFDIFTQQPRYQFIQRGCFVHGFDDIPFQEINMSVFQFRKTPLHRQGEPVENTGNFTCDP\par + RQHESTECGFHHFSGVSGILQFLCVGLRTRKSMAFVLNSSWLEICLAGLPQFFNLPAQLF\par + VLNFSIPFGIPFYDGGRVIKHLITLATASQNGHSLFLPVLNGTDTRTENVSQLLFVDFHC\par + SFHGQKQRKETTEAKKPRFQHLSFPFFSEGILNKNIKL*\par +>313 complement(313..732) \par + PDCSIYSLSNPGISSSSAAALFMALMISRFRKSTCRFSSSGKRRCTDRGSPSRILAISPA\par + IRDSTKVQNAVSTTSAESLAFFSSCASACARVSRWRSYSIVPGWRSASLACRSSSTSRRS\par +}\pard \li1260\ri1360\sl220\box\brsp100\brdrth {\f22\fs18 FSFSISASLSAYHFMTAAES*\par +}\pard \li1260\ri1360\sl220 {\f22\fs18 \par +}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 10.3\tab The contents of a file containing the protein sequences of the open reading frames found by the program\par +\pard\plain \s6\fi-560\li560\sb240\sa60\sl280\tx560 \b\f20 2.6\tab Producing a file of protein sequences for segments defined from a feature table\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and write protein sequences to disk".\par +2.\tab Accept "Translate selected regions".\par +3.\tab Reject "Define segments using keyboard". The alternative is to use a feature table.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Feature table file name". Type the name of the file containing the appropriate feature table in EMBL/GenBank format.\par +5.\tab Define "Operator". This defines which feature table operators should be employed when selecting the segments to translate.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "File name for translation"\par +\pard\plain \s4\qj\sa120\sl280 \f20 The program will now read the feature table file and translate the segments defined using the selected operator(s). The results will be stored as in figure 10.3.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab To produce a listing without translation the "Translate and list" function can be used with the "Show translation" option rejected. Alternatively the function "List the sequence" can be used. +\par +2.\tab Some users may be confused by the fact that the program asks "Where to list from, and to" and also "Define segments to translate". This allows for 5' and 3' untranslated regions to be included in the listing.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +The feature table file employed by the programs is a simple text file containing the data for the current sequence. Because of the multiplicity of different sequence library formats we have not provided the facility of reading such data directly from li +braries. The feature tables for individual library entries must be extracted (see the introductory chapter) or files can be created for new sequences.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab +The current feature tables use "operators" such as "join" or "order" to specify which segments should be translated together to make a complete protein sequence. The program allows users to select which ones to employ, the default being "Use all operato +rs".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab The program contains a function "Set genetic code" which allows users to choose from a menu of codes or to define their own by specifying amino acid and codon pairs. This sets the code for all functions. +\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 11. Statistical and Structural Analysis of Protein Sequences\par +\pard\plain \s3\sb200\sa120\sl360 \b\f20\fs32 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Plotting hydrophobicity\par +2.2 \tab Plotting charge\par +2.3\tab Plotting hydrophobic moment and hydrophobicity\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700\tx1980 2.4\tab Drawing helical wheels\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.5\tab Producing a Robson secondary structure prediction\par +2.6\tab Calculating the amino acid composition and molecular weight\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +4.\tab References\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we describe the use of routines for plotting hydrophobicity, charge and hydrophobic moments, drawing helix wheels and predicting second +ary structure. Use of all these routines is very straightforward and they are contained in the program PIP.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Plotting hydrophobicity\par +\pard\plain \s4\qj\sa120\sl280 \f20 This method uses the values of Kyte and Doolittle (1)\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot hydrophobicity".\par +2.\tab Define "Window length".\par +3.\tab Define "Plot interval".\par +\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 11.1.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Plotting charge\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot charge".\par +2.\tab Define "Window length".\par +3. \tab Define "Plot interval".\par +\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear and will be similar to that shown in figure 11.1.\par +\pard\plain \sl220\keepn \f4\fs16 {{\pict\macpict\picw448\pich81 +0396ffffffff005001bf1101a0008201000affffffff005001bf0900000000000000003100000000004f01be9800240000000000350120000000000035011f00000000004f01be000102dd0006007fdfff00fc060040df000004060040df000004060040df000004060040df000004060078df000004060040df0000040600 +40df000004060040df000004060040df00000407017840e0000004070140b0e000000407014108e000000407014104e000000407014204e00000040b017a02fc000020e60000040c014202fd00010250e60000040f014402fd00010590e900031000000418014401fd00010490f800010380f8000020fe0003700000041c01 +4801fd00010808f800010480fd000010fd000060fe0003880000041d017801fd00010808f800010440fe00010428fd000090fe00038804000424074801000002000804fe00010110fd00010440fe00010a28fe000801108004010806000424075000800005001004fe000101a8fd00010840fe000d09440000020111800b01 +04090004240c500080000480100200800002a4fd00010840fe000d10c40020030a0a4009010709002425236000800004e0100201400002440020000210400004401002005004960c400882010881e42523780080000810100202300002430050000d1020001ba0200100900490004810820090412425234000800008102001 +02080002008088001120200020104000811004600037f0c20090222406007fdfff00fc2523400040002008e00104032c040022020020c0180080038000420808000020000c00600a14241440004000200500010400c204001201002080080080fe0002420410fd000404004004142113780030004005000084000104001401 +0040000401fd0002220410fb00024004141f13400008004005000098000088000c010040000401fd0002240410f900000c1f134000080040020000e00000900000010040000407fd00021c0220f90000041c05400009008002fc0008600000010e80000208fc000103e0f900000416044000068080fb000040fe0004918000 +0210f2000004110378000441f6000490800003f0f20000040d0340000022f60000a0ee0000040d0340000022f6000060ee000004090340000014e2000004090340000008e2000004060078df000004060040df000004060040df000004060040df000004060040df000004060078df000004060040df000004060040df0000 +04060040df000004060040df00000406007fdfff00fc02dd00a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.1\tab A hydrophobicity plot using the values of Kyte and Doolittle.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Plotting hydrophobic moment and hydrophobicity\par +\pard\plain \s4\qj\sa120\sl280 \f20 This method plots the hydrophobic moment and the hydrophobicity as defined by Eisenberg {\i et al} (2).\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot hydrophobic moment".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Angle". This is the angle between the residues when the helix is viewed end on. The default value of 100 degrees is that found in alpha helices.\par +3.\tab Define "Window length". The default of 18, if used in conjunction with the default "Angle", is equivalent to 5 turns of the helix.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Plot interval".\par +\pard\plain \s4\qj\sa120\sl280\tx560 \f20 +The plot will appear as in figure 11.2. with the hydrophobicity shown above the hydrophobic moment. The scale for the hydrophobicity runs from -1.0 to 1.5 and for the hydrophobic moment from 0.0 to 1.5. The program plots the mean values for each window pos +ition with the value at position x representing the segment from x-window length+1 to x.\par +\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw447\pich160 +0659ffffffff009f01be1101a0008201000affffffff009f01be0900000000000000003100000000009e01bd9800240000000000670120000000000067011f00000000009e01bd000102dd0006007fdfff00fc060040df000004060070df000004060040df000004060070df000004060040df000004060070df0000040600 +40df000004060070df000004060040df00000406007edf000004060040df000004060070df000004060040df000004060070df000004060040df000004060078df000004060074df0000040a0072fc000008e50000040a0061fc000038e50000040e007ffc000044f2000020f500000413016080fd000084f2000050f90000 +08fe00000414017080fe00010104f2000048f9000016fe00000419016040fe00010202f2000088fe000001fe000502110008000419017030fe00010401f2000084fe000902818000052088140004200c40080001000401e00800000380f900012102fe00090242600004e096120004240c70080002800800101400000c40fd +000008fe000e320380000004741000040061210004240c40080004800800082200003040fd000034fe000e4a004000001c0c0c00080001210004250c70040004500800042200004020fe00131c440000e04c0020000020000b0e080000c08004250c400400086f10000241f8038010fe001323820001104000100000400000 +91100000808004241e7e02000800f00001800404001100001c4002000e0880000c400080000060e0fd0000041f0340020008fd0012800208000ef000244002001004800003b00080f90000041e0370010010fc000b023800000fc0228001001003fe0002080080f90000041c0340010010fc000101c0fe00053042800100a0 +fd00010801f8000004170370008020f70005084280008160fd00010401f8000004160340008040f700040481000041fc000107c6f8000004150370004440f700040301000021fb000028f8000004110340004a80f300001afb000010f80000041002700031f2000006fb000010f8000004060040df00000406007edf000004 +060040df000004060070df000004060040df000004060070df000004060040df000004060070df000004060040df000004060070df000004060040df00000406007fdfff00fc060040df000004060040df000004060070df000004060040df000004060040df000004060040df000004060070df000004060040df00000406 +0040df000004060070df000004060040df000004060040df000004060070df000004060040df000004060040df000004060040df00000406007edf000004060040df000004060040df000004060070df000004060040df000004060040df000004060070df000004060040df000004060040df000004060040df0000040600 +70df0000040a0040f6000010eb0000040a0040f600002ceb0000040a0070f6000024eb0000040a0040f6000042eb0000040c0040f80002800082eb00000411007efe000040fd000303400082eb000004110040fe0000a0fd000304400101eb000004130040fe0000a0fd0005082002010001ed0000041e044000000110fd00 +070820040100030010f9000040fe000103e0fd0000042204700000011cfd000730100800c0048128fd000480000004c0fe00010410fd00000423044000000204fd000720081000200482c4fe00050380000006a0fe00012808fd00000424044000000402fe0008204008100020088404fe0005024000000920fe00015808fd +00000424047000001c02fe0008504004100020084404fe0005044000004920fe00018004fd00000425104000002002700000484004200010084802fe000f84400000a820040001000400e00000042523400000200190000048800420001010480200000144420000b0100a018a000403100080042523700001200109000084 +80044000087030020000024826000110105206740002241003000425234001c2c0000a8008848002800008801002000804482508071010b208000003d812040004251b40022440000a601703000280000500000104340428291408000d0108fe0004080d080004241b60041800000411a00200030000070000012a44043019 +240800030108fd0003088800041e077f88100000040a60f9000701d982880000c408fe000090fc00025000041b014050fd00000cf7000601037800008210fe000070fc00025000040e014030ed000103f0f8000220000406007fdfff00fc02dd00a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.2\tab A hydrophobic moment (below) and hydrophobicity plot. The hydrophobicity plot displays the mean va +lues on a scale of -1.5 to 1.0 and the hydrophobic moment on a scale of 0.0 to 1.5.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Drawing helical wheels\par +\pard\plain \s4\qj\sa120\sl280 \f20 This method draws helical wheels for any segment of the sequence (3). In addition it displays the hydrophobic moment for the segment (2).\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Draw helix wheel".\par +2.\tab Define "Angle". The default angle of 100 degrees is that found in alpha helices.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Window length". The default of 18, if used in conjunction with the default "Angle", is equivalent to 5 turns of the helix.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Step +". To produce a display for a sequence position N bases from the current one type N, and the display will appear in place of the previous one. The default value of N is 1, so by repeatedly hitting carriage return the user can step, residue by residue, thro +ugh the sequence.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The display for the current position in the sequence will appear as in figure 11.3. and the bell will ring. The program now allows the user to "step through the sequence displaying the helix wheel for each position. +\par +\pard\plain \li900\ri960\sb500\sl220\keepn\box\brsp120\brdrth \f4\fs16 {{\pict\macpict\picw355\pich329 +0c64ffffffff014801621101a00082a0008c01000affffffff0148016209000000000000000031010f01050121011338a10096000c010000000200000000000000a1009a0008fffd00000004000001000a01100106011e01112c000c00150948656c76657469636103001504010d000c2e00040000010028011a01070144a0 +0097a0008da0008c01000affffffff0148016231012600ba013800c838a10096000c010000000200000000000000a1009a0008fffc00000004000001000a012700bb013500c628013100bc014ca00097a0008da0008c01000affffffff0148016231011d0087012e009538a10096000c010000000200000000000000a1009a +0008fffc00000004000001000a011d0088012b009328012700890146a00097a0008da0008c01000affffffff014801623100df004600f1005438a10096000c010000000200000000000000a1009a0008fffd00000004000001000a00e0004700ee00532800ea00480156a00097a0008da0008c01000affffffff0148016231 +0097003900a8004738a10096000c010000000200000000000000a1009a0008fffc00000004000001000a0097003a00a500452800a1003b0159a00097a0008da0008c01000affffffff0148016231006b004d007c005b38a10096000c010000000200000000000000a1009a0008fffc00000004000001000a006b004e007900 +59280075004f014ca00097a0008da0008c01000affffffff01480162310032008a0044009838a10096000c010000000200000000000000a1009a0008fffc00000004000001000a0033008b0041009628003d008c014ba00097a0008da0008c01000affffffff0148016231002b00ba003d00c838a10096000c010000000200 +000000000000a1009a0008fffd00000004000001000a002c00bb003a00c628003600bc0144a00097a0008da0008c01000affffffff0148016231003300f1004500ff38a10096000c010000000200000000000000a1009a0008fffd00000004000001000a003400f2004200fd2b37080148a00097a0008da0008c01000affff +ffff0148016231005101190063012738a10096000c010000000200000000000000a1009a0008fffd00000004000001000a0052011a006001252b281e0145a00097a0008da0008c01000affffffff014801623100b9014400cb015238a10096000c010000000200000000000000a1009a0008fffc00000004000001000a00b9 +014500c701512b2b67014ba00097a0008da0008c01000affffffff01480162310098014400aa015238a10096000c010000000200000000000000a1009a0008fffc00000004000001000a0099014500a701512800a30146014ba00097a0008da0008c01000affffffff0148016231003e00ba004f00c838a10096000c010000 +000200000000000000a1009a0008fffc00000004000001000a003f00bb004d00c728004900bc0131a00097a0008da0008c01000affffffff014801623100b9013100ca013f38a10096000c010000000200000000000000a1009a0008fffd00000005000001000a00ba013200c8013e2b777b0132a00097a0008da0008c0100 +0affffffff014801623101080090011a009e38a10096000c010000000200000000000000a1009a0008fffc00000004000001000a010900910117009c28011300920133a00097a0008da0008c01000affffffff01480162310075005b0087006938a10096000c010000000200000000000000a1009a0008fffd000000050000 +01000a0076005c00840068280080005d0134a00097a0008da0008c01000affffffff0148016231005c0109006e011738a10096000c010000000200000000000000a1009a0008fffc00000005000001000a005d010a006b0116280067010b0135a00097a0008da0008c01000affffffff014801623100f900fe010b010c38a1 +0096000c010000000200000000000000a1009a0008fffd00000004000001000a00fa00ff0108010b28010401000136a00097a0008da0008c01000affffffff014801623100d5005700e7006538a10096000c010000000200000000000000a1009a0008fffd00000004000001000a00d6005800e400632800e000590137a000 +97a0008da0008c01000affffffff014801623100480093005a00a138a10096000c010000000200000000000000a1009a0008fffc00000005000001000a00490094005700a028005300950138a00097a0008da0008c01000affffffff01480162310098013200a9014038a10096000c010000000200000000000000a1009a00 +08fffc00000004000001000a0099013300a7013e2b9f500139a00097a0008da0008c01000affffffff0148016231010f00b7011c00d038a10096000c010000000200000000000000a1009a0008fffd00000009000001000a011000b8011e00cd28011a00b9023130a00097a0008da0008c01000affffffff01480162310097 +004a00a6006338a10096000c010000000200000000000000a1009a0008fffd00000009000001000a0098004b00a600602800a2004c023131a00097a0008da0008c01000affffffff0148016231004600e3005700f838a10096000c010000000200000000000000a1009a0008fffc00000008000001000a004700e4005500f6 +28005100e5023132a00097a0008da0008c01000affffffff014801623100e2011700f3012c38a10096000c010000000200000000000000a1009a0008fffc00000007000001000a00e3011800f101292b349c023133a00097a0008da10096000c010000000200000000000000a1009a0008fffd0000003a000001000a000000 +00000e007728000a00010d444b464c4544564b4b4c594853a00097a10096000c010000000200000000000000a1009a0008000400000007000001000a00180002003400132b0218044d20200d2a0e0148a00097a10096000c030000000200000000000000a1009a0008000b00000004000001000a0018000d00420031280022 +001a05372e38310d2800300016062d322e39370d2b070e03313532a00097a0008c01000affffffff0148016231003300890045009738a10096000c010000000200000000000000a1009a0008fffd00000005000001000a0034008a00420096296e014ba00097a0008da0008c01000affffffff014801623100f30123010401 +3138a10096000c010000000200000000000000a1009a0008fffd00000005000001000a00f40124010201302b9ac00153a00097a0008d01000affffffff0148016207000000002200bc01210000a000a0a100a4000209fd01000a0000000000000000070001000109ffffffffffffffff22005900bf62632300002100fc009f +23000023cc8723000021006d00fe2300002100ee00fe2300002100d8006b23000023338723000021009f0120230000239f6323000023a29c23000021005f00e2230000233278230000a000a301000affffffff0148016222005900bf62632100fc009f23cc8721006d00fe2100ee00fe2100d8006b23338721009f0120239f +6323a29c21005f00e2233278a000a1a10096000c030000000200000000000000a1009a0008fffc00000003000001000a002000f9003101020d000e28002c00fa012ba00097a10096000c030000000200000000000000a1009a0008fffc00000003000001000a002100820032008b28002d0083012ba00097a10096000c0300 +00000200000000000000a1009a0008fffc00000003000001000a0096015800a701612bd675012ba00097a10096000c030000000200000000000000a1009a0008fffc00000003000001000a00b7015700c801602800c30158012ba00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a00 +4401250055012f280050012a012da00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a001900b7002a00c128002500bc012da00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a011d0107012e0111280129010c012da00097a10096000c030000 +000200000000000000a1009a0008fffc0000ffff000001000a013600b6014700c028014200bc012ea00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a012a007c013b00862801360082012ea00097a10096000c030000000200000000000000a1009a0008fffc0000fffe000001000a +00e4003100f5003b2800f00037012ea00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a0092002400a3002e28009e002a012ea00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a005a003e006b00472800660043012ea00097a00083ff}} +\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 11.3\tab A typica +l helix wheel display using a window of only 13 residues. The display includes a schematic of the helix showing the links between residues, with each vertex numbered according to position; the residue type at each vertex; a symbol denoting a classification + as hydrophobic (.), positively charged (+), negatively charged (-), or otherwise (). The residue number of the first sequence element in the current window is displayed at the top left corner along with the sequence. Below this is the total hydrophobicity + and hydrophobic moment according to Eisenberg {\i et al }(2).\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Producing a Robson secondary structure prediction\par +\pard\plain \s4\qj\sa120\sl280 \f20 This method uses the method of Garnier {\i et al} (4) to predict the positions of alpha helices, beta sheets, turns and random coil. The results can be either plotted or listed.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Robson secondary structure prediction".\par +\pard \s7\qj\fi-560\li560\ri-100\sa120\sl280\tx560 \page 2.\tab Accept "Plot results". The alternative produces a listing like that shown in figure 11.4.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 11.5. and the program also prints a count of the number of positions at which each of the 4 structure types is the highest scoring.\par +\pard\plain \li1500\ri1460\sb200\sl220\box\brsp100\brdrth \tqr\tx3220\tqr\tx4700\tqr\tx6140\tqr\tx7420 \f4\fs16 350 P\tab 274\tab -178\tab -84\tab -77\par +\pard \li1500\ri1460\sl220\box\brsp100\brdrth \tqr\tx3220\tqr\tx4700\tqr\tx6140\tqr\tx7420 351 L\tab 16\tab -192\tab -21\tab -38\par +352 K\tab 371\tab -223\tab -75\tab -68\par +353 L\tab 365\tab -152\tab -101\tab -65\par +354 S\tab 331\tab -82\tab -84\tab -63\par +355 K\tab 311\tab -43\tab -110\tab -88\par +356 A\tab 280\tab -23\tab -110\tab -80\par +357 V\tab 234\tab -12\tab -135\tab -75\par +358 H\tab 177\tab -10\tab -143\tab -92\par +359 K\tab 153\tab 2\tab -180\tab -138\par +360 A\tab 158\tab 52\tab -175\tab -130\par +361 V\tab 144\tab 78\tab -187\tab -115\par +362 L\tab 132\tab 58\tab -186\tab -80\par +363 T\tab 124\tab 63\tab -142\tab -78\par +364 I\tab 144\tab 32\tab -111\tab -43\par +365 D\tab 120\tab -49\tab -29\tab 5\par +366 E\tab 103\tab -80\tab 13\tab 43\par +367 K\tab 111\tab -113\tab 23\tab 42\par +368 G\tab 132\tab -127\tab -13\tab 64\par +369 T\tab 172\tab -132\tab -42\tab 52\par +\pard \li1500\ri1460\sl220\keepn\box\brsp100\brdrth \tqr\tx3220\tqr\tx4700\tqr\tx6140\tqr\tx7420 370 E\tab 216\tab -170\tab -122\tab -4{\b \par +}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa200\sl240\tx1140 \f21\fs20 Figure 11.4\tab A listing of the Robson secondary structure prediction. It includes the sequence position, the residue type and the values for the four structure classes.\par +\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw446\pich256 +0d0fffffffff00ff01bd1101a0008201000affffffff00ff01bd090000000000000000310000000000fe01bc9800240000000000a601200000000000a6011f0000000000fe01bc000102dd0006007fdfff00fc060040df000004060040df000004060040df000004060041df00000407014280e0000004060042df0000040b +0042fd00010140e500000410014380fe000101a0fd0000c0ea000004110640000008800120fe000101a0ea000004160c40000019900124000010022380fc00000cf1000004200c40000066e80216000070022240fc000012fe00014001fc000060fd00010404240e400000a68804190000908214400006fe000612200000a0 +0180fe00010190fd00010c04252340001100080409020111421820000b02800012500000a181900000040f10001006001204252340003f0008040903c20d3c0020001103c00022900000a1827900038e080800380a90120425234061410004340104a205200010021084200022910000a272279ea495300800440ba8218425 +234292e03b1c5f070a12f300e0d0059e9830002751860122f20493de5161f800440bb82d042508429400000648010814fe00171004a0b814002009b621221400600860c008008508a44104250843140000018000880cfe000d0f0460c01ad9200a4952220c0020fe000604808590424004230842140000018000880cfe000b +0d046080032ac006014a2204fc000605e883904380041f014018fc000050fc000084fe000604400401861c04fc0006051500500200041e014008fc000030fc000048fe0005040000018018fb0006060500600000041a0040fb000020fc000070fb0002010018fb0006020600200000040e0040f5000050f2000002fc000004 +0a0040f5000040ec000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df00000406007fdfff00fc060040df000004060040df000004060040df00000407014380e0000004060041df000004060041df0000040a0041f7000020ea000004100041fe0002 +8000c0fd000060ea000004190640000008e00120fe00010161fb000008fd000001f60000041b064000000da00120fe000202e280fc000008fe00018001f60000041c0640600053200110fe000202e280fc00000cfe0002c00180f70000042306405010a3100212fe000202a280fe0002c00014fe0004c001800002fc000304 +00100425064090110310020efe001904948000000140002400000121018000c200018000000420120425234090190210040e0000c0049c8000000220002480000121814000a38c014000000c202e8425234090190210140d020330280880000c222000248100022281430123920f6000000ab02d04252340891d02182c0503 +0234d80040001a54202025830022224124821291083000300ab44d04252342f766eef868e50484150323c0002a55d0303be2e033e34124c21279381000500efc560425234287e60018800504c4170000440722881049c39491322231242a1a01e00880580adc440425234306a00015000108241600003c05418810c943140b +4c1211343c0c00000980480a8a440425234106000016000108141400001218c1800d0902140780140d38240c000009a0841b0a8004252340040000060000881c080000111000000f0e02140780140608000c00000950852b0a8004252340040000040000880008000001900000070002080500180600000400000908872b0b +000420014004fc0002900008fe000050fb000f040018060000040000050481230300041b0040fb000060fc000050fb000304000802fc0006060700a30000041b0040fb000020fc000060fb000304000802fc0006020100e2000004140040f5000060f9000008fb0006020000220000040a0040e5000002fc0000040a0040e5 +000002fc000004060040df000004060040df000004060040df000004060040df00000406007fdfff00fc060040df000004060040df000004060040df0000040b014380f8000008ea0000040a0042f700000cea0000040e0043f700000cf1000008fb0000040e0042f700000af100000cfb0000040f014380f8000012f10000 +14fb0000041c014004fd000008fd000012fd000004fc00010204fd000014fb0000041e01410afd000014fd000012fd00000afc0001060cfe00010224fc000101041f014292fd000014fd000012fd00000bfc00010a0cfe0001c522fd000203010421014291fd000024fd000621200000020904fd00060a12000001a522fd00 +02028284240642910000a00024fd00182150000007110f000007000a12000002252200010000028204252342508000d006443804000820900000091109c0000900091200300218e2180380000282042523426043f0918582280c000e20900030089109200008801112002802180128048000028404252343bfc32912899a68 +1200113e3000380e9f8a60001c802f1e002803f7ff2804802002858425234000440f0e480184122011201180c80890902000204e21120044040001281c406164840425234000440208500184225320c009410808a09010002052e0a110441c0001241040b2a44404252340002402085001044254a00009220810a060100040 +2380a12d841000012620210a34440425234000240000300002829c60000924041040600800802080c127022000014220210e08440425234000180000200002818000000614041000000881000080812002200001424011000848042402400018fd000c0300800000061c04a000000981fe000c01200240000082401b000028 +04200040fb000002fc00061004c000000a42fe000c012002400000014004000030041f0040fb000002fb0005028000000a64fe00070120014000000180fe000130041b0040f40005038000000614fe000301200180fe000080fe00011004160040f4000002fe00010614fe000301400080f9000004110040ef000008fe0003 +01400080f90000040a0040ea0000c0f70000040a0040ea000080f7000004060040df000004060040df000004060040df00000406007fdfff00fc0a0040e40000c0fd0000040a0040e40000a0fd00000413014280f4000080fe000006f7000090fd00000417014280f600021000c0fe000009f80006011001c00000041c0143 +80f600026800a0fe00010880fe000001fd00060110032000000421014280fa000002fe0002480110fe00011080fe00010280fe00060108042000000421014280fa000005fe0002c40110fe00011040fe00010640fe000602080420000004250f4000060001c000380405300001040110fe00011040fe000b3420e000000204 +0820008004250f40000500023000280a04a80003020210fe00101040080000482090000004060820018004252340000900021000440a0868000c0102080038001020140000c0211000000405901002400425234000110002080082120864001401020801e4002021e21c0100111000000801d0100240042523400011000404 +00826108040010008208030300202222240100121000000800201004200425234000108004040103a09002001000e208040381c012022201000a080000080000100420042523400020600404010000900200200024078c00418014022202000c08010008000008c82104252341c06020080401000060020020001804f00022 +001c0243e200000801c008000008a81704252342a0ff100ffe01ffff0ffe105fffebf811fda3bff3ff789ffffff802201ffffffdeff5042506401b000e880202fe0002012840fe000610003200000180fe00090b06201000000518148423064004000bc80202fe0002012480fc00001cfe000080fe00090a84383000000300 +0c841e0640040002480104fd0001a680fc000008fa00090a8c2ce00000020000041e0640000002280104fd0001e380fc000008fa0002045005fe000302000004170040fe0002100104fd00018080f40002042003fb000004140040fe0002100104fd000080f1000003fb0000040a0040fc0000fce50000040a0040fc000020 +e5000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df00000406007fdfff00fc060040df000004060043df00000407014280e000000407014280e000000407014280e000000407014380e000000425134000203b1807070200f000e0c0000e00300006 +40fe000cf000000e1001f800000b980c04060040df000004060040df000004060040df000004060040df000004060040df0000042406407000eee000e0fe0011032380000001c00038800001e10100000208fd000304201204060040df000004060040df000004060040df000004060040df000004060040df000004060040 +df0000042202439f80fe000018fd00111e200010061f0260000c000e1e000001f7fefc00010184060040df000004060040df000004060040df000004060040df000004060040df000004252340005f0007fc01ffff0ffc001fffe3f801fd81bff3fe3801fffff800000ffffff8cfe004060040df000004060040df00000406 +0040df000004060040df000004060040df000004060040df00000406007fdfff00fc02dd00a00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.5\tab A secondary structure plot using the method of Robson. The likelihood that each 17 residue segment of the sequence forms one of the four structure classes\: + helix (H), extended (E) normally termed sheet, turn (T) and coil (C) are each plotted out across the screen in four strips. Below this +is a "decision" strip (D) in which a single dot is poltted for the higest scoring structure class at each point. Here we see a sequence that is predicted to be predominantly helical.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Calculating the composition and molecular weight of a sequence.\par +\pard\plain \s4\qj\sa120\sl280 \f20 +Select "Count amino acid composition". The composition and molecular weight are displayed as in figure 11.6.. Each column contains the one letter code for the amino acid, the number of occurrences of that amino acid in the sequence, and the number expresse +d as a percentage, and its molecular weight.\par +\pard\plain \li220\ri280\sb200\sl220\box\brsp100\brdrth \f4\fs16 Sequence composition\par +\pard \li220\ri280\sl220\box\brsp100\brdrth A C S T P A G N D E Q B Z H\par +N 0. 14. 19. 12. 30. 26. 3. 10. 11. 4. 0. 0. 0.\par +% 0.0 5.3 7.3 4.6 11.5 9.9 1.1 3.8 4.2 1.5 0.0 0.0 0.0\par +W 0. 1219. 1921. 1165. 2132. 1483. 342. 1151. 1420. 513. 0. 0. 0.\par +\par +A R K M I L V F Y W - X ? \par +N 7. 7. 10. 15. 39. 23. 13. 11. 8. 0. 0. 0. 0.\par +% 2.7 2.7 3.8 5.7 14.9 8.8 5.0 4.2 3.1 0.0 0.0 0.0 0.0\par +W 1093. 897. 1312. 1697. 4413. 2280. 1913. 1795. 1490. 0. 0. 0. 0.\par +\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth Total molecular weight= 28256.254\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.6\tab A typical molecular weight and composition display. It includes the residue type, their number, their percentage and their contribution to the molecular weight.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab The methods described in the chapters on motif and pattern searching can also be used to search for specifi +c structures. For example a sequence can be searched for all the structures contained in the PROSITE motif library.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab It is often convenient to produce displays in which several of the plots described above appear together on the screen.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Kyte, J. and Doolittle, R.F. 1982. A simple method for displaying the hydropathic character of a protein. {\i J.Mol. Biol}. {\b 157}\:105-132. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Eisenberg, D., Schwarz, E., Komaromy, M., and Wall, R. 1984. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. {\i J. Mol. Biol.} {\b 179}\:125-142.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Schiffer,M and Edmundson,A.B. 1967 Use of helical wheels to represent the structures of proteins and to identify the segments with helical potential. {\i Biophys. J}. {\b 7}, 121-135.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Garnier, J., Osguthorpe, D.J., and Robson, B. 1978. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. {\i J. Mol. Biol}. {\b 120}\: +97-120.\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 12. Searching for Motifs in Protein Sequences\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Searching for exact matches.\par +2.2\tab Searching for percentage matches to consensus sequences\par +2.3\tab Searching for consensus sequences using a score matrix\par +2.4\tab Using weight matrices for searching protein sequences\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +4.\tab References\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The program PIP contains several ways of defining and searching for motifs (1,2). We describe searches for exact matches and percentage matches, the use of score matrices and the creation and use of weight matrices. All of the searches produce +both listed and graphical output.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Searching for exact matches.\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The routine for finding and displaying the positions of exact matches to sequences can display its results in various forms. It is equivalent to the restriction enzyme search routine in the nucleotide analysis programs. The sequences to be searched for ca +n be typed on the keyboard or read from files. The format of these files is given in the notes. Here we give only a single example of the use of the routine which shows how to produce a plot of the positions of all amino acid types in a sequence.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab +Select "Input source" as "All acids file". A number of standard files are available and users may also have their own. The one selected simply contains the one letter codes for all the standard amino acids.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Accept "Search for all names". The alternative allows users to select a subset of the entries in the file by name.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Order results name by name".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Reject "List matches". If results are listed the output gives the name and position of each match and also the separations between matches.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The results will then appear in the form shown in figure 12.1. \par +\pard\plain \li80\ri80\sl220\keepn\box\brsp40\brdrth \f4\fs16 {{\pict\macpict\picw441\pich182 +14a4ffffffff00b501b81101a0008201000affffffff00b501b8090000000000000000310000000000b201b798002a000000000083014400000000008301440000000000b201b7000102d70020f90002020080fd000a8000008000010204200201fb000620000200401004fe0020f90002020080fd000a8000008000010204 +200201fb000620000200401004fe00220050fa0002020080fd000a8000008000010204200201fb000620000200401004fe00220050fa0002020080fd000a8000008000010204200201fb000620000200401004fe00220070fa0002020080fd000a8000008000010204200201fb000620000200401004fe00220020fa000202 +0080fd000a8000008000010204200201fb000620000200401004fe00220020fa0002020080fd000a8000008000010204200201fb000620000200401004fe0009fd000007deff01c00005d900014000070050da00014000070050da00014000070050da00014000070070da000140000b0070fe000007deff01c00025fb0018 +c340000e04308008944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e04308008944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e04308008944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e043080 +08944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e04308008944000004080c01000400404880128004afe000340481008fd000b0020fe000007deff01c00026fc00018004fe000a0100400001000a00020120fc0008224412200041000820fe000308010000280070fd00018004fe +000a0100400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100 +400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100400001000a00020120fc0008224412200041000820fe00010801ff0009fd000007deff01c00028fd0005014041802010fe00fe101900040500180001080080010084001804028000500500000480002a0020fe000501 +4041802010fe00fe101900040500180001080080010084001804028000500500000480002a0050fe0005014041802010fe00fe101900040500180001080080010084001804028000500500000480002a0060fe0005014041802010fe00fe101900040500180001080080010084001804028000500500000480002a0010fe00 +05014041802010fe00fe101900040500180001080080010084001804028000500500000480000b0070fe000007deff01c00026fc0014040010042000890000400310080040004180112058fe0008010401100800008004fd00280060fd0014040010042000890000400310080040004180112058fe00080104011008000080 +04fd00280050fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd00280070fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd00280050fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd0028 +0050fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd0009fd000007deff01c00027fd0004040a000080fc00092a010808100001000090fe000e040100008021048630000050008000290020fe0004040a000080fc00092a010808100001000090fe000e04010000802104863000 +0050008000290050fe0004040a000080fc00092a010808100001000090fe000e040100008021048630000050008000290050fe0004040a000080fc00092a010808100001000090fe000e040100008021048630000050008000290050fe0004040a000080fc00092a010808100001000090fe000e0401000080210486300000 +500080000b0070fe000007deff01c000230020fa00070800801009010408fc000920000090000020200120fe000301000001fd00230060fa00070800801009010408fc000920000090000020200120fe000301000001fd00230050fa00070800801009010408fc000920000090000020200120fe000301000001fd00230070 +fa00070800801009010408fc000920000090000020200120fe000301000001fd00230040fa00070800801009010408fc000920000090000020200120fe000301000001fd00230040fa00070800801009010408fc000920000090000020200120fe000301000001fd0009fd000007deff01c00021fd00080100880004800000 +40fd0002100101fc0005020000101440fa000022fe00230050fe0008010088000480000040fd0002100101fc0005020000101440fa000022fe00230070fe0008010088000480000040fd0002100101fc0005020000101440fa000022fe00230070fe0008010088000480000040fd0002100101fc0005020000101440fa0000 +22fe00230050fe0008010088000480000040fd0002100101fc0005020000101440fa000022fe000b0050fe000007deff01c0001ffd000604000001400108fd000010fc000028f8000020fe000380000080fe00210050fe000604000001400108fd000010fc000028f8000020fe000380000080fe00210070fe000604000001 +400108fd000010fc000028f8000020fe000380000080fe00210050fe000604000001400108fd000010fc000028f8000020fe000380000080fe00210050fe000604000001400108fd000010fc000028f8000020fe000380000080fe000b0050fe000007deff01c00029fd00230220000410c020462080000081000024028812 +06016000a0005000084842100c48208028ff0029fd00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800002b0040fe00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800002b0040fe00250220000410c020462080000081000024 +02881206016000a0005000084842100c4820802800002b0040fe00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800002b0040fe00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800000b0070fe000007deff01c00026fc000008 +fd000c06000200008004010840000001fe000016fd000a5800044c00040000620000280050fd000008fd000c06000200008004010840000001fe000016fd000a5800044c00040000620000280060fd000008fd000c06000200008004010840000001fe000016fd000a5800044c00040000620000280060fd000008fd000c06 +000200008004010840000001fe000016fd000a5800044c00040000620000280060fd000008fd000c06000200008004010840000001fe000016fd000a5800044c000400006200000b0070fe000007deff01c00027fc0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd +0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd00 +12540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd0012540430210000800802800860b2a20100001808fe0004100a821022fd000302002090ff0009fd000007deff01c0001bfb00011008fc000040fc000008fd000001f9000001fe000002fd001d0050fc00011008fc00 +0040fc000008fd000001f9000001fe000002fd001d0050fc00011008fc000040fc000008fd000001f9000001fe000002fd001d0070fc00011008fc000040fc000008fd000001f9000001fe000002fd001d0050fc00011008fc000040fc000008fd000001f9000001fe000002fd000b0050fe000007deff01c00027fb002304 +488809088d15210106240210080004400048001502010223060000800082000c500000290020fc002304488809088d15210106240210080004400048001502010223060000800082000c500000290050fc002304488809088d15210106240210080004400048001502010223060000800082000c500000290070fc00230448 +8809088d15210106240210080004400048001502010223060000800082000c500000290050fc002304488809088d15210106240210080004400048001502010223060000800082000c500000290070fc002104488809088d15210106240210080004400048001502010223060000800082000c50ff0009fd000007deff01c0 +0020fc000001fa000020fa0014010000120020004048000003000004010000020000220070fd000001fa000020fa0014010000120020004048000003000004010000020000220040fd000001fa000020fa0014010000120020004048000003000004010000020000220060fd000001fa000020fa0014010000120020004048 +000003000004010000020000220040fd000001fa000020fa00140100001200200040480000030000040100000200000b0040fe000007deff01c00028fc0024a02800404010200008400080000010080240002021880800200000100020010880418000002a0070fd0024a02800404010200008400080000010080240002021 +880800200000100020010880418000002a0040fd0024a02800404010200008400080000010080240002021880800200000100020010880418000002a0060fd0024a02800404010200008400080000010080240002021880800200000100020010880418000002a0040fd0024a0280040401020000840008000001008024000 +2021880800200000100020010880418000000b0070fe000007deff01c00024fa000a820010400004c008201044fd0006a6000400000102fd000662000020040204ff0024fa000a820010400004c008201044fd0006a6000400000102fd0008620000200402040000260060fb000a820010400004c008201044fd0006a60004 +00000102fd0008620000200402040000260050fb000a820010400004c008201044fd0006a6000400000102fd0008620000200402040000260050fb000a820010400004c008201044fd0006a6000400000102fd0008620000200402040000260050fb000a820010400004c008201044fd0006a6000400000102fd0008620000 +2004020400000b0070fe000007deff01c0000dfa000301000002fb000001e9000f0020fb000301000002fb000001e9000f0050fb000301000002fb000001e9000f0040fb000301000002fb000001e9000f0040fb000301000002fb000001e9000b0070fe000007deff01c00028fc0024022004030450001016004001c02806 +369020a0101a404280048180c49001222052000100002a0020fd0024022004030450001016004001c02806369020a0101a404280048180c49001222052000100002a0050fd0024022004030450001016004001c02806369020a0101a404280048180c49001222052000100002a0070fd0024022004030450001016004001c0 +2806369020a0101a404280048180c49001222052000100002a0050fd0024022004030450001016004001c02806369020a0101a404280048180c49001222052000100000b0050fe000007deff01c00002d700a0008c310002000100b5001038a10096000c010000000200000000000000a1009a0008fffd0000000300000100 +0a00050002000f000a2c000c00150948656c76657469636103001504010d00082e0004000001002b030c0159a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a000e00020018000a2a090157a00097a10096000c010000000200000000000000a1009a0008fffd0000000300000100 +0a00150002001f000a2a070156a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a001f00020029000a2a0a0154a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a002700020031000a2a080153a00097a10096000c01000000020000000000 +0000a1009a0008fffe00000003000001000a00300002003a000a2a090152a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a003900020043000a2a090151a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00420002004c000a2a090150a0 +0097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a004a00020054000a2a08014da00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a00530002005d000a2a090148a00097a10096000c010000000200000000000000a1009a0008fffe0000000300 +0001000a005c00020066000a2a09014ca00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00640002006e000a2a08014ba00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a006e00020078000a2a0a0149a00097a10096000c01000000020000 +0000000000a1009a0008fffe00000003000001000a007600020080000a2a080148a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00800002008a000a2a0a0147a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a008800020092000a2a08 +0146a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00900002009a000a2a080145a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a0099000200a3000a2a090144a00097a10096000c010000000200000000000000a1009a0008fffe0000 +0003000001000a00a2000200ac000a2a090143a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00aa000200b4000a2a080141a00097a0008da00083ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb40\sa120\sl240\tx1140 \f21\fs20 Figure 12.1\tab Typical graphical output from "Search for exact matches" in which the position of each matching string (here individual amino acid types) is marked.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Searching for percentage matches to sequences\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find percentage matches".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par +3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par +4.\tab Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Percent match". The search is performed, the results are presented graphically, the number of matches displayed, and the scores and positions of the top 10 matches displayed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define the number of matches to "Display". For the number of matches chose +n the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round to step 3.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Searching for sequences using a score matrix\par +\pard\plain \s4\qj\sa120\sl280 \f20 +A score matrix gives a score for the alignment of each possible pair of sequence symbols. This method is more sensitive than the simple percentage match search. The default matrix MDM78 used by this program is shown in figure 12.2.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find matches using a score matrix".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par +3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab +Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default. The program displays the minimum and maximum possible scores for the string.\par +5.\tab Define "Score". The search is performed, the results are presented graphically, the number of matches displayed, and the scores and positions of the top 10 matches displayed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab +Define the number of matches to "Display". For the number of matches chosen the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round + to step 3. An example run is shown in figure 12.3.\par +\pard\plain \li220\ri280\sb200\sl220\box\brsp100\brdrth \f4\fs16 C S T P A G N D E Q B Z H R K M I L V F Y W - X ? \par +\pard \li220\ri280\sl220\box\brsp100\brdrth C 22 10 8 7 8 7 6 5 5 5 5 5 7 6 5 5 8 4 8 6 10 2 10 10 10 10\par +S 10 12 11 11 11 11 11 10 10 9 10 10 9 10 10 8 9 7 9 7 7 8 10 10 10 10\par +T 8 11 13 10 11 10 10 10 10 9 10 10 9 9 10 9 10 8 10 7 7 5 10 10 10 10\par +P 7 11 10 16 11 9 9 9 9 10 9 10 10 10 9 8 8 7 9 5 5 4 10 10 10 10\par +A 8 11 11 11 12 11 10 10 10 10 10 10 9 8 9 9 9 8 10 6 7 4 10 10 10 10\par +G 7 11 10 9 11 15 10 11 10 9 10 10 8 7 8 7 7 6 9 5 5 3 10 10 10 10\par +N 6 11 10 9 10 10 12 12 11 11 12 11 12 10 11 8 8 7 8 6 8 6 10 10 10 10\par +D 5 10 10 9 10 11 12 14 13 12 13 12 11 9 10 7 8 6 8 4 6 3 10 10 10 10\par +E 5 10 10 9 10 10 11 13 14 12 12 13 11 9 10 8 8 7 8 5 6 3 10 10 10 10\par +Q 5 9 9 10 10 9 11 12 12 14 11 13 13 11 11 9 8 8 8 5 6 5 10 10 10 10\par +B 5 10 10 9 10 10 12 13 12 11 13 11 11 10 10 8 8 6 8 5 7 4 10 10 10 10\par +Z 5 10 10 10 10 10 11 12 13 13 11 14 12 10 10 8 8 8 8 5 6 4 10 10 10 10\par +H 7 9 9 10 9 8 12 11 11 13 11 12 16 12 10 8 8 8 8 8 10 7 10 10 10 10\par +R 6 10 9 10 8 7 10 9 9 11 10 10 12 16 13 10 8 7 8 6 6 12 10 10 10 10\par +K 5 10 10 9 9 8 11 10 10 11 10 10 10 13 15 10 8 7 8 5 6 7 10 10 10 10\par +M 5 8 9 8 9 7 8 7 8 9 8 8 8 10 10 16 12 14 12 10 8 6 10 10 10 10\par +I 8 9 10 8 9 7 8 8 8 8 8 8 8 8 8 12 15 12 14 11 9 5 10 10 10 10\par +L 4 7 8 7 8 6 7 6 7 8 6 8 8 7 7 14 12 16 12 12 9 8 10 10 10 10\par +V 8 9 10 9 10 9 8 8 8 8 8 8 8 8 8 12 14 12 14 9 8 4 10 10 10 10\par +F 6 7 7 5 6 5 6 4 5 5 5 5 8 6 5 10 11 12 9 19 17 10 10 10 10 10\par +Y 10 7 7 5 7 5 8 6 6 6 7 6 10 6 6 8 9 9 8 17 20 10 10 10 10 10\par +W 2 8 5 4 4 3 6 3 3 5 4 4 7 12 7 6 5 8 4 10 10 27 10 10 10 10\par +- 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par +X 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par +? 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par +\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa300\sl240\tx1140 \f21\fs20 Figure 12.2\tab The amino acid score matrix MDM78.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Using weight matrices for searching protein sequences\par +\pard\plain \s4\qj\sa120\sl280 \f20 +A weight matrix is the most sensitive way of defining a motif. It is a table of values that gives scores for each amino acid type in each position along a motif. For a motif of length 8 amino acids the weight matrix would be a table 8 positions long and, a +llowing for 26 amino acid symbols, 26 deep. The simplest way of choosing the values for the table is to take an alignment of all known +examples of the motif and to count the frequency of occurrence of each amino acid type at each position. These frequencies can be used as the table of weights. When the table is used to search a new sequence the program calculates a score for each position + along the sequence by adding or multiplying (see notes) the relevant values in the table. All positions that exceed some cutoff score are reported as matching the original set of motifs.\par +\pard \s4\qj\sa120\sl280 How can we select a suitable cutoff score? The simplest way is to ap +ply the weight matrix to all the known occurrences of the motif - i.e. the set of sequence segments used to create the table - and to see what scores they achieve. The cutoff can be selected accordingly. For convenience the weight matrix is stored as a fil +e along with its cutoff score, a title that is displayed when the file is read, and a few other values need by the program. A routine for creating weight matrix files from sets of aligned sequences is included in the program. When a search using the weight + matrix is performed the program will either list the matching sequence segments or plot their positions as for the other motif search methods.\par +\pard\plain \li2000\ri2260\sb200\sl220\box\brsp100\brdrth \f4\fs16 Find matches using a score matrix\par +\pard \li2000\ri2260\sl220\box\brsp100\brdrth ? Keep picture (y/n) (y) =\par + ? String=ALPHA\par +Minimum score= 23 Maximum score= 72\par +? Score (23-72) (72) =60\par +\par +For score 60 the number of matches= 5\par +Scores 62 62 62 61 61\par +Positions 120 217 420 54 326\par +? Display (0-5) (0) =\par +\par + 120\par + PLDHD\par + * *\par + ALPHA\par + 1\par +\par + 217\par + ALANT\par + **\par + ALPHA\par + 1\par +\par + 420\par + QLDHG\par + * *\par + ALPHA\par + 1\par +\par + 54\par + SLPGN\par + **\par + ALPHA\par + 1\par +\par + 326\par + ALPII\par + ***\par + ALPHA\par + 1\par +? Keep picture (y/n) (y) =\par + Default String=ALPHA\par +\pard \li2000\ri2260\sl220\keepn\box\brsp100\brdrth ? String=!\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa420\sl240\tx1140 \f21\fs20 Figure 12.3\tab An example of the listed output from "Search using a score matrix".\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.4.1\tab Creating a weight matrix file from a set of aligned sequences\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par +2.\tab Select "Make weight matrix".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +Define "Name of aligned sequences file". We assume the file of aligned sequences has already been created (see note 5). The program reads and displays the contents of the file numbering each sequence as it goes. Then it displays the length of the longes +t sequence.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Sum logs of weights". The alternative is to sum the weights when calculating scores (see note 6). \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Accept "Use all motif positions". The alternative allows the user to define a "mask" which i +dentifies positions within the motif that should be ignored when the matrix is created (see note 7). The program now calculates the weights and applies them in turn to each of the sequences in the file. The number and score for each sequence is displayed, +followed by the top, bottom and mean scores and the standard deviation. In addition the mean plus and minus 3 standard deviations is displayed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Cutoff score". The default is the mean minus 3 standard deviations, but users may, for example, decide to use the lowest score obtained by the sequences in the file.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Top score for scaling plots". This parameter is used by the graphics output routine when scaling the plots. Its value will influence the height of lines plotted to represent matches.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab +Define "Position to identify". When a search is performed it is not always appropriate to report the position of a match relative to the leftmost amino acid in the motif. For example when performing a helix-turn-helix motif search we may want to know + the position of the well conserved glycine rather than the position of the first amino acid in the matrix. The "Position to identify" allows the user to define which amino acid is marked. The amino acids in the table are number 1,2,3 and so on.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define a "Title". This is a title that will be displayed when the matrix file is read prior to performing a search. It is limited to 60 characters.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab Define "Name for new weight matrix file". Give a name for the weight matrix file.\par +\pard\plain \s4\qj\sa120\sl280 \f20 See the example run in figure 12.4.\par +\pard\plain \li1240\ri1180\sb300\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Motif search using weight matrix\par +\pard \li1240\ri1180\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select operation\par +X 1 Use weight matrix\par + 2 Make weight matrix\par + 3 Rescale weight matrix\par +? Selection (1-3) (1) =2\par +? Name of aligned sequences file=atpbinding.seq\par + 1 GETLGIVGESGSG\par + 2 GESLGVVGESGGGKSTFAR OppF\par + 3 GDVISIDGSSGSGKSTFLR HisP\par + 4 GEFVVFVGPSGGGKSTLLR MalK E. coli\par + 5 NQVTAFIGPSGGGKSTLLR PstB\par + 6 GRVMALVGENGAGKSTMMK RbsA(N)\par + 7 GEVIGIVGRSGSGKSTLTK HlyB\par + 8 GECFGLLGPNGAGKSTITR NodI R. leguminosarum\par + 9 GEMAFLTGHSGAGKSTLLK FtsE E. coli\par + 10 GQRELIIGDRQTGKTALAI ATPase\par + 11 GGKVGLFGGAGVGKTVNMM ATPase\par + 12 GRIVEIYGPESSGKTTLTL RecA\par + 13 RSNLLVLAGAGSGKTRVLV UvrD\par + 14 GGKIGLFGGAGVGKTVGIM ATPase Bovine\par + 15 SKIIFVVGGPGSGKGTQCE Adenylate Kinase Rabbit\par + 16 NQSILITGESGAGKTVNTK Myosin Rabbit\par + 17 HVNVGTIGHVDHGKTTLTA EF-Tu E. coli\par + 18 YRNIGISAHIDAGKTTERI EF-G E. coli\par + 19 EYKLVVVGARGVGKSALTI v-ras (HARVEY)\par + 20 EYKLVVVGASGVGKSALTI v-ras (KIRSTEN)\par + 21 EYKLVVVGAVGVGKSALTI pEJ BLADDER CARCINOMA TRANSFORMING\par + 22 EYKLVVVGAGGVGKSALTI pEJ BLADDER CARCINOMA CELLULAR\par +Length of motif 19\par +? Sum logs of weights (y/n) (y) =\par + ? Use all motif positions (y/n) (y) =\par +Applying weights to input sequences\par + 1 -36.651 GETLGIVGESGSGKSQSLR\par + 2 -35.780 GESLGVVGESGGGKSTFAR\par + 3 -38.180 GDVISIDGSSGSGKSTFLR\par + 4 -35.403 GEFVVFVGPSGGGKSTLLR\par + 5 -39.039 NQVTAFIGPSGGGKSTLLR\par + 6 -40.653 GRVMALVGENGAGKSTMMK\par + 7 -34.017 GEVIGIVGRSGSGKSTLTK\par + 8 -37.454 GECFGLLGPNGAGKSTITR\par + 9 -36.474 GEMAFLTGHSGAGKSTLLK\par + 10 -43.431 GQRELIIGDRQTGKTALAI\par + 11 -40.210 GGKVGLFGGAGVGKTVNMM\par + 12 -40.720 GRIVEIYGPESSGKTTLTL\par + 13 -45.143 RSNLLVLAGAGSGKTRVLV\par + 14 -40.684 GGKIGLFGGAGVGKTVGIM\par + 15 -45.197 SKIIFVVGGPGSGKGTQCE\par + 16 -39.098 NQSILITGESGAGKTVNTK\par + 17 -43.832 HVNVGTIGHVDHGKTTLTA\par + 18 -44.817 YRNIGISAHIDAGKTTERI\par + 19 -36.305 EYKLVVVGARGVGKSALTI\par + 20 -35.101 EYKLVVVGASGVGKSALTI\par + 21 -36.305 EYKLVVVGAVGVGKSALTI\par + 22 -36.711 EYKLVVVGAGGVGKSALTI\par +Top score -34.017 Bottom score -45.197\par +Mean -39.146 Standard deviation 3.441\par +Mean minus 3.sd -49.470 Mean plus 3.sd -28.822\par +? Cutoff score (-999.00-9999.00) (-49.47) =\par +? Top score for scaling plots (-49.47-999.00) (-28.82) =\par +? Position to identify (0-19) (1) =13\par +? Title=ATP binding motif\par +\pard \li1240\ri1180\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth ? Name for new weight matrix file=atpbinding.wts\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa320\sl240\tx1140 \f21\fs20 Figure 12.4\tab An example run of the creation of a weight matrix from a set of aligned sequences.\par +\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.4.2\tab Searching using a weight matrix\par +\pard\plain \s4\qj\sa120\sl280 \f20 Once a weight matrix has been stored in a file it can be used to search any sequence. Results can be displayed graphically or the matching sequence segments can be listed out with their scores.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par +2.\tab Select "Use weight matrix".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Motif weight matrix file". The name of the file containing the weight matrix. The program reads the file and displays its title.\par +4.\tab Accept "Use frequencies as weights". The alternative will use the weight matrix file as a definition of a "Membership of set" motif (see note 10).\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab +Define "Cutoff score". The default will be the value set when the weight matrix file was created. If the score is negative the program will calculate sums of logs of frequencies, otherwise it will add frequencies.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Plot results". Alternatively they will be listed.\par +The results will appear.\par +\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \page 3.\tab Notes\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab The files containing the definitions of peptides that can be be searched for by the exact match search routine have the following format. Each name is followed by a /, th +en each of its peptide sequences is followed by a /. The last peptide sequence for each name is followed by //. For example a file might contain the following.\par +\pard \s7\qj\li1720\sb200\sa120\sl280\tx1880 Acidic/D/E//\par +\pard \s7\qj\li1720\sa120\sl280\tx1880 Basic/R/K/H//\par +Glyco/N-S/N-T//\par +\pard \s7\qj\fi-560\li560\sb200\sa120\sl280\tx560 \tab Users could then search for these named sets of sequences. Note that the symbol - matches any amino acid.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab To search for a subset of the names in a file employed by exact match routine the user should reject "Search for all names" and the program will ask for the names wanted and extract their sequences +from the file. Alternatively, if a user was always using the same subset, then a file containing only those names could be created. This file would then be selected as "Personal file" for "Input source".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +The exact match routine also allows names and their sequences to be entered on the keyboard. This is selected as "Keyboard" for "Input source", and the program will prompt for names and their sequences. In this way the routine can be used to search for +exact matches to any short sequence. \par +4.\tab For this pr +ogram a motif is a short segment of sequence of fixed length. More complex structures termed "patterns" which we define as sets of motifs separated by varying gaps, are covered in another chapter. The current chapter should be read before the chapter on pa +tterns. \par +5.\tab The files of aligned sequences used to make weight matrices have the following format. Each sequence should be on a separate line. The sequence should start in column 2 and is terminated by a new line or a space. Anything after the space is tr +eated as a comment. The files can be created by previous searches or using an editor.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab +The frequencies in the weight matrix can be used in two ways to calculate scores for sequences. Some users prefer to add the frequencies to give a total score, and others to multiply them by summing their logs. If we regard the frequencies as probabilit +ies then multiplication seems the correct procedure. The user chooses which method will be used when the weight matrix is created, however the choice can be overridden wh +en the matrix is used. If multiplication is selected then all results will presented as sums of logs.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab +Masking the weight matrix is particularly useful in cases where a limited number of examples of a motif are available, or when the motif may have several components. In the first case the limited number of examples may make the matrix unrepresentative o +f the motif because the amino acids in the unconserved positions may bias the results of searches. We stated that a motif might have several components\: +for example it might have both structural and specificity components. We may want to separate out the two parts and again masking provides such a facility.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab +The weight matrix handling routine contains a further option "Rescale weight matrix". If the user has edited a weight matrix to change the frequency values this provides a way of selecting a new cutoff score. It allows users to read in a set of aligned + sequences and a weight matrix and to apply the matrix to the set of sequences to see the range of scores achieved. A new weight matrix file contining the selected cutoff score is written to disk.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab +The program contains no hardwired motifs as we expect most sites that use the programs to accumulate their own libraries of motifs and patterns, and to use the PROSITE library, both of which users can employ by simply knowing the names of the correspond +ing files.\par +10.\tab The weight matrix search can also used as a "Membership of a set" search. This means that at each position in the motif, any amino acid type tha +t is non-zero in the weight matrix is counted as a match and scores a value 1. See the chapter on searching protein sequences for patterns.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4(1)}\:53-60.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 13. Using Patterns to Analyse Protein Sequences\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 1.1\tab Introduction to the PROSITE motif library\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Creating a pattern file containing a weight matrix motif and a membership of a set motif.\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.2\tab Searching a sequence using a pattern file\par +2.3\tab Comparing a sequence against a library of patterns including PROSITE\par +2.4\tab Searching libraries for patterns\par +2.5\tab Preparing the PROSITE motif library for use by the programs\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +4.\tab References\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 Here we describe one of the most powerful facilities provided by the program PIP\: the ability to d +efine and search sequences or libraries of sequences for complex patterns of motifs. In another chapter we give details of seaching for individual motifs but here we show how to create individual patterns and libraries of patterns and to use them to searc +h sequences. Once a pattern has been defined and stored in a file it can used to search any sequence. In addition if users want to routinely screen sequences against libraries of patterns this can be achieved by use of files of file names. For example, the + program can use the PROSITE protein motif library. The program can produce several alternative forms of output. It will display the segment of sequence matching each individual motif in the pattern, display all the sequence between and including the two o +utermost motifs, produce a description of the match in the form of a SWISSPROT feature table, or draw a simple graphical plot.\par +\pard \s4\qj\sa120\sl280 Towards the end of the chapter we describe how a related program PIPL is used to search libraries of sequences to find patterns. This program can produce alignments of sequence families.\par +\pard \s4\qj\sa120\sl280 +Patterns are defined as sets of motifs with variable spacing. Each motif in a pattern can be defined using any of several methods, and their positions relative to one other are defined in terms of minimum and maximum separations. In addition, by the use of + logical operators, each motif can be declared to be essential (the AND operator), optional (the OR operator), or forbidden (the NOT operator). The following methods (termed "classes" by the program) for defining motifs are provided\: + 1) exact match to a short sequence; 2) percentage match to a short sequence; 3) match to a short sequence using a score matrix and cutoff score; 4) match to a weight matrix; 5) direct repeat; 6) membership of a set. \par +\pard \s4\qj\sa120\sl280 +The motifs in a pattern are numbered sequentially and motif spacing is defined in the following way. When a new motif is added to a pattern the user specifies the "Reference motif" by its number and then a "Relative start position". The "Relative start pos +iti +on" is defined by taking the first base of the "Reference motif" as position 1, the next as 2, and so on. Then the user defines the allowed variation in the spacing by specifying the "Number of extra positions". Notice that the position of a motif can be d +efined relative to any other motif, and that a negative "Relative start position" declares the motif to be to the left of its "Reference motif".\par +\pard \s4\qj\sa120\sl280 The probability of finding each individual motif in the current sequence, the product of the probabilities for +all the motifs in a pattern "Probability of finding pattern", and the "Expected number of matches" is calculated and displayed by the program. In addition to the cutoffs used for the individual motifs, users can apply two pattern cutoffs\: + "Maximum pattern probability" and "Minimum pattern score".\par +\pard \s4\qj\sa120\sl280 Below we describe\: how to create a pattern; how to use a pattern file to search a sequence; how to use a "File of pattern file names" to search a sequence for a whole library of patterns; how to use a pattern file + to search a whole library of sequences; how to reformat the PROSITE motif library into a form compatible with these search programs. To describe how to create a pattern file we first show all the steps to make one containing two motifs, and then, to save +space, the parts specific to the individual motif types are sketched in the notes section.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.1\tab Introduction to the PROSITE motif library\par +\pard\plain \s4\qj\sa120\sl280 \f20 A library of protein motifs (in our terminology, because they include variable gaps, many would be called patterns) has + recently become available from Amos Bairoch, Departement de Biochimie Medicale, University of Geneva. Currently it contains over 500 patterns/motifs and arrives on tape or cdrom in two files\: + a .DAT file and a .DOC file. There is also a user documentation file PROSITE.USR. Here we outline the library structure and what is required to prepare the PROSITE library for use by our programs. A typical entry in the .DAT file is shown in figure 13.1. +\par +\pard \s4\qj\sa120\sl280 Each entry has an accession number (in figure 13.1 PS00197), a pattern definition (in figure 13.1 C-x(1,2)-[STA]-x(2)-C-[STA]-\{P\}-C) and a documentation file cross reference (in figure 13.1 PDOC00175). This pattern means\: + C, gap of 1 or 2, any of STA, gap of 2, C, any of STA, not P, C.\par +\pard \s4\qj\sa120\sl280 +We need to convert all of these patterns into our pattern definitions (as membership of a set, with the appropriate gap ranges) and write each into a separate pattern file with corresponding "membership of a set" weight matrices. After the conversion each +pattern file is named accession_number.pat (here PS00197.PAT). The corresponding matrix files are accession_number.wtsa, accession_number.wtsb, etc for however many are needed (here PS00197.WTSA and PS00197.WTSB)\: + two are needed because of the variable gap.\par + +n addition we can optionally split the .DAT and .DOC files into separate files, one for each entry, with names accession_number.dat and accession_number.doc. Also we create an index for the library which gives a one line description of each pattern, and en +ds with the pattern file and do +cumentation file numbers. The start of the file is shown in figure 13.2. So, refering to figure 13.2, the name of the pattern file for Glycosaminoglycan attachment site is PS00002.PAT, and for the documentation file PDOC00002.DOC\par +\pard \s4\qj\sa120\sl280 +Finally we create a file of file names for all the patterns in the library. If this file of file names is PROSITE.NAM then to use the complete PROSITE library from program PIP, users select "pattern searcher" and choose the option "use file of pattern file + names", and give the file name PROSITE.NAM. For any matches found, the accession number and pattern title will be displayed.\par +\pard\plain \li360\ri440\sl220\pagebb\box\brsp40\brdrth \f4\fs16 ID 2FE2S_FERREDOXIN; PATTERN.\par +\pard \li360\ri440\sl220\box\brsp40\brdrth AC PS00197;\par +DT APR-1990 (CREATED); APR-1990 (DATA UPDATE); APR-1990 (INFO UPDATE).\par +DE 2Fe-2S ferredoxins, iron-sulfur binding region signature.\par +PA C-x(1,2)-[STA]-x(2)-C-[STA]-\{P\}-C.\par +NR /RELEASE=14,15409;\par +NR /TOTAL=69(69); /POSITIVE=63(63); /UNKNOWN=0(0); /FALSE_POS=6(6);\par +NR /FALSE_NEG=5(5);\par +CC /TAXO-RANGE=A?EP?; /MAX-REPEAT=1;\par +CC /SITE=1,iron_sulfur; /SITE=5,iron_sulfur; /SITE=8,iron_sulfur;\par +DR P15788, FER$APHHA , T; P00250, FER$APHSA , T; P00223, FER$ARCLA , T;\par +DR P00227, FER$BRANA , T; P07838, FER$BRYMA , T; P13106, FER$BUMFI , T;\par +DR P00247, FER$CHLFR , T; P07839, FER$CHLRE , T; P00222, FER$COLES , T;\par +DO PDOC00175;\par +\pard \li360\ri440\sl220\keepn\box\brsp40\brdrth //\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 13.1\tab A typical entry from the PROSITE library\par +\pard\plain \li440\ri480\sb300\sl220\box\brsp100\brdrth \f4\fs16 IN-glycosylation site. 00001,00001\par +\pard \li440\ri480\sl220\box\brsp100\brdrth Glycosaminoglycan attachment site. 00002,00002\par +Tyrosine sulfatation site. 00003,00003\par +\pard \li440\ri480\sl220\keepn\box\brsp100\brdrth cAMP- and cGMP-dependent protein kinase phosphorylation site. 00004,00004\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 13.2\tab The start of the index created by the conversion program\par +\pard\plain \s4\qj\sa120\sl280 \f20 +In order to make the PROSITE library useable by the search programs it is only necessary to run a program named SPLITP3. Two other programs, SPLITP1 and SPLITP2, only make the original files marginally easier to manage and produce an index. SPLITP1 split +s the PROSITE.DAT file to create a separate file for each entry. Each file is automatically named PSentry_number.DAT. In addition it creates an index for the library (see above).\par +\pard \s4\qj\sa120\sl280 SPLITP2 performs the same operation for the PROSITE.DOC file, except that no index is created. Files are named PSentry_number.DOC.\par +\pard \s4\qj\sa120\sl280 +SPLITP3 creates a separate pattern file and weight matrix files for each PROSITE entry from the file PROSITE.DAT. Pattern files are named PSentry_number.PAT, weight matrix files PSentry_number.WTSA, PSentry_number.WTSB, etc. The pattern title is the one li +ne description of the motif. SPLITP3 also creates a file of file names. Notice that it will ask for a path name so that the path can be included in the file of file names. This is the path to the directory in which the pattern files are stored\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\fi-560\li560\sb240\sa60\sl280\tx560\tx920 \b\f20 2.1\tab Creating a pattern file containing a weight matrix motif and a membership of a set motif.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par +2.\tab Select "Pattern definition mode" as "Use keyboard".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select "Results display mode" as "Inclusive". The alternatives are listed in the introduction.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Motif definition mode" as "Weight matrix"\par +5.\tab Define "Motif name". Each motif can be given an 8 character name\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Weight matrix file name". Type in the name of the file containing the weight matrix. The program will display the probability of finding the motif.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Select "Motif definition mode" as "Membership of a set".\par +8.\tab Define "Motif name".\par +9.\tab Select "Logical operator" as "AND". The alternatives are "OR" and "NOT".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Select "Number of reference motif". At this stage the only choice is 1 and this is the default.\par +11.\tab Define "Relative start position". The base position relative to the "Reference motif". See the introduction.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 12.\tab Define "Number of extra positions".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 13.\tab Select input mode as "Keyboard". The alternative is an existing file in the form of a weight matrix.\par +14.\tab Define "String". Type in the sets of allowed residue types using the one letter code. See note 1\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 15.\tab Define the "Minimum matches". This is the number of positions within the motif that must match. The default is that +all positions must match but users may want to allow some flexibility by giving a lower score.\par +\tab The program now cycles round to step 7 and all subsequent passes round the loop to add further motifs to the pattern would differ only in the details for the different motif "classes".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16.\tab Select "Pattern complete"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 17.\tab Accept "Save pattern in a file". The alternative does not save the pattern and so it can only be used once on the current sequence.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 18.\tab Define "Pattern definition file". Give a name for the new file.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 19.\tab "Define "Pattern title". All patterns can have a 60 character title that can be displayed when the pattern file is read and the sequence searched.\par +20.\tab Define "Weight matrix file name". The membership of a set motifs are stored in the form of weight matrices, and so the program needs the user to define a file name.\par +21.\tab Define "Title". Type in a title for the weight matrix like file. The title will be displayed when the file is read.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab The program will now display a detailed textual description of the pattern, the "Probability of finding the pattern" and the "Expected number of matches" (see figure 13.3).\par +22.\tab Define "Maximum pattern probability". Yes maximum\: any match with a greater probability of being found will be rejected. If no value is specified the search will be quicker (see notes).\par +\pard\plain \li1240\ri1360\sl220\pagebb\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Pattern searcher\par +\pard \li1240\ri1360\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select pattern definition mode\par +X 1 Use keyboard\par + 2 Use pattern file\par + 3 Use file of pattern file names\par +? Selection (1-3) (1) =1\par +Select results display mode\par +X 1 Motif by motif\par + 2 Inclusive\par + 3 Graphical\par + 4 SWISSPROT feature table\par +? Selection (1-4) (1) =2\par +Select motif definition mode\par +X 1 Exact match\par + 2 Percentage match\par + 3 Cut-off score and score matrix\par + 4 Cut-off score and weight matrix\par + 5 Direct repeat\par + 6 Membership of set\par + 7 Pattern complete\par +? Selection (1-7) (1) =4\par +? Motif name=atp\par +? Weight matrix file name=atpbinding.wts\par + ATP binding\par +Probability of score -47.8010 = 0.302E-04\par +Select motif definition mode\par + 1 Exact match\par + 2 Percentage match\par + 3 Cut-off score and score matrix\par +X 4 Cut-off score and weight matrix\par + 5 Direct repeat\par + 6 Membership of set\par + 7 Pattern complete\par +? Selection (1-7) (4) =6\par +? Motif name=hydro\par +Select logical operator\par +X 1 And\par + 2 Or\par + 3 Not\par +? Selection (1-3) (1) =\par +? Number of reference motif (1-1) (1) =\par +? Relative start position (-1000-1000) (20) =22\par +? Number of extra positions (0-1000) (0) =5\par +Select input mode\par +X 1 Keyboard\par + 2 File\par +? Selection (1-2) (1) =\par +Separate sets with commas\par +? String=ivl,ivl,,,rkhde\par +? Minimum matches (1.00-5.00) (3.00) =\par +Probability of score 3.000 = 0.145E-01\par +Select motif definition mode\par + 1 Exact match\par + 2 Percentage match\par + 3 Cut-off score and score matrix\par + 4 Cut-off score and weight matrix\par + 5 Direct repeat\par +X 6 Membership of set\par + 7 Pattern complete\par +? Selection (1-7) (6) =7\par +? Save pattern in a file (y/n) (y) =\par +? Pattern definition file=_paper.pat\par +? Pattern title=atpbinding plus\par +? Weight matrix file name=_hydro.wts\par +Weight matrix needs a title\par +? Title=hydrophobic and + spot\par +Pattern description\par +atpbinding plus\par +Motif 1 named atp is of class 4\par +Which is a match to a weight matrix with score -47.801\par +Motif 2 named hydro is of class 6\par +Which is membership of a set with score 3.000\par +It is anded with the previous motif.\par +Probability of finding pattern = 0.4368E-06\par +Expected number of matches = 0.1350E-02\par +? Maximum pattern probability (0.00-1.00) (1.00) =\par +? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par +{\f22\fs18 162\par +} GQRELIIGDRQTGKTALAIDAIINQR\par +Total matches found 1\par +\pard \li1240\ri1360\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Minimum and maximum observed scores -38.35 -38.35\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa300\sl240\tx1140 \f21\fs20 Figure 13.3\tab The creation and use of a pattern containing a weight matrix motif and a membership of a set motif.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 23.\tab +Define "Minimum pattern score". A minimum pattern score only makes sense if all the motifs in the pattern are defined with compatible scoring methods. For example membership of a set motifs and weight matrices using sums of logs are incompatible. Searc +hing will now commence and any matches displayed using the chosen method. In figure 13.3 we show a typical run i +n which a pattern containing a weight matrix and a membership of a set motif is created and stored on disk. Figure 13.4 shows the contents of the pattern file. \par +\pard\plain \li2260\ri2380\sb200\sl220\box\brsp100\brdrth \f4\fs16 atpbinding plus \par +\pard \li2260\ri2380\sl220\box\brsp100\brdrth A4 atp Class \par +atpbinding.wts \par + A6 hydro Class \par + 1 Relative motif\par + 22 Relative start position\par + 5 Number of extra positions\par +\pard \li2260\ri2380\sl220\keepn\box\brsp100\brdrth _hydro.wts \par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa40\sl240\tx1140 \f21\fs20 Figure 13.4\tab The pattern file created in the worked example shown in figure 13.3.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Searching a sequence using a pattern file\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par +2.\tab Select "Pattern definition mode" as "Use pattern file".\par +3.\tab Select "Results display mode" as "Inclusive"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab +Define "Pattern definition file". Type the name of the file containing the pattern. The program will read the file then display its title, a detailed textual description of the pattern, the "Probability of finding the pattern", and the "Expected number +of matches".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Maximum pattern probability". \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab D +efine "Minimum pattern score". Searching will now commence and any matches displayed using the chosen method. Figure 13.5 shows a typical run using a pattern file and output in the form of a SWISSPROT feature table.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Comparing a sequence against a library of patterns including PROSITE\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This mode of operation allows a sequence to be searched, in turn, for any number of patterns each stored in a separate pattern file. The names of the files containing the individual patterns must be stored in a simple text +file. This file is called "a file of pattern file names" and its name is the only user input required to define the search. The file of file names could contain references to entries in the PROSITE motif library and also include the names of other patterns +.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par +2.\tab Select "Pattern definition mode" as "Use file of pattern file names".\par +3.\tab Select "Results display mode" as "Inclusive"\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "File of pattern file names". Type the name of the file containing the list of pattern file na +mes. The program will read the file and then, in turn, all the pattern files it names. Each of these patterns will be compared against the current sequence but only those that give matches will produce any output. The pattern title and each match will be d +isplayed.\par +\pard\plain \li1240\ri1360\sb320\sl220\box\brsp40\brdrth \f4\fs16 Pattern searcher\par +\pard \li1240\ri1360\sl220\box\brsp40\brdrth Select pattern definition mode\par +X 1 Use keyboard\par + 2 Use pattern file\par + 3 Use file of pattern file names\par +? Selection (1-3) (1) =2\par +? Pattern definition file=_paper.pat\par +Select results display mode\par +X 1 Motif by motif\par + 2 Inclusive\par + 3 Graphical\par + 4 SWISSPROT feature table\par +? Selection (1-4) (1) =4\par + ATP binding sequences\par +Probability of score -47.8010 = 0.302E-04\par + hydrophobic and + spot\par +Probability of score 3.0000 = 0.145E-01\par +\par +Pattern description\par +\par + atpbinding plus\par +Motif 1 named atp is of class 4\par +Which is a match to a weight matrix with score -47.801\par +Motif 2 named hydro is of class 6\par +Which is membership of a set with score 3.000\par +It is anded with the previous motif.\par +Probability of finding pattern = 0.4368E-06\par +Expected number of matches = 0.1350E-02\par +? Maximum pattern probability (0.00-1.00) (1.00) =\par +? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par +\par +FT atp 162 187 Program\par +\par +Total matches found 1\par +\pard \li1240\ri1360\sl220\keepn\box\brsp40\brdrth Minimum and maximum observed scores -38.35 -38.35\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 13.5\tab Worked example of using a pattern file to search a sequence, and writing the results in the form of a SWISSPROT feature table.\par +\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.4\tab Searching libraries for patterns\par +\pard\plain \s4\qj\sa120\sl280 \f20 The program PIPL can be used to search whole sequence + libraries for patterns. Its use is similar to the pattern search routine described above, except that it does not have the facility for creating pattern files, so they must be created beforehand using PIP. In addition to its obvious application of finding + new occurrences of patterns or checking on their frequency it is a useful way of obtaining sequence alignments. It can restrict its search to a list of named entries or can search all but those on a list of entries. It can restrict its output to showing t +he highest scoring match in each sequence, but by default it will show all matches.\par +\pard \s4\qj\sa120\sl280 +Of its modes of output two require further description. The first "Padded sections" creates a new file for each match. The file will contain the sequence between and including the two outermost motifs in the pattern. It will be gapped to the furthest exten +t defined by the pattern, which means that if all the files were subsequently written one above the other all the motifs in the pattern would be exactly aligned, with the s +ections between them containing the requisite numbers of padding characters. The second such mode of output is called "Complete padded sequences". Here the user must know the maximum distance between the leftmost motif and the start of all the sequences th +at match. A trial run in which only the positions of matches are reported is usually required. The user gives this maximum distance to the program. The program then writes a new file containing the full length of all matching sequences, again maximally gap +ped (including their left ends) so that they would all align if written above one another. For both of these modes of output the files created are named "entryname" where "entryname" is the name given to the sequence in the sequence library. These modes ar +e best used with the option "Report all matches" rejected, so that only the best match for each sequence is reported. The sequences can be lined up using the sequence assembly program SAP.\par +\pard \s4\qj\sa120\sl280 The searches, which have recently been recoded, are very rapid. For + example a search of the current SWISSPROT library for a pattern defining the globin family as 6 weight matrices with widely varying gaps, finds only globins and takes less than 4 minutes using a single processor on an Alliant FX2800. This time includes re +ading in the whole library as stored in EMBL CDROM format.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select PIPL.\par +2.\tab Define "Name for results file."\par +3.\tab Select a library.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Search whole library". The alternatives are "Search only a list of entries" and "Search all but a list of entries" +. The files containing the list of entries should contain one entry name per line, left justified.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Results display mode" as "Inclusive". The alternatives include "Motif by motif", "Scores only", "Complete padded sequences" and "Padded sections".\par +6.\tab Accept "Report all matches". The alternative only shows the best match for each sequence.\par +7.\tab Define "Pattern definition file". The name of the file containing the pattern created using PIP. \par +\tab The program displays a textual description of the pattern and the expected number of matches per 1000 residues assuming an average amino acid composition.\par +8.\tab Define "Maximum pattern probability". The program will run much more quickly if none is given.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Define "Minimum pattern score".\par +\pard\plain \s4\qj\sa120\sl280 \f20 The search will start.\par +A typical run is shown in figure 13.6\par +\pard\plain \li1120\ri1280\sb200\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 PIPL (Protein interpretation program (library)) V4.1 Jul 1991\par +\pard \li1120\ri1280\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Author\: Rodger Staden\par +Searches protein libraries for patterns of motifs\par +\par +? Name for results file=globin.res\par +Select a library\par + 1 EMBL nucleotide library \par +X 2 SWISSPROT protein library \par + 3 Personal file in PIR format \par + 4 Personal file in FASTA format \par +? Selection (1-4) (2) =\par +Library is in EMBL format with indexes\par +Select a task\par +X 1 Search whole library \par + 2 Search only a list of entries \par + 3 Search all but a list of entries \par +? Selection (1-3) (1) =\par +Select results display mode\par +X 1 Motif by motif \par + 2 Inclusive \par + 3 Scores only \par + 4 Complete padded sequences\par + 5 Padded sections \par +? Selection (1-5) (1) =5\par +? (y/n) (y) Report all matches n\par +? Pattern definition file=globin.pat\par + globin 1 \par +Probability of score -34.5300 = 0.197E-02\par + globin 2 \par +Probability of score -44.6000 = 0.409E-02\par + globin 3 \par +Probability of score -75.1000 = 0.293E-01\par + globin 4 \par +Probability of score -36.1000 = 0.147E-01\par + globin 5 \par +Probability of score -73.7000 = 0.375E-01\par + globin 6 \par +Probability of score -55.9000 = 0.483E-01\par +\par +Pattern description\par + Globin pattern file \par +Motif 1 named g1 is of class 4\par +Which is a match to a weight matrix with score -34.530\par +Motif 2 named g2 is of class 4\par +Which is a match to a weight matrix with score -44.600\par +and the N-terminal residue can take positions 17 to 22\par +relative to the N-terminal end of motif 1\par +It is anded with the previous motif.\par +Motif 3 named g3 is of class 4\par +Which is a match to a weight matrix with score -75.100\par +and the N-terminal residue can take positions 27 to 35\par +relative to the N-terminal end of motif 2\par +It is anded with the previous motif.\par +Motif 4 named g4 is of class 4\par +Which is a match to a weight matrix with score -36.100\par +and the N-terminal residue can take positions 29 to 53\par +relative to the N-terminal end of motif 3\par +It is anded with the previous motif.\par +Motif 5 named g5 is of class 4\par +Which is a match to a weight matrix with score -73.700\par +and the N-terminal residue can take positions 12 to 16\par +relative to the N-terminal end of motif 4\par +It is anded with the previous motif.\par +Motif 6 named g6 is of class 4\par +Which is a match to a weight matrix with score -55.900\par +and the N-terminal residue can take positions 29 to 33\par +relative to the N-terminal end of motif 5\par +It is anded with the previous motif.\par +Probability of finding pattern = 0.6273E-11\par +Expected number of matches per 1000 residues = 0.2119E-03\par +? Maximum pattern probability (0.00-1.00) (1.00) =\par +\pard \li1120\ri1280\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth ? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 13.6\tab A typical run of PIPL using a pattern of 6 weight matrices to search the SWISSPROT library.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Preparing the PROSITE motif library for use by the programs\par +\pard\plain \s4\qj\sa120\sl280 \f20 Only the program SPLITP3 is essential for preparing the PROSITE library for use by our programs. \par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select SPLITP3\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Prosite library file". Type the name of the file containing the prosite library (usually PROSITE.DAT).\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +Define "Name for file of pattern file names". This is the file of file names that users will employ to search the whole library. It will be convenient for them if an environment variable is defined for this file name.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Path name of motif directory". This is the full path name, including the final /, to the directory in which the converted library will be stored.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab +The "exact match" motif class requires a consensus sequence. The "percentage match" motif class requires a consensus sequence and a cutoff score. The "score matrix" motif class uses the MDM78 matrix and requires a consensus sequence and a cutoff score. + The "weight matrix" search only requires the name of the file containing the matrix. The "direct repeat" motif class requires a repeat length, the minimum and maximum gap between the t +wo occurrences of the repeat, and a minimum score. The "membership of a set" motif class defines sets of residue types that are allowed at each position in the motif. When they are first entered into the pattern they are normally typed on the keyboard, but + when they are stored in a file, they are written in the same format as a weight matrix. To enter them on the keyboard use the following format. Type the one letter codes for the set of residue types allowed at each position terminated by a comma (,). For +positions where any residue type is allowed simply type an extra comma. For example VLI,FY,,,DE means any of Valine, Leucine or Isoleucine in the first position, either Phenylalanine or Tyrosine in the next position, anything in the next two positions, and + Aspartic acid or Glutamic acid in the next. When the pattern is stored on the disk the program will request a name for the file and a title for the motif.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab The details of the probabilty calculations are outside the scope of this article. They are quite +rapid and are essential both for assessing the statistical significance of any matches found and for allowing meaningful cutoffs to be applied to patterns. Obviously, in general, cutoff scores are inappropriate for patterns containing a mixture of motif cl +asses.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +The program calculates the "Probability of finding the pattern" and the "Expected number of matches". The first figure is actually the product of the individual motif probabilities but the latter figure is more useful because it takes into accoun +t the allowed variation in spacing between motifs and the length of the current sequence. In both cases the composition of the current sequence is also used so that different probabilities would be calculated for other sequences.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab +The pattern definition system is very flexible. Assume that a laboratory has a large library of patterns stored in its computer. Different groups or users may want to screen their sequences against different subsets of a pattern library. Each group ther +efore uses its own "File o +f pattern file names" which contains only the names of the pattern files that are relevant to their sequences. Of course a pattern may contain only one motif. Hence a library of patterns can include both simple and complex patterns. In the same way a labor +atory may have a large library of weight matrices defining different motifs and different users may want to combine them in different ways to produce their own patterns.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab Also, of course, a library does not have to be used solely for performing mass screenings\: + each individual entry can be used as a single pattern by giving the name of its pattern file - eg pathname/PS00002.PAT.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab +Note that 5 of the PROSITE motifs contains the symbols > or < which means that the motifs must appear exactly at the N or C termini of the sequences. Currently our methods have no mechanism for such definitions and, for example KDEL motifs, will be perm +itted to occur anywhere throughout a sequence.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4(1)}\:53-60.\par +2.\tab Staden, R. 1989. Methods for calculating the probabilities of finding patterns in sequences. {\i CABIOS} {\b 5(2)}\:89-96.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par +\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 14. Comparing Sequences\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par +2.\tab Methods\par +\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Producing a dot matrix plot (or list) of exact matches\par +2.2\tab Producing a dot matrix plot using the proportional algorithm\par +2.3\tab Producing a dot matrix plot using the quick scan algorithm\par +2.4\tab Producing a list of all matching segments using the proportional algorithm\par +2.5\tab Calculating the expected scores for the proportional algorithm\par +2.6\tab Calculating the observed scores for the proportional algorithm\par +2.7\tab Producing an optimal alignment\par +2.8\tab Comparing a sequence against a library of sequences\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par +4.\tab References\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par +\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we describe methods for comparing and aligning pairs of nucleic acid or protein +sequences. The program described (SIP), the original version of which was first described in 1982 (1), is based around several methods for producing "dot matrix" plots and includes routines for assessing the statistical significance of the plots, plus a d +ynamic programming algorithm for finding optimal alignments. At the end of the chapter we describe a program SIPL that is used for comparing a single sequence against a whole library of sequences.\par +\pard \s4\qj\sa120\sl280 We assume the reader is familiar with the general principl +e of dot matrix diagrams. The program uses a number of different algorithms to calculate the score for each point in a dot matrix and the user defines a minimum score so that only those points in the diagram for which the score is at least this value will +be marked with a dot. The first scoring method finds uninterrupted sections of perfect identity i.e. those that contain no mismatches, insertions or deletions. Generally this method, termed "the identities algorithm" is of limited value, but runs very qui +ckly. \par +\pard \s4\qj\sa120\sl280 +The second method looks for sections where a proportion of the characters in the sequence are similar, again allowing no insertions or deletions. For a thorough analysis this method, termed "the proportional algorithm", is the best. The original method, o +f this type was first described by McLachlan (2) and involves calculating a score for each position in the matrix by summing points found when looking forwards and backwards along a diagonal line of a given length (the window). The algorithm does no +t simply look for identity but uses a score matrix that contains scores for every possible pair of characters. For comparing amino acid sequences we usually use the score matrix MDM78 (3) which is shown in figure 14.1.. It is also possible to use other ma +trices, including an identity matrix for proteins. For nucleic acids we usually use an identity matrix.\par +\pard\plain \li220\ri280\sl220\box\brsp100\brdrth \f4\fs16 C S T P A G N D E Q B Z H R K M I L V F Y W - X ? \par +\pard \li220\ri280\sl220\box\brsp100\brdrth C 22 10 8 7 8 7 6 5 5 5 5 5 7 6 5 5 8 4 8 6 10 2 10 10 10 10\par +S 10 12 11 11 11 11 11 10 10 9 10 10 9 10 10 8 9 7 9 7 7 8 10 10 10 10\par +T 8 11 13 10 11 10 10 10 10 9 10 10 9 9 10 9 10 8 10 7 7 5 10 10 10 10\par +P 7 11 10 16 11 9 9 9 9 10 9 10 10 10 9 8 8 7 9 5 5 4 10 10 10 10\par +A 8 11 11 11 12 11 10 10 10 10 10 10 9 8 9 9 9 8 10 6 7 4 10 10 10 10\par +G 7 11 10 9 11 15 10 11 10 9 10 10 8 7 8 7 7 6 9 5 5 3 10 10 10 10\par +N 6 11 10 9 10 10 12 12 11 11 12 11 12 10 11 8 8 7 8 6 8 6 10 10 10 10\par +D 5 10 10 9 10 11 12 14 13 12 13 12 11 9 10 7 8 6 8 4 6 3 10 10 10 10\par +E 5 10 10 9 10 10 11 13 14 12 12 13 11 9 10 8 8 7 8 5 6 3 10 10 10 10\par +Q 5 9 9 10 10 9 11 12 12 14 11 13 13 11 11 9 8 8 8 5 6 5 10 10 10 10\par +B 5 10 10 9 10 10 12 13 12 11 13 11 11 10 10 8 8 6 8 5 7 4 10 10 10 10\par +Z 5 10 10 10 10 10 11 12 13 13 11 14 12 10 10 8 8 8 8 5 6 4 10 10 10 10\par +H 7 9 9 10 9 8 12 11 11 13 11 12 16 12 10 8 8 8 8 8 10 7 10 10 10 10\par +R 6 10 9 10 8 7 10 9 9 11 10 10 12 16 13 10 8 7 8 6 6 12 10 10 10 10\par +K 5 10 10 9 9 8 11 10 10 11 10 10 10 13 15 10 8 7 8 5 6 7 10 10 10 10\par +M 5 8 9 8 9 7 8 7 8 9 8 8 8 10 10 16 12 14 12 10 8 6 10 10 10 10\par +I 8 9 10 8 9 7 8 8 8 8 8 8 8 8 8 12 15 12 14 11 9 5 10 10 10 10\par +L 4 7 8 7 8 6 7 6 7 8 6 8 8 7 7 14 12 16 12 12 9 8 10 10 10 10\par +V 8 9 10 9 10 9 8 8 8 8 8 8 8 8 8 12 14 12 14 9 8 4 10 10 10 10\par +F 6 7 7 5 6 5 6 4 5 5 5 5 8 6 5 10 11 12 9 19 17 10 10 10 10 10\par +Y 10 7 7 5 7 5 8 6 6 6 7 6 10 6 6 8 9 9 8 17 20 10 10 10 10 10\par +W 2 8 5 4 4 3 6 3 3 5 4 4 7 12 7 6 5 8 4 10 10 27 10 10 10 10\par +- 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par +X 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par +? 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par +\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 14.1\tab The amino acid score matrix MDM78.\par +\pard\plain \s4\qj\sa120\sl280 \f20 +For the proportional method plotting dots at the centres of windows that reach the cutoff leads to a persistence effect that, to some extent, can be mitigated by a variation on the method. If, for example, all the high scoring amino acids are clustered at +the left end of a particular diagonal segment, dots will continue to be plotted to their right until the window score drops below the cutoff. Instead of plotting a single point for each window that reaches the cutoff score, the variant method plots p +oints for all the identities that lie in windows that reach the cutoff. Obviously the persistence effect can be more pronounced for long windows and low cutoff scores, but note that the variant method will plot nothing if there are no identities present, a +nd so similar regions could be missed! A further variant, useful for comparing a sequence against itself, ignores the main diagonal.\par +\pard \s4\qj\sa120\sl280 The third comparison method called "quick scan" is really a combination of the first two, and is similar to the FASTP prog +ram of Lipman and Pearson (4), but produces a dot matrix diagram. The algorithm is as follows. The dot matrix positions are found for all words of some minimum length (obviously length 1 is most sensitive) that are common to both sequences. Imagine a diago +nal line running from corner to corner of the diagram, at right angles to the diagonals in the dot matrix, The scores for the common words (according to the current score matrix, e.g. MDM78) are accummulated at the appropriate positions on that imaginary l +ine, hence producing a histogram. The histogram is analysed to find its mean and standard deviation. The diagonals that lie above some cutoff score (defined in standard deviation units), are rescanned using the proportional algorithm, and a diagram produce +d. The method is very fast, and is also employed by the library comparison program (see below).\par +\pard \s4\qj\sa120\sl280 \par +\pard \s4\qj\sa120\sl280 The dynamic programming alignment algorithm contained in the program is based on that of Myers and Miller (5). It guarantees to produce alignments with the opt +imum score given a score matrix, a gap start penalty, and a gap extension penalty. It is very useful to have the dot matrix methods and the alignment routine together in the same program because it allows users to produce a dot matrix diagram to help selec +t which regions of the sequence they wish to align. Selection is made by use of the crosshair. The crosshair is positioned first at the bottom left hand end of the segment to be aligned and then at the top right of the segment. When the alignment routine i +s selected the segment will be aligned. The alignment can replace the original segment of the sequence. By repeated plotting of dot matrices, followed by alignment, very long sequences can easily be aligned. \par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Producing a dot matrix plot (or list) of exact matches\par +\pard\plain \s4\qj\sa120\sl280 \f20 This method is relatively fast and can be useful for very similar sequences. It marks the position of every exact match of some minimum length with a dot or lists out the matching segments.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Apply identities algorithm".\par +2.\tab Define "Identity score". \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab +Select "Plot or List". The plot will appear as in figure 14.2, which shows a comparison of two protein sequences using a score of 2. Listed output displays the matching segments and defines their positions. \par +\pard\plain \li1700\sb300\sl220\keepn \f4\fs16 {{\pict\macpict\picw283\pich299 +112800000000012b011b001102ff0c00fffe0000003cb4bc003cb4bc0000000000fc00ef000000000001000a0000000000fc00ef0098801e0000000000fc00ef0000000000000000003cb4bc003cb4bc00000001000100010000000000000000000000000048c23f000000010000ffffffffffff0001000000000000000000 +0000fc00ef0000000000fc00ef000002e30006003fe5ff00f80f0020f6000020f8000020fc000104080d0020fa000302000004f0000048060020e50000081b0020fe00042000000802fb0002100002fe000040fd00031000000817012008fc000004fd000020fe00011001f9000301000008100020fb000001f600018080fa +00011008150020fe00014002f5000308000004fd000304000008060020e5000008110020f2000312010008fe000020fc0000080e0020f3000020fe000040f8000008060020e50000080d0320000020f3000001f7000008130020fb000040fa000080f90005200000100008110320000080f700042800000440f70000080b04 +2002000001ea00010808110020f7000080f90002012048fc000102080f0020fc000001f60000a0f8000102080a0020fb000020ec0000481c05200010000001fe000010fd000008fc0002010004fe0003800000080c012001f800010202f10000080b0020ea000002fe00010408140320000020f6000302000004fb000008fe +00000816022010c0f9000020fc00040200100008fc0002200008130020fc000002fe000080fa000010f800010808150320000010fe000002f9000048fd000020fa000008160020fd0005100000040080fa000008fa0003040080080c0020f20002040080f7000008140020fe00010104f600010440fb000001fe0000081200 +20f700041000808010f8000010fe0000080a0020ef000008f80000080a0020fa000010ed000008110020f3000040fb000710100000080000080a0020ee000010f90000080c012080fa00010802ef000008110020f6000040f8000780008000080000080e0020f5000002fe000010f60000080e0020f5000002fe000030f600 +0008180620200000100020fd00041081808010f8000010fe000008110020f6000020f800072000000400000408060020e50000080c0020f60002200004f300000814012410f60002200040fb000010fe000308000008100022f9000008f40006400400100010080a0020fd000080ea0000080c0020fd00010888ec00010108 +110320800004fe00040400001810f0000008130020f900010404fe000001fe000001f7000008060020e5000008100022f9000008f4000640060010000408080020e700020200080e0320800008fc00010802ef0000080a0020f5000008f2000008150020fd000308000040f6000080fd000408000001280d02200020fd0001 +0108ed000008150020fd000001fe000010fd000008f6000380000008160020f8000340000042fe00011002fe000002fb0000080a0020f3000080f40000080a0020f6000010f1000008080020e70002020008190020fe00014002fe0002100008f900044004000010fd000008140620200000100020fc000081f5000004fe00 +00080c0020fb000080ee0002040008100020fb000080f200010240fe00010108150020fa00018001fc0002040080fe000008fa0000080a0030f7000002f0000008100020fe000080fe0002010001ef000088100020fb000080f20006024000000401080c0020fa000021ef0002200008160020fe00042000000802fb000010 +fc0000c0fa0000080d0020f7000320000001f3000008150020fc000088fd000080f90002012008fc000102081a0320000020fa00070400400002000004fd0002080008fe0000080d0320800004fe000004ec000008130020fa00018001fc000004fd000020f9000008140020f30002400080fe00010810fe00030800000814 +0620200000100020fc000081f5000004fe0000080e0020f5000080f8000001fc000028120020f8000008fe000004f9000002fc000008100020fb000080f200010240fe000101080d012008ec0002080002fe000008060020e5000008100020f5000080fa0002800001fc000028060020e50000080a0020ee000080f9000008 +0a0020f0000004f7000008120020fe000002f70000c0fa000006fc00000811042000020002f1000080fd000320000008160020fd000004fc000020fc0002020004fa0002200008060020e5000008180020fe0004200000080afb000030fe0002400040fa00000816042000800040fe000040f5000010fe000020fe0000080a +0020ee000080f90000080f0020f00002080080fc000304000008140020fd00051000000400c0fa000018f800018008120030f7000002fb000080fe000080fb000008060020e50000080f05208010000008f3000080f9000008160020fe000080fe000080f9000020fa000401000200080e0020f5000008f6000004fe000008 +11072000200008000001fb000040f30000080d0320000080f9000020f10000081c042000040001fc000320000804fe000004fd000702400001404000080a0020fc000010eb0000080e0020fa000010f7000004f80000080d0020fa000004f0000380004008160020fe00046002000802fb000010fc000044fa000008120020 +fb000010fc000010fc000010f80000080a0020f5000001f2000008150020fd000411c8000020fe000302000020f5000008150020fc000080f90002010040fb00050400008000081e0320000002fe00071000002000081082fe00040210000002fd000280000812052010c0000004f5000340100008fa0000080b012802eb00 +0004fd0000081b042000040001fc000320000804fc000a40020002400000404000080a0020fc000010eb000008090320000008e8000008120020f700042000000108fa000080fc000008060020e5000008100020fa000021f7000008fa0002200008180020fc000080f90005010040000004fe0005040400800008110020f2 +000304000020fc00040420000008140020fe000080fc000002fe000010f50002100008140920020000011140000020fb000020f600010808140020fe000080fc000002fe000010f500021010080a0020f4000010f3000008060020e5000008120020fe000001f5000004fa000001fe0000081002200004fe000080f6000004 +f7000008140020fd0005100000040080fa000008f8000180081b0320000002fb00042000081082fe000006fe000002fd00028000081c0020fc000040fd000610008080100010fe000004fe000010fe0000080b0020f200018080f6000008090320000010e8000008150320000010fe000002f9000048fd000020fa0000080a +0020f2000040f500000807012004e6000008140020f90002100008f9000040fe000010fd000008060020e5000008140320000020f6000302000004fb00000cfe0000080e022010c0f10002100008fa0000080e0020fd000040ef000008fd0000081402200020fe00040201000010f9000001f800000810012020f800010180 +fb000002f8000008140020fe000080fe0002010001f8000080f9000088060020e5000008090320000080e80000080a0020f2000002f5000008070020e6000120080b042002001001ea00010808120020f9000008fb000020fb000081fc000008190020fe000080fe000080f9000020fd000304000001fe0000081103200000 +80f7000028fe000040f700000811072000200040020001f4000004fa000008100020ef000640020000080002fe0000081a042000800040fe000040fa000004fd000010fe000020fe0000080d0020fa000004f0000380004008120020f8000001fc000001fc000010fb000008130020fb000001fd000010fb000080f9000110 +08110320000004fe00040400001810f00000080d0320800004fe000004ec0000080e02200010f00002010004fb000008140022f900000cfd000002f9000640040010000008060020e5000008060020e5000008160020fc000020fe000040fb00040810000040fa000008130020fc000002fe000080fa000010f80001080811 +072000200040020001f4000004fa000008160020fe0002200008fc000008fb0002400202fa0000080c0020ed00010208fc000140081402200004fe000080f4000004fd000004fe0000080f0020fe00014002f2000004fa000008180020fd00014002fe000080fa000010fc000008fe00010808130020fc000001f60004a000 +000208fc000142080c0020eb0002080002fe000008060020e50000080c0020f4000040f50002040008180020fe0002020002fe000010fd000040fa000004fc0000081002200020fd000001f6000001f8000008120021fd000040fc000080f5000008fd000008190020fd000610000004008080fb00040800012008fc000182 +081605200000010004fb000020fd000340008018f9000008150020fc000080f90002010040fb0005040000800008100022f9000008f40006500400100200080a0030f7000001f0000008110020f3000040fb000010fe000308000008130022f9000308000020f7000640040010000008060020e50000080f0020fc00010402 +f000044000080008160020f5000010fe000020fe00080100200002010000080a0020f1000020f6000008110320000804fe00040400001810f0000008060020e50000081a0020fe000080fe000080fe000020fd000020fa000001fe000008130320000080f90002200028fe000040f70000080f0028f90002220008f3000304 +021008110320000080f7000028fe000040f7000008140020fd00018001f60000a0fe000040fc000102080c0020fa000021ef0002200008110020fd000302000010fd000001f2000008140620000080000088fa000028fe000040f70000080a0020e9000004fe00000812012020f7000080fb000302000040fb0000080b0520 +8010000008ea000008120020fd000411c8000020fb000020f5000008120020fc000010fb000080f8000001fc000028140020fc0005020002008004fb000010f800010848160020fe00044002000010fd000001fa000004fa0000080e0020fd000004f4000004f8000008120020fd0002040001f700014080f9000110081100 +20fc000304000040f9000001f70000081605200000010004fb000030fd000340000010f9000008120020fe000002f7000040fa000004fc000008140020fd000040f7000304002020fb0003200000081302200004fe00018080f200010240fe00010108150020fd000001fe000010fd000008f60003800000080a0020f20000 +80f50000081102200044fe00018220f5000040f90000080c02200040ed000004fc0000080d0020f0000008fa0003040000080e0020fe000008f8000040f3000008160320810004fe000004fe00010202f8000004fb0000080a0020f6000040f100000813012001f800018002fa0002012008fc000102080c0020ed00010208 +fc000140080e0020fc000088f0000040fd0000080a0020e9000008fe0000080e0020fa000008f1000008fe0000081605200000010004fb000020fd000340400010f90000080a0020e9000004fe000008160020fe000080fe000080f90000a0fa000001fe000008090320040010e8000008060020e50000080a0020fa000008 +ed0000081605200000010004fb000020fd000340400010f90000081c0320000002fb00042800081086fe000002fe00010202fe00028000080a0020f3000080f4000008120020fc000040f7000010fe000004fa0000080e0020fd000004f4000004f80000081b0020fe000620000008020008fe00010410fc0002400002fc00 +00080a0020f2000008f5000008060020e5000008140020fb000008f9000304020020fb0003300000081408202000001000200021fe000081f30002200008190020fe00042000080802fe000308000010fc000042fa0000080f0020f90002220008f30003040000081002200020fd000001f6000001f80000080f012001fb00 +0008fe000002f1000008190020fc000002fe000080fa000010fe000080fe0003080008081b0020fd0008100000040080000080fd000008fd000001fd0001802806003fe5ff00f80000ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb80\sa400\sl240\tx1140 \f21\fs20 Figure 14.2\tab A dot-matrix for two related protein sequences using the "Identities algorithm" and a score of 2. Notice that the similarity is not apparent. \par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Producing a dot matrix plot using the proportional algorithm\par +\pard\plain \s4\qj\sa120\sl280 \f20 This method gives the most thorough analysis.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Apply proportional algorithm".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Odd window length". The size of window over which the scores for each point are summed.\par +3.\tab Define "Proportional score". All points achieving at least this score will be marked with a dot in the diagram.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 14.3.\par +\pard\plain \qj\li1700\sb300\sl480\keepn \f4\fs16 {{\pict\macpict\picw283\pich301 +08a200000000012d011b001102ff0c00fffe0000003c32b0003c32b00000000000fc00ed000000000001000a0000000000fc00ed0098801e0000000000fc00ed0000000000000000003c32b0003c32b000000001000100010000000000000000000000000048ae57000000010000ffffffffffff0001000000000000000000 +0000fc00ed0000000000fc00ed000002e30006007fe5ff00f0060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100b0040f200010180f60000100a0040f2000003f50000100a0040f2000006f50000100a0040f2000004f50000100d0340000020f5000008f5000010090340000020e800 +0010060040e5000010090340000080e80000100802400001e70000100802400003e7000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000 +10060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f2000060f50000100a0040f2000040f5000010060040e50000100a0040ea000040fd0000100c0040ec0002040080fd000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040 +e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040eb000080fc0000100a0040ec000001fb0000100a0040ec000002fb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000 +10060040e50000100a0040f9000010ee0000100a0040f9000030ee0000100a0040f9000060ee0000100a0040f90000c0ee0000100e0040f9000080fc000020f40000100a0040eb000040fc000010060040e5000010060040e5000010060040e5000010060040e50000100a0040ee000004f9000010060040e5000010060040 +e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f0000002f7000010060040e5000010 +060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010070040e600018010060040e5000010060040e50000100b0040fd000101 +80eb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f4000002f30000100a0040f4000006f30000100a0040f400000cf30000100a0040f4000008f30000100a0040f4000010f30000100a0040f4 +000030f30000100a0040f4000060f30000100a0040f4000040f3000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010090040e8000301000010060040e5 +000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f7000004f00000100a0040f7000004f0000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5 +000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f9000008ee000010060040e5000010060040e50000100a0040f9000020ee0000100a0040f9 +000040ee0000100a0040f9000080ee0000100a0040fa000001ed0000100a0040fa000002ed0000100a0040fa000004ed0000100a0040fa000004ed0000100a0040fa000008ed000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500 +0010060040e5000010060040e5000010060040e5000010060040e50000100a0040fb000040ec0000100a0040fb000080ec0000100a0040fc000001eb0000100a0040fc000002eb0000100a0040fc000006eb0000100a0040fc000004eb0000100a0040fc000008eb0000100a0040fc000010eb0000100a0040fc000020eb00 +00100a0040fc000060eb0000100a0040fc000080eb0000100b0040fd00010180eb0000100a0040fd000001ea0000100a0040fd000002ea0000100a0040fd000004ea0000100a0040fd000008ea0000100a0040fd000008ea0000100a0040fd000010ea0000100a0040fd000020ea0000100a0040fd000040ea000010060040 +e50000100a0040fe000001e9000010060040e50000100a0040fe000002e90000100a0040fe000004e90000100a0040fe000008e9000010060040e50000100e0040fe000010f0000040fb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500001006 +0040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100d0040fd000303000020ed000010060040e5000010060040e5000010060040e50000100a0040fc00000ceb0000100a0040fc00 +0008eb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500001006007fe5ff00f00000ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 14.3\tab +A dot-matrix for the two related protein sequences shown in figure 14.2, but here using the "Proportional algorithm" with a window of 21 and a score of 240. Notice that the similarity is now apparent. \par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Producing a dot matrix plot using the quick scan algorithm\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This method is very fast. Using the current score matrix it accumulates the scores for all the exact matches that lie on each diagonal. The mean diagonal score and its standard deviation is calculated, and those diagonals that have scores more than a chose +n number of standard deviations above the mean are rescanned using the proportional algorithm and the points above the proportional algorithms cutoff are plotted.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Apply quick scan algorithm".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Identity score". The minimum number of consecutive identical sequence symbols that count as a match.\par +3.\tab Define "Odd window length". The size of window over which the scores for each point are summed when the proportional algorithm is applied to the best diagonals.\par +4.\tab Define "Proportional score". For the best diagonals all points achieving at least this score will be marked with a dot in the diagram.\par +5.\tab Define "Number of s.d. above mean". Diagonals with scores above the minimum number of standard deviations are rescanned using the proportional algorithm.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 14.4.\par +\pard\plain \qj\li1720\sb300\sl480\keepn \f4\fs16 {{\pict\macpict\picw283\pich301 +07fa00000000012d011b001102ff0c00fffe0000003c32b0003c32b00000000000fc00ed000000000001000a0000000000fc00ed0098801e0000000000fc00ed0000000000000000003c32b0003c32b0000000010001000100000000000000000000000000491cbd000000010000ffffffffffff0001000000000000000000 +0000fc00ed0000000000fc00ed000002e30006007fe5ff00f0060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5 +000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500 +0010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000 +10060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010 +060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500001006 +0040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100600 +40e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040 +e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5 +000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500 +0010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f9000008ee000010060040e5000010060040e50000100a0040f9000020ee0000100a0040f9000040ee0000100a0040f9000080ee0000100a0040fa +000001ed0000100a0040fa000002ed0000100a0040fa000004ed0000100a0040fa000004ed0000100a0040fa000008ed000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000 +10060040e50000100a0040fb000040ec0000100a0040fb000080ec0000100a0040fc000001eb0000100a0040fc000002eb0000100a0040fc000006eb0000100a0040fc000004eb0000100a0040fc000008eb0000100a0040fc000010eb0000100a0040fc000020eb0000100a0040fc000060eb0000100a0040fc000080eb00 +00100b0040fd00010180eb0000100a0040fd000001ea0000100a0040fd000002ea0000100a0040fd000004ea0000100a0040fd000008ea0000100a0040fd000008ea0000100a0040fd000010ea0000100a0040fd000020ea0000100a0040fd000040ea000010060040e50000100a0040fe000001e9000010060040e5000010 +0a0040fe000002e90000100a0040fe000004e90000100a0040fe000008e9000010060040e50000100a0040fe000010e9000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000 +10060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010 +06007fe5ff00f00000ff}}\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 14.4\tab +A dot-matrix for the two related protein sequences shown in figures 14.2 and 14.3, but here using the "Quick scan algorithm" with an identity score of 1 and a window of 21 and a score of 240 for the proportional algorithm. Notice that the simil +arity is now apparent but the absence of background "noise" is misleading.\par +\pard\plain \s6\fi-540\li560\sb240\sa60\sl280\tx860 \b\f20 2.4\tab Producing a list of all matching segments using the proportional algorithm\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "List matching segments".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Odd window length". The size of window over which the scores for each point are summed.\par +3.\tab Define "Proportional score". All segments achieving at least this score will be listed out with the two sequences written one above the other. See figure 14.5.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Calculating the expected scores for the proportional algorithm\par +\pard\plain \s4\qj\sa120\sl280 \f20 This function calculates the probability of achieving each possible score using the proportional algorithm. Hence it provides a method of setting + cutoff scores and assessing the statistical significance of the scores found. The algorithm calculates the "Double matching probability" described by McLachlan (2) which is defined as the probability of finding the scores in two infinitely long sequences +of the same composition as the pair being compared. It is very much faster than the alternative of repeatedly scrambling and recomparing the sequences. The program offers three ways for the user to see the results of the calculation\: + the user can type a \par +\pard\plain \li2320\ri2720\sl220\box\brsp100\brdrth \f4\fs16 List matching segments\par +\pard \li2320\ri2720\sl220\box\brsp100\brdrth ? Odd window length (1-401) (11) =\par +? Proportional score (1-567) (252) =\par +Working\par + 62\par +GLRRGLDVKDLEHPIEVPVGK\par +DLAEGMKVKCTGRILEVPVGR\par + 81\par + 63\par +LRRGLDVKDLEHPIEVPVGKA\par +LAEGMKVKCTGRILEVPVGRG\par + 82\par + 65\par +RGLDVKDLEHPIEVPVGKATL\par +EGMKVKCTGRILEVPVGRGLL\par + 84\par + 66\par +GLDVKDLEHPIEVPVGKATLG\par +GMKVKCTGRILEVPVGRGLLG\par + 85\par + 67\par +LDVKDLEHPIEVPVGKATLGR\par +MKVKCTGRILEVPVGRGLLGR\par +\pard \li2320\ri2720\sl220\keepn\box\brsp100\brdrth 86\par +\pard\plain \s8\qj\fi-1140\li1140\sb60\sa400\sl240\tx1140 \f21\fs20 Figure 14.5\tab A typical run of "List matching segments.\par +\pard\plain \s4\qj\sa120\sl280 \f20 score and the program will display its probability; the user can type a probability and the program will display the corresponding score, alternatively the program will list the full range of scores and probabilities. +\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate expected scores".\par +2.\tab Define "Odd window length".\par +\tab The calculation takes a noticeable time.\par +3.\tab Select "List scores and probabilities".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Number of steps between scores". This allows, say, every fifth score to be listed if the user defines the number of steps to be 5. The list will appear as in figure 14.6.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Calculating the observed scores for the proportional algorithm\par +\pard\plain \s4\qj\sa120\sl280 \f20 +This function applies the proportional algorithm, but instead of producing a dot matrix it accumulates the scores and their frequencies of occurrence. It provides a method of setting cutoff scores and assessing the statistical significance of the scores fo +und. The program offers three ways for the user to see the results of the calculation\: the user can type a score and the program will display its frequency; the user can type a frequency and the progra +m will display the corresponding score, alternatively the program will list the full range of scores and frequencies. The frequencies are expressed as percentages.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate observed scores".\par +2.\tab Define "Odd window length".\par +\tab The calculation takes a noticeable time.\par +\pard\plain \li1320\ri1300\sl220\box\brsp100\brdrth \f4\fs16 Calculate expected proportional scores\par +\pard \li1320\ri1300\sl220\box\brsp100\brdrth ? Odd window length (1-401) (21) =\par +Working\par +Average score= 196.99062\par +Select probability display mode\par + 1 Show probability for a score\par +X 2 Show score for a probability\par + 3 List scores and probabilities\par +? Selection (1-3) (2) =3\par +? Number of steps between scores (1-10) (5) =\par +\par + 5 0.10000E+01 200 0.40004E+00 395 0.00000E+00\par + 10 0.10000E+01 205 0.24037E+00 400 0.00000E+00\par + 15 0.10000E+01 210 0.12555E+00 405 0.00000E+00\par + 20 0.10000E+01 215 0.56905E-01 410 0.00000E+00\par + 25 0.10000E+01 220 0.22402E-01 415 0.00000E+00\par + 30 0.10000E+01 225 0.76821E-02 420 0.00000E+00\par + 35 0.10000E+01 230 0.23031E-02 425 0.00000E+00\par + 40 0.10000E+01 235 0.60614E-03 430 0.00000E+00\par + 45 0.10000E+01 240 0.14064E-03 435 0.00000E+00\par + 50 0.10000E+01 245 0.28888E-04 440 0.00000E+00\par + 55 0.10000E+01 250 0.52741E-05 445 0.00000E+00\par + 60 0.10000E+01 255 0.85917E-06 450 0.00000E+00\par + 65 0.10000E+01 260 0.12534E-06 455 0.00000E+00\par + 70 0.10000E+01 265 0.16433E-07 460 0.00000E+00\par + 75 0.10000E+01 270 0.19425E-08 465 0.00000E+00\par + 80 0.10000E+01 275 0.20772E-09 470 0.00000E+00\par + 85 0.10000E+01 280 0.20155E-10 475 0.00000E+00\par + 90 0.10000E+01 285 0.17801E-11 480 0.00000E+00\par + 95 0.10000E+01 290 0.14353E-12 485 0.00000E+00\par + 100 0.10000E+01 295 0.10599E-13 490 0.00000E+00\par + 105 0.10000E+01 300 0.71886E-15 495 0.00000E+00\par + 110 0.10000E+01 305 0.44920E-16 500 0.00000E+00\par + 115 0.10000E+01 310 0.25938E-17 505 0.00000E+00\par +\pard \li1320\ri1300\sl220\keepn\box\brsp100\brdrth 120 0.10000E+01 315 0.13881E-18 510 0.00000E+00\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa500\sl240\tx1140 \f21\fs20 Figure 14.6\tab A typical run of "Calculate expected proportional scores." The scores are listed in three columns alongside their probabilities. e.g. score 250 has a probability 0.527x10 +{\up6 -5}{\plain \b\f20 .}{\up6 \par +}\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 3.\tab Select "List scores and percentages".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Number of steps between scores". This allows, say, every fifth score to be listed if the user defines the number of steps to be 5. The list will appear as in figure 14.7.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.7\tab Producing an optimal alignment\par +\pard\plain \s7\qj\sa120\sl280\tx0 \f20 This function produces an optimal alignment for any segments of the two sequences +using the algorithm of Myers and Miller (5). It guarantees to produce alignments with the optimum score, given a score matrix, a "gap start penalty" and a "gap extension penalty". That is starting a gap costs a fixed penalty F and each residue added to the + gap costs a further penalty E, so for \par +\pard\plain \li1980\ri2060\sb400\sl220\box\brsp100\brdrth \f4\fs16 Calculate observed proportional scores\par +\pard \li1980\ri2060\sl220\box\brsp100\brdrth ? Odd window length (1-401) (21) =\par +Working\par +Maximum observed score is 285\par +Select score display mode\par +X 1 Show percentage reaching a score\par + 2 Show score for a percentage\par + 3 List scores and percentages\par +? Selection (1-3) (1) =3\par + ? Number of steps between scores (1-10) (5) =\par + 156 236949 0.99998E+02\par + 161 236938 0.99993E+02\par + 166 236792 0.99932E+02\par + 171 235882 0.99548E+02\par + 176 232582 0.98155E+02\par + 181 222875 0.94058E+02\par + 186 203232 0.85769E+02\par + 191 171507 0.72380E+02\par + 196 131216 0.55376E+02\par + 201 89194 0.37642E+02\par + 206 52791 0.22279E+02\par + 211 27315 0.11528E+02\par + 216 12117 0.51137E+01\par + 221 4890 0.20637E+01\par + 226 1774 0.74867E+00\par + 231 656 0.27685E+00\par + 236 263 0.11099E+00\par + 241 111 0.46845E-01\par + 246 66 0.27854E-01\par + 251 36 0.15193E-01\par + 256 23 0.97065E-02\par + 261 16 0.67524E-02\par + 266 15 0.63303E-02\par + 271 10 0.42202E-02\par + 276 6 0.25321E-02\par +\pard \li1980\ri2060\sl220\box\brsp100\brdrth 281 2 0.84405E-03\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 14.7\tab +A typical run of "Calculate observed scores." The scores are followed by their observed number of occurrences expressed both absolutely and as a percentage of the total number of points.\par +\pard\plain \s4\qj\sa120\sl280 \f20 +gap of length K residues the penalty is F + KE. Gaps at the ends of sequences incur no penalty. The size of the segments of sequence that can be aligned at once is limited to 5000 characters. The user can select the start and end of the segments by use of +the crosshair simply by clicking on any dot matrix plot. After the alignment has been produce the user can elect to have it replace the original sequence segments. By alternate use of dot matrix plotting and alignment, very long sequences can be aligned. +\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Align sequences". The crosshair will appear in the graphics window. \par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Position the crosshair on the bottom left of the segment to be aligned and hit the space bar on the keyboard. The bell will ring.\par +3.\tab Position the crosshair on the top right of the segment to be aligned and hit the space bar on the keyboard. The bell will ring.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Penalty for starting each gap".\par +5.\tab Define "penalty for each residue in gap".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab A noticeable time will elapse before the alignment is displayed on the screen. A typical alignment is shown in figure 14.8.\par +6.\tab Reject "Keep alignment". If the alignment is "kept" the padded sequences from the alignment will replace the original sequences in the active region.\par +\pard\plain \li480\ri540\sl220\box\brdrth \f4\fs16 Align the sequences\par +\pard \li480\ri540\sl220\box\brdrth Aligning region 1 to 461\par + with region 1 to 514\tab \tab Working\par + V 1 11 21 31 41 51\par + MA--TGKIVQ VIGA------ VVDVEFPQDA VPRVYDALEV QNG------N ERLVL-----\par + * * * ** * * * * *\par + MQLNSTEISE LIKQRIAQFN VVSEAHNEGT IVSVSDGVIR IHGLADCMQG EMISLPGNRY\par + H 1 11 21 31 41 51\par + V 61 71 81 91 101 111\par + EVQQQLGGGI VRTIAMGSSD GLRRGLDVKD LEHPIEVPVG KATLGRIMNV LGEPVDMKGE\par + * * ** * * ** ***** *** * ** * * **\par + AIALNLERDS VGAVVMGPYA DLAEGMKVKC TGRILEVPVG RGLLGRVVNT LGAPIDGKGP\par + H 61 71 81 91 101 111\par + V 121 131 141 151 161 171\par + IGEEERWAIH RAAPSYEELS NSQELLETGI KVIDLMCPFA KGGKVGLFGG AGVGKTVNMM\par + * ** * ** * * * * * * ***\par + LDHDGFSAVE AIAPGVIERQ SVDQPVQTGY KAVDSMIPIG RGQRELIIGD RQTGKTALAI\par + H 121 131 141 151 161 171\par + V 181 191 201 211 221 231\par + ELIRNIAIEH SGYS-VFAGV GERTREGNDF YHEMTDSNVI DKVSLVYGQM NEPPGNRLRV\par + * * ** * * *\par + DAI--INQRD SGIKCIYVAI GQKASTISNV VRKLEEHGAL ANTIVVVATA SESAALQYLA\par + H 181 191 201 211 221 231\par + V 241 251 261 271 281 291\par + ALTGLTMAEK FRDEGRDVLL FVDNIYRYTL AGTEVSALLG RMPSAVGYQP TLAEEMGVLQ\par + * * *** * * * * * * ** * * *\par + RMPVALMGEY FRDRGEDALI IYDDLSKQAV AYRQISLLLR RPPGREAFPG DVFYLHSRLL\par + H 241 251 261 271 281 291\par + V 301 311 321 331 341 351\par + ERITST---- ---------- -KTGSITSVQ AVYVPADDLT DPSPATTFAH LDATVVLSRQ\par + ** **** * * * * * *\par + ERAARVNAEY VEAFTKGEVK GKTGSLTALP IIETQAGDVS AFVPTNVISI TDGQIFLETN\par + H 301 311 321 331 341 351\par + V 361 371 381 391 401 411\par + IASLGIYPAV DPLDSTSRQL DPLVVGQEHY DTAR----GV QSILQRYQEL KDIIAILGMD\par + ** *** * * ** * * * * * **\par + LFNAGIRPAV NPGISVSR-- ---VGGAAQT KIMKKLSGGI RTALAQYREL AAFSQFAS--\par + H 361 371 381 391 401 411\par + V 421 431 441 451 461 471\par + ELSEEDKLVV ARARKIQRFL SQ----PFFV AE----VFTG SPGKYVSLKD --TIRGFKGI\par + * * * * * * * * * * * *\par + DLDDATRKQL DHGQKVTELL KQKQYAPMSV AQQSLVLFAA ERG-YLADVE LSKIGSFEAA\par + H 421 431 441 451 461 471\par + V 481 491 501 511 521\par + MEG--EYDHL P-EQAFYMVG SIEEAVE--- --------KA KKL*\par + ** * * * * *\par + LLAYVDRDHA PLMQEINQTG GYNDEIEGKL KGILDSFKAT QSW*\par + H 481 491 501 511 521\par +Conservation 22.5%\par +\pard \li480\ri540\sl220\keepn\box\brdrth Number of padding characters inserted 63 and 10\par +\pard\plain \s8\qj\fi-1140\li1140\sb60\sa300\sl240\tx1140 \f21\fs20 Figure 14.8\tab A typical output from "Align the sequences". The horizontal and vertical sequences are labelled H and V.\par +\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.8\tab Comparing a sequence against a library of sequences\par +\pard\plain \s4\qj\sa120\sl280 \f20 +The program SIPL is used for comparing a probe sequence against a whole library of sequences. The searches are very fast and use the "Quick scan" algorithm described above to produce a list of matching sequences sorted in score order, and optionally, this +is followed by the production of optimal alignments using the Myers and Miller (5) algorithm. The program will search the whole of a library or restrict its search using a list of entry names. The list of + entry names can be used either as a list of sequences to search or conversely as a list of sequences to exclude from a search.\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select SIPL.\par +2.\tab Select "Personal file".\par +3.\tab Select "Format".\par +4.\tab Define "Name of sequence file". The name of the file containing the probe sequence.\par +5.\tab Define "Name of results file".\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Display alignments". The alternative will stop after producing a list of the best matching sequences.\par +7.\tab Define "Minimum library sequence length". This permits the search to skip sequences that are too short to be of interest.\par +8.\tab Define "Maximum number of scores to list". The maximum number of sequences that will be included in the results file.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab +Define "Identity score". This is the minimum number of consecutive sequence characters that will be counted as a match. Only matches of at least this length will be included in the overall score. For proteins maximum sensitivity is gained using a value +of 1, but for nucleic acids values of 4 or 6 are necessary to achieve reasonable speed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define "Number of sd above mean". This means the number of standard deviations above the mean that a diagonal must score in order for it to be scanned using the proportional algorithm.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab Define "Odd window length". This is the window size for the rescanning of high scoring diagonals using the proportional algorithm.\par +12.\tab Define "Proportional score". The score used by the proportional algorithm. It depends on the window length and the score matrix.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 13.\tab Define "Minimum global score". This is the total score achieved using the proportional algorithm when all the diagonals scoring the defined number of standard deviations above the mean, are rescanned. +\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 14.\tab Define "Penalty for starting a gap". This is for the alignment algorithm.\par +15.\tab Define "Penalty for each residue in gap". See above.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16.\tab Select a library to search. The default library will reflect the composition of the probe sequence. That is, a probe sequence that is less than 85% acgt will be guessed to be a protein.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 17.\tab Select "Search whole library". The alternatives allow the search to be restricted using a list of entry names.\par +\pard\plain \s4\qj\sa120\sl280 \f20 The search will start. A large number of parameters are required but for normal use the default value can be taken for them all. A worked example is shown in figure 14.9.\par +\pard\plain \li220\ri240\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 SIPL (Similarity investigation program (Library)) V3.0 June 1991\par +\pard \li220\ri240\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Author\: Rodger Staden\par + Compares a probe protein or nucleic acid\par + sequence against a library of sequences\par +\par +Select probe sequence\par + Select sequence source\par + X 1 Personal file \par + 2 Sequence library\par + ? Selection (1-2) (1) =2\par + Select a library\par + 1 EMBL nucleotide library \par + X 2 SWISSPROT protein library\par + 3 PIR protein library \par + ? Selection (1-3) (2) =\par +Library is in EMBL format with indexes\par + Select a task\par + X 1 Get a sequence \par + 2 Get annotations \par + 3 Get entry names from accession numbers \par + 4 Search titles for keywords \par + 5 Search keyword index for keywords \par + ? Selection (1-5) (1) =\par + ? Entry name=bacr$halha\par +DE BACTERIORHODOPSIN PRECURSOR (BR) (GENE NAME\: BOP). \par + Sequence length= 262\par + Sequence composition\par + A C S T P A G N D E Q B Z H\par + N 0. 14. 19. 12. 30. 26. 3. 10. 11. 4. 0. 0. 0.\par + % 0.0 5.3 7.3 4.6 11.5 9.9 1.1 3.8 4.2 1.5 0.0 0.0 0.0\par + W 0. 1219. 1921. 1165. 2132. 1483. 342. 1151. 1420. 513. 0. 0. 0.\par +\par +A R K M I L V F Y W - X ? \par +N 7. 7. 10. 15. 39. 23. 13. 11. 8. 0. 0. 0. 0.\par +% 2.7 2.7 3.8 5.7 14.9 8.8 5.0 4.2 3.1 0.0 0.0 0.0 0.0\par +W 1093. 897. 1312. 1697. 4413. 2280. 1913. 1795. 1490. 0. 0. 0. 0.\par +Total molecular weight= 28256.254\par +? Results file=sipl.res\par +? Display alignments (y/n) (y) =\par +? Minimum library sequence length (10-20000) (209) =\par +? Maximum number of scores to list (1-10000) (20) =10\par +? Identity score (1-3) (1) =\par +? Number of sd above mean (0.00-10.00) (3.00) =\par +? Odd window length (1-31) (11) =\par +? Proportional score (1-297) (132) =\par +? Minimum global score (1-69168) (1729) =\par +? Penalty for starting a gap (1-100) (10) =\par +? Penalty for each residue in gap (1-100) (10) =\par +Select a library\par + 1 EMBL nucleotide library \par +X 2 SWISSPROT protein library\par + 3 PIR protein library \par + 4 Personal file in PIR format \par +? Selection (1-4) (2) =\par +Library is in EMBL format with indexes\par +Select a task\par +X 1 Search whole library \par + 2 Search only a list of entries \par + 3 Search all but a list of entries \par +? Selection (1-3) (1) =3\par +? File of entry names=skip.nam\par + 21794 entries processed, 25 above cutoff, sorting now\par +Entries exceeding sd cutoff= 4439\par +Mean number of diagonals above span cutoff 1.32012\par +List in score order\par + 31007 BACA$HALSA DE ARCHAERHODOPSIN PRECURSOR (AR). \par + 12177 BACH$NATPH DE HALORHODOPSIN PRECURSOR (HR) (GENE NAME\: HOP). \par + 10999 BACH$HALSP DE HALORHODOPSIN PRECURSOR (HR) (GENE NAME\: HOP). \par + 3999 HYAC$ECOLI DE HYPOTHETICAL 27.6 KD PROTEIN IN HYAB 3'REGION (GENE NAM\par + 2670 OPS4$DROME DE OPSIN RH4 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NA\par + 2573 PYR1$MESAU DE CAD PROTEIN (CONTAINS\: GLUTAMINE-DEPENDENT CARBAMOYL-PH\par + 2328 PFLA$ECOLI DE PYRUVATE FORMATE-LYASE ACTIVATING ENZYME. \par + 2194 DCOP$CANAL DE OROTIDINE 5'-PHOSPHATE DECARBOXYLASE (EC 4.1.1.23) (OMP\par + 2145 BCM1$HUMAN DE LYMPHOCYTE ACTIVATION MARKER BLAST-1 PRECURSOR (BCM1 SU\par + 2103 LAG3$HUMAN DE LAG-3 PROTEIN PRECURSOR (FDC PROTEIN) (GENE NAME\: LAG3 \par + BACA$HALSA DE ARCHAERHODOPSIN PRECURSOR (AR). \par + V 1 11 21 31 41 51\par + MLELLPTAVE GVSQAQITGR PEWIWLALGT ALMGLGTLYF LVKGMGVSDP DAKKFYAITT\par + * ** ** ** ** ** ** ** *** ** * * * ** \par + M-DPIALTAA VGADLLGDGR PETLWLGIGT LLMLIGTFYF IVKGWGVTDK EAREYYSITI\par + H 1 11 21 31 41 51\par + V 61 71 81 91 101 111\par + LVPAIAFTMY LSMLLGYGLT MVPFGGEQNP IYWARYADWL FTTPLLLLDL ALLVDADQGT\par + *** ** * *** * *** * * * ** ******* ********** *** * \par + LVPGIASAAY LSMFFGIGLT EVQVGSEMLD IYYARYADWL FTTPLLLLDL ALLAKVDRVS\par + H 61 71 81 91 101 111\par + V 121 131 141 151 161 171\par + ILALVGADGI MIGTGLVGAL TKVYSYRFVW WAISTAAMLY ILYVLFFGFT SKAESMRPEV\par + * *** * ** ******* * * * ** * ** * * ***\par + IGTLVGVDAL MIVTGLVGAL SHTPLARYTW WLFSTICMIV VLYFLATSLR AAAKERGPEV\par + H 121 131 141 151 161 171\par + V 181 191 201 211 221 231\par + ASTFKVLRNV TVVLWSAYPV VWLIGSEGAG IVPLNIETLL FMVLDVSAKV GFGLILLRSR\par + **** * *** *** * ** **** * * ***** ****** *** *** ******\par + ASTFNTLTAL VLVLWTAYPI LWIIGTEGAG VVGLGIETLL FMVLDVTAKV GFGFILLRSR\par + H 181 191 201 211 221 231\par + V 241 251 261\par + AIFGEAEAPE PSAGDGAAAT SD\par + ** * **** **** * *\par + AILGDTEAPE PSAG-AEASA AD\par + H 241 251 261\par +Conservation 56.1%\par +\pard \li220\ri240\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Number of padding characters inserted 0 and 2\par +\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 14.9\tab A run of SIPL using an entry from a sequence library and a file of entries to be excluded from the search.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab +The variants on the proportional algorithm are selected by setting parameters using a special menu. This includes the facility to switch off the main diagonal for all options, which is useful when comparing a sequence against itself.\par +2.\tab For nucleotide sequences the program also has a function to complement a sequence. If the sequence on one axis is the complement of that on the other, the plots will show possible base pairing.\par +3.\tab When the cross hair is being employed, in addition to the standard special keys, the letter m will produce a display showing all the identical sequence characters around the cross hair position. The display is in the form of a matrix.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab +Users should not be misled by the "Quick scan" algorithm. Its function is to perform rapid comparisons. The plots it produces may look quite striking because they will contain almost no background, however such plots tell nothing about the significance +of the similarities displayed.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab By using the "Reposition plots" function users can display several dot matrix plots on the screen at the same time. In this way plots from several pairs of sequence comparisons can be viewed together. +\par +6.\tab The library search program SIPL is of limited use for searching the nucleic acid libraries because it does not deal properly with sequences longer than 20,000 characters, but simply truncates them.\par +\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par +\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1. Staden, R. 1982. An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. {\i Nucl. Acids Res}. {\b 10(9)}\:2951-2961.\par +2. McLachlan, A.D. 1971. Test for comparing related amino acid sequences. {\i J. Mol. Biol.} {\b 61}\:409-424.\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3. Schwartz, R.M. and Dayhoff, M.O. 1978. Matrices for detecting distant relationships. (in) {\i Atlas of Protein Sequence and Structure,} {\b 5 suppl. 3}\:353-358, Nat. Biomed. Res. Found., Washington D.C. +\par +\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4. Lipman, D.J. and Pearson, W.R. 1985. Rapid and sensitive protein similarity searches. {\i Science} {\b 227}\:1435-1441.\par +5.\tab Myers, E.W. and Miller, W. 1988. Optimal alignments in linear space. {\i Comput. Applic. Biosci}., {\b 4}, 11-17.\par +} diff --git a/doc/ted.PS b/doc/ted.PS new file mode 100644 index 0000000..88ef25a --- /dev/null +++ b/doc/ted.PS @@ -0,0 +1,3033 @@ +%! for use by dvi2ps Version 2.00 +% $Header: tex.ps,v 2.0 88/06/07 15:12:32 peterd Rel2 $ +% a start (Ha!) at a TeX mode for PostScript. +% The following defines procedures assumed and used by program "dvi2ps" +% and must be downloaded or sent as a header file for all TeX jobs. + +% By: Neal Holtz, Carleton University, Ottawa, Canada +% +% +% June, 1985 +% Last Modified: Aug 25/85 +% oystr 12-Feb-1986 +% Changed @dc macro to check for a badly formed bits in character +% definitions. Can get a <> bit map if a character is not actually +% in the font file. This is absolutely guaranteed to drive the +% printer nuts - it will appear that you can no longer define a +% new font, although the built-ins will still be there. +% mackay 4-Jan-1988 +% Changed size of character array to reflect gf usage (256 characters) + +% To convert this file into a downloaded file instead of a header +% file, uncomment all of the lines beginning with %-% + +%-%0000000 % Server loop exit password +%-%serverdict begin exitserver +%-% systemdict /statusdict known +%-% {statusdict begin 9 0 3 setsccinteractive /waittimeout 300 def end} +%-% if + +/TeXDict 200 dict def % define a working dictionary +TeXDict begin % start using it. + + % units are in "dots" (300/inch) +/Resolution 300 def +/Inch {Resolution mul} def % converts inches to internal units + +/Mtrx 6 array def + +%%%%%%%%%%%%%%%%%%%%% Page setup (user) options %%%%%%%%%%%%%%%%%%%%%%%% + +% dvi2ps will output coordinates in the TeX system ([0,0] 1" down and in +% from top left, with y +ive downward). The default PostScript system +% is [0,0] at bottom left, y +ive up. The Many Matrix Machinations in +% the following code are an attempt to reconcile that. The intent is to +% specify the scaling as 1 and have only translations in the matrix to +% properly position the text. Caution: the default device matrices are +% *not* the same in all PostScript devices; that should not matter in most +% of the code below (except for lanscape mode -- in that, rotations of +% -90 degrees resulted in the the rotation matrix [ e 1 ] +% [ 1 e ] +% where the "e"s were almost exactly but not quite unlike zeros. + +/@letter + { letter initmatrix + 72 Resolution div dup neg scale % set scaling to 1. + 310 -3005 translate % move origin to top (these are not exactly 1" + Mtrx currentmatrix pop % and -10" because margins aren't set exactly right) + } def + % note mode is like letter, except it uses less VM +/@note + { note initmatrix + 72 Resolution div dup neg scale % set scaling to 1. + 310 -3005 translate % move origin to top + Mtrx currentmatrix pop + } def + +/@landscape + { letter initmatrix + 72 Resolution div dup neg scale % set scaling to 1. +% -90 rotate % it would be nice to be able to do this + Mtrx currentmatrix 0 0.0 put % but instead we have to do things like this because what + Mtrx 1 -1.0 put % should be zero terms aren't (and text comes out wobbly) + Mtrx 2 1.0 put % Fie! This likely will not work on QMS printers + Mtrx 3 0.0 put % (nor on others where the device matrix is not like + Mtrx setmatrix % like it is on the LaserWriter). + 300 310 translate % move origin to top + Mtrx currentmatrix pop + } def + +/@legal + { legal initmatrix + 72 Resolution div dup neg scale % set scaling to 1. + 295 -3880 translate % move origin to top + Mtrx currentmatrix pop + } def + +/@manualfeed + { statusdict /manualfeed true put + statusdict /manualfeedtimeout 300 put % 5 minutes + } def + % n @copies - set number of copies +/@copies + { /#copies exch def + } def + +%%%%%%%%%%%%%%%%%%%% Procedure Defintions %%%%%%%%%%%%%%%%%%%%%%%%%% + +/@newfont % id @newfont - -- initialize a new font dictionary + { /newname exch def + pop + newname 7 dict def % allocate new font dictionary + newname load begin + /FontType 3 def + /FontMatrix [1 0 0 -1 0 0] def + /FontBBox [0 0 1 1] def +% mackay 4-Jan-1987 changed size of array from 128 to 256 for gf fonts + /BitMaps 256 array def + /BuildChar {CharBuilder} def + /Encoding 256 array def + 0 1 255 {Encoding exch /.undef put} for + end + newname newname load definefont pop + } def + + +% the following is the only character builder we need. it looks up the +% char data in the BitMaps array, and paints the character if possible. +% char data -- a bitmap descriptor -- is an array of length 6, of +% which the various slots are: + +/ch-image {ch-data 0 get} def % the hex string image +/ch-width {ch-data 1 get} def % the number of pixels across +/ch-height {ch-data 2 get} def % the number of pixels tall +/ch-xoff {ch-data 3 get} def % number of pixels below origin +/ch-yoff {ch-data 4 get} def % number of pixels to left of origin +/ch-tfmw {ch-data 5 get} def % spacing to next character + +/CharBuilder % fontdict ch Charbuilder - -- image one character + { /ch-code exch def % save the char code + /font-dict exch def % and the font dict. + /ch-data font-dict /BitMaps get ch-code get def % get the bitmap descriptor for char + ch-data null eq not + { ch-tfmw 0 ch-xoff neg ch-yoff neg ch-width ch-xoff sub ch-height ch-yoff sub + setcachedevice + ch-width ch-height true [1 0 0 1 ch-xoff ch-yoff] + {ch-image} imagemask + } + if + } def + + +/@sf % fontdict @sf - -- make that the current font + { setfont() pop + } def + + % in the following, the font-cacheing mechanism requires that + % a name unique in the particular font be generated + +/@dc % char-data ch @dc - -- define a new character bitmap in current font + { /ch-code exch def +% ++oystr 12-Feb-86++ + dup 0 get + length 2 lt + { pop [ <00> 1 1 0 0 8.00 ] } % replace <> with null + if +% --oystr 12-Feb-86-- + /ch-data exch def + currentfont /BitMaps get ch-code ch-data put + currentfont /Encoding get ch-code + dup ( ) cvs cvn % generate a unique name simply from the character code + put + } def + +/@bop0 % n @bop0 - -- begin the char def section of a new page + { + } def + +/@bop1 % n @bop1 - -- begin a brand new page + { pop + erasepage initgraphics + Mtrx setmatrix + /SaveImage save def() pop + } def + +%-- tjh sept. 87: if this page has a mac drawing on it, we have to +%-- use showpage in the md dictionary. +/@eop % - @eop - -- end a page + { + userdict /md known { + userdict /md get type /dicttype eq { + md /MacDrwgs known { + md begin showpage end + }{ + showpage + } ifelse + }{ + showpage + } ifelse + }{ + showpage + } ifelse + SaveImage restore() pop + } def + +/@start % - @start - -- start everything + { @letter % (there is not much to do) + } def + +/@end % - @end - -- done the whole shebang + { end + } def + +/p % x y p - -- move to position + { moveto + } def + +/r % x r - -- move right + { 0 rmoveto + } def + +/s % string s - -- show the string + { show + } def + +/c % ch c - -- show the character (code given) + { c-string exch 0 exch put + c-string show + } def + +/c-string ( ) def + +/ru % dx dy ru - -- set a rule (rectangle) + { /dy exch neg def % because dy is height up from bottom + /dx exch def + /x currentpoint /y exch def def % remember current point + newpath x y moveto + dx 0 rlineto + 0 dy rlineto + dx neg 0 rlineto + closepath fill + x y moveto + } def + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% the \special command junk +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +% The structure of the PostScript produced by dvi2ps for \special is: +% @beginspecial +% - any number of @hsize, @hoffset, @hscale, etc., commands +% @setspecial +% - the users file of PostScript commands +% @endspecial + +% The @beginspecial command recognizes whether the Macintosh Laserprep +% has been loaded or not, and redfines some Mac commands if so. +% The @setspecial handles the users shifting, scaling, clipping commands + +%-- tjh sept. 87: made changes to allow postscript and macdrawing to +%-- to be inserted with version 65 of the md dictionary. Many bugs +%-- were fixed: +%-- vo changed to vof, name conflict with md +%-- vs changed to vsz, name conflict with md +%-- substantially changed @setspecial and @MacSetUp +%-- Also, made changes to allow users to specify offsets +%-- and clip rectangles in inches. + +% The following are user settable options from the \special command. + +/@SpecialDefaults + { /hs 8.5 72 mul def + /vsz 11 72 mul def + /ho 0 def + /vof 0 def + /hsc 1 def + /vsc 1 def + /CLIP false def + } def + +% d @hsize - specify a horizontal clipping dimension +% these 2 are executed before the MacDraw initializations +/@hsize {72 mul /hs exch def /CLIP true def} def +/@vsize {72 mul /vsz exch def /CLIP true def} def + +% d @hoffset - specify a shift for the drwgs +/@hoffset {72 mul /ho exch def} def +/@voffset {72 mul /vof exch def} def + +% s @hscale - set scale factor +/@hscale {/hsc exch def} def +/@vscale {/vsc exch def} def + +/@setclipper + { hsc vsc scale + CLIP + { newpath 0 0 moveto hs 0 rlineto 0 vsz rlineto hs neg 0 rlineto closepath clip } + if + } def + +% this will be invoked as the result of a \special command (for the +% inclusion of PostScript graphics). The basic idea is to change all +% scaling and graphics back to defaults, but to shift the origin +% to the current position on the page. Due to TeXnical difficulties, +% we only set the y-origin. The x-origin is set at the left edge of +% the page. + +/@beginspecial + { gsave /SpecialSave save def + % the following magic incantation establishes the current point as + % the users origin, and reverts back to default scalings, rotations + currentpoint transform initgraphics itransform translate + @SpecialDefaults % setup default offsets, scales, sizes + @MacSetUp % fix up Mac stuff + } def + + +%-- tjh: assume this is raw postscript, but save some state in case its not. +/@setspecial + { + /specmtrx matrix currentmatrix def + ho vof translate @setclipper + } def + + +/@endspecial + { SpecialSave restore + grestore + } def + + +% - @MacSetUp - turn-off/fix-up all the MacDraw stuff that might hurt us + % we depend on 'psu' being the first procedure executed + % by a Mac document. We redefine 'psu' to adjust page + % translations, and to do all other the fixups required. + % This stuff will not harm other included PS files +/@MacSetUp + { userdict /md known % if md is defined + { userdict /md get type /dicttype eq % and if it is a dictionary + { + md begin % then redefine some stuff + /psu % redfine psu to set origins, etc. + /psu load + % this procedure contains almost all the fixup code + { +% /letter {} def % it is bad manners to execute the real +% /note {} def % versions of these (clears page image, etc.) +% /legal {} def + /MacDrwgs true def + specmtrx setmatrix % restore pre-@setspecial state. + initclip % ditto + % change smalls to prevent page clearing. + /smalls [ lnop lnop lnop lnop lnop lnop lnop lnop lnop ] def + 0 0 0 0 ppr astore pop % prevents origin translation. + % redifine cp, do the showpage later, see @eop + /cp { + pop + pop + pm restore + } def % no printing of pages + } + concatprocs + def + /od + % redefine od to translate and scale. + % redfine load to set clipping region. + /od load + { + ho vof translate + hsc vsc scale + CLIP { + /nc + /nc load + { newpath 0 0 moveto hs 0 rlineto 0 vsz rlineto + hs neg 0 rlineto closepath clip } + concatprocs + def + } if + } + concatprocs + def + end } + if } + if + } def + +% p1 p2 concatprocs p - concatenate procedures +/concatprocs + { /p2 exch cvlit def + /p1 exch cvlit def + /p p1 length p2 length add array def + p 0 p1 putinterval + p p1 length p2 putinterval + p cvx + } def + +end % revert to previous dictionary +TeXDict begin @start +%%Title: ted.dvi +%%Creator: dvi2ps +%%EndProlog +5 @bop0 +[ 300 ] /cmr12.300 @newfont +cmr12.300 @sf +[ 24 33 -2 0 24.387] 50 @dc +[<70F8F8F870> 8 5 -4 0 13.548] 46 @dc +[ 32 34 -2 0 37.249] 68 @dc +[<00FC000703000E00801C0040380020780020700000F00000F00000F00000F00000F00000FFFFE0F000E07000E07801E03801 + C01C01C00C038007070001FC00> 24 21 -1 0 21.677] 101 @dc +[<0FC1E03C2390781708F00F08F00708F00708F007087007007807003C07001E070007C70000FF000007000007000007001807 + 003C0E003C0C001838000FE000> 24 21 -2 0 24.387] 97 @dc +[ 24 21 -1 0 18.968] 114 @dc +[<4020101008080404040474FCFCF870> 8 15 -4 10 13.548] 44 @dc +[<81FC00C60700C80180F000C0E000C0C00060C000608000708000708000708000700000700000F00000F00001E00007E0003F + C003FF800FFF001FFE003FF0007F0000780000F00000F00000E00020E00020E00020E00060E000606000607000E03001E018 + 02600C0C6003F020> 24 36 -3 1 27.097] 83 @dc +[ 32 21 -1 0 27.097] 110 @dc +[<01F0FE070CF00C02E01801E03800E07800E07000E0F000E0F000E0F000E0F000E0F000E0F000E0F000E07000E07800E03800 + E01C01E00C02E00704E001F8E00000E00000E00000E00000E00000E00000E00000E00000E00000E00000E00000E00001E000 + 0FE00000E0> 24 35 -2 0 27.097] 100 @dc +[<01F0030807080E040E040E040E040E040E040E000E000E000E000E000E000E000E000E000E000E00FFF83E001E000E000600 + 060006000200020002000200> 16 31 -1 0 18.968] 116 @dc +[<0000007C00FFFC01E2000FC003C100078007C08007800FC08007800F808007800F800007800F800007800F800007800F0000 + 07800F000007800F000007800F000007800E000007801E000007801C00000780380000078070000007FFE0000007803C0000 + 07800E00000780078000078007C000078003C000078003E000078003E000078003E000078003E000078003E000078003C000 + 078007C000078007800007800E00000F803C0000FFFFE00000> 40 35 -2 1 35.894] 82 @dc +[<00200040008001000300060004000C000C00180018003000300030007000600060006000E000E000E000E000E000E000E000 + E000E000E000E000E000E000E0006000600060007000300030003000180018000C000C000400060003000100008000400020> 16 50 -4 13 18.968] 40 @dc +[ 16 33 -4 0 24.387] 49 @dc +[<0FC000103000201800700C007806007807003003000003800003800001C00001C00001C003E1E00619E00C05E01805E03803 + E07003E07001E0F001E0F001E0F001E0F001E0F001E0F001C0F001C0F001C07003807003803803801807000C0600060C0001 + F000> 24 34 -2 1 24.387] 57 @dc +[<800040002000100018000C000400060006000300030001800180018001C000C000C000C000E000E000E000E000E000E000E0 + 00E000E000E000E000E000E000E000C000C000C001C0018001800180030003000600060004000C0018001000200040008000> 16 50 -3 13 18.968] 41 @dc +[ 40 34 -2 0 36.563] 78 @dc +[<00FC7F0003827800060170000E00F0000E00F0000E0070000E0070000E0070000E0070000E0070000E0070000E0070000E00 + 70000E0070000E0070000E0070000E0070000E0070001E00F000FE07F0000E007000> 32 21 -1 0 27.097] 117 @dc +[<01F8000706000C01001C0080380040780040700000F00000F00000F00000F00000F00000F00000F000007000007800003803 + 001C07800C078007030001FE00> 24 21 -2 0 21.677] 99 @dc +[ 40 35 -2 0 36.563] 65 @dc +[ 16 34 -1 0 13.548] 105 @dc +[<8FC0D030E018C008C00C800C800C801C003C01F80FF03FE07F80F000E008C008C008C018601830780F88> 16 21 -2 0 19.239] 115 @dc +[ 24 31 -1 10 27.097] 112 @dc +[<03F0000C1C00100F002007804007804003C0F003C0F803E0F803E07003E02003E00003E00003C00003C0000780000780000F + 00001C0003F000003800000E00000F000007000007800007803807C07807C07803C07807C04007C02007801007000C1E0003 + F800> 24 34 -2 1 24.387] 51 @dc +[ 40 34 -2 0 36.563] 72 @dc +[ 16 35 -1 0 13.548] 108 @dc +[ 32 34 -2 0 30.475] 76 @dc +[<0007F00000003C0C080000E003180001C000B800038000B80007000078000F000078001E000078001E000078003C00007800 + 3C000078007C000078007800007800780000F800F8001FFF00F800000000F800000000F800000000F800000000F800000000 + F800000000F800000000780000080078000008007C000008003C000018003C000018001E000018001E000038000F00003800 + 0700007800038000F80001C001B80000E0021800003C0C18000007F00800> 40 36 -3 1 38.270] 71 @dc +[ 32 34 -2 0 33.185] 80 @dc +[<083F000C41C00C80600F00700E00380E003C0E001C0E001E0E001E0E001E0E001E0E001E0E001E0E001E0E001C0E003C0E00 + 380F00300E80600E61C00E1F000E00000E00000E00000E00000E00000E00000E00000E00000E00000E00000E00001E0000FE + 00000E0000> 24 35 -1 0 27.097] 98 @dc +[ 40 21 -1 0 40.645] 109 @dc +[ 300 ] /cmbx12.300 @newfont +cmbx12.300 @sf +[ 32 34 -2 0 35.226] 70 @dc +[ 16 36 -2 0 15.566] 105 @dc +[<01FF00000FFFE0003F01F8007C007C0078003C00F0001E00F0001E00F0001E00F0001E0070003E003800FC001FFFFC000FFF + F8001FFFF0003FFF800038000000300000003000000013FC00001FFF00001F0F80003E07C0003C03C0007C03E0007C03E000 + 7C03E0007C03E0007C03E0003C03C0003E07CF001F0F8F000FFF7F0003FC1E00> 32 33 -2 11 28.019] 103 @dc +[<01FC3FC007FF3FC00F81BE001F00FE001F007E001F003E001F003E001F003E001F003E001F003E001F003E001F003E001F00 + 3E001F003E001F003E001F003E001F003E001F003E001F003E001F003E00FF01FE00FF01FE00> 32 22 -2 0 31.133] 117 @dc +[ 24 22 -2 0 22.888] 114 @dc +[<00FF0003FFC00FC0701F00303E00187E00007C00007C0000FC0000FC0000FC0000FFFFF8FFFFF8FC00F8FC00F87C00F87C00 + F03E01F01E01E00F87C007FF8000FE00> 24 22 -2 0 25.569] 101 @dc +[<7FFFE07FFFE001F80001F80001F80001F80001F80001F80001F80001F80001F80001F80001F80001F80001F80001F80001F8 + 0001F80001F80001F80001F80001F80001F80001F80001F80001F80001F800FFF800FFF80001F800007800001800> 24 32 -4 0 28.019] 49 @dc +[ 16 35 -2 0 15.566] 108 @dc +[ 32 22 -2 0 31.133] 110 @dc +[<01FC3FC007FF3FC00F83BE001E00FE003E007E007C003E007C003E00FC003E00FC003E00FC003E00FC003E00FC003E00FC00 + 3E00FC003E00FC003E007C003E007E003E003E003E001F007E000F81FE0007FFBE0001FC3E0000003E0000003E0000003E00 + 00003E0000003E0000003E0000003E0000003E0000003E0000003E0000003E000001FE000001FE00> 32 35 -2 0 31.133] 100 @dc +[<387CFEFEFE7C38> 8 7 -4 0 15.566] 46 @dc +cmr12.300 @sf +[ 32 34 -2 0 31.830] 70 @dc +[<03FE000E03803800E0600030600030C00018C00018C000184000186000303800F00FFFE00FFFC01FFE001800001800001000 + 0010000019F0000F1C000E0E001C07001C07003C07803C07803C07803C07801C07001C07000E0E18071E1801F198000070> 24 33 -1 11 24.387] 103 @dc +[ 32 35 -1 0 27.097] 104 @dc +[<01FC000707000E03801C01C03800E07800F0700070F00078F00078F00078F00078F00078F00078F000787000707000703800 + E01800C00C018007070001FC00> 24 21 -1 0 24.387] 111 @dc +[<00600600000060060000006006000000F00F000000F00F000000F00D000001C81C800001C81C800001C81880000384384000 + 038438400003843040000702702000070270200007026020000E01E010000E01E010000E01C018001C01C018001E01E03C00 + FF8FF8FF00> 40 21 -1 0 35.225] 119 @dc +[<381C7C3EFC7EFC7EB85C8040804080408040402040202010201010080804> 16 15 -6 -20 24.387] 92 @dc +[<4020201010081008080408040402040204020402743AFC7EFC7EF87C7038> 16 15 -2 -20 24.387] 34 @dc +[<7FF8000780000700000700000700000700000700000700000700000700000700000700000700000700000700000700000700 + 00070000070000070000FFF800070000070000070000070000070000070000070000070000070000030F00038F00018F0000 + C600003C00> 24 35 0 0 14.903] 102 @dc +[<03FFFF00000FC000000780000007800000078000000780000007800000078000000780000007800000078000000780000007 + 8000000780000007800000078000000780000007800000078000000780000007800000078000000780008007800480078004 + 8007800480078004C007800C40078008400780084007800860078018780780787FFFFFF8> 32 34 -2 0 35.225] 84 @dc +[ 16 2 -1 -10 16.258] 45 @dc +[<3C0000430000F18000F08000F0400000400000200000200000200000100000100000380000380000380000740000740000E2 + 0000E20000E20001C10001C1000380800380800380800700400700400E00200E00200E00301E0078FFC1FE> 24 31 -1 10 25.742] 121 @dc +[ 24 21 -1 0 21.677] 122 @dc +[<00100000380000380000380000740000740000E20000E20000E20001C10001C1000380800380800380800700400700400E00 + 200E00200E00301E0078FFC1FE> 24 21 -1 0 25.742] 118 @dc +[<000FFE0000E00000E00000E00000E00000E00000E00000E00000E00000E001F0E0070CE00C02E01C01E03801E07800E07000 + E0F000E0F000E0F000E0F000E0F000E0F000E0F000E07800E07800E03801E01C01600E026007046001F820> 24 31 -2 10 25.742] 113 @dc +[<4020101008080404040474FCFCF870> 8 15 -4 -20 13.548] 39 @dc +[<7FE3FF8007007800070070000700700007007000070070000700700007007000070070000700700007007000070070000700 + 700007007000070070000700700007007000070070000700700007007000FFFFFFC007007000070070000700700007007000 + 07007000070070000700700007007000070070000380F0780180F87800C07C7800706E30001F83E0> 32 35 0 0 28.451] 11 @dc +[<7FE1FF8007003800070038000700380007003800070038000700380007003800070038000700380007003800070038000700 + 380007003800070038000700380007003800070038000700380007007800FFFFF80007000000070000000700000007000000 + 0700000007000000070000000700300007007800038078000180380000C0100000702000001FC000> 32 35 0 0 27.097] 12 @dc +[<3E006180F180F0C060E000E000E000E000E000E000E000E000E000E000E000E000E000E000E000E000E000E000E000E000E0 + 00E000E000E001E00FE001E00000000000000000000000000000000001C003E003E003E001C0> 16 44 2 10 14.903] 106 @dc +[ 32 34 -2 0 34.539] 66 @dc +[<03F0000C1C001006002007004003804003C08001C0E001C0F001E0F001E07001E00001E00001E00001E00001E00001C00001 + C0100380180380140700130E0010F80010000010000010000010000010000010000013E0001FF8001FFE001FFF001E070010 + 0080> 24 34 -2 1 24.387] 53 @dc +5 @bop1 +cmr12.300 @sf +237 307 p (2.) s +22 r (Dear,) s +16 r (S.) s +16 r (and) s +17 r (Staden,) s +16 r (R.) s +16 r (\(1991\)) s +16 r (Nuc.) s +22 r (Acids) s +16 r (Res.,) s +17 r (in) s +16 r (press.) s +237 367 p (3.) s +22 r (Hillier,) s +16 r (L.) s +16 r (and) s +17 r (Green,) s +16 r 80 c +-3 r 46 c +15 r (\(1991\)) s +16 r (submitted.) s +cmbx12.300 @sf +237 428 p (Figure) s +19 r 49 c +18 r (legend.) s +cmr12.300 @sf +237 488 p (Figure) s +17 r 49 c +17 r (sho) s +0 r (ws) s +15 r 97 c +17 r (\\screen) s +17 r (dump") s +17 r (of) s +17 r (the) s +17 r (ted) s +17 r (graphical) s +17 r (in) s +-1 r (terface.) s +23 r (The) s +17 r (dis-) s +164 548 p (pla) s +0 r 121 c +22 r (consists) s +24 r (of) s +23 r (the) s +24 r (con) s +-1 r (trol) s +23 r (panel) s +23 r (and) s +24 r (the) s +23 r (sync) s +0 r (hronized) s +22 r (view) s +24 r (of) s +23 r (the) s +24 r (base) s +164 608 p 112 c +1 r (osition) s +19 r (information,) s +19 r (original) s +18 r (and) s +19 r (edited) s +18 r (sequence) s +19 r (data,) s +19 r (and) s +18 r (graphical) s +19 r (rep-) s +164 668 p (resen) s +0 r (tation) s +15 r (of) s +16 r (the) s +16 r (trace) s +16 r (\(with) s +16 r (eac) s +0 r 104 c +15 r 110 c +-1 r (ucleotide's) s +15 r (trace) s +16 r 98 c +2 r (eing) s +16 r (represen) s +-1 r (ted) s +15 r 98 c +0 r 121 c +15 r 97 c +164 729 p (di\013eren) s +0 r 116 c +16 r (color\).) s +24 r (The) s +17 r (con) s +0 r (trol) s +16 r (panel) s +17 r (allo) s +0 r (ws) s +16 r (the) s +18 r (user) s +17 r (to) s +17 r (read) s +17 r (in) s +17 r (new) s +17 r (trace) s +18 r (\014les) s +164 789 p (\(in) s +17 r (either) s +16 r 98 c +1 r (ottom) s +17 r (or) s +16 r (top) s +17 r (strand) s +17 r (orien) s +-1 r (tation\)) s +16 r (as) s +16 r 119 c +0 r (ell) s +15 r (as) s +17 r (to) s +17 r (searc) s +-1 r 104 c +16 r (for) s +16 r 97 c +17 r (string) s +164 849 p (of) s +15 r 110 c +0 r (ucleotides) s +15 r (or) s +15 r 97 c +15 r (certain) s +16 r (base) s +15 r 112 c +2 r (osition.) s +21 r (Scroll) s +16 r (bars) s +15 r (allo) s +0 r 119 c +14 r (the) s +16 r (user) s +15 r (to) s +15 r (adjust) s +164 909 p (the) s +15 r (magni\014cation) s +15 r (of) s +15 r (or) s +14 r (scroll) s +15 r (through) s +15 r (the) s +15 r (sequence) s +15 r (and) s +15 r (trace) s +15 r (data.) s +21 r (The) s +15 r (user) s +164 969 p (ma) s +0 r 121 c +15 r (also) s +16 r 99 c +-1 r (ho) s +1 r (ose) s +16 r (to) s +16 r 99 c +-1 r (hange) s +15 r (the) s +16 r 118 c +0 r (ertical) s +15 r (magni\014cation) s +16 r (of) s +16 r (the) s +16 r (trace) s +16 r (data.) s +22 r 70 c +-3 r (ur-) s +164 1029 p (ther,) s +17 r (sequence) s +17 r (on) s +17 r (the) s +18 r (head) s +17 r (\(v) s +-1 r (ector\)) s +16 r (or) s +17 r (tail) s +18 r (\(uncertain) s +17 r (data\)) s +17 r (of) s +17 r (the) s +17 r (sequence) s +164 1090 p (ma) s +0 r 121 c +19 r 98 c +1 r 101 c +21 r (\\cuto\013) s +3 r 34 c +20 r (using) s +21 r (the) s +20 r (adjust) s +20 r (left) s +21 r (and) s +20 r (righ) s +0 r 116 c +19 r (cuto\013) s +20 r (buttons.) s +34 r (Bases) s +20 r (can) s +164 1150 p 98 c +1 r 101 c +17 r (inserted,) s +16 r (deleted,) s +17 r (or) s +16 r (replaced) s +17 r (as) s +16 r (with) s +17 r (an) s +0 r 121 c +15 r (ordinary) s +17 r 119 c +-1 r (ord-pro) s +1 r (cessor) s +16 r (in) s +17 r (the) s +164 1210 p (sequence) s +17 r (data) s +16 r (windo) s +0 r (w.) s +22 r (Finally) s +-3 r 44 c +16 r (the) s +17 r (sequence) s +16 r (ma) s +0 r 121 c +16 r 98 c +1 r 101 c +17 r (written) s +16 r (to) s +17 r (an) s +17 r (ascii) s +17 r (\014le) s +164 1270 p (using) s +16 r (the) s +16 r (output) s +17 r (button) s +16 r (on) s +16 r (the) s +17 r (con) s +-1 r (trol) s +15 r (panel.) s +961 2599 p 53 c +@eop +4 @bop0 +cmbx12.300 @sf +[ 40 34 -2 0 42.317] 65 @dc +[ 40 34 -2 0 38.281] 80 @dc +[ 32 34 -2 0 33.669] 76 @dc +[ 24 34 -1 0 20.870] 73 @dc +[<0003FE0000001FFFC00000FF00F00001F800380003F0000C0007C00006000F800003001F800003003F000003803F00000180 + 7F000001807E000001807E00000000FE00000000FE00000000FE00000000FE00000000FE00000000FE00000000FE00000000 + FE000000007E000001807E000001807F000001803F000003803F000003801F800007800F8000078007C0000F8003E0001F80 + 01F8003F8000FF01E380001FFF81800003FE0080> 40 34 -3 0 40.472] 67 @dc +[<03FFFFF80003FFFFF8000003F800000003F800000003F800000003F800000003F800000003F800000003F800000003F80000 + 0003F800000003F800000003F800000003F800000003F800000003F800000003F800000003F800000003F800000003F80000 + 0003F800000003F80000C003F800C0C003F800C0C003F800C0C003F800C0E003F801C0E003F801C06003F801807003F80380 + 7803F807807E03F80F807FFFFFFF807FFFFFFF80> 40 34 -2 0 38.973] 84 @dc +[<0007FC0000003FFF800000FC07E00003F001F80007E000FC000FC0007E001F80003F003F80003F803F00001F807F00001FC0 + 7F00001FC07E00000FC0FE00000FE0FE00000FE0FE00000FE0FE00000FE0FE00000FE0FE00000FE0FE00000FE0FE00000FE0 + FE00000FE07E00000FC07E00000FC07F00001FC03F00001F803F00001F801F80003F001F80003F000FC0007E0007E000FC00 + 03F001F80000FC07E000003FFF80000007FC0000> 40 34 -3 0 42.086] 79 @dc +[ 40 34 -2 0 43.816] 78 @dc +[<80FF8000C7FFE000FF00F800FC003C00F0003C00E0001E00E0001E00C0001F00C0001F00C0001F0000003F0000003F000000 + 7F000003FF00003FFE0003FFFE000FFFFC001FFFF8003FFFF0007FFFC0007FFF0000FFE00000FF000000FC000000FC000C00 + F8000C00F8000C0078001C0078001C007C003C003C007C001F03FC0007FF8C0001FC0400> 32 34 -3 0 31.133] 83 @dc +[ 40 34 -2 0 42.951] 68 @dc +[<0001FF0000000FFFE000003F80F800007E001C0000FC000E0001F800060003F800030003F000030007F000018007F0000180 + 07F000018007F000018007F000018007F000018007F000018007F000018007F000018007F000018007F000018007F0000180 + 07F000018007F000018007F000018007F000018007F000018007F000018007F000018007F000018007F000018007F0000180 + 07F000018007F0000180FFFF803FFCFFFF803FFC> 40 34 -2 0 43.067] 85 @dc +cmr12.300 @sf +[ 16 34 -2 0 17.595] 73 @dc +[<0007E00000381C0000E0020001C0010003800080070000400E0000401E0000201C0000203C0000103C0000107C0000107800 + 001078000000F8000000F8000000F8000000F8000000F8000000F8000000F8000000F800000078000010780000107C000010 + 3C0000303C0000301C0000301E0000700E000070070000F0038000F001C0017000E00630003818300007E010> 32 36 -3 1 35.225] 67 @dc +[ 24 35 -1 0 25.742] 107 @dc +[<0003F00000001C0800000030060000006001000000E000800001C000800003C000400003C000400003800040000780002000 + 0780002000078000200007800020000780002000078000200007800020000780002000078000200007800020000780002000 + 0780002000078000200007800020000780002000078000200007800020000780002000078000200007800020000780002000 + 0780002000078000200007800070000FC000F800FFFC07FF00> 40 35 -2 1 36.563] 85 @dc +[ 24 21 -1 0 25.742] 120 @dc +[ 8 49 -5 12 13.548] 91 @dc +[ 8 49 -1 12 13.548] 93 @dc +cmbx12.300 @sf +[<0000380000000038000000007C000000007C00000000FE00000000FE00000000FE00000001FF00000001FF00000003FD8000 + 0003F980000007F9C0000007F0C0000007F0C000000FF06000000FE06000001FE03000001FC03000003FC03800003F801800 + 003F801800007F800C00007F000C0000FF00060000FE00060001FE00070001FC00030001FC00030003F800018003F8000180 + 07F80000C007F00000C0FFFF800FFEFFFF800FFE> 40 34 -1 0 42.317] 86 @dc +[ 40 34 -2 0 39.838] 66 @dc +[<001FFFF000001FFFF0000000FE00000000FE00000000FE00000000FE00000000FE00000000FE00000000FE00000000FE0000 + 0000FE00000000FE00000000FE00000000FE00000001FE00000001FF00000003FF80000003FD80000007F8C000000FF0E000 + 000FF06000001FE07000003FC03000003FC01800007F801C0000FF000C0000FF00060001FE00070001FE00030003FC000180 + 07F80001C007F80000C0FFFF800FFEFFFF800FFE> 40 34 -1 0 42.317] 89 @dc +cmr12.300 @sf +[ 48 34 -2 0 44.692] 77 @dc +[ 40 34 -2 0 37.918] 75 @dc +[<000001E0000003F8000007F8000007FC000007FC00000F0E00000E0600000C0200000C02000FEC02007C3C0200E80E0003C8 + 1780078813C00F0801E00E0420E01E0380F03C0000783C0000787C00007C7C00007C7800003CF800003EF800003EF800003E + F800003EF800003EF800003EF800003EF800003EF800003E7800003C7800003C7C00007C7C00007C3C0000783E0000F81E00 + 00F00E0000E00F0001E0078003C003C0078000E00E0000783C00000FE000> 32 45 -3 10 37.935] 81 @dc +[ 40 34 -1 0 36.563] 88 @dc +[<000FE00000783C0000E00E0003C00780078003C00F0001E00F0001E01E0000F03E0000F83C0000787C00007C7C00007C7800 + 003CF800003EF800003EF800003EF800003EF800003EF800003EF800003EF800003EF800003E7800003C7800003C7C00007C + 7C00007C3C0000783C0000781E0000F00E0000E00F0001E0078003C003C0078000E00E0000783C00000FE000> 32 36 -3 1 37.935] 79 @dc +cmbx12.300 @sf +[ 40 34 -2 0 43.874] 75 @dc +[<0000E0000E00000000E0000E00000000F0001E00000001F0001F00000001F0001F00000003F8003F80000003F8003F800000 + 03FC007F80000007FC007FC0000007FC007FC000000FF600FFE000000FE600FE6000000FE600FE6000001FE301FC3000001F + C301FC3000001FC383FC3000003F8183F81800003F8183F81800007F80C7F81C00007F00C7F00C00007F00C7F00C0000FF00 + 6FE0060000FE006FE0060000FE007FE0060001FC003FC0030001FC003FC0030003FC003F80038003F8003F80018003F8007F + 80018007F0007F0000C007F0007F0000C00FF000FF0000E0FFFF0FFFF01FFEFFFF0FFFF01FFE> 56 34 -1 0 57.883] 87 @dc +[ 40 34 -2 0 36.782] 69 @dc +[<0003FF00C0001FFFC3C0007F80E7C001FC003FC003F0001FC007E0001FC00FC0001FC01F80001FC03F00001FC03F00001FC0 + 7F00001FC07F00001FC07E000FFFFCFE000FFFFCFE00000000FE00000000FE00000000FE00000000FE00000000FE00000000 + FE000000007E000000C07E000000C07F000000C03F000001C03F000001C01F800003C00FC00003C007C00007C003F0000FC0 + 01F8003FC0007F00F1C0001FFFC0C00003FE0040> 40 34 -3 0 44.047] 71 @dc +[ 56 34 -2 0 53.156] 77 @dc +cmr12.300 @sf +[<70F8F8F870000000000000000000000070F8F8F870> 8 21 -4 0 13.548] 58 @dc +[<07C000187000203800401C00F01E00F80E00F80F00F80F00700F00000F00000F00000F00000F00000F00000F00000F00000F + 00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F0000 + 1F0003FFF0> 24 35 -2 1 25.056] 74 @dc +[<00020000800000030001800000070001C00000070001C00000070001C000000F8003E000000F8003E000000F8003E000001E + 40079000001E40079000001E40079000003C200F0800003C200F0800003C200F0800007C101E04000078101E04000078101E + 040000F8183E060000F0083C020000F0083C020000F0083C020001E00478010001E00478010001E00478010003C002F00080 + 03C002F0008003C002F00080078001E00040078001E00040078001E000400F0003C000200F0003C000200F0003C000701F80 + 07E000F8FFF03FFC03FE> 48 35 -1 1 50.111] 87 @dc +[<01F000071C000C06001C07003803803803807803C07001C07001C07001C0F001E0F001E0F001E0F001E0F001E0F001E0F001 + E0F001E0F001E0F001E0F001E0F001E0F001E0F001E07001C07001C07001C07001C03803803803801803000C0600071C0001 + F000> 24 34 -2 1 24.387] 48 @dc +[<01F000070C000C06001C03001803803801C03801C07001E07001E07001E0F001E0F001E0F001E0F001E0F001E0F001C0F801 + C0F80380F40300F40600F30C00F0F8007000007000007800003800003800001801801C03C00E03C00601C003008001C10000 + 7E00> 24 34 -2 1 24.387] 54 @dc +cmbx12.300 @sf +[ 40 34 -2 0 41.798] 82 @dc +cmr12.300 @sf +[<01FFF0001F00000E00000E00000E00000E00000E00000E00000E00FFFFF8800E00400E00200E00200E00100E00100E00080E + 00040E00040E00020E00020E00010E00008E00008E00004E00004E00002E00001E00001E00000E00000E00000600000200> 24 33 -1 0 24.387] 52 @dc +4 @bop1 +cmbx12.300 @sf +164 307 p (APPLICA) s +-4 r (TIONS) s +18 r (AND) s +19 r (CONCLUSIONS) s +cmr12.300 @sf +164 400 p (In) s +18 r (the) s +18 r (C.) s +19 r (elegans) s +18 r (genome) s +18 r (sequencing) s +19 r (pro) s +2 r (ject,) s +19 r (data) s +18 r (from) s +18 r (the) s +19 r (ABI) s +18 r (or) s +18 r (A.L.F.) s +164 460 p (sequencing) s +20 r (mac) s +0 r (hines') s +19 r (computers) s +20 r (are) s +20 r (transferred) s +20 r (to) s +20 r (Sun) s +21 r 119 c +-1 r (orkstations.) s +33 r (The) s +164 520 p (user) s +18 r (in) s +0 r 118 c +-2 r (ok) s +-1 r (es) s +17 r 97 c +18 r (Unix) s +18 r (shell) s +18 r (script) s +18 r (that) s +18 r (calls) s +18 r (ted) s +18 r (systematically) s +18 r (on) s +18 r (eac) s +-1 r 104 c +17 r (of) s +18 r (the) s +164 580 p (new) s +16 r (set) s +16 r (of) s +17 r (trace) s +16 r (\014les) s +16 r (creating) s +16 r 97 c +16 r (set) s +17 r (of) s +16 r (sequence) s +16 r (\014les.) s +22 r (The) s +16 r (sequence) s +16 r (\014les) s +16 r (that) s +164 640 p (are) s +20 r (deemed) s +20 r (to) s +20 r 98 c +1 r 101 c +20 r (of) s +20 r (acceptable) s +20 r (qualit) s +0 r 121 c +19 r (are) s +20 r (then) s +20 r (en) s +-1 r (tered) s +19 r (in) s +0 r (to) s +19 r (the) s +20 r (sequence) s +164 700 p (assem) s +0 r (bly) s +16 r (program) s +17 r (xdap) s +18 r ([2]) s +17 r (where) s +18 r (the) s +17 r (sequences) s +17 r (are) s +18 r (assem) s +0 r (bled) s +16 r (in) s +0 r (to) s +16 r (con) s +0 r (tigs.) s +164 761 p 80 c +0 r (ortions) s +14 r (of) s +15 r (the) s +16 r (ted) s +15 r (trace-editor) s +16 r (ha) s +-1 r 118 c +-1 r 101 c +14 r 98 c +2 r (een) s +15 r (incorp) s +2 r (orated) s +15 r (in) s +0 r (to) s +14 r (the) s +15 r (xdap) s +16 r (\\trace) s +164 821 p (manager,") s +19 r (whic) s +0 r 104 c +18 r (is) s +18 r (used) s +19 r (in) s +19 r (conjunction) s +18 r (with) s +19 r (the) s +19 r (con) s +-1 r (tig) s +18 r (editor) s +19 r (to) s +19 r (view) s +18 r (sets) s +164 881 p (of) s +16 r (aligned) s +16 r (traces) s +17 r (at) s +16 r (sites) s +16 r (of) s +17 r (discrepancies) s +16 r (in) s +16 r (the) s +16 r (aligned) s +17 r (sequences.) s +237 941 p 84 c +-3 r (ed) s +16 r (is) s +17 r (also) s +16 r (used) s +17 r (at) s +17 r (the) s +17 r (stage) s +16 r (of) s +17 r 99 c +0 r (ho) s +0 r (osing) s +17 r (oligo) s +17 r (primers) s +16 r (for) s +17 r (the) s +17 r (\\w) s +-1 r (alking") s +164 1001 p (stage) s +21 r (of) s +20 r (the) s +21 r (sequencing) s +20 r (pro) s +3 r (ject.) s +35 r (It) s +20 r (can) s +21 r 98 c +1 r 101 c +21 r (in) s +-1 r 118 c +-1 r (ok) s +-1 r (ed) s +19 r (directly) s +21 r (from) s +20 r (the) s +21 r (oligo) s +164 1062 p (selection) s +23 r (program,) s +24 r (osp) s +23 r ([3],) s +25 r (to) s +23 r (allo) s +0 r 119 c +21 r (examination) s +23 r (of) s +23 r (the) s +23 r (trace) s +23 r (data) s +23 r (in) s +23 r (the) s +164 1122 p (region) s +16 r (of) s +16 r (the) s +17 r (primers) s +16 r (so) s +16 r (that) s +17 r (in) s +-1 r (tegrit) s +-1 r 121 c +15 r (of) s +16 r (the) s +17 r (sequence) s +16 r (data) s +16 r (can) s +16 r 98 c +2 r 101 c +16 r 118 c +0 r (eri\014ed.) s +237 1182 p (Curren) s +0 r (tly) s +-4 r 44 c +20 r (no) s +20 r (other) s +20 r (programs) s +20 r (are) s +20 r (kno) s +-1 r (wn) s +19 r (to) s +20 r 98 c +2 r 101 c +20 r 97 c +-1 r 118 c +-2 r (ailable) s +19 r (whic) s +-1 r 104 c +19 r (supp) s +2 r (ort) s +164 1242 p (editing) s +18 r (of) s +18 r (the) s +18 r (ABI) s +18 r (trace) s +18 r (data.) s +26 r 70 c +-3 r (urther,) s +18 r (the) s +18 r (mo) s +1 r (dular) s +18 r (design) s +18 r (of) s +18 r (the) s +18 r (program) s +164 1302 p (should) s +18 r (allo) s +0 r 119 c +17 r (supp) s +2 r (ort) s +18 r (for) s +18 r (new) s +18 r 116 c +0 r (yp) s +0 r (es) s +19 r (of) s +18 r (sequencing) s +18 r (mac) s +0 r (hines,) s +18 r (with) s +18 r (new) s +18 r (data) s +164 1363 p (formats,) s +16 r (to) s +16 r 98 c +2 r 101 c +16 r (implemen) s +0 r (ted) s +15 r (in) s +16 r 97 c +17 r (straigh) s +-1 r (tforw) s +-1 r (ard) s +15 r (fashion.) s +cmbx12.300 @sf +164 1492 p 65 c +-5 r 86 c +-6 r (AILABILITY) s +cmr12.300 @sf +164 1585 p 84 c +-3 r (ed) s +18 r (is) s +18 r (freely) s +19 r 97 c +0 r 118 c +-3 r (ailable) s +18 r (from) s +19 r (the) s +18 r (authors) s +19 r (or) s +19 r (from) s +19 r (Ro) s +1 r (dger) s +19 r (Staden) s +18 r (and) s +19 r (Simon) s +164 1645 p (Dear) s +19 r (\(MR) s +-1 r 67 c +18 r (Lab) s +1 r (oratory) s +19 r (of) s +19 r (Molecular) s +18 r (Biology) s +-3 r 44 c +18 r (Hills) s +19 r (Road,) s +19 r (Cam) s +0 r (bridge,) s +18 r (UK,) s +164 1705 p (CB2) s +16 r (2QH\)) s +16 r (for) s +17 r (use) s +16 r (on) s +16 r (Sun) s +17 r 119 c +-1 r (orkstations) s +15 r (running) s +17 r (X-windo) s +-1 r (ws) s +16 r (\(or) s +16 r (Op) s +1 r (enLo) s +2 r (ok\).) s +cmbx12.300 @sf +164 1835 p 65 c +-1 r (CKNO) s +-1 r (WLEDGMENTS) s +cmr12.300 @sf +164 1927 p (The) s +19 r (authors) s +19 r 119 c +0 r (ould) s +18 r (lik) s +0 r 101 c +18 r (to) s +19 r (thank) s +19 r (all) s +20 r (mem) s +-1 r 98 c +1 r (ers) s +19 r (of) s +19 r (the) s +19 r (C.) s +20 r (elegans) s +19 r (sequencing) s +164 1988 p (pro) s +3 r (ject) s +16 r (with) s +17 r (sp) s +2 r (ecial) s +16 r (thanks) s +17 r (to) s +17 r (the) s +16 r (follo) s +0 r (wing) s +16 r 112 c +1 r (eople:) s +23 r (John) s +17 r (Sulston,) s +16 r (Bob) s +17 r 87 c +-3 r (a-) s +164 2048 p (terston,) s +16 r (Phil) s +15 r (Green,) s +16 r (Ric) s +0 r 107 c +15 r (Wilson,) s +15 r (Ric) s +0 r (hard) s +15 r (Durbin,) s +16 r (Simon) s +15 r (Dear,) s +16 r (and) s +16 r (Ro) s +1 r (dger) s +164 2108 p (Staden) s +13 r (for) s +12 r (their) s +13 r (helpful) s +13 r (suggestions) s +12 r (for) s +13 r (impro) s +0 r 118 c +-2 r (emen) s +-1 r (ts) s +12 r (in) s +12 r (the) s +13 r (ted) s +13 r (in) s +0 r (terface) s +11 r (and) s +164 2168 p (for) s +18 r (their) s +19 r (parts) s +18 r (in) s +18 r (the) s +19 r (dev) s +-1 r (elopmen) s +-1 r 116 c +17 r (of) s +19 r (ted.) s +28 r (This) s +18 r 119 c +0 r (ork) s +17 r 119 c +0 r (as) s +17 r (supp) s +1 r (orted) s +19 r 98 c +-1 r 121 c +18 r (the) s +164 2228 p (Medical) s +16 r (Researc) s +0 r 104 c +15 r (Council) s +16 r (and) s +17 r (NIH) s +16 r (gran) s +0 r 116 c +15 r (R01-HG00136.) s +cmbx12.300 @sf +164 2358 p (REFERENCES) s +cmr12.300 @sf +164 2451 p (1.) s +22 r 87 c +-3 r (aterston,) s +15 r (R.,) s +16 r (Sulston,) s +16 r (J.,) s +17 r (et) s +16 r (al.) s +22 r (\(1991\),) s +16 r (in) s +16 r (preparation.) s +961 2599 p 52 c +@eop +3 @bop0 +cmr12.300 @sf +[<4040201010100808080878F8F8F870000000000000000000000070F8F8F870> 8 31 -4 10 13.548] 59 @dc +[ 32 34 -2 0 33.185] 69 @dc +[<7FF1FFCFFE07001C00E007001C00E007001C00E007001C00E007001C00E007001C00E007001C00E007001C00E007001C00E0 + 07001C00E007001C00E007001C00E007001C00E007001C00E007001C00E007001C00E007001C00E007001C00E007001C01E0 + FFFFFFFFE007001C000007001C000007001C000007001C000007001C000007001C000007001C000007001C00C007003C01E0 + 03803E01E001801E00E000E00B0040007031C080000FC07F00> 40 35 0 0 40.645] 14 @dc +3 @bop1 +cmr12.300 @sf +164 307 p (program) s +15 r (has) s +15 r 98 c +1 r (een) s +15 r (in) s +0 r 118 c +-2 r (ok) s +-1 r (ed.) s +20 r (Other) s +15 r (parameters) s +15 r (whic) s +0 r 104 c +14 r (the) s +15 r (user) s +15 r (ma) s +-1 r 121 c +14 r (sp) s +2 r (ecify) s +15 r (on) s +164 367 p (the) s +15 r (command) s +15 r (line) s +15 r (include:) s +21 r (the) s +15 r (output) s +15 r (\014le) s +16 r (name;) s +15 r 97 c +15 r (base) s +15 r 112 c +2 r (osition) s +15 r (or) s +15 r (sequence) s +164 428 p (string) s +17 r (on) s +18 r (whic) s +0 r 104 c +16 r (the) s +17 r (trace) s +18 r (is) s +17 r (to) s +18 r 98 c +1 r 101 c +18 r (cen) s +-1 r (tered;) s +17 r 97 c +18 r (default) s +17 r (trace) s +18 r (magni\014cation;) s +18 r 97 c +164 488 p (5') s +17 r 118 c +-1 r (ector) s +16 r (sequence) s +16 r (for) s +17 r (automated) s +17 r (elimination) s +16 r (of) s +17 r (the) s +16 r (sequence) s +17 r (head) s +17 r (\(v) s +-1 r (ector\);) s +164 548 p (top) s +16 r (or) s +17 r 98 c +1 r (ottom) s +17 r (strand) s +16 r (orien) s +0 r (tation;) s +15 r (or) s +17 r (an) s +-1 r 121 c +16 r (of) s +16 r (the) s +16 r (usual) s +17 r (X-windo) s +-1 r 119 c +16 r (parameters) s +164 608 p (\(e.g.) s +22 r (displa) s +-1 r 121 c +-4 r 44 c +15 r (geometry) s +-3 r (...\).) s +237 668 p (The) s +21 r (graphics) s +21 r (displa) s +-1 r 121 c +20 r (\(Figure) s +21 r (1\)) s +21 r (consists) s +21 r (of) s +21 r (the) s +21 r (con) s +-1 r (trol) s +20 r (panel,) s +22 r (the) s +21 r (base) s +164 729 p 112 c +1 r (osition) s +12 r (information,) s +12 r (the) s +11 r (original) s +12 r (and) s +11 r (edited) s +12 r (sequence) s +11 r (data,) s +12 r (and) s +12 r (the) s +11 r (graphical) s +164 789 p (represen) s +0 r (tation) s +16 r (of) s +18 r (the) s +18 r (trace.) s +27 r (The) s +17 r (user) s +18 r (ma) s +0 r 121 c +17 r 98 c +1 r (egin) s +18 r 98 c +0 r 121 c +16 r (using) s +18 r (the) s +18 r (con) s +0 r (trol) s +17 r (panel) s +164 849 p (INPUT) s +11 r (button) s +12 r (to) s +11 r (input) s +12 r 97 c +11 r (new) s +11 r (trace) s +12 r (\014le) s +11 r (at) s +12 r (whic) s +-1 r 104 c +11 r (time) s +11 r (the) s +11 r (user) s +12 r (selects) s +11 r (whether) s +164 909 p (to) s +21 r (view) s +21 r (the) s +21 r (sequence) s +21 r (and) s +21 r (trace) s +21 r (in) s +21 r (top) s +21 r (or) s +21 r 98 c +1 r (ottom) s +21 r (strand) s +21 r (orien) s +-1 r (tation.) s +35 r (The) s +164 969 p (trace) s +18 r (\014le) s +18 r (is) s +18 r (displa) s +0 r 121 c +-2 r (ed) s +17 r (and,) s +19 r (if) s +18 r 97 c +18 r (5') s +18 r 118 c +0 r (ector) s +17 r (sequence) s +18 r (has) s +18 r 98 c +1 r (een) s +18 r (sp) s +2 r (eci\014ed) s +18 r (on) s +18 r (the) s +164 1029 p (command) s +17 r (line,) s +18 r (the) s +17 r (program) s +18 r (attempts) s +17 r (to) s +17 r (select) s +18 r 97 c +17 r (cuto\013) s +17 r 112 c +2 r (oin) s +-1 r 116 c +17 r (corresp) s +1 r (onding) s +164 1090 p (to) s +16 r (the) s +16 r 118 c +-1 r (ector) s +15 r (sequence) s +16 r (at) s +16 r (the) s +15 r (\\head") s +16 r (of) s +16 r (the) s +16 r (trace) s +15 r (\014le.) s +22 r (The) s +16 r (bases) s +15 r 98 c +2 r (ey) s +-1 r (ond) s +15 r (the) s +164 1150 p (\\cuto\013) s +3 r 34 c +17 r 112 c +1 r (oin) s +0 r 116 c +15 r (are) s +16 r (displa) s +-1 r 121 c +-1 r (ed) s +15 r (on) s +16 r 97 c +16 r (shaded) s +16 r (bac) s +0 r (kground.) s +21 r (The) s +16 r (user) s +16 r (ma) s +-1 r 121 c +16 r (mo) s +1 r (dify) s +164 1210 p (the) s +18 r (cuto\013) s +18 r 112 c +1 r (osition) s +18 r 98 c +0 r 121 c +17 r (clic) s +-1 r (king) s +17 r (on) s +18 r (the) s +18 r (\\Adj) s +18 r (left) s +18 r (cut") s +18 r (button) s +17 r (and) s +18 r (clic) s +0 r (king) s +17 r (on) s +164 1270 p (the) s +19 r 112 c +2 r (osition) s +19 r (of) s +20 r (the) s +19 r (desired) s +19 r (cuto\013.) s +31 r (Similarly) s +-3 r 44 c +20 r (the) s +19 r (user) s +19 r (ma) s +0 r 121 c +18 r (adjust) s +20 r (the) s +19 r (righ) s +0 r 116 c +164 1330 p (cuto\013) s +17 r (of) s +16 r (the) s +17 r (sequence) s +17 r (\(c) s +-1 r (hosen) s +16 r 98 c +0 r 121 c +15 r (starting) s +17 r (at) s +16 r (the) s +17 r (5') s +17 r (end) s +16 r (of) s +17 r (the) s +17 r (sequence) s +16 r (and) s +164 1391 p (lo) s +1 r (oking) s +20 r (for) s +19 r (the) s +19 r (\014rst) s +20 r 111 c +1 r (ccurrence) s +20 r (when) s +19 r 50 c +19 r (out) s +20 r (of) s +19 r 53 c +19 r (bases) s +20 r (are) s +19 r ('N'\)) s +19 r 98 c +0 r 121 c +18 r (scrolling) s +164 1451 p (along) s +22 r (the) s +21 r (sequence) s +22 r (to) s +22 r (that) s +21 r 112 c +2 r (oin) s +-1 r (t,) s +22 r (clic) s +0 r (king) s +20 r (on) s +22 r (the) s +22 r (\\Adj) s +21 r (righ) s +0 r 116 c +21 r (cut") s +21 r (button,) s +164 1511 p (and) s +16 r (clic) s +0 r (king) s +15 r (on) s +17 r (the) s +16 r (appropriate) s +17 r (base.) s +22 r (Automation) s +16 r (of) s +17 r (the) s +16 r (\\cuto\013) s +4 r 34 c +16 r (pro) s +1 r (cess) s +17 r (is) s +164 1571 p (optional;) s +16 r (the) s +16 r (user) s +17 r (ma) s +-1 r 121 c +16 r (compile) s +16 r (the) s +16 r (program) s +16 r (with) s +17 r (that) s +16 r (feature) s +16 r (turned) s +16 r (\\o\013.") s +237 1631 p (Clic) s +0 r (king) s +13 r (on) s +15 r (the) s +14 r (\\Edit) s +15 r (seq") s +15 r (button) s +14 r (allo) s +0 r (ws) s +13 r (the) s +15 r (user) s +15 r (to) s +14 r (en) s +0 r (ter) s +13 r (the) s +15 r (edit) s +14 r (mo) s +2 r (de.) s +164 1692 p (The) s +14 r (\\Searc) s +0 r (h") s +14 r (button) s +14 r (can) s +15 r 98 c +1 r 101 c +15 r (used) s +14 r (to) s +15 r (skip) s +14 r (from) s +15 r (\\problem") s +14 r (to) s +15 r (\\problem") s +14 r (\(i.e.,) s +164 1752 p (am) s +0 r (biguit) s +-2 r 121 c +17 r (to) s +17 r (am) s +-1 r (biguit) s +-1 r (y\)) s +16 r (or) s +18 r (to) s +17 r (lo) s +1 r (ok) s +18 r (for) s +17 r (runs) s +17 r (of) s +17 r (iden) s +0 r (tical) s +16 r (bases) s +18 r (\(e.g.,) s +17 r (TTTT\)) s +164 1812 p (whic) s +0 r 104 c +15 r (are) s +16 r (often) s +16 r (mis-called) s +17 r 98 c +-1 r 121 c +16 r (the) s +16 r (mac) s +-1 r (hine) s +16 r (soft) s +-1 r 119 c +-1 r (are.) s +237 1872 p (Bases) s +20 r (can) s +20 r 98 c +1 r 101 c +20 r (inserted,) s +21 r (deleted,) s +21 r (or) s +20 r (replaced) s +20 r (as) s +20 r (with) s +20 r (an) s +0 r 121 c +19 r (ordinary) s +20 r 119 c +-1 r (ord-) s +164 1932 p (pro) s +1 r (cessor.) s +26 r (In) s +18 r (di\016cult-to-read) s +17 r (areas,) s +18 r (the) s +18 r (trace) s +18 r (ma) s +-1 r 121 c +17 r 98 c +1 r 101 c +18 r 118 c +-1 r (ertically) s +17 r (or) s +18 r (horizon-) s +164 1992 p (tally) s +23 r (scaled) s +22 r 98 c +0 r 121 c +22 r (dragging) s +23 r (or) s +22 r (clic) s +0 r (king) s +22 r (on) s +23 r (the) s +22 r (magni\014cation) s +23 r (scroll) s +23 r (bar) s +23 r (or) s +22 r 98 c +0 r 121 c +164 2053 p (clic) s +0 r (king) s +17 r (on) s +18 r (the) s +18 r 118 c +0 r (ertical) s +17 r (scaling) s +18 r (buttons) s +18 r (\(\\Scale) s +18 r (do) s +0 r (wn",) s +17 r (\\Scale) s +18 r (up"\),) s +19 r (resp) s +1 r (ec-) s +164 2113 p (tiv) s +0 r (ely) s +-4 r 46 c +19 r (Finally) s +-3 r 44 c +12 r (the) s +12 r (edited) s +12 r (sequence) s +12 r (is) s +12 r (sa) s +0 r 118 c +-2 r (ed) s +11 r (to) s +12 r (an) s +12 r (ascii) s +12 r (\014le) s +12 r (using) s +13 r (the) s +12 r (\\Output") s +164 2173 p (button.) s +21 r 65 c +16 r (history) s +16 r (of) s +16 r (the) s +15 r (editing) s +16 r (session) s +16 r (can) s +15 r (also) s +16 r 98 c +1 r 101 c +16 r (sa) s +0 r 118 c +-2 r (ed) s +15 r (along) s +16 r (with) s +16 r (the) s +15 r (se-) s +164 2233 p (quence.) s +30 r (The) s +19 r (\\Quit") s +18 r (button) s +19 r (is) s +19 r (used) s +19 r (to) s +19 r (exit) s +19 r (the) s +19 r (program.) s +30 r (When) s +19 r (rein) s +-1 r 118 c +-1 r (oking) s +164 2293 p (ted) s +16 r (on) s +16 r (an) s +15 r (edited) s +16 r (trace) s +16 r (\014le) s +16 r (the) s +15 r (edited) s +16 r (base) s +16 r (sequence,) s +16 r (rather) s +16 r (than) s +15 r (the) s +16 r (original) s +164 2354 p (sequence,) s +18 r (is) s +17 r (sho) s +0 r (wn) s +16 r (in) s +18 r (the) s +17 r (edited) s +18 r (base) s +17 r (windo) s +0 r (w.) s +24 r (The) s +18 r (user) s +17 r (ma) s +0 r 121 c +17 r (in) s +-1 r 118 c +-1 r (ok) s +-2 r 101 c +17 r (ted) s +17 r 98 c +0 r 121 c +164 2414 p (calling) s +16 r (in) s +16 r (an) s +0 r 121 c +15 r (one) s +17 r (of) s +16 r (the) s +16 r (previous) s +16 r (editing) s +17 r (sessions.) s +961 2599 p 51 c +@eop +2 @bop0 +cmbx12.300 @sf +[ 40 34 -2 0 43.816] 72 @dc +[ 300 ] /cmti12.300 @newfont +cmti12.300 @sf +[<00FE0000000381C0000006003000001C000800001800040000380002000070000100007000008000F000008000E000004000 + E000004000E000002000E000002000E000000000F000000000F000000000F000000000F000000000F0000000007800000000 + 780000000078000000003C000008003C000004001E000004000E000004000F000004000700000E000380000E0001C0000E00 + 00E0000E000070001F000038002700000E006300000380810000007F0080> 40 36 -7 1 34.869] 67 @dc +[<07C000187000301800700C00700E00700700F00780F00380F003C0F003C07801E07801E07801E03801E03C01E01C01E00E01 + C00701C003818001C300007E00> 24 21 -5 0 24.906] 111 @dc +[<3003001E00700700310038038030803803807080380380704038038038401C01C038201C01C01C001C01C01C001C01C01C00 + 0E00E00E000E00E00E000E00E00E000E00E00E00870070070087007007008780780700878078070047606606002610C10C00 + 1C0F80F800> 40 21 -5 0 39.850] 109 @dc +[ 24 31 -1 10 24.906] 112 @dc +[<07C3C00C26201C1E201C0E10180E101C0E101C07081C07001C07001C07000E03800E03800E03800703808701C08701C08381 + C04381C04380E02300E01E0060> 24 21 -5 0 26.152] 117 @dc +[<1E003100708070407020702038103800380038001C001C001C001C000E000E000E000E0007000700FFF80700038003800380 + 038001C001C001C001C000C0> 16 31 -4 0 16.189] 116 @dc +[<1C00320071007080708070803840380038001C001C001C000E000E008700870087004300430023001C000000000000000000 + 000000000000000001C001C001E000C0> 16 33 -5 0 14.944] 105 @dc +[<3003C0700620380610380E10380E083807081C07041C03801C03801C03800E01C00E01C00E01C00E01C08700E08700E08780 + E08780E04740C02631C01C0F00> 24 21 -5 0 27.397] 110 @dc +[<3F800060E000F03000F01800701C00000E00000E00000E0000070000070001E700061700060B800E07800E03801E03801E01 + C01E01C01E01C01E01C00F00E00F00E00F00E00700E007807003807001C07001C07000E0B80030B8001F18> 24 31 -2 10 22.416] 103 @dc +[ 40 34 -3 0 36.783] 68 @dc +[<0F80306070186004E002E002E000E000E000E000F000F000FFE0F018780438023C021C020E02038400F8> 16 21 -6 0 22.416] 101 @dc +[<1FC000203000400800E00400F00600F00600700700000700000F00003E0003FE0007FC000FF0000F00000C00000C03000C03 + 8004018002008001830000FC00> 24 21 -3 0 19.925] 115 @dc +[<0F0780308C40305C40703C20701C20F01C20F00E10F00E00F00E00F00E007807007807007807003807003C03801C03800E03 + 800E03800705C00185C000F8C0> 24 21 -5 0 24.906] 97 @dc +[<0F0780308C40305C40703C20701C20F01C20F00E10F00E00F00E00F00E007807007807007807003807003C03801C03800E03 + 800E03800705C00185C000F9C00001C00000E00000E00000E00000E000007000007000007000007000003800003800003800 + 03F800003C> 24 35 -5 0 24.906] 100 @dc +[ 24 34 -3 0 18.772] 73 @dc +[<38006400E200E200E200E200710070007000700038003800380038001C001C001C001C000E000E000E000E00070007000700 + 0700038003800380038001C001C001C01FC001E0> 16 35 -4 0 12.453] 108 @dc +[ 8 5 -6 0 14.944] 46 @dc +[<00FE0000000381C1000006002300001C0013800018000F800038000780007000078000700003C000F00003C000E00003C000 + E00003C000E00001E000E00001E000E00001E000F0003FFC00F000000000F000000000F000000000F0000000007800000000 + 780000000078000000003C000008003C000004001E000004000E000004000F000004000700000E000380000E0001C0000E00 + 00E0000E000070001F000038002700000E006300000380810000007F0080> 40 36 -7 1 37.694] 71 @dc +[<3000007000003800003800003800003800001C00001C00001C00001C00000E00000E00000E00000E00008700008701808703 + C08783C04741C02620801C1F00> 24 21 -5 0 20.548] 114 @dc +[<6003C0E00620700610700E10700E087007083807043803803803803803801C01C01C01C01C01C01C01C00E00E00E00E00F00 + E00F00E007C0C0072180071F0007000003800003800003800003800001C00001C00001C00001C00000E00000E00000E0000F + E00000F000> 24 35 -3 0 24.906] 104 @dc +[<0FC000183000300C00700200700100F00100F00000F00000F00000F000007800007800007800003800003C00001C07800E07 + 8007038003018001C100007E00> 24 21 -5 0 22.416] 99 @dc +[<3C00000062000000F3000000F18000007180000001C0000001C0000000C0000000E0000000E0000000E0000000E000000070 + 0000007000000070000000700000007000000038000000380000003800000038000000380000001C0000001C0000001C0000 + 001C0000001C0000000E0000000E0000000E000000FFF000000E000000070000000700000007000000070000000700000003 + 800000038000000380000001860000018F000000CF000000470000003E00> 32 45 2 10 14.944] 102 @dc +cmr12.300 @sf +[<7FF3FF8007003800070038000700380007003800070038000700380007003800070038000700380007003800070038000700 + 380007003800070038000700380007003800070038000700380007003800FFFFF80007003800070038000700380007003800 + 0700380007003800070038000700380007003800038078000180780000C0780000703800001FD800> 32 35 0 0 27.097] 13 @dc +[<01800003C00003C00003C00003C00003C00003C00003C00003C00001C00001C00001C00001C00000C00000C00000E0000060 + 00006000006000002000003000001000000800000800000400800200800200800100C001004000807FFFC07FFFC07FFFE060 + 0000400000> 24 35 -3 1 24.387] 55 @dc +2 @bop1 +cmr12.300 @sf +164 307 p (in) s +0 r 118 c +-3 r (aluable.) s +35 r 84 c +-3 r (ed) s +20 r (\(a) s +21 r 84 c +-3 r (race-EDitor\)) s +21 r 119 c +-1 r (as) s +20 r (dev) s +0 r (elop) s +0 r (ed) s +21 r (to) s +22 r (\014ll) s +21 r (this) s +21 r (role) s +21 r (in) s +21 r (the) s +21 r (C.) s +164 367 p (elegans) s +16 r (genome) s +16 r (sequencing) s +17 r (pro) s +2 r (ject) s +17 r ([1].) s +cmbx12.300 @sf +164 497 p (METHODS) s +cmti12.300 @sf +164 590 p (Computing) s +15 r (Design) s +14 r (and) s +15 r (Implementation.) s +cmr12.300 @sf +21 r (When) s +13 r (designing) s +13 r (ted,) s +14 r 119 c +0 r 101 c +12 r (had) s +13 r 97 c +14 r 110 c +-1 r (um-) s +164 650 p 98 c +1 r (er) s +11 r (of) s +11 r (sp) s +2 r (eci\014c) s +11 r (computing) s +11 r (goals) s +11 r (in) s +11 r (mind) s +11 r (including) s +11 r 112 c +1 r (ortabilit) s +0 r 121 c +10 r (and) s +11 r (adaptabilit) s +0 r 121 c +-4 r 46 c +164 710 p 70 c +-3 r (or) s +14 r 112 c +1 r (ortabilit) s +0 r 121 c +-4 r 44 c +14 r 119 c +-1 r 101 c +14 r 99 c +0 r (hose) s +14 r (to) s +15 r (write) s +15 r (ted) s +14 r (in) s +15 r (ANSI) s +15 r 67 c +15 r (using) s +15 r (the) s +15 r 88 c +14 r (windo) s +0 r (wing) s +14 r (sys-) s +164 770 p (tem) s +17 r (and) s +17 r (the) s +17 r (Xa) s +-1 r 119 c +16 r (to) s +1 r (olkit.) s +24 r 88 c +17 r (pro) s +-1 r (vides) s +16 r (basic) s +17 r (capabilities) s +17 r (for) s +17 r (the) s +17 r (creation) s +17 r (and) s +164 830 p (use) s +18 r (of) s +18 r (windo) s +0 r (ws,) s +17 r (and) s +18 r (the) s +18 r (to) s +1 r (olkit) s +18 r (con) s +0 r (tains) s +17 r 97 c +18 r 110 c +0 r (um) s +-2 r 98 c +1 r (er) s +18 r (of) s +17 r (pre-pac) s +0 r 107 c +-3 r (aged) s +17 r (comp) s +2 r (o-) s +164 891 p (nen) s +0 r (ts,) s +19 r (suc) s +-1 r 104 c +18 r (as) s +20 r (the) s +19 r (\\sliders") s +19 r (used) s +19 r (for) s +19 r (scrolling.) s +31 r 88 c +19 r (also) s +19 r (allo) s +0 r (ws) s +18 r (site,) s +20 r (user) s +19 r (and) s +164 951 p 112 c +1 r (er-run) s +19 r (defaults) s +18 r (to) s +19 r 98 c +1 r 101 c +19 r (set.) s +28 r (Adaptabilit) s +0 r 121 c +17 r (is) s +19 r (also) s +18 r (an) s +18 r (imp) s +2 r (ortan) s +-1 r 116 c +18 r (goal) s +18 r (since) s +19 r 119 c +0 r 101 c +164 1011 p (are) s +18 r (pro) s +0 r (viding) s +17 r 97 c +18 r (new) s +18 r (function) s +18 r (to) s +18 r (researc) s +-1 r 104 c +17 r (groups) s +18 r (who) s +18 r (are) s +18 r (constan) s +0 r (tly) s +17 r (adding) s +164 1071 p (new) s +16 r (requiremen) s +0 r (ts.) s +237 1131 p (St) s +0 r (ylistically) s +-4 r 44 c +21 r 119 c +0 r 101 c +20 r (ha) s +0 r 118 c +-2 r 101 c +20 r (follo) s +0 r 119 c +-1 r (ed) s +20 r (an) s +21 r (\\Abstract) s +21 r (Data) s +21 r 84 c +0 r (yp) s +0 r (e") s +22 r (discipline.) s +36 r (In) s +164 1192 p (this) s +20 r (discipline,) s +20 r 97 c +20 r (program) s +20 r (is) s +19 r (split) s +20 r (in) s +0 r (to) s +18 r 97 c +20 r 110 c +0 r (um) s +-2 r 98 c +1 r (er) s +19 r (of) s +20 r (mo) s +1 r (dules) s +20 r (whic) s +0 r 104 c +18 r (pro) s +0 r (vide) s +164 1252 p (separate,) s +15 r 119 c +0 r (ell-de\014ned) s +14 r (functions.) s +22 r 87 c +-3 r 101 c +14 r (separate) s +15 r (the) s +15 r (in) s +0 r (terface) s +14 r (of) s +16 r 97 c +15 r (mo) s +1 r (dule) s +15 r (from) s +164 1312 p (its) s +15 r (implemen) s +0 r (tation.) s +20 r 70 c +-3 r (or) s +15 r (example,) s +15 r 97 c +16 r (uni\014ed) s +15 r (in) s +0 r (ternal) s +15 r (sequence) s +15 r (format) s +15 r (is) s +16 r (used.) s +164 1372 p (This) s +19 r (can) s +20 r (store) s +19 r 97 c +19 r 118 c +-1 r (arying) s +18 r (amoun) s +0 r 116 c +18 r (of) s +19 r (information.) s +31 r (Ho) s +0 r 119 c +-2 r (ev) s +-1 r (er,) s +19 r (there) s +19 r (is) s +20 r 97 c +19 r (clear) s +164 1432 p (and) s +17 r (simple) s +16 r (in) s +0 r (terface) s +15 r 98 c +0 r 121 c +16 r (whic) s +-1 r 104 c +16 r (the) s +16 r (rest) s +17 r (of) s +16 r (the) s +17 r (program) s +17 r (accesses) s +16 r (this) s +17 r (mo) s +1 r (dule.) s +164 1492 p (Suc) s +0 r 104 c +22 r 97 c +23 r (st) s +-1 r (yle) s +23 r (is) s +23 r (not) s +23 r 119 c +-1 r (ell) s +23 r (supp) s +1 r (orted) s +23 r 98 c +0 r 121 c +22 r (C,) s +23 r (but) s +23 r (its) s +23 r (adoption) s +23 r (has) s +23 r 98 c +2 r (een) s +23 r 118 c +0 r (ery) s +164 1553 p (successful.) s +21 r (The) s +15 r (addition) s +15 r (of) s +14 r (new) s +15 r (sequencing) s +15 r (mac) s +-1 r (hines,) s +14 r (and) s +15 r (th) s +0 r (us) s +14 r (new) s +14 r (external) s +164 1613 p (data) s +18 r (formats,) s +17 r (ma) s +0 r 121 c +17 r (cause) s +17 r (some) s +18 r 99 c +-1 r (hanges) s +17 r (in) s +18 r (the) s +17 r (in) s +0 r (ternal) s +16 r (represen) s +0 r (tation) s +17 r (of) s +17 r (the) s +164 1673 p (sequence) s +16 r (but) s +16 r (should) s +17 r (not) s +16 r (a\013ect) s +16 r (the) s +17 r (rest) s +16 r (of) s +16 r (the) s +16 r (program.) s +237 1733 p 84 c +-3 r (ed) s +17 r (accepts) s +17 r 97 c +18 r (large) s +18 r 110 c +-1 r (um) s +-1 r 98 c +0 r (er) s +18 r (of) s +18 r (optional) s +17 r (command) s +18 r (line) s +18 r (argumen) s +-1 r (ts,) s +17 r (man) s +0 r 121 c +164 1793 p (of) s +18 r (whic) s +0 r 104 c +17 r (can) s +18 r (also) s +18 r 98 c +1 r 101 c +18 r (sp) s +2 r (eci\014ed) s +18 r (as) s +18 r (system) s +18 r (defaults.) s +27 r (This) s +18 r (supp) s +1 r (orts) s +19 r 97 c +18 r (mo) s +1 r (de) s +18 r (of) s +164 1854 p 119 c +0 r (orking) s +20 r (whereb) s +-1 r 121 c +20 r (ted) s +22 r (is) s +21 r (in) s +-1 r 118 c +-1 r (ok) s +-1 r (ed) s +20 r (not) s +21 r (directly) s +21 r 98 c +-1 r 121 c +21 r (the) s +21 r (user) s +21 r (but) s +21 r (instead) s +21 r 98 c +-1 r 121 c +21 r 97 c +164 1914 p (script) s +21 r (or) s +21 r (another) s +20 r (application) s +21 r (whic) s +0 r 104 c +20 r (supplies) s +20 r (argumen) s +0 r (ts) s +20 r (appropriate) s +21 r (to) s +20 r (the) s +164 1974 p (editing) s +16 r (task.) s +cmti12.300 @sf +237 2034 p (Gr) s +-1 r (aphic) s +-3 r (al) s +22 r (Interfac) s +-2 r (e.) s +cmr12.300 @sf +37 r 84 c +-3 r (ed) s +21 r (curren) s +0 r (tly) s +21 r (accepts) s +21 r (data) s +22 r (from) s +22 r 116 c +0 r 119 c +-2 r 111 c +21 r (\015uorescence) s +164 2094 p (based) s +18 r (sequencing) s +19 r (mac) s +-1 r (hines,) s +18 r (the) s +19 r (Pharmacia) s +18 r (A.L.F.) s +19 r (and) s +18 r (the) s +19 r (ABI) s +18 r (373A.) s +18 r (The) s +164 2155 p (sequencing) s +12 r (mac) s +0 r (hine) s +12 r (data) s +12 r (consists) s +13 r (of) s +12 r (four) s +13 r (traces) s +12 r (of) s +13 r (\015uorescence) s +12 r (lev) s +0 r (els) s +12 r (together) s +164 2215 p (with) s +14 r (the) s +14 r (mac) s +0 r (hine's) s +13 r (in) s +-1 r (terpretation,) s +14 r (whic) s +0 r 104 c +13 r (is) s +14 r 97 c +14 r (sequence) s +14 r (of) s +14 r (bases.) s +21 r 84 c +-3 r (ed) s +13 r (displa) s +-1 r (ys) s +164 2275 p (the) s +16 r (traces) s +15 r (and) s +16 r (the) s +16 r (mac) s +-1 r (hine-generated) s +15 r (base) s +16 r (list.) s +21 r 65 c +16 r (second,) s +16 r (initially) s +15 r (iden) s +0 r (tical,) s +164 2335 p (list) s +16 r (of) s +16 r (bases) s +17 r (is) s +16 r (pro) s +0 r (vided) s +15 r (for) s +16 r (correction) s +16 r 98 c +0 r 121 c +15 r (the) s +17 r (user.) s +237 2395 p 84 c +-3 r (ed) s +13 r (has) s +13 r (an) s +14 r 88 c +13 r (windo) s +0 r (ws) s +12 r (based) s +14 r (graphical) s +14 r (in) s +-1 r (terface.) s +20 r (The) s +13 r (trace) s +14 r (\014le) s +14 r (can) s +13 r (either) s +164 2455 p 98 c +1 r 101 c +12 r (input) s +12 r (from) s +11 r (the) s +12 r (command) s +12 r (line) s +11 r (or) s +12 r 98 c +0 r 121 c +10 r (clic) s +0 r (king) s +11 r (on) s +11 r (the) s +12 r (INPUT) s +11 r (button) s +12 r (after) s +12 r (the) s +961 2599 p 50 c +@eop +1 @bop0 +[ 622 ] /cmr10.622 @newfont +cmr10.622 @sf +[ 64 61 -3 0 64.569] 65 @dc +[<0003F000000FF800001F0C00003E0600007C030000FC030000F8030000F8018001F8018001F8018001F8018001F8018001F8 + 018001F8018001F8018001F8018001F8018001F8000001F8000001F8000001F8000001F8000001F8000001F8000001F80000 + 01F8000001F8000001F8000001F8000001F8000001F8000001F8000001F8000001F8000001F80000FFFFFF00FFFFFF001FFF + FF0007F8000003F8000001F8000000F8000000F8000000780000007800000038000000380000003800000038000000180000 + 00180000001800000018000000180000> 32 54 -2 1 33.480] 116 @dc +[ 32 38 -2 0 33.719] 114 @dc +[<007F800F8003FFE01FE00FE0703FF01F80187E383F000C7C187F0006F80C7E0006F80CFE0003F80CFE0003F80CFE0001F80C + FE0001F80CFE0001F80C7E0001F80C7F0001F8007F0001F8003F8001F8001F8001F8000FC001F80007F001F80003F801F800 + 00FF01F800003FF9F8000007FFF80000001FF800000001F800000001F800000001F800000001F8000F0001F8001F8001F800 + 3FC001F0003FC003F0003FC003F0003FC003E0001F8007C0001E000F80000E001F000007C07E000001FFF80000003FC00000> 40 40 -4 1 43.046] 97 @dc +[<0003FC0000001FFF0000007E03C00000F800E00001F000300003E000180007C0000C000FC00006001F800006001F80000300 + 3F000003003F000003007F000000007F000000007E00000000FE00000000FE00000000FE00000000FE00000000FE00000000 + FE00000000FE00000000FE00000000FE000000007E000000007E000000007F000000007F000000003F000078003F0000FC00 + 1F8001FE000F8001FE000FC001FE0007C001FE0003E000FC0001F0003C0000F8003800003E01F000001FFFC0000003FE0000> 40 40 -3 1 38.263] 99 @dc +[<0001FE0000000FFF8000003F01E000007800700001F000180003E0000C0007C00006000FC00003000F800003001F80000180 + 3F000001803F000001807F000000007F000000007E000000007E00000000FE00000000FE00000000FE00000000FE00000000 + FE00000000FFFFFFFF80FFFFFFFF80FE00001F80FE00001F807E00001F807E00001F807F00001F003F00001F003F00001F00 + 1F00003F001F80003E000F80003E0007C0007C0003E000780001E000F00000F801E000007E07C000001FFF00000003FC0000> 40 40 -2 1 38.263] 101 @dc +[<0003FC07E000001FFF07FFC0003E0387FFC000F800E7FFC001F00077F80003E0001FE00007C0001FE0000F80000FE0001F80 + 000FE0001F000007E0003F000007E0003F000007E0007F000007E0007E000007E0007E000007E000FE000007E000FE000007 + E000FE000007E000FE000007E000FE000007E000FE000007E000FE000007E000FE000007E000FE000007E0007E000007E000 + 7E000007E0007F000007E0003F000007E0003F000007E0001F800007E0000F800007E0000FC0000FE00007C0000FE00003E0 + 001FE00001F00037E00000FC0067E000003F03C7E000000FFF87E0000001FC07E00000000007E00000000007E00000000007 + E00000000007E00000000007E00000000007E00000000007E00000000007E00000000007E00000000007E00000000007E000 + 00000007E00000000007E00000000007E00000000007E00000000007E00000000007E0000000001FE000000003FFE0000000 + 03FFE000000003FFE00000000007E000> 48 61 -3 1 47.829] 100 @dc +[ 24 58 -1 0 23.914] 105 @dc +[ 32 40 -3 1 33.958] 115 @dc +[ 48 55 -2 17 47.829] 112 @dc +[ 24 60 -1 0 23.914] 108 @dc +[<07C0000000001FF000000000383800000000700C000000007C0600000000FE0700000000FE0300000000FE0180000000FE01 + 800000007C00C00000001000C00000000000C000000000006000000000006000000000003000000000003000000000003000 + 000000001800000000001800000000003C00000000003C00000000003C00000000007E00000000007E0000000000FF000000 + 0000FF0000000000FF0000000001F98000000001F98000000003F9C000000003F0C000000003F0C000000007E06000000007 + E0600000000FE0700000000FC0300000000FC0300000001F80180000001F80180000003F801C0000003F000C0000003F000C + 0000007E00060000007E0006000000FE0007000000FC0003000000FC0003000001F80001800001F80001800003F80001C000 + 03F80003E00007FC0007F800FFFF801FFF00FFFF801FFF00FFFF801FFF00> 48 55 -2 18 45.437] 121 @dc +[ 48 38 -2 0 47.829] 110 @dc +[<0007FF8000003FFFF00001FC00FE0003E0001F000F800007C01F000003E03E000001F07C000000F87C000000F8F80000007C + F80000007CF80000007CF80000007CF80000007C780000007C7C000000F83C000000F83E000001F81F000007F00780003FE0 + 03FFFFFFC000FFFFFF8001FFFFFF0003FFFFFC0007FFFFC00007C00000000F000000000F000000000E000000000E00000000 + 0E0000000006000000000607F00000073FFE0000037C1F000001F007800001E003C00003C001E00007C001F0000FC001F800 + 0F8000F8001F8000FC001F8000FC001F8000FC001F8000FC001F8000FC001F8000FC001F8000FC000F8000F8000FC001F800 + 07C001F00003C001E00801E003E01C00F007F03E007C1F1E3E003FFE0FFC0007F001F8> 40 57 -2 19 43.046] 103 @dc +[<0001FE0000000FFFC000003F03F00000F8007C0001F0003E0003E0001F0007C0000F800F800007C01F800007E01F000003E0 + 3F000003F03F000003F07F000003F87E000001F87E000001F8FE000001FCFE000001FCFE000001FCFE000001FCFE000001FC + FE000001FCFE000001FCFE000001FCFE000001FC7E000001F87E000001F87E000001F83F000003F03F000003F03F000003F0 + 1F000003E00F800007C00F800007C007C0000F8003E0001F0001F0003E0000F8007C00003F03F000000FFFC0000001FE0000> 40 40 -2 1 43.046] 111 @dc +[ 72 38 -2 0 71.743] 109 @dc +[<7FFFFC007FFFFC007FFFFC0001FE000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC + 000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC0000 + 00FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC0000FFFFFC00FFFFFC00FFFFFC0000FC + 000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC000000FC0000 + 00FC0000007C01E0007E03F0007E07F8003E07F8001F07F8000F83F8000781F00003E0E00001FFC000003F00> 32 61 -1 0 26.306] 102 @dc +[<7FFFF8FFFFF07FFFF8FFFFF07FFFF8FFFFF001FE0003FC0000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC + 0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001 + F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F800 + 00FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC + 0001F800FFFFFFFFF800FFFFFFFFF800FFFFFFFFF80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001 + F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F80000FC0001F800 + 00FC0001F800007E0001F800007E0003F800003F0007F800001F0007F800000F8007F8000007C007F8000003F003F8000000 + FC01F80000003FFFB800000007FE0000> 48 61 -1 0 47.829] 13 @dc +[<0001FE03F000000FFF83FFE0001F81E3FFE0003E0073FFE0007C001BFC0000FC001BF00000F8000FF00000F8000FF00001F8 + 0007F00001F80007F00001F80007F00001F80003F00001F80003F00001F80003F00001F80003F00001F80003F00001F80003 + F00001F80003F00001F80003F00001F80003F00001F80003F00001F80003F00001F80003F00001F80003F00001F80003F000 + 01F80003F00001F80003F00001F80003F00001F80003F00001F80003F00001F80003F00001F80003F00001F80003F00001F8 + 0003F00007F8000FF000FFF801FFF000FFF801FFF000FFF801FFF00001F80003F000> 48 39 -2 1 47.829] 117 @dc +[<00000FE0000001803FFC000001C0F03F000001E1C00FC00001E30003E00001F60001F00001FC0000F80001FC0000FC0001F8 + 00007C0001F800007E0001F800003F0001F800003F0001F800003F8001F800001F8001F800001F8001F800001FC001F80000 + 1FC001F800001FC001F800001FC001F800001FC001F800001FC001F800001FC001F800001FC001F800001FC001F800001F80 + 01F800001F8001F800003F8001F800003F0001F800003F0001F800003E0001F800007E0001FC00007C0001FC0000F80001FE + 0001F00001FB0003E00001F9C007C00001F8F01F000001F83FFE000001F80FF0000001F80000000001F80000000001F80000 + 000001F80000000001F80000000001F80000000001F80000000001F80000000001F80000000001F80000000001F800000000 + 01F80000000001F80000000001F80000000001F80000000001F80000000001F80000000007F800000000FFF800000000FFF8 + 00000000FFF80000000001F800000000> 48 61 -2 1 47.829] 98 @dc +[<000007FFFF80000007FFFF80000007FFFF800000001FE0000000000FC0000000000FC0000000000FC0000000000FC0000000 + 000FC0000000000FC0000000000FC0000000000FC0000000000FC0000000000FC0000000000FC0000000000FC0000003F80F + C000001FFE0FC000007E070FC00000F801CFC00001F000CFC00003E0006FC00007C0003FC0000FC0001FC0001F80001FC000 + 1F80000FC0003F00000FC0003F00000FC0007F00000FC0007F00000FC0007E00000FC000FE00000FC000FE00000FC000FE00 + 000FC000FE00000FC000FE00000FC000FE00000FC000FE00000FC000FE00000FC000FE00000FC0007E00000FC0007F00000F + C0007F00000FC0003F00000FC0003F80000FC0001F80001FC0001F80001FC0000FC0001FC00007E00037C00003E00067C000 + 01F00063C00000FC00C3C000003F0381C000000FFF01C0000001FC00C000> 48 55 -3 17 45.437] 113 @dc +[ 48 60 -2 0 47.829] 104 @dc +[ 432 ] /cmr10.432 @newfont +cmr10.432 @sf +[<00FFFFFE0000FFFFFE000000FE000000007C000000007C000000007C000000007C000000007C000000007C000000007C0000 + 00007C000000007C000000007C000000007C000000007C000000007C000000007C000000007C000000007C000000007C0000 + 00007C000000007C000000007C000000007C000000007C000000007C000000007C000080007C000480007C000480007C0004 + 80007C000480007C0004C0007C000CC0007C000C40007C000840007C000860007C001870007C00387C007C00F87FFFFFFFF8 + 7FFFFFFFF8> 40 41 -2 0 43.171] 84 @dc +[ 16 41 -1 0 16.604] 105 @dc +[ 56 26 -1 0 49.812] 109 @dc +[<007F000001C1C000070070000E0038001E003C003C001E003C001E0078000F0078000F00F8000F80F8000F80F8000F80F800 + 0F80F8000F80F8000F80F8000F80F8000F8078000F0078000F003C001E003C001E001C001C000E0038000700700001C1C000 + 007F0000> 32 26 -2 0 29.887] 111 @dc +[<003E0000E10001C08003C0800780400780400780400780400780400780400780400780000780000780000780000780000780 + 00078000078000078000078000078000078000078000FFFF801FFF800F800007800003800003800001800001800001800000 + 8000008000008000008000> 24 37 -1 0 23.246] 116 @dc +[ 32 42 -1 0 33.208] 104 @dc +[<1F00000060800000F0400000F8200000F8100000F81000007008000000080000000400000004000000040000000200000002 + 0000000700000007000000070000000F8000000F8000001E4000001E4000003E6000003C2000003C20000078100000781000 + 00F8180000F0080000F0080001E0040001E0040003E0020003C0020003C0020007800100078003800F8003C0FFF00FF8FFF0 + 0FF8> 32 38 -1 12 31.548] 121 @dc +[<0000FF8000000007FFE01000001FC0383000003E000C700000F80002F00001F00002F00003E00001F00007C00001F0000F80 + 0001F0000F800001F0001F000001F0003F000001F0003E000001F0003E000001F0007E000001F0007E000003F0007C0000FF + FF00FC0000FFFF00FC0000000000FC0000000000FC0000000000FC0000000000FC0000000000FC0000000000FC0000000000 + FC00000000007C00000000007E00000010007E00000010003E00000010003E00000030003F00000030001F00000030000F80 + 000070000F800000700007C00000F00003E00001F00001F00003F00000F80006F000003E000C7000001FC07830000007FFE0 + 30000000FF001000> 48 43 -3 1 46.906] 71 @dc +[ 16 42 -1 0 16.604] 108 @dc +[<007F0001C0C00700200E00101E00083C00043C00047C0000780000F80000F80000F80000F80000F80000FFFFFCF8003CF800 + 3C78003C78003C3C00383C00781C00700E00F00700E003C380007E00> 24 26 -2 0 26.566] 101 @dc +[<83F800C40700F80180F001C0E000C0C000E0C000E0C000E08001E08001E00007C0003FC003FF800FFF003FFE007FF0007E00 + 00F80000F00040E00040E000406000C06000C03001C01C06C007F840> 24 26 -2 0 23.578] 115 @dc +[ 32 26 -1 0 33.208] 110 @dc +[ 40 41 -2 0 37.359] 76 @dc +[<07F80F001F063FC03C013C407C00F820F800F820F8007820F8007820F8007820780078207C0078003E0078001F0078000F80 + 780003E07800007FF800000078000000780000007800080078001C0078003E0078003E00F0003C00E0001001C0000E078000 + 01FC0000> 32 26 -2 0 29.887] 97 @dc +[ 48 41 -2 0 45.661] 68 @dc +[ 48 41 -2 0 44.831] 72 @dc +[ 24 26 -1 0 23.412] 114 @dc +[<0001FF0000000F01E000003C0078000078003C0000F0001E0003E0000F8007C00007C007800003C00F800003E01F000001F0 + 1F000001F03F000001F83E000000F87E000000FC7E000000FC7E000000FC7C0000007CFC0000007EFC0000007EFC0000007E + FC0000007EFC0000007EFC0000007EFC0000007EFC0000007EFC0000007E7C0000007C7C0000007C7E000000FC7E000000FC + 3E000000F83E000000F81F000001F01F000001F00F800003E007800003C003C000078001E0000F0000E0000E000078003C00 + 003C007800000F01E0000001FF0000> 40 43 -3 1 46.491] 79 @dc +[<007F0001C0C00780200F00101E00083C00043C00047C0000780000F80000F80000F80000F80000F80000F80000F80000F800 + 007800007C00103C00383C007C1E007C0F003C07800801C070007F80> 24 26 -2 0 26.566] 99 @dc +[<0407E00006181C0007200E000740078007C003C0078001C0078001E0078001F0078000F0078000F0078000F8078000F80780 + 00F8078000F8078000F8078000F8078000F8078000F0078000F0078001E0078001E007C003C007A0038007B00700078C1C00 + 0783F00007800000078000000780000007800000078000000780000007800000078000000780000007800000078000000780 + 00000F800000FF800000FF80000007800000> 32 42 -1 0 33.208] 98 @dc +[<03F800000FFE00001C0780001801C0003C00E0003E0070003E0078001C003C0000003C0000001E0000001E0000001E000000 + 0F0000000F0000000F00007E0F8003810F800700CF800E004F801C002F803C001F803C001F8078001F8078000F80F8000F80 + F8000F80F8000F80F8000F80F8000F00F8000F00F8000F00F8000F0078001E0078001E003C001C001C003C001E0038000F00 + 700007C1E00001FFC000007F0000> 32 41 -2 1 29.887] 57 @dc +[<4020101008040404020202027AFEFEFCFC78> 8 18 -5 12 16.604] 44 @dc +[<7FFFE07FFFE001F80000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F0 + 0000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000F00000 + F000F0F000FEF0000FF00001F000007000001000> 24 40 -5 0 29.887] 49 @dc +cmti12.300 @sf +[ 32 34 -9 0 34.869] 84 @dc +1 @bop1 +cmr10.622 @sf +222 508 p 65 c +28 r (trace) s +29 r (displa) s +-1 r 121 c +27 r (and) s +29 r (editing) s +29 r (program) s +29 r (for) s +214 612 p (data) s +29 r (from) s +29 r (\015uorescence) s +28 r (based) s +29 r (sequencing) s +802 716 p (mac) s +-1 r (hines) s +cmr10.432 @sf +464 864 p (Timoth) s +-1 r 121 c +19 r (Gleeson) s +157 r (LaDeana) s +19 r (Hillier) s +765 981 p (Octob) s +2 r (er) s +20 r (9,) s +19 r (1991) s +cmbx12.300 @sf +164 1381 p (ABSTRA) s +-1 r (CT) s +cmr12.300 @sf +164 1473 p (\\T) s +-3 r (ed") s +16 r 40 c +cmti12.300 @sf +0 r 84 c +cmr12.300 @sf +0 r (race) s +cmti12.300 @sf +18 r 101 c +-1 r 100 c +cmr12.300 @sf +0 r (itor\)) s +16 r (is) s +18 r 97 c +17 r (graphical) s +18 r (editor) s +17 r (for) s +18 r (sequence) s +18 r (and) s +17 r (trace) s +18 r (data) s +17 r (from) s +164 1534 p (automated) s +17 r (\015uorescence) s +17 r (sequencing) s +17 r (mac) s +0 r (hines.) s +23 r (It) s +17 r (pro) s +0 r (vides) s +16 r (facilities) s +17 r (for) s +17 r (view-) s +164 1594 p (ing) s +13 r (sequence) s +13 r (and) s +13 r (trace) s +12 r (data) s +13 r (\(in) s +13 r (top) s +13 r (or) s +13 r 98 c +1 r (ottom) s +13 r (strand) s +13 r (orien) s +0 r (tation\),) s +12 r (for) s +13 r (editing) s +164 1654 p (the) s +18 r (base) s +18 r (sequence,) s +19 r (for) s +18 r (automated) s +19 r (or) s +18 r (man) s +0 r (ual) s +17 r (trimming) s +18 r (of) s +18 r (the) s +19 r (head) s +18 r (\(v) s +-1 r (ector\)) s +164 1714 p (and) s +14 r (tail) s +15 r (\(uncertain) s +14 r (data\)) s +14 r (from) s +15 r (the) s +14 r (sequence,) s +15 r (for) s +14 r 118 c +0 r (ertical) s +13 r (and) s +14 r (horizon) s +0 r (tal) s +13 r (trace) s +164 1774 p (scaling,) s +13 r (for) s +13 r 107 c +0 r (eeping) s +12 r 97 c +13 r (history) s +13 r (of) s +12 r (sequence) s +13 r (editing,) s +14 r (and) s +13 r (for) s +12 r (output) s +13 r (of) s +13 r (the) s +13 r (edited) s +164 1835 p (sequence.) s +20 r 84 c +-3 r (ed) s +12 r (has) s +12 r 98 c +1 r (een) s +12 r (used) s +13 r (extensiv) s +-1 r (ely) s +12 r (in) s +12 r (the) s +12 r (C.) s +13 r (elegans) s +12 r (genome) s +12 r (sequencing) s +164 1895 p (pro) s +3 r (ject,) s +20 r 98 c +2 r (oth) s +19 r (as) s +20 r 97 c +20 r (stand-alone) s +19 r (program) s +20 r (and) s +20 r (in) s +-1 r (tegrated) s +19 r (in) s +0 r (to) s +18 r (the) s +20 r (Staden) s +20 r (se-) s +164 1955 p (quence) s +11 r (assem) s +0 r (bly) s +11 r (pac) s +-1 r 107 c +-2 r (age,) s +11 r (and) s +12 r (has) s +11 r (greatly) s +12 r (aided) s +11 r (in) s +12 r (the) s +11 r (e\016ciency) s +12 r (and) s +11 r (accuracy) s +164 2015 p (of) s +16 r (sequence) s +17 r (editing.) s +21 r (It) s +16 r (runs) s +17 r (in) s +16 r (the) s +16 r 88 c +17 r (windo) s +-1 r (ws) s +15 r (en) s +0 r (vironmen) s +-1 r 116 c +15 r (on) s +16 r (Sun) s +17 r 119 c +-1 r (orksta-) s +164 2075 p (tions) s +14 r (and) s +14 r (is) s +14 r 97 c +-1 r 118 c +-3 r (ailable) s +13 r (from) s +14 r (the) s +14 r (authors.) s +21 r 84 c +-3 r (ed) s +13 r (curren) s +-1 r (tly) s +13 r (supp) s +2 r (orts) s +13 r (sequence) s +14 r (and) s +164 2136 p (trace) s +16 r (data) s +16 r (from) s +17 r (the) s +16 r (ABI) s +16 r (373A) s +17 r (and) s +16 r (Pharmacia) s +16 r (A.L.F.) s +16 r (sequencers.) s +cmbx12.300 @sf +164 2261 p (INTR) s +-1 r (ODUCTION) s +cmr12.300 @sf +164 2354 p (Time) s +23 r (in) s +0 r 118 c +-1 r (olv) s +-2 r (ed) s +22 r (in) s +24 r (sequence) s +23 r (editing) s +23 r (is) s +24 r (extensiv) s +-1 r (e,) s +24 r (and) s +24 r (an) s +-1 r (ything) s +22 r (easing) s +24 r (that) s +164 2414 p (burden) s +21 r (will) s +21 r (impro) s +0 r 118 c +-1 r 101 c +20 r (the) s +21 r (e\016ciency) s +21 r (of) s +21 r (an) s +0 r 121 c +20 r (ma) s +3 r (jor) s +21 r (sequencing) s +21 r (pro) s +3 r (ject.) s +37 r (Ha) s +-1 r (v-) s +164 2474 p (ing) s +19 r (sequence) s +19 r (and) s +19 r (trace) s +19 r (data) s +20 r 97 c +-1 r 118 c +-2 r (ailable) s +18 r (online) s +19 r (in) s +19 r (easily-) s +19 r (manipulable) s +19 r (form) s +19 r (is) s +961 2599 p 49 c +@eop +@end diff --git a/doc/ted.tex b/doc/ted.tex new file mode 100644 index 0000000..0a0b291 --- /dev/null +++ b/doc/ted.tex @@ -0,0 +1,213 @@ +\documentstyle[12pt]{article} + +\title{A trace display and editing program for data from fluorescence based +sequencing machines} +\author{Timothy Gleeson \and LaDeana Hillier} + +\begin{document} +\maketitle +\section*{} +\subsection*{} +\subsubsection*{ABSTRACT} + +``Ted'' ({\em T}race {\em ed}itor) +is a graphical editor for sequence and trace data from automated +fluorescence sequencing machines. It provides facilities +for viewing sequence and trace data (in top or bottom strand +orientation), for editing the base sequence, for +automated or manual trimming of the head (vector) and tail +(uncertain data) from the sequence, for vertical and horizontal trace +scaling, for keeping a history of sequence editing, and for output of +the edited sequence. Ted has been used extensively in the C. +elegans genome sequencing project, +both as a stand-alone program and integrated into +the Staden sequence assembly package, and has +greatly aided in the efficiency +and accuracy of sequence editing. It runs in the X +windows environment on Sun workstations and is available from the +authors. Ted currently supports sequence and trace data from the ABI +373A and Pharmacia A.L.F. sequencers. + +\subsubsection*{INTRODUCTION} + Time involved in sequence editing is extensive, and anything easing +that burden will improve the efficiency of any major sequencing +project. Having sequence and trace data available online in easily- +manipulable form is invaluable. Ted (a Trace-EDitor) was developed to +fill this role in the C. elegans genome +sequencing project [1]. + +\subsubsection*{METHODS} + +{\em Computing Design and Implementation.} +When designing ted, we had a number of specific computing goals +in mind including portability and adaptability. For portability, we +chose to write ted in ANSI C using the X windowing system and the +Xaw toolkit. X provides basic capabilities for the creation and use +of windows, and the toolkit contains a number of pre-packaged +components, such as the ``sliders'' used for scrolling. X also allows +site, user and per-run defaults to be set. Adaptability is also an +important goal since we are providing a new function to +research groups who are constantly adding new requirements. + + Stylistically, we have followed an ``Abstract Data Type'' +discipline. In this discipline, a program is split into a number of +modules which provide separate, well-defined functions. We +separate the interface of a module from its implementation. For +example, a unified internal sequence format is used. This can store +a varying amount of information. However, there is a clear and +simple interface by which the rest of the program accesses this +module. Such a style is not well supported by C, but its adoption has +been very successful. The addition of new sequencing machines, and +thus new external data formats, may cause some changes in the +internal representation of the sequence but should not affect +the rest of the program. + + Ted accepts a large number of optional command line arguments, +many of which can also be specified as system defaults. This +supports a mode of working whereby ted is invoked not directly by the +user but instead by a script or another application which supplies +arguments appropriate to the editing task. + + +{\em Graphical Interface.} +Ted currently accepts data from two fluorescence based sequencing +machines, the Pharmacia A.L.F. and the ABI 373A. +The sequencing machine data consists of +four traces of fluorescence levels together with the machine's +interpretation, which is a sequence of bases. +Ted displays +the traces and the machine-generated base list. +A second, initially identical, list of bases is provided for correction +by the user. + + Ted has an X windows based +graphical interface. The trace file +can either be input from the command line or by +clicking on the INPUT button after the program has been invoked. +Other parameters which the user may specify on the +command line include: the output +file name; a base position or sequence string on which the trace is +to be centered; a default trace magnification; a 5' vector sequence +for automated elimination of the sequence head (vector); top or +bottom strand orientation; or any of the usual X-window parameters (e.g. +display, geometry...). + + The graphics display (Figure 1) consists of the control +panel, the base position information, the original and edited sequence +data, and the graphical representation of the trace. The user may +begin by using the control panel INPUT button to input a new trace +file at which time the user selects whether to view the sequence +and trace in top or bottom strand orientation. +The trace file is displayed and, if a 5' vector sequence has been +specified on the command line, the program attempts to select a +cutoff point corresponding to the vector sequence at the ``head'' of the +trace file. The bases beyond the ``cutoff'' point are +displayed on a shaded background. The user may modify the cutoff +position by clicking on the ``Adj left cut'' button and clicking on the +position of the desired cutoff. Similarly, the user may adjust the +right cutoff of the sequence (chosen by starting at the 5' end of the +sequence and looking for the first occurrence when 2 out of 5 bases +are 'N') by scrolling along the sequence to that point, clicking on the +``Adj right cut'' button, and clicking on the appropriate base. +Automation of the ``cutoff'' process is optional; the user may compile +the program with that feature turned ``off.'' + + Clicking on the ``Edit seq'' button allows the user to enter the edit +mode. The ``Search'' button can be used to skip from ``problem'' to +``problem'' (i.e., ambiguity to ambiguity) or to look for runs of +identical bases (e.g., TTTT) which are often mis-called by +the machine software. + + Bases can be inserted, deleted, or replaced as with +any ordinary word-processor. In difficult-to-read areas, +the trace may be vertically or horizontally scaled by dragging or +clicking on the magnification scroll bar or by clicking on the +vertical scaling buttons (``Scale down'', ``Scale up''), respectively. +Finally, the edited sequence is saved to an ascii file using the +``Output'' button. A history of the editing session can also be saved +along with the sequence. +The ``Quit'' button is used +to exit the program. When reinvoking ted on an edited trace file the +edited base sequence, rather than the original sequence, is shown in +the edited base window. The user may invoke ted by calling in any one +of the previous editing sessions. + + +\subsubsection*{APPLICATIONS AND CONCLUSIONS} + + In the C. elegans genome sequencing project, data from the ABI or +A.L.F. sequencing machines' computers are transferred to Sun +workstations. +The user invokes a Unix shell script that calls ted systematically +on each of the new set of trace files creating a set of sequence files. +The sequence files that are deemed to be of acceptable quality +are then entered into the sequence +assembly program xdap [2] where the sequences are assembled into +contigs. Portions of the ted trace-editor have been incorporated +into the xdap ``trace manager,'' which is used in +conjunction with the contig editor to view sets of aligned traces +at sites of discrepancies in the aligned sequences. + + Ted is also used at the stage of choosing oligo primers for the +``walking'' stage of the sequencing project. It can be invoked directly +from the oligo selection program, osp [3], to allow examination +of the trace data in the region of the primers so that +integrity of the sequence data can be verified. + + Currently, no other programs are known to be available +which support editing of the ABI trace data. +Further, the modular design of the program should allow +support for new types of sequencing machines, with new data +formats, to be implemented in a straightforward fashion. + + +\subsubsection*{AVAILABILITY} + Ted is freely available from the authors or from Rodger Staden and +Simon Dear (MRC Laboratory of Molecular Biology, Hills Road, Cambridge, +UK, CB2 2QH) for use on Sun workstations running X-windows (or OpenLook). + + +\subsubsection*{ACKNOWLEDGMENTS} + The authors would like to thank all members of the C. elegans +sequencing project with special thanks to the following people: +John Sulston, Bob Waterston, +Phil Green, Rick Wilson, Richard Durbin, Simon Dear, and Rodger Staden +for their helpful suggestions for improvements in the ted interface +and for their parts in the development of ted. This work was +supported by the Medical Research Council and NIH grant R01-HG00136. + +\subsubsection*{REFERENCES} + +1. Waterston, R., Sulston, J., et al. (1991), in preparation. + +2. Dear, S. and Staden, R. (1991) Nuc. Acids Res., in press. + +3. Hillier, L. and Green, P. (1991) submitted. + + +{\bf Figure 1 legend.} + +Figure 1 shows a ``screen dump'' of the ted graphical interface. +The display consists of +the control panel and the synchronized view of the base position +information, original and edited sequence data, +and graphical representation of the trace (with each nucleotide's trace + being represented +by a different color). The control +panel allows the user to read in new trace files (in either +bottom or top strand orientation) +as well as to search for a string of nucleotides or a certain base position. +Scroll bars allow the user to adjust the magnification of or scroll through +the sequence and trace data. The user may also choose to change the vertical +magnification of the trace data. Further, sequence on the head (vector) +or tail (uncertain data) of the sequence may be ``cutoff'' +using the adjust left and right cutoff buttons. Bases can be inserted, +deleted, or replaced as with +any ordinary word-processor in the sequence data window. Finally, the +sequence may be written to an ascii file using the output button on +the control panel. + +\end{document} + + + diff --git a/help/BAP.RNO b/help/BAP.RNO new file mode 100644 index 0000000..731c8c9 --- /dev/null +++ b/help/BAP.RNO @@ -0,0 +1,2722 @@ +.npa +.left margin1 +@-1. TX 0 @General +.sp +@-2. T 0 @Screen control +.sp +@-2. X 0 @Screen +.sp +@-3. TX 0 @Modification +.sp +@0. TX -1 @BAP +.left margin2 +.PARA +This is an interactive program whose primary use is +for managing shotgun sequencing projects, but it can also be used for +handling alignments of other sequences, including those of proteins. +Currently the maximum 'gel reading' length is set to 4096 characters. +Almost all of the information below describes the use of the program for +shotgun projects, but those using the programs for handling other +sequence +alignments should interpret it accordingly. +The data for such a project is stored in a special type of database. The +program + contains the tools that are required to screen gel readings +against vector sequences and restriction sites, and to assemble +new gel +readings into the database (automatically comparing and aligning +them). In addition it contains editors and functions to examine the quality +of the aligned sequences. +.para + There are three main menus: "general", "screen" and "modification", +and some functions have submenus. +.left margin2 +.lit + The general menu contains the following options: + + Open a database + Display a contig + List a text file + Direct output + Calculate a consensus + Screen against restriction enzymes + Screen against vector + Check logical consistency + Copy database + Show relationships + set parameters + Highlight disagreements + Examine quality + Check Assembly + Find read pairs + +The graphics menu contains: + + Clear graphics + Clear text + Draw ruler + Use cross hair + Change margins + Label diagram + Plot map + Plot single contig + Plot all contigs + + +The modification menu contains: + + Edit contig + Auto assemble + Join contigs + Complement a contig + Alter relationships + Extract gel readings + Find internal joins + Disassemble readings + Shuffle pads + Auto-select oligos + Double strand + +The alter relationships menu contains: + + Cancel + Line change + Check logical consistency + Remove contig + Shift + Move gel reading + Rename gel reading + Break a contig + Remove a gel reading + Alter raw data parameters + +.END LIT +.SK1 +.para +Overview of the methodology +.para +The shotgun sequencing strategy +.para + In the shotgun sequencing procedure +the sequence to be determined is randomly broken into fragments of +about +1000 nucleotides in length. These fragments are cloned and then +selected randomly and their + + sequences determined. The relationship between any pair of + + fragments is not known beforehand +but is found by comparing their sequences. + + If the sequence of one found to be wholly or partially contained + + within that of another for sufficient length to distinguish an + + overlap from a repeat then those two fragments can be joined. +The + + process of select, sequence and compare is continued until the +whole + + of the DNA to be sequenced is in one continuous well +determined + + piece. + +.para + Definition of a contig + +.para + A CONTIG is a set of gel readings that are related to one + another by overlap of their sequences. All gel readings belong to + a contig and each contig contains at least one gel + reading. The gel readings in a contig can be summed to produce +a continuous consensus sequence and the length of this sequence is +the length of the contig. The rules used to perform this summation are + given under "the consensus algorithm". + At any stage + of a sequencing project the data will comprise a number of +contigs; +when a project is + + complete there should be only one contig and its consensus will be + the finished sequence. Note that since being introduced and +defined as above the word "contig" has been taken up by those involved in +genomic mapping. In that context the consensus with a precise length is, +of course, not +defined. + +.SK1 +.LEFT MARGIN2 +Introduction to the computer method +.LEFT margin2 +.PARA +It is useful to consider the objectives of a sequencing project before +outlining how we use the computer to help achieve them. The aim of a +shotgun sequencing project is to +produce an accurate consensus sequence from many overlapping gel +readings. +It is necessary to know, particularly at the latter +stages of the project, how accurate the +consensus sequence is. This enables us to know which regions of the + sequence require further work and also to know when the project is +finished. +To show the quality of the consensus, the programs described here +produce displays like that shown below. +.sk1 +.lit + + 10 20 30 40 50 + -6 HINW.010 GCGACGGTCTCGGCACAAAGCCGCTGCGGCGCACCTACCCTTCTCTTATA + CONSENSUS GCGACGGTCTCGGCACAAAGCCGCTGCGGCGCACCTACCCTTCTCTTATA + + 60 70 80 90 100 + -6 HINW.010 CACAAGCGAGCGAGTGGGGCACGGTGACGTGGTCACGCCGCGGACACGTC + -3 HINW.007 GGCACA*GTC + CONSENSUS CACAAGCGAGCGAGTGGGGCACGGTGACGTGGTCACGCCG-G-ACA-GTC + + 110 120 130 140 150 + -6 HINW.010 GATTAGGAGACGAACTGGGGCG3CGCC*GCTGCTGTGGCAGCGACCGTCG + -3 HINW.007 GATTAG4AGACGAACTGGGGCGACGCCCG*TGCTGTGGCAGCGACCGTCG + -5 HINW.009 GGCAGCGACCGTCG + 17 HINW.999 AGCGACCGTCG + CONSENSUS GATTAGGAGACGAACTGGGGCGACGCC-G-TGCTGTGGCAGCGACCGTCG + + 160 170 180 190 200 + -6 HINW.010 TCT*GAGCAGTGTGGGCGCTG*CCGGGCTCGGAGGGCATGAAGTAGAGC* + -3 HINW.007 TCT*GAGCAGTGTGGGCGCTGC*CGGGCTCGGAGGGCATGAAGTAGAGC* + -5 HINW.009 TCT*GAGCAGTGTGGGCG*T*G*CGGGCTCGGAGGGCATGAAGTAGAGC* + 17 HINW.999 TCTCGAGCAGTGTGGGCGCTG**CGGGCTCGGAGGGCATGAAGTAGAGCG + 12 HINW.017 GTAGAGC* + CONSENSUS TCT*GAGCAGTGTGGGCGCTG-*CGGGCTCGGAGGGCATGAAGTAGAGC* +.END LIT +.para + This is an example showing the left end of a contig from + position 1 to 200. Overlapping this region are gel readings +numbered 6, 3, 5, 17 and 12; +6, 3 and 5 +are in reverse orientation to their original reading (denoted by a minus +sign). Each gel reading also has a name (eg HINW.010). It can be seen that +in a number of places the sequences contain characters other than A,C,G +and +T. Some of these extra characters have been used by the sequencer to +indicate regions of uncertainty in the initial interpretation of the gel +reading, but the asterisks (*) have been inserted by the automatic +assembly function in order to align the sequences. Underneath each 50 +character block of gel reading sequences is the consensus derived from +the +sequences aligned above (the line labelled CONSENSUS). For most of its +length the consensus has a definite nucleotide assignment but in a few +positions there is insufficient agreement between the gel readings and +so a dash (-) appears in the sequence. This display contains all the +evidence needed to assess the quality of the consensus: the number of +times +the sequence has been determined on each strand of the DNA, and the +individual nucleotide assignments given for each gel reading. +.para +So the aim is to produce the consensus sequence and, equally important, +a display of the experimental results from which it was derived. +.para +In order to achieve this the following operations need to be performed: +.left margin2 +1) Put individual gel readings into the computer. +This might involved the manual interpretation of autoradiographs +or the transfer and process of machine-readable files from fluorescent +sequencing machines. +.left margin2 +2) Check each gel reading to make sure it is not simply part of one of the +vectors used to clone the sequence. +.left margin2 +3) Check each gel reading to make sure that those fragments that span +the +ligation point used prior to sonication are not assembled as single +sequences. +.left margin2 +4) Compare all the remaining gel readings with one another to assemble +them +to produce the consensus sequence. +.left margin2 +5) Check the quality of the consensus and edit the sequences. +.left margin2 +6) When all the consensus is sufficiently well determined, produce a copy +of +it for processing by other analysis programs. +.para +It is very unlikely that this procedure will only be passed through once. +Usually steps 1 to 5 are cycled through repeatedly, with step 4 just +adding +new sequences to those already assembled. Generally step 6 is also used +in +order to analyse imperfect sequence to check if it is the one the project +intended to sequence, or to look for interesting features. Analysis of +the consensus, such as +searches for protein coding regions, +can also help to find errors in the sequence. The display of the +overlapping gel readings shown above can be used to indicate, not only +the +poorly determined regions, but also which clones should be resequenced +to +resolve ambiguities, or those which can usefully be extended or +sequenced +in the reverse direction, to cover +difficult regions. + +.PARA +The original +individual gel readings for a sequencing project are each stored in +separate files. As the gel readings are entered into the computer +(usually in batches, say 10 +from a film), the file names they are given are stored in +a further file, called a file of file names. Files of file names +enable gel readings to be processed in batches. +.para +For each sequencing project +we start a project database. This database has a structure specifically +designed for +dealing with shotgun sequence data. +In order to arrive at the final consensus sequence many operations will +be +performed on the sequence data. Individual fragments must be +sequenced and +compared in both senses (i.e. both orientations) with all the other +sequences. When an overlap between a new gel reading and a contig are +found +they must be aligned and the new gel reading added to the contig. If a +new +gel reading overlaps two contigs they must be aligned and joined. Before +the two contigs are joined one of them may need to be turned around +(reversed and complemented) so they are both in in the same orientation. +.para +Clearly, keeping track of all these manipulations is quite complicated, +and to be able to perform the operations +quickly requires careful choice of data +structure and algorithms. For these reasons it is not practicable to store +the gel readings aligned as shown in the display above. Rather, it is more +convenient to store the sequences unassembled, and to record sufficient +information for programs to assemble them during processing. The +data used to assemble the sequences is called relational information. +.left margin2 +.PARA + The database comprises five files and they are described under the +section entitled "open database". +.PARA +Before entry into the project database +each new gel reading must be compared to look for overlaps +with all the data already contained +within the database. This last point is +important: all searching for overlaps is between individual new gel +readings and the data already in the database. There is no searching for +overlaps between sequences within the database; overlaps must be found +before new gel readings are entered into the database. +.para +Below I give an introduction to how the sequences are processed by +being +passed from one function to the next. +.para +This program is used to start a +database for the project and +then the following procedure is used. +.para +Data in the form of individual gel readings are entered into the computer + +and stored in separate files (possibly using either the digitizer + +program GIP). Batches +of these gel readings +are passed to the screening functions in this program to search for overlaps + +with vector sequences (see VEP and "screen against vector") or for matches to + +restriction enzyme sites that should not be + +present ("screen against enzymes"). +Each run of these screening functions passes on only those gel + +readings that do not contain unwanted sequences. Sequences are passed + +via +files of file names and eventually are processed by the automatic +assembly function ("auto assemble"). This function compares each gel +reading with a consensus of all the previous gel readings +stored in the database. +If it finds any +overlaps + it aligns the overlapping sequences by inserting padding characters, +and then adds the new gel reading to the database. +Gels that overlap are added to existing contigs and gels that do not +overlap any data in the database start +new contigs. If a new gel overlaps two contigs they are joined. +Any gel readings that appear to overlap but which +cannot be aligned sufficiently well are not entered and have +their names written to a file of failed gel reading names. +.PARA +Generally data is entered +into the database in batches as just described. The program + is also used to examine + +the data in the database, to enter gel readings that the automatic + +assembly function cannot align ("auto assemble"), + + and to make final edits. Edits to whole contigs + +can be made using a + mouse-driven editor ("edit contig"). + +.PARA +Editing the sequences is obviously an essential part of managing a + +sequencing project. +Editing is required when new + +sequences are added, when contigs are joined, and when sequences are + +corrected. +A basic part of the strategy + +used here is that new + +gel readings should be correctly aligned throughout their whole length + +when +they are entered into the database, and that when contigs are joined they + +are edited so that they are well aligned in the region of overlap. + + Alignment can be achieved by + +adding padding characters to the sequences, and this is the way "auto + +assemble" +operates when adding new sequences to the database. + +.para +In order to search +for overlaps that may have been missed or may be hidden in the "unused data" +the function "find internal joins" can be used. + +.para +Generally the users need not concern themselves with how the relational +information is used by the program, but it is necessary to know +how contigs are identified. Because contigs are constantly being changed and +reordered the program identifies them by the numbers of the gel readings +they contain. Whenever users need to identify a contig they need only +know +the number or name of one of the gel readings it contains. Whenever the +program asks users to identify a contig or gel reading they can type its +number or its archive name. If they type its archive name they must precede +the name by a slash "/" symbol to denote that it is a name rather than a +number. E.g if the archive +name is fred.gel with number 99, users should +type /fred.gel or 99 when asked to identify the contig. Generally, + when it asks for the gel reading to be identified, +the program will offer the user a default name, + and if the user types only return, that +contig will be accessed. When a database is opened the default contig will +be the longest one, but if another is accessed, it will subsequently become +the current default. +.para +Further information is located in the following places. +The database files are described under "open database". The format +for +vector and consensus sequences is given under "calculate a consensus", as are +the +uncertainty codes used in gel readings. +.left margin2 +.para + The digitiser program +is used for the initial input of gel readings +and for writing a file of file names. The program +uses a digitizer for data entry. +A digitizer is + a two dimensional surface such as a light box +which is such that if a special pen is pressed onto it, the pens +coordinates are recorded by a computer. +These coordinates + can be interpreted by a program. +.para + In order to read an autoradiograph placed on the light box +the user need only define the bottom of +the four sequencing lanes and the bases + to which they correspond and then use the pen to point to each + successive band progressing up the gel. The program examines +the + coordinates of each pen position to see in which of the four +lanes + it lies and assigns the corresponding base to be stored in the + computer. Each time the pen tip is depressed to point to a position + on the surface of the digitizer the program sounds the bell on the + terminal to indicate to the user that a point has been recorded. As + the sequence is read the program displays it on the screen. + +.left margin1 +@17. TX 1 @Screen against enzymes +.left margin2 +.PARA +Used to compare gel readings against any restriction enzyme recognition + +sequences that may have been used during cloning and which should not + +be present in the data. Works on single gel readings or processes batches + +accessed through files of file names. The algorithm looks for exact + +matches to recognition sequences stored in a file. + +.para +The file containing the recognition sequences must be identified. The +user +must choose between employing a file of file names, or typing in the + + +names of individual gel reading files. If a file of file names is used the + + +program will also create a new file of file names. When the option has + +finished operating this new file will contain the names of all those gel + +readings that did not match any of the recognition sequences. Hence it + can +be used for further processing of the batch. The recognition sequences + +should be stored in a simple text file with one recognition sequence per + +line. +.left margin1 +@18. TX 1 @Screen against vector +.left margin2 +.PARA +Used to compare gel readings against any vector sequences that may have + +been picked up during cloning and which have not been removed by vep. +It Works on single gel readings or processes + +batches accessed through files of file names. The algorithm looks for +exact +matches of length "minimum match length" and displays the overlapping + +sequences. +.para +The file containing the vector sequence must be identified. The user must + +choose between employing a file of file names, or typing in the names of + +individual gel reading files. If a file of file names is used the program +will +also create a new file of file names. When the option has finished + +operating this new file will contain the names of all those gel readings + +that did not match the vector sequence. Hence it can be used for further + +processing of the batch. The vector sequence should be stored in a simple + +text file with up to 80 characters of data per line. More than one vector + +can be stored in a single file. If so each should be preceded by a 20 + +character title of the form <---m13mp8.0001----> where the < and > + signs +and the number like .0001 are obligatory. The number must be preceded + +by a dot (.) and be 4 digits long. The total sequence in the file must be < + +500,001 characters in length. + +.left margin1 +@20. TX 3 @Auto assemble +.left margin2 +.PARA +Compares gel readings against the current contents of the database and + +produces alignments. In its normal mode of operation +("entry permitted"), the function +will automatically enter the gel readings into the database. +.para +New assembly suboption. +However +if entry is not permitted the reads won't be entered but the program +will produce alignments and (optionally) save each reading name and its best +alignment score (percentage mismatch) in a file. When used in +this mode, the program will include in the alignment the poor quality data +for each reading. These files of names can then be sorted into score order +and then used for assembly, hence forcing the readings that align best to +be entered into the database first. +End of new suboption. +.para +The routine works on + +single gel readings or processes batches of gel readings accessed through + +files of file names. It is the only way to enter data into the database. + +.para +The function will check the database for logical consistency and will + only +proceed if it is OK. Choose if gel readings should be entered into the + +database, or if they should only be compared. Choose between using a file + +of file names or typing file names on the keyboard. If so selected, supply + +the file of file names. Also supply a file of file names to contain the names of + +all the gel readings that fail to get entered. +Select the entry mode. Normal assembly is appropriate for all but special +cases, as is "permit joins". Uses for the other modes are not documented +here. +Define a minimum initial + +match length. +Define the maximum number + +of padding characters allowed to be used in each gel reading to help + +achieve alignment, and the same for the number allowed in the contig for + +each gel reading. Finally define the maximum percentage mismatch to +be allowed for any gel reading to be entered into the database. If + +for any gel reading, either of these last three values is exceeded the gel + +reading will not be entered into the database. + +.para +In operation the function takes a batch of gel readings (probably passed + + on as a file of file names from one of the screening routines) and +enters them into a + database for a sequencing project. It takes each gel reading + in turn, + compares it with the current consensus for the database, it then + produces an alignment for any regions of the consensus it + overlaps; if this alignment is sufficiently good it then edits + both the new gel reading and the sequences it overlaps and adds +the + new gel reading to the database. The program then updates the +consensus + accordingly and carries on to the next gel reading. +.para + All alignments are displayed and any gel readings +that do match but that + + cannot be aligned sufficiently well have their names written to a + file of failed gel reading names. The function works without any + + user intervention and can process any number of gel readings in a + single run. Those gel readings that fail can be recompared using + + the same function (to find the current overlap position) and the + +user can enter them into the database + + using the "put all readings in new contigs" +assembly option and then joined using "join contigs". +.para +Typical dialogue and output from the function is shown below. (Note that +output for gel readings 2 - 9 has been deleted to save space). +.lit +Automatic sequence assembler +Database is logically consistent +? (y/n) (y) Permit entry +? (y/n) (y) Use file of file names +? File of gel reading names=demo.nam +? File for names of failures=demo.fail +Select entry mode +X 1 Perform normal shotgun assembly + 2 Put all sequences in one contig + 3 Put all sequences in new contigs +? Selection (1-3) (1) = +? (y/n) (y) Permit joins +? Minimum initial match (12-4097) (15) = +? Maximum pads per gel (0-25) (8) = +? Maximum pads per gel in contig (0-25) (8) = +? Maximum percent mismatch after alignment (0.00-15.00) (8.00) = + >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> + Processing 1 in batch + Gel reading name=HINW.004 + Gel reading length= 283 + Searching for overlaps + Strand 1 + Strand 2 + No matches found + Total matches found 1 + Padding in contig= 0 and in gel= 1 + Percentage mismatch after alignment = 1.8 + Best alignment found + 1 11 21 31 41 51 + TTTTCCAGCG TGCGTCTGAC GCTGTCTTGC TTAATGATCT CCATCGTGTG CCTAGGTCTG + ********** ********** ********** ********** ********** ********** + TTTTCCAGCG TGCGTCTGAC GCTGTCTTGC TTAATGATCT CCATCGTGTG CCTAGGTCTG + 1 11 21 31 41 51 + 61 71 81 91 101 111 + TTGCGTTGGG CCGAGCCCAA CTTTCCCAAA AACGTATGGA TCTTACTGAC GTACA-GTTG + ********** ********** ********** ********** ********** ***** **** + TTGCGTTGGG CCGAGCCCAA CTTTCCCAAA AACGTATGGA TCTTACTGAC GTACACGTTG + 61 71 81 91 101 111 + 121 131 141 151 161 171 + CTTACCAGCG TGGCTGTCAC GGCGTCAGGC TTCCACTTTA GTCATCGTTC AGTCATTTAT + ********** ********** ********** ********** ********** ********** + CTTACCAGCG TGGCTGTCAC GGCGTCAGGC TTCCACTTTA GTCATCGTTC AGTCATTTAT + 121 131 141 151 161 171 + 181 191 201 211 221 231 + GCCATGGTGG CCACAGTGAC G-TATTTTGT TTCCTCACGC TCGCTACGTA TCTGTTTGCC + ********** ********** * ******** ********** ********** ********** + GCCATGGTGG CCACAGTGAC GCTATTTTGT TTCCTCACGC TCGCTACGTA TCTGTTTGCC + 181 191 201 211 221 231 + 241 251 261 271 281 + CGCG--GTGG AATTACAGCG TTCCCTATTG ACGGGCGCAT CCAC + **** **** ********** ** * ***** ********** **** + CGCGACGTGG AATTACAGCG TT,CDTATTG ACGGGCGCAT CCAC + 241 251 261 271 281 + Batch finished + 9 sequences processed + 0 sequences entered into database + 0 joins made + +.end lit + +.para +Note that "auto assemble" cannot align protein sequences. +.left margin1 +@28. TX 1 @Highlight disagreements +.left margin2 +.para +Used in the latter stages of a project +to highlight disagreements between individual gel readings +and their consensus sequences. This display is also availbale in the +contig editor. +Characters that agree with the + +consensus are shown as : symbols for the plus strand and . for the minus + +strand. Characters that disagree with the consensus are left unchanged + +and so stand out clearly. The results of this analysis are written to a +file. + +.para +Before selecting this option create a file of the display of the contig to +be +"highlighted". The option will ask for the name of this file. Select + symbols +to denote "agreeing" characters on each strand, the defaults are : and ., + +but any others can be used. Supply the name of a file in which to put + +the output. +.para +The display file needed as input for this option is created by selecting + +"Redirect output", followed immediately by "display contig", and then +"Redirect output" again. The + +cutoff score used in the consensus calculation can be set by option "set + +display parameters". Note that for the highlight function +there is a limit of 50 for the number of gel +readings that are aligned at any position - ie the contig must be less +than 51 gel readings deep at its thickest point. I hope that those performing +shotgun sequencing never reach this limit, but those using the program for +comparing sequence families might. +.para +Typical output from this function is shown below. +.lit + + 210 220 230 240 250 + 1 HINW.004 :C::::::::::::::::::::::::::::::::::::::::::AC:::: + 7 HINW.018 :*::::::::::::::::::::::::::::::::::::::::::CA:::: + -4 HINW.017 ...............AC.... + G-TATTTTGTTTCCTCACGCTCGCTACGTATCTGTTTGCCCGCG--GTGG + + 260 270 280 290 300 + 1 HINW.004 ::::::::::::*:D::::::::::::::::::: + 7 HINW.018 ::::::::::::::::::::CA:::::T:*:::*::::::::::::CA: + -4 HINW.017 ..............................................A... + 3 HINW.009 :::::::::::::::V::::::::::::::::::::::::::::*AV::: + -6 HINW.028 ......................A... + AATTACAGCGTTCCCTATTGACGGGCGCATCCACGCTGATTCTCTT-CTG + +.end lit +.left margin1 +@32. TX 3 @Extract gel readings +.left margin2 +.para +Used to make copies of the aligned gel readings in a database, +to write them into separate files, and to write a + +corresponding file of file names. It operates in two modes: either all gel + +readings are extracted, or only those at the ends of contigs. + +.para +Choose which mode of operation is required and supply a file of file + +names. +.para +The gel readings are given their original + +names. +.para +If the option is used to extract all the gel readings from a database, a + +subsequent run of "auto assemble" can reconstitute a database which has + +been corrupted. This rarely occurs and is usually necessitated by a + +user employing "alter relationships" incorrectly without first having + +made a copy. +.left margin1 +@1. TX 0 @Help +.left margin2 +.PARA +Help is available on the following topics : + +.LEFT MARGIN1 +@2. TX 0 @Quit +.LEFT MARGIN2 +.PARA +This command stops the program and is the only safe way to terminate a + +run +of the program that has altered the contents of the database in any way. + +.left margin1 +@3. TX 1 @Open a database +.LEFT MARGIN2 +.PARA +Opens existing databases or allows new ones to be started. The function + is +automatically called into operation +when the program is started but can also be selected + +from the general menu. +.para +Choose to open an existing database or start a new one, or if ! is typed +when the program is first started, enter the program without opening a +database. Supply a project + +database name, and if it already exists, the "version". If starting a new + +database define the database size and if it is for DNA or protein sequences. +The database size is an initial size for the database. It can be increased +later during the project. It is the sum of the number of gel +readings plus the number of contigs. The current maximum size is 8000. +.para +Database names can have from one to 12 letters and must not include full + +stop (.). The database is made from five separate files. If the database + is +called FRED then version 0 of database FRED comprises files FRED.AR0, + +FRED.RL0, FRED.SQ0, FRED.TG0 and FRED.CC0. The version is the last symbol in the file names. + +Only this program + can read these files. If the "copy database" option is used it + +will ask the user to define a new "version". +.para +For normal use the maximum gel reading length is set to 512 characters, + +but when a database is started the user may choose lengths of either + + 512, +1024, 1536..., 4096. Normally the program is used to handle DNA + +sequences but many of the functions also work on protein sequences. The + +choice of sequence type is made when the database is started. + +.para +The contigs are not stored on the disk as the user sees them displayed on + +the screen. Each gel reading is stored with sufficient information about + +how it overlaps other gel readings so that the program can work out how + +to +present them aligned on the screen. We refer to this extra data as "the +relationships" and it is explained below. + +The database comprises 5 separate files. + +.left margin2 + 1. a working version of each gel reading. This is the version of + the gel reading +that is in the database and initially it is an exact copy of + the original sequence (known as the archive) + but it is edited and manipulated to align it + with other gel readings. + +.left margin2 + 2. the file of relationships. This file contains all of the + + information that is required to assemble the working versions +into + + contigs during processing; any manipulations on the data use this + + file and it is automatically updated at any time that the + + relationships are changed. The information in this file is as + + follows: +.left margin2 + (A) Facts about each gel reading and its relationship to +others +("gel + + descriptor lines"): + +.left margin2 + (a) the number of the gel +reading (each gel reading is given a number as it is + + entered into the database) + +.left margin2 + (b) the length of the sequence from this gel reading + +.left margin2 + (c) the position of the left end of this gel +reading relative to the left + + end of the contig of which it is a member + +.left margin2 + (d) the number of the next gel +reading to the left of this gel reading + +.left margin2 + (e) the number of the next gel reading to the right + +.left margin2 + (f) the relative strandedness of this gel +reading , ie whether it is in + + the same sense or the complementary sense as its archive. + +.left margin2 + (B) Facts about each contig ("contig descriptor lines"): + +.left margin2 + (a) the length of this contig + +.left margin2 + (b) the number of the leftmost gel +reading of this contig + +.left margin2 + (c) the number of the rightmost gel reading of this contig. + +.left margin2 + (C) General facts: + +.left margin2 + (a) the number of gel readings in the database + +.left margin2 + (b) the number of contigs in the database. + +.left margin2 + 3. the file of archive names. This is simply a list of the names + + of each of the archive files in the database. + +.left margin2 + 4. the file of tags (annotation). +This consists of linked lists of tag information for each sequences in the +database. +Tags are created by the user as annotation, or by xdap as records of edits or +for storing cutoff information. +As the number of tags can grow without limit, so can this file. +For each gel there is a header record, which contains the record number of +the start of the linked list for that gel. On line IDBSIZ there is a record +containing information about the file such as its present length and if there +are any free "tag" slots to be reused in the file. + + 5. the file of comments (annotation). +This consists of linked lists of comment fragments. +Comments are created by the user as a message attached to annotation, +or by the system to store cutoff information. +Comments are character strings of any length. +Comments longer than 40 characters are broken up into fragments, each 40 +characters long, and are chained together in a link list. +As the number of comments can grow without limit, so can this file. + +.para + Structure of the database files +.para + 1. The file of relationships +.para + The file contains IDBSIZ lines of data: + the general data are stored on line IDBSIZ; data about gel +readings are + stored from line 1 downwards; data about contigs are stored from + line IDBSIZ-1 upwards. A database of 500 lines containing 25 gel +readings and 4 contigs would have a file + of relationships as is shown below. +.lit + + + --------------------------------------------- + 0 Info about the database size + 1 Gel descriptor record + 2 " " " + 3 " " " + 4 " " " + 5 " " " + ' ' ' ' + ' ' ' ' + 25 " " " + 26 Empty record + ' ' ' + + ' ' ' + 495 ' ' + 496 Contig descriptor record + 497 " " " + 498 " " " + 499 " " " + 500 Number of gel readings=25, Number of contigs=4 + --------------------------------------------- + + The arrangement of the data in the file of relationships + +.end lit +As each new gel reading is added into the database a new line is added + to the end of the list of gel descriptor + lines. If this new gel reading does not + overlap with any gel readings + already in the database a new contig line is + added to the top of the list of contig lines. If it overlaps with + one contig then no new contig line need be added but if it overlaps + with two contigs then these two contigs must be joined and the + number of contig lines will be reduced by one. Then the list of +contig + lines is compressed to leave the empty line at the top of the list. + Initially the two types of line will move towards one another but + eventually, as contigs are joined, the contig descriptor lines will + move in the same direction as the gel descriptor + lines. At the end of a + project there should be only one contig line. The database is thus + capable of handling a project of 998 gels. +.para + 2. Structure of the working versions file +.para + The working versions of gel readings are stored in a file of + NGELS lines each containing MAXGEL characters. Gel reading +number 1 is stored on line + 1, gel reading number 2 on line 2 and so on. NGELS is the +current number of readings and MAXGEL the maximum reading length. +.para + 3. Structure of the archive names file +.para + This file has NGELS lines of 16 characters. + +.para + 4. Structure of the tag file +.para +This file initially starts with IDBSIZ lines, and is expanded as new tags are +created. +Information about the length of the file, and which tag records are reusable +is stored on line IDBSIZ. +A database of 500 lines would have a file of tags as shown below. +.lit + + --------------------------------------------- + 1 Tag descriptor record + 2 " " " + 3 " " " + 4 " " " + 5 " " " + ' ' ' ' + ' ' ' ' + 497 " " " + 498 " " " + 499 " " " + 500 Length of file=N, Free list=0 + 501 Tag record + 502 " " + 503 " " + ' ' ' + ' ' ' + N-2 " " + N-1 " " + N Tag record + --------------------------------------------- + + The arrangement of the data in the tag file + +.end lit +As each new tag is added to the database, a check is made in the +file descriptor record at line IDBSIZ. If the list of reusable records is 0, +the file is extended by one line. Otherwise the new tag is assigned to +record at the head of the freelist. +When tags are deleted, they are added to the free list in the file descriptor +record. +.para + 5. Structure of the comment file +.para +This file initially starts with 1 line, and is expanded as new annotation is +created. +Information about the length of the file, and which comment records are reusable +is stored on the first line. +.lit + + --------------------------------------------- + 1 Length of file=N, Free list=0 + 2 Comment fragment + 3 " " + 4 " " + ' ' ' + ' ' ' + N-2 " " + N-1 " " + N Comment fragment + --------------------------------------------- + + The arrangement of the data in the comment file + +.end lit +As each new comment is added to the database, a check is made in the file +descriptor record at line 1. If the list of reusable records is 0, +the file is extended to hold the new comment. Otherwise the new comments is +assigned to records starting with the head of the freelist. +When comments are deleted, the discarded records are added to the free list in +the file descriptor record. +.para + There are various checks within the programs to + protect users from themselves:- +.left margin2 + 1. All user input is checked for errors - e.g. reference to + non-existent gel +readings or contigs, incorrect positions in the + contig or gel readings. +.left margin2 + 2. Before entering a gel reading the system checks to see if a + file of the same name has already been entered. +.left margin2 + 3. Join will not allow the circularising of a contig. + +.left margin2 +5. Users may escape from any point in the program. +.left margin2 +6. Help is available from all points in the program. +.SK2 +.LEFT MARGIN2 +IT IS ESSENTIAL THAT USERS DO NOT KILL THE PROGRAM WHILE IT IS +DOING +ANYTHING THAT INVOLVES CHANGING THE CONTENTS OF THE +DATABASE. I.E DURING AUTO ASSEMBLE, +COMPLETE JOIN, COMPLEMENT CONTIG, SAVE EDIT CONTIG. + +This could +corrupt the database so badly that it is impossible to fix. The program +should always be left using the QUIT option. + +.left margin1 +@4. TX 3 @Edit contig +.LEFT MARGIN2 +.PARA +The Contig Editor is a mouse-driven editor that can insert, +delete and change gel reading sequences. +.para +The Contig Editor allows scrolling from one end of a contig to the other +using the scroll bar and scroll buttons. Action of mouse button presses +when the mouse pointer is in the scroll bar: +.sk1 +.lit + Middle Mouse Button Set editor position + Left Mouse Button Scroll forward one screenful + Right Mouse Button Scroll backwards one screenful +.end lit +.sk1 +The four scroll buttons operate as follows: +.sk1 +.lit + "<<" Scroll left half a screenful + "<" Scroll left one character + ">" Scroll right one character + ">>" Scroll right half a screenful +.end lit +.para +The Editor cursor can be positioned anywhere in the edit window by +moving the mouse pointer over the character of interest, then pressing the +left mouse button. The Editor cursor can also be moved by using the +direction arrow keys. +.para +The editor operates in two main edit modes - Replace and Insert. Replace allows +a character to be replaced by another. Insert allows characters to be +inserted into a gel reading sequence. Characters are entered by typing +them from the keyboard. Only valid characters are permitted. +Characters can be deleted by positioning the cursor one character to the right, +then pressing the delete key. +Normally Insert and Delete apply to the consensus line of the contig ONLY. +This restraint can be overridden by using the "Super Edit" mode of +operation, THOUGH IT IS NOT RECOMMENDED. +.para +Edits can also be performed on the consensus, though they are +restricted to insertion and deletion of padding characters ("*"). +These edits also have special meanings. +A deletion will delete ALL characters at the position to the left +of the cursor in the contig, and move the relative positions of all +sequences starting to the right of the cursor position left one +character. +An insertion will insert the character typed ("*") into ALL gel +reading sequences at the cursors position in the contig, and move the +relative positions of all sequences starting to the right of the cursor +position right one character. +.para +The effect of the last edit can be undone by pressing the "Undo" button +at the top of the editor window. +.para +The cursor will automatically be positioned at the next problem when the +"Find Next Problem" button is selected. The next problem is where the +consensus shows either an ambiguity ("-") or a pad ("*") character. +.para +The edits to the contig can be saved by pressing the "Leave Editor" +button and replying "Yes" to the prompt to "Save changes?". As no changes +are made to the working copy of your database til this point it +is possible to abort the editor if +the edit session ends up in an unsatisfactory state (ie if you've +stuffed it up!) +.left margin1 +.sk3 +Displaying Traces +.left margin2 +.para +The original data from which the gel reading sequences where derived can +be seen by double clicking (two quick clicks) with the middle mouse button +on the area of interest. The trace will be displayed with the point +clicked at the centre of the trace viewport. +.para +All traces that are displayed are maintained in one window, called the Trace +Manager. The Trace Manager will only display four traces maximum. When four +traces are already being managed and a new one is requested, the one at the top +of the Trace Manager is removed and the new one is added to the bottom. +Traces can be removed individually by using the "quit" button in the panel next +to the trace. +.left margin1 +.sk3 +Extending Reads Using Cutoff Information +.left margin2 +.para +Sequence data read in from Automated Fluorescent sequencing machines +trace files processed through the program ted +will have the discarded sequence (vector at start and poor read at +end) available to the contig editor. To display the cutoff +information, press the "Display Cutoff" button at the top of the +editor window. +The cutoff sequence appears in grey. This sequence can be incorporated +into the editable sequence, by moving the cutoff position. This is +done by positioning the cursor at the end of the gel sequence, and +using Meta-Left-Arrow and Meta-Right-Arrow to adjust the point of cutoff. +The Meta key is a diamond on the Sun keyboard. +.left margin1 +.sk3 +Pop-up menu +.left margin2 +.para +A pop-up menu is revealed by depressing the "Control" key on the keyboard +and at the same time pressing the left mouse button. The menu has the following +functions: +.lit + + Search + Highlight Disagreements + Save Contig + Create Tag + Edit Tag + Delete Tag + Select Oligo + +.end lit +.left margin2 +"Highlight Disaggreements" simply toggles between the normal display showing +the current base assignments and one in which only those assignments that +differ from the consensus are shown. + +.left margin2 +"Save Contig" is described above. +Searching and operations on tags are described below. +.left margin2 +.sk3 +Searching +.left margin2 +.para +Selecting "Search" brings up a +window which can remain present during normal editor operation. The +window allows the user to select the direction of search, the type of +search and a value to search on. The value is entered into the value +text window. Then pressing the "search" button +performs the search. If successful, the cursor is positioned and +centred accordingly. An audible tone indicates failure. Pressing the +"ok" button removes the search window. The search window is +automatically removed when the contig editor is exited. +.sk1 +There are seven different search modes: +.sk1 +1. Search by position +.sk1 +This positions the cursor at the numeric position specified in the +value text window. Eg a value of "1234" causes the cursor to be placed +at base number 1234 in the contig. Positioning withing a gel reading is +achieved by prefixing the number with the "@" character, eg "@123" +positions the cursor at base 123 of the sequence in which the cursor +lies. Relative positions can be specified by prefixing the number with +a plus or minus character. Eg "+1234" will advance the cursor 1234 +bases. If possible, the cursor is positioned within the same sequence. +The direction buttons have no effect on the operation of "search +by position". +.sk1 +2. Search by reading name +.sk1 +This positions the cursor at the left end of the gel reading specified +in the value text window. If the value is prefixed with a slash is is +assumed to be a gel reading name. Otherwise it is assumed to be a gel +reading number. Eg "123" positions the cursor at the left end of gel +reading number 123. "/a16a12.s1" positions at the start of reading +a16a12.s1. If the value was "/a16" the cursor is positioned at the +first reading which starts with "a16". The direction buttons have no +effect on the operation of "search by position". +.sk1 +3. Search by tag type. +.sk1 +This positions the cursor at the start of the next tag which has the +the same type as specified by the type value menu. To change the type, +select off the menu that pops up when the mouse is clicked on the +button labeled "Type:". The search can be performed either forwards +or backwards of the current cursor position. To find all tags, use +"search by annotation", with a null text value string. +.sk1 +4. Search by annotation. +.sk1 +This positions the cursor at the start of the next tag which has a +comment containing the string specified in the value text window. The +search performed is a regular expression search, and certain +characters have special meaning. Be careful when your value string +contains ".", "*", "[", "^" or "$". The search can be performed either +forwards or backwards from the current cursor position. +.sk1 +5. Search by sequence. +.sk1 +This positions the cursor at the start of the next piece of sequence +that matches the value specified in the text value window. The search +is for an exact match, which means the case of value string is +important. The search is performed on the gel readings themselves, +rather than the consensus sequence. The search can be performed either +forwards or backwards from the current cursor position. +.sk1 +6. Search by problem. +.sk1 +This positions the cursor at the next place in the consensus sequence +which is not an "A", "C", "G" or "T". The search can be performed +either forwards or backwards from the current cursor position. +.sk1 +7. Search by quality +.sk1 +This positions the cursor at the next place in the consensus sequence +where the consensus calculation for each strand disagrees. When only +sequences on one strand is present, the search will stop at every +base. The search can be performed either forwards or backwards from the +current cursor position. +.left margin1 +.sk3 +Annotation +.left margin2 +.para +Parts of a sequence can be annotated, to record the positions of primers used +for walking, or to mark sites, such as compressions that have caused problems +during sequencing. +The consensus sequence CANNOT be annotated. +.para +To annotate a piece of sequence first select the part of sequence +using the mouse buttons. Use the left mouse button to position the start of the +selection, and while this button is being held down, move the mouse to extend. +The selection can be extended further using the right mouse button. +.para +To create annotation, invoke the pop-up menu, and select the "Create Tag" +function. A small "tag editor" will appear which +allows you to select the type of the +annotation from a pull-down menu, and specify a comment if desired. +To select a new type pull down the Type menu, and select the entry desired. +To enter a comment, simply type into the text window in the tag editor. +The annotation is created when the "Leave" button on the tag editor, +and is displayed in the colour defined in the tag database file (TAGDB). +.para +To edit existing annotation, +position the cursor with the left mouse button +on the tag, and select the +"Edit Tag" +off the pop-up menu. +This invokes the tag editor, and changes to the type and comment of the +annotation can be made. The tag is updated when the "Leave" button is pressed. +.para +To delete an existing annotation, +position the cursor with the left mouse button +on the tag, and select the +"Delete Tag" +off the pop-up menu. +.left margin1 +.sk3 +NOTE: +.left margin2 +.para +As the Contig Editor is a very powerful tool, it is possible that the alignment +of the gel reading sequences has unexpectedly been disrupted. +This can easily happen to parts of the contig that lie to the right +of the screen if excessive use has been made of the "Super Edit" facility. +Until familiar with "Super Edit" it would benefit the sequencer to quickly +scan through the contig after editing to check that bad alignments have not +been created. +.sp +.left margin2 +Selecting Oligos +---------------- +.sk1 +.left margin2 +1. Open the oligo selection window, by selecting "Select Oligo" from +the contig editor popup menu. + +.left margin2 +2. Position the cursor to where you want the oligo to be chosen. While +the oligo selection window is visible, you will still have complete +control over positioning and editing within the contig editor. + +.left margin2 +3. Indicate the strand for which you require an oligo. This is done by +toggling the direction arrow ("----->" or "<------"), if necessary. + +.left margin2 +3. Press the "Find Oligos" button to find all suitable oligos (See +"Oligo selection" below.) Information for the closest oligo to the +cursor position is given in the output text window. In the contig +editor the position of the oligo is marked by a temporary tag on the +consensus. The window is recentered if the oligo is off the screen. +Selecting "Display Selection Information" will print a short report on +the numbers of oligos considered and rejected during oligo selection. + +.left margin2 +4. If this oligo is not suitable (it may have been previously chosen, +and found to be unsuitable by experimentation, say), the next closest +oligo can be viewed by pressing "Select Next". + +.left margin2 +5. Suitable templates are automatically identified for the currently +displayed oligo (See "Template selection" below.) By default, the +template is that closest to the oligo site. If the choice is not +suitable (it may be known to be a poor quality template, say) another +can be chosen from the "Choose Template for this Oligo" menu. +Templates that do not appear on the menu can be specified by selecting +"other". However, the template must be on the correct strand and be +upstream of the oligo. + +.left margin2 +6. A tag can be created for the current oligo by pressing the button +"Create a tag for this oligo". The annotation for this tag holds the +name of the template and the oligo primer sequence. There are fields +to allow the user to specify their own primer name ("serial#") and +comments ("flags") for this tag. An example of oligo tag annotation: +.lit + serial#= + template=a16a9.s1 + sequence=CGTTATGACCTATATTTTGTATG + flags= + +.end lit +.left margin2 +7. The oligo selection window is closed when "Create a tag for this +oligo" or "Quit" is selected. + + +.left margin2 +Oligo selection: +.left margin2 +---------------- + +.left margin2 +The oligo selection engine is the one used in the program OSP. It is +described in some detail in: + +.left margin2 + Hillier, L., and Green, P. (1991). "OSP: an oligonucleotide + selection program," PCR Methods and Applications, 1:124-128. + +.left margin2 +The parameters controlling the selection of oligos can be changed in +the "Oligo Selection Parameters" window. The weights controlling the +scoring of selected oligos can be changed in the "Oligo Selection +Weights" window. + +.left margin2 +By default, the oligos are selected from a window that extends 40 +bases either side of the cursor. The size and location of this window +relative to the cursor position can be changed in the "Parameters" +window. + +.left margin2 +In xbap oligos are ranked according to their proximity to the cursor +position, rather than by their scores. + + +.left margin2 +Template selection: +.left margin2 +------------------- + +.left margin2 +For simplicity, each reading is considered to represent a template. In +practise, many readings can be made of the same template. Suitable +templates that are identified are those that: +.lit + + 1. are in the appropriate sense, + 2. have 5' ends that start upstream of the oligo, +and 3. are sufficiently close to the oligo to be useful. + +.end lit +.left margin2 + +This last criterion relates to the insert size for the subclones used +for sequencing and the average reading length. A template is +considered useful if a full reading can be made from it, taking into +account both of these factors. The default insert size is 1000 bases, +and the default average reading length is 400 bases. These values can +be changed in the "Parameters" window. + +.left margin1 +@5. TX 1 @Display a contig +.LEFT MARGIN2 +.para +Used to show the aligned gel readings for any part of a contig. The + +number, name and strandedness of each gel reading is shown and the + +consensus is written below. +.para +If required identify the contig, and then the start and end points of the + +region to display. +.para +The display can be directed to a disk file using "direct output to disk". + +.para + Below is an example showing the left end of a contig from + position 1 to 200. Overlapping this region are gels 6,3,5,17and 12; +6, 3 and 5 +are in reverse orientation to their archives (denoted by a minus sign) + There are a few uncertainty codes and a few padding + characters in the working versions, but the consensus (shown +below + each page width) has a definite assignment for almost every +position. +.lit + + 10 20 30 40 50 + -6 HINW.010 GCGACGGTCTCGGCACAAAGCCGCTGCGGCGCACCTACCCTTCTCTTATA + CONSENSUS GCGACGGTCTCGGCACAAAGCCGCTGCGGCGCACCTACCCTTCTCTTATA + + 60 70 80 90 100 + -6 HINW.010 CACAAGCGAGCGAGTGGGGCACGGTGACGTGGTCACGCCGCGGACACGTC + -3 HINW.007 GGCACA*GTC + CONSENSUS CACAAGCGAGCGAGTGGGGCACGGTGACGTGGTCACGCCG-G-ACA-GTC + + 110 120 130 140 150 + -6 HINW.010 GATTAGGAGACGAACTGGGGCG3CGCC*GCTGCTGTGGCAGCGACCGTCG + -3 HINW.007 GATTAG4AGACGAACTGGGGCGACGCCCG*TGCTGTGGCAGCGACCGTCG + -5 HINW.009 GGCAGCGACCGTCG + 17 HINW.999 AGCGACCGTCG + CONSENSUS GATTAGGAGACGAACTGGGGCGACGCC-G-TGCTGTGGCAGCGACCGTCG + + 160 170 180 190 200 + -6 HINW.010 TCT*GAGCAGTGTGGGCGCTG*CCGGGCTCGGAGGGCATGAAGTAGAGC* + -3 HINW.007 TCT*GAGCAGTGTGGGCGCTGC*CGGGCTCGGAGGGCATGAAGTAGAGC* + -5 HINW.009 TCT*GAGCAGTGTGGGCG*T*G*CGGGCTCGGAGGGCATGAAGTAGAGC* + 17 HINW.999 TCTCGAGCAGTGTGGGCGCTG**CGGGCTCGGAGGGCATGAAGTAGAGCG + 12 HINW.017 GTAGAGC* + CONSENSUS TCT*GAGCAGTGTGGGCGCTG-*CGGGCTCGGAGGGCATGAAGTAGAGC* +.END LIT +.left margin1 +@6. TX 1 @List a text file +.LEFT MARGIN2 +.PARA +This option allows users to list text files on the screen. It can be used +to read a file containing notes, for checking files written to disk etc. The +user is asked to type the name of the file to list. +.left margin1 +@8. TX 1 @Calculate a consensus +.LEFT MARGIN2 +.para + Calculates a consensus sequence either for the whole database or + +for selected contigs. The consensus is written to a file named by the + user. +.left margin2 +Supply a file name, choose between whole database or selected contigs. +.para + Symbols for uncertainty in gel readings +.para +In order to record uncertainties when reading gels the codes shown + +below can be used. Use of these codes permits us to extract the + +maximum amount of data from each gel and yet record any doubts by + +choice of code. The program can deal with all of these codes and any + +other characters in a sequence are treated as dash (-) characters. + + +.lit + + SYMBOL MEANING + + 1 PROBABLY C + 2 " T + 3 " A + 4 " G + D " C POSSIBLY CC + V " T " TT + B " A " AA + H " G " GG + K " C " C- + L " T " T- + M " A " A- + N " G " G- + R A OR G + Y C OR T + 5 A OR C + 6 G OR T + 7 A OR T + 8 G OR C + - A OR G OR C OR T + a A + c C + g G + t T + * padding character placed by auto assembler + else = - + +.end lit + +.LEFT MARGIN2 + The DNA consensus algorithm +.para +The "calculate consensus" function, the "display contig" routine and the + +"show quality" option use the rules outlined here to calculate a + +consensus from aligned gel readings. Note that "display contig" +calculates +a consensus for each page width it displays (it does not use the + +consensus sequence file calculated by the consensus function). + +.LEFT MARGIN2 +.para +We have 6 possible symbols in the consensus sequence: A,C,G,T,* and -. The +last symbols is assigned if none of the others makes up a sufficient +proportion of the aligned characters at any position in the contig. The +following calculation is used to decide which symbol to place in the +consensus at each position. +.para +Each uncertainty code contributes a score +to one of A,C,G,T,* and also to the total at each point. Symbols like R +and Y which don't correspond to a single base type contribute only to the +total at each point. The scores are shown below. +.lit + definite assignments ie A,C,G,T,B,D,H,V,K,L,M,N,a,c,g,t,* =1 + + probable assignments ie 1,2,3,4 = 0.75 + + other uncertainty codes including R,Y,5,6,7,8,- = 0.1 +.end lit +.para +A cutoff score of 51% to 100% is supplied by the user. (When the program +starts this is set to 75%. See "set display parameters"). +At each position in the contig we calculate the total score for each of +the 5 symbols +A,C,G,T and * (denote these by Xi, where i=A,C,G,T or *), +and also the sum of these totals +(denote this by S). Then if 100 Xi / S > the cutoff for any i, symbol i is +placed in the consensus; otherwise - is assigned. +.para +Notice that S does not equal the number of times the sequence has been +determined, but is the score total, and hence we are less likely to put a - +in the consensus. For the "examine quality" algorithm each strand is +treated separately but the calculation is the same. (It was originally +different). +.para +Format of the consensus sequence ( and vector sequences). +.para +A consensus sequence file may contain the consensus for several contigs + +and so we identify each of them by preceding them by a 20 character + +title. The title is of the form <---LAMBDA.0076----> ( where LAMBDA is + +the project name and gel reading number + + + 76 is the leftmost gel +reading to contribute to this consensus sequence). + + + The angle brackets <> and the 4 digit number precede by a . + +are important to some processing programs. +.left margin1 +@25. TX 1 @Show relationships +.LEFT MARGIN2 +.para + Used to show the relationships of the gel readings in the database in + +three ways - +.LEFT MARGIN2 + (a) All contig descriptor lines followed by all gel descriptor + lines. +.LEFT MARGIN2 + (b) All contigs one after the other sorted, i.e. for each + contig show its contig descriptor line followed by all its + gel descriptor lines sorted on position from left to right +.LEFT MARGIN2 + (c) Selected contigs: show the contig line and, in order, + those gel readings that cover a user-defined region. + Note that this output can be directed to a disk file by + prior selection of "redirect output". +.LEFT MARGIN2 +.para + Below is an example showing a contig from position + 1 to 689. The left gel reading is number 6 and has archive +name HINW.010, the +rightmost gel reading is number 2 and is has archive name HINW.004. +On each gel descriptor line is shown: + the name of the archive version, the gel number, the position of the + left end of the gel reading relative to the left end of the contig, the + length of the gel +reading (if this is negative it means that the gel reading is in + the opposite orientation to its archive), the number of the gel +reading to + the left and the number of the gel reading to the right. +.lit + + + CONTIG LINES + CONTIG LINE LENGTH ENDS + LEFT RIGHT + 48 689 6 2 + GEL LINES + NAME NUMBER POSITION LENGTH NEIGHBOURS + LEFT RIGHT + HINW.010 6 1 -279 0 3 + HINW.007 3 91 -265 6 5 + HINW.009 5 137 -299 3 17 + HINW.999 17 140 273 5 12 + HINW.017 12 193 265 17 18 + HINW.031 18 385 -245 12 2 + HINW.004 2 401 -289 18 0 + +.end lit +.left margin1 +@23. TX 3 @Complement a contig +.LEFT MARGIN2 +.PARA + This function will complement and reverse all of the gel +readings in a + contig. It automatically reverses and complements each gel + reading sequence, reorders left and right neighbours, recalculates +relative + positions and changes each strandedness. +.PARA + The only user input required is to identify the contig to + complement by the number or name of a gel reading it contains. +DO NOT KILL THE +PROGRAM DURING THIS STEP! +.left margin1 +@22. TX 3 @ Join contigs +.LEFT MARGIN2 +.PARA +This function joins contigs interactively using a mouse driven editor. +The operation of this editor is very similar to the Contig Editor +described in "Edit". + +.para +It allows the +user to align the ends of the two contigs by editing each +contig separately. It is important that the alignment achieved is +correct because once the join is completed the alignment is fixed. +The program needs to know which two contigs to join. +.para +First specify which two contigs are to be joined. +The user should identify the two +contigs. +The program checks that the two contig numbers are different (it will not +allow circles to be formed!) +.para +The Join Editor consists of two Contig Editors in between which is sandwiched +a disagreement box. This disagreement box shows exclamation marks to +denote mismatches between the two consensuses. +.para +For example, the display will look something like this: +.lit + + 1460 1470 1480 1490 1500 + 56 HINW.100 TCT*GAGCAGTGTGGGCGCTG*CCGG + 33 HINW.300 TCT*GAGCAGTGTGGGCGCTGC*CGGGCTCGGAGGG + -25 HINW.090 TCT*GAGCAGTGTGGGCG*T*G*CGGGCTCGGAGGG + 19 HINW.123 TCTCGAGCAGTGTGGGCGCTG**CGGGCTCGGAGGGCATGAAGTAGAGCG + CONSENSUS TCTCGAGCAGTGTGGGCGCTG-CCGGGCTCGGAGGGCATGAAGTAGAGCG + MISMATCH ! !!!!!! + 10 20 30 40 50 + -6 HINW.010 TCTCGAGCAGTGTGGGCGCTGCCCGGGCTCGGAGGGCATGAAGTTAGAGC + -3 HINW.007 TGGGCGCTGCCCGGGCTCGGAGGGCATGAAGT*AGAGC + -5 HINW.009 GCTCGGAGGGCATGAAGT*AGAGC + CONSENSUS TCTCGAGCAGTGTGGGCGCTGCCCGGGCTCGGAGGGCATGAAGTTAGAGC + +.END LIT +.para +The overlap must be of at least one character. +Use the scroll bar and the scroll buttons (`<<',`<',`>',and`>>') +for positioning the relative positions of the two contigs. +.para +The join position can be fixed in position +by pressing the `lock' button at the top of the Join Editor. +Locking allows the two contigs to be scrolled as one when using the scroll bar +and buttons, the left ends always in the same position relative to each +other. +.para +Once locked, it is best to proceed to the right along the contigs, inserting +padding characters (`*') into the consensuses to minimise the +disagreements. +.para +It is essential that the user aligns the two contigs throughout the whole +region of overlap before completing the join because it is only at this +stage that the two contigs can be edited independently. Once the join is +completed the alignment can only be altered using the routines supplied +by "alter relationships". +.para +The join can be completed by pressing the `Leave Editor' button. The +percentage mismatch is displayed, and the user is required to confirm that +they want to perform the join. +.left margin1 +@24. TX 1 @ Copy the database +.LEFT MARGIN2 +.PARA +Used to make a copy of the database. If required the database size can be + +altered using this option. The "version" of a database is encoded as the + +last letter in the names of the five files that contain the database. + +.para +Supply a "version" number (the default is version 1), and if required + +select a new size for the database. The size of a database is the number + of +lines of information it can hold. It needs a line for each gel reading and + +another for each contig. +.left margin1 +@19. TX 1 @ Check database +.LEFT MARGIN2 +.para +Used to perform a check on the logical consistency of the + database. No user intervention is required. If selected "with +dialogue" the program also checks for any sections of the consensus that +contain 15 dashes in 20 characters. +.para + The following relationships are checked: +.LEFT MARGIN2 + 1. If gel reading A thinks gel reading B is its left + neighbour + +does B think A is + its right neighbour? + The error message is +.left margin2 +"Hand holding problem for gel reading A" +.left margin2 +followed by the + gel descriptor lines for gel readings A and B. +.LEFT MARGIN2 + 2. Are there any contig lines with no left or right +end gel readings? + The error message is +.left margin2 +"Bad contig line number A" +.LEFT MARGIN2 + 3. Do the gel readings that are described as left ends on +contig + lines agree that they are left ends? + The error message is +.left margin2 +"The end gel readings of contig A have outward neighbours" +.LEFT MARGIN2 + 4. Are there gel readings that are in more than one contig? + The error message is +.left margin2 +" Gel number A is used N times" +.LEFT MARGIN2 + 5. Are there gel readings that are not in any contig? + The error message is +.left margin2 +" Gel number A is not used" +.LEFT MARGIN2 + 6. Do the relative positions of gel readings agree with +their + position as defined by left and right neighbourliness? + The error message is +.left margin2 +" Gel number A with position X is left neighbour of gel number B with +position Y" +.LEFT MARGIN2 + 7. Are there any loops in contigs? If so no further + checking is done. + The error message is +.left margin2 +" Loop in contig n no further checking done, but gel reading numbers follow" +.left margin2 + The + program then prints the gel reading numbers in the looped +contig up +to + the start of the loop. +.LEFT MARGIN2 +8. Are there any contigs of length <1? The error message is +.left margin2 +" The contig on line +number x has zero length" +.LEFT MARGIN2 +9. Are there any gel readings (used in only one contig) that have zero + +length? The error +message is +.left margin2 +" Gel number N has zero length" +.left margin2 +Note that "auto assemble" also uses this logical consistency check and + will +only tolerate a "Gel number N + is not used" error. Any other error will cause it to + +give up. + +.left margin1 +@29. TX 1 @ Examine quality +.LEFT MARGIN2 +.para +Analyses the quality of the data in a contig. It reports on the proportion + +of the consensus that is "well determined" and will display a sequence of + +symbols that indicate the quality of the consensus at each position. + +.para +Identify the contig to analyse, and the section of interest. The current + +consensus calculation cutoff score will be used to decide if each position +is +"well determined". In general the quality of a reading deteriorates along +the length of the gel and so it is also possible to use a length cutoff for +the quality calculation. Only the data from the first section of each reading +will be included in the quality calculation. The length is altered under +"set parameters" and is initially set to the maximum reading length. +A summary showing the percentage of the consensus +that falls into each category of quality is shown. Choose whether or not to +have the quality codes for each position of the consensus displayed. +They can be displayed as either graphics or text. +.para +The quality of the data depends on the number of times it has been + +sequenced and the particular uncertainty codes used in each gel + +reading. This function divides the data into five categories, assigning + +each +a symbol or code: +.LEFT MARGIN2 + 1. Well determined on both strands and they agree. code=0 +.LEFT MARGIN2 + 2. Well determined on the plus strand only. code=1 +.LEFT MARGIN2 + 3. Well determined on the minus strand only. code=2 +.LEFT MARGIN2 + 4. Not well determined on either strand. code=3 +.LEFT MARGIN2 + 5. Well determined on both strands but they disagree. code=4 +.LEFT MARGIN2 + A position is "well determined" if it is assigned one of the symbols +A,C,G,T when the algorithm described in the section "calculate a +consensus". +The calculation is performed +separately for each strand. +.para +If the user chooses to have the data displayed graphically the following +scheme is used. A rectangular box is drawn so that the x coordinate +represents the length of the contig. The box is notionally +divided vertically into +5 possible levels which are given the y values: -2,-1,0,1,2. +The quality codes attributed to each base position are plotted as +rectangles. +Each rectangle represents a region in +which the quality codes are identical, so a single base having a different +code from its immediate neighbours will appear as a very narrow rectangle. +.lit + + Rectangle bottom and top y values + + Quality 0 rectangle from 0 to 0 + Quality 1 rectangle from 0 to 1 + Quality 2 rectangle from 0 to -1 + Quality 3 rectangle from -1 to 1 + Quality 4 rectangle from -2 to 2 +.end lit +.para +Obviously a single line at the midheight shows a perfect sequence. +.para +Typical dialogue is shown below. +.lit + + 41.47% OK on both strands and they agree(0) + 55.48% OK on plus strand only(1) + 2.08% OK on minus strand only(2) + 0.97% Bad on both strands(3) + 0.00% OK on both strands but they disagree(4) + ? (y/n) (y) Show sequence of codes + + 10 20 30 40 50 + 1111111111 1111111111 1111111111 1111111111 1111111111 + + 60 70 80 90 100 + 1111111111 1111111111 1111111111 3111111111 1111111111 + + 110 120 130 140 150 + 1111111111 1111131111 1111111111 1111111111 1111111111 + + 160 170 180 190 200 + 1111111111 1111111111 1111111111 1111111111 1111111133 + + 210 220 230 240 250 + 1311111111 1111111111 1111111110 0000000000 0000220000 + + 260 270 280 290 300 + 0000000000 0020000000 2200000202 0002000000 0000222200 + +.end lit +.left margin1 +@26. TX 3 @ Alter relationships +.LEFT MARGIN2 +.para +Used to make what are normally illegal changes to the database. That is + +the normal checks are not done and any item in the database can be +changed independently of all others. Users need to know what they are + +doing because it is very easy to make a horrible mess. Always start by + +making a copy! +.para +By using the options here users can +move one section of a contig relative to another, break contigs, remove +contigs, remove gel readings, etc. To give flexibility most + of the commands do only one thing. This means that several commands +may +have to be executed to complete any change. +.para +The following options are offered: +.lit + + Cancel + Line change + Check logical consistency + Remove contig + Shift + Move gel reading + Rename gel reading + Break a contig + Remove a gel reading + Alter raw data parameters + +.end lit +.left margin2 +1. QUIT returns to the main options of BAP. +.left margin2 + +3. Line change +.left margin2 + allows the user to change the contents of any line in the + +file of relationships. The line is selected by number, the + program prints the current line and prompts for the new line. + +.left margin2 +4. Check logical consistency +.left margin2 +5. Remove a contig +.left margin2 +This function removes a contig and all its gel readings. The user specifies +any reading in the contig. +.left margin2 +6. Shift +.left margin2 + allows the user to change all the relative positions of a + set of neighbouring gel +readings by some fixed value, i.e. it will + shift related gel readings + either left or right. It can therefore + be used to change the alignment of the gel +readings in a contig. +It prompts for the number of the first gel +reading to + shift and then for the distance to move them (Note a + negative value will move the gel readings + left and a positive value + right). It then chains rightwards (ie follows right + neighbours) and shifts each gel +reading, in turn, up to the end + of the contig. (This means that only those gel readings + from the first + to shift to the rightmost are moved). It updates the length of + the contig accordingly. + +.left margin2 +7. Move gel reading +.left margin2 + is a function to renumber a gel reading. It moves all the information + about a gel +reading on to another line. The user must specify the +number + of the gel reading +to move and the number of the line to place it. It + takes care of all the relationships. Of course gel +readings must not be + moved to lines occupied by other gel +readings! + +.left margin2 +8. Rename gel reading +.left margin2 + is a function that is used to rename the archive names of + gel +readings in the database; it only changes the name in the .ARN + file of the database. + +.sk1 +.LEFT MARGIN2 +9. Break contig +.LEFT MARGIN2 +.PARA +Occasionally it is necessary to break a contig into two parts and this can be +achieved using this option. The program needs only the number of a gel +reading. This is the gel reading that will become a left end after the +break. That +is, the break is made between this gel +reading and its left neighbour. A new contig +line is created so ensure that there is sufficient space in the database. +.left margin2 +10. Removing gel readings from contigs +.left margin2 +.PARA +Gel +readings can be removed from contigs. If they are essential for holding the +contig together (ie are the only gel reading covering a particular region), +the program will create a new contig. +.sk1 +.LEFT MARGIN2 +11. Alter raw data parameters +.LEFT MARGIN2 +.PARA +Allows the user to edit the individual raw data parameters, such as +the left and right cutoff lengths and the name of the machine readable trace +file. +The user must specify the gel line to modify, and provide new values for +the length of the raw sequence including cutoff lengths, the left cutoff position, the length of the original working sequence, the machine type, and the name +of the raw data file, where these values change. +.left margin1 +@27. TX 1 @ Set display parameters +.LEFT MARGIN2 +.para +Used to redefine the parameters that control the cutoff employed by the + +consensus calculation and quality examiner, the maximum length of each +reading to include in the quality calculation, the line length used by + +the display function, the text window length used by the graphics +options, and the graphics window length used by the graphics options. +.para +The default cutoff score is 75%. The default line length is 50 characters. +For protein sequences the cutoff is always 100%. +.para +The text window used by the graphics options controls the amount of +sequence listed at the crosshair position. The graphics window controls the +"zoom" function. Both these windows are defined as the number of bases that +should be shown, to both left and right of the crosshair. +.left margin1 +@30. TX 3 @ Shuffle pads +.left margin2 +.para +One weakness of the alignment strategy used is that padding +characters are not always aligned by the assembly routine. This function +attempts to align padding characters using a very simply strategy. It +does not solve all pad alignment problems but is a useful first step during +cleaning-up operations. +.LEFT MARGIN1 +@10. TX 2 @Clear graphics +.LEFT MARGIN2 +.para + Clears graphics from the screen. +.left margin1 +@11. TX 2 @Clear text +.LEFT MARGIN1 +.para + Clears text from the screen. +.left margin1 +@12. TX 2 @Draw a ruler. +.LEFT MARGIN2 +.para +This option +allows the user to draw a ruler or scale along the x axis of the screen to +help identify the coordinates of points of interest. The user can define +the position of the first base to be marked (for example if the active +region is 1501 to 8000, the user might wish to mark every 1000th base +starting at either 1501 or 2000 - it depends if the user wishes to treat +the active region as an independent unit with its own numbering starting +at +its left edge, or as part of the whole sequence). The user can also define +the separation of the ticks on the scale and their height. If required the +labelling routine can be used to add numbers to the ticks. +.left margin1 +@14. TX 2 @Reposition plots +.LEFT MARGIN2 +.para +The positions of each of the plots is defined relative to a users drawing +board which has size 1-10,000 in x and 1-10,000 in y. +Plots for +each option are drawn in a window defined by x0,y0 and xlength,ylength. +Where x0,y0 is the position of the bottom left hand corner of the window, + and xlength is the width of the window and ylength the +height of the window. +.lit + --------------------------------------------------------- 10,000 + 1 1 + 1 -------------------------------------- ^ 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 1 1 ylength 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 -------------------------------------- v 1 + 1 x0,y0^ 1 + 1 <---------------xlength--------------> 1 + --------------------------------------------------------- 1 + 1 10,000 + +.end lit +All values are in drawing board units (i.e. 1-10,000, 1-10,000). +The default window positions are read from a file "ANALMARG" when the +program is started. Users can have their own file if required. +As all the plots start +at the same position in x and have the same width, x0 and xlength are the +same for all options. Generally users will only want to change the start +level of the window y0 and its height ylength. + This option +allows users to change window positions whilst running the program. +The routine prompts first for the number of the option that the users +wishes +to reposition; then for the y start and height; then for the x start and +length. Note that changes to the x values affect all options. If the user +types only carriage return for any value it will remain unchanged. +Note that, unlike all the other programs, the boxes used to contain +analytical results (eg plot quality) should not be made to overlap one +another, as the function of the crosshair routine depends on which box the +crosshair is in! +.LEFT MARGIN1 +@15. TX 2 @Label a diagram +.LEFT MARGIN2 +.para +This routine allows users to label any diagrams they have produced. They +are asked to type in a label. When the user types carriage return to finish +typing the label the cross-hair appears on the screen. The user can +position it anywhere on the screen. If the user types R (for right justify) +the label will be +written on the diagram with its right end at the cross-hair position. +If the user types L (for left justify) the label will be written on the +diagram with its left end at the cross hair position. +The +cross-hair will then immediately reappear. The user may put the same +label +on another part of the diagram as before or if he hits the space bar he +will be asked if he wishes to type in another label. +.para +Typical dialogue follows. +.lit +? Menu or option number=15 +Type label then drive cross hair to left or right end +of label position then hit "L" to write label left +justified or "R" to write label right justified or +the space bar to quit + + +? Label=delta gene + + missing graphics + +? Label= + +.end lit +.left margin1 +@16. TX 2 @Display a map +.LEFT MARGIN2 +.para +This is disabled! +.left margin1 +@7. TX 1 @Redirect output +.LEFT MARGIN2 +.para +Used to direct output that would normally appear on the screen to a file and +to create postscript output. +.para +Select redirection of either text or graphics, and +supply the name of the file that the output should be written to. +.para + The results from the next options selected will not appear on the screen +but will be written to the file. When option 7 is selected again +the file will be +closed and output will again appear on the screen. +.left margin1 +@13. TX 2 @Use crosshair +.left margin2 +.para +This option puts a steerable cross on the screen which the user +drives around +by using the arrow keys (or mouse). When the crosshair is +visible a number of options are available if the user types one of a +set of special keyboard characters. Any other characters will cause +an exit from the crosshair option. The special keys are: +.lit + + I = Identify the nearest gel reading + Z = Zoom in + Q = plot Quality + S = display the aligned Sequences at the crosshair position + N = list the Names and Numbers of the sequences at the crosshair +.end lit +.para +In order for any of these special keys to operate, the crosshair +must be in an appropriate display box, and the precise function of +the keys will also depend on which box the crosshair is in. +.para + If the +crosshair is in the "plot all contigs" box, Z will cause a new box to +appear showing all the readings for the nearest contig; Q will give +the same as Z but will also produce an extra box showing the +"quality" plot. +.para + If Z is hit in the "plot single contig" box, the contig will be zoomed +to the current graphics window size. The zoom will be roughly +centred on the crosshair position. Because of this it is possible to +step along a contig by repeatedly zooming with the crosshair near +to one end of the single contig display box. If I is hit the crosshair +must be close to a gel reading line. If Q is hit, the quality plot will +be produced for the region shown in the plot single contig box. In +all cases when the "plot all contigs" box is shown, a vertical line will +bisect the line the represents the relevant contig, at the current +position. +.para +If the crosshair is in the plot quality box only the character "s" will operate +as a special symbol. +.para +The number of bases shown in the N and S options is controlled by +the current graphics text window size, and the size of the zoom +window by the current graphics window size. Both are set by the +parameter setting function of the general menu. +.left margin1 +@33. TX 2 @Plot single contig +.left margin2 +.para +This option produces a schematic of a selected region of a single +contig by drawing a horizontal line to represent each of its gel +readings. The lines show the relative positions of each reading and +also their sense. The plot is divided vertically into two sections by +a line that is identified by an asterisk drawn at each end. All lines +that lie above this line represent readings that are in their original +sense, all lines below show readings that are in the +complementary sense to their original. By use of the crosshair +function the plot can be stepped through and examined in more +detail. See help on crosshair. +.left margin1 +@34. TX 2 @Plot all contigs +.left margin2 +.para +This option produces a schematic of all the contigs in a database. It +does this by drawing a horizontal line to represent each of them. +In order to show the ends of each contig it draws the lines for +contigs at alternate heights: the first at height one, the +second at height two, the third at height one, etc. The order of the +contigs in the display is the same as their order in the database. +By use of the crosshair function the plot can be stepped +through and examined in more detail. See help on crosshair. +.left margin1 +@31. TX 3 @ Disassemble readings +.left margin2 +.para +This function is used to remove a list of readings from a database, or +to create a new contig from a single reading moved from an existing contig. +This latter mode is useful for repositioning a reading in a repeat: +once separated it can be placed in the join editor and scrolled by the +other copies. +Removal of sets of readings works in two modes: +1. A set of adjacent readings in a +contig can be removed by the user naming the two end ones; or 2. A batch +of readings from any number of contigs can be defined by the user naming +a file containing a list of reading names. The program cleans up the +database by moving data to fill up any holes made in the files. +.para +For both modes of operation the program will ask for a file of file names. +If users create their own file (ie mode 2) each reading NAME must be on +a separate line. For mode 1 the user types the NAMES of the leftmost +and rightmost readings to be removed. They and all intervening readings +will be removed. Note that the routine operates on reading names - not +numbers. For both modes, if necessary, new contigs will be created. +.left margin1 +@35. TX 1 3 @Find internal joins +.left margin2 +.para +The purpose of this function is to use data already in the database to +find possible joins between contigs. +Joins may have been missed due to poor data or may have not been made +due to repeated sequences. Where appropriate, it may be +possible to find potential +joins by using the "unused data" derived from sequencing machines. +.left margin2 +For all overlaps found when the X version is used, + the contig editor (in join mode) will be +called up with the two contigs aligned. +.left margin2 +The database is checked for logical consistency. Supply a minimum initial +match length, a minimum alignment block, the maximum pads per sequence, +the maximum percent mismatch after alignment, the probe length. Choose +if clipped data is to be used, if so define the window size for finding good +data and the number of dashes allowed in the window. Processing will commence. +Most of these values are used in an identical way in the autoassemble +function. The others are defined below. +.left margin2 +The program strategy +.left margin2 +Take the first contig and calculate its consensus. If clipped data is being +used examine all readings that +are in the complementary orientation, and sufficiently near to the contigs left +end, to see if they have good clipped sequence which if present, would +protrude +from the left end of the contig. If found add the longest such sequence to the +left end of the consensus. Do the same for the right end by examining +readings that are in their +original orientation. If any are found add the longest extension to the +right end of +the consensus. Repeat the consensus calculations and extensions +for all contigs hence producing an extended consensus. If clipped data is not +being used simply calculate the consensus for the whole database. Now +look for possible joins by processing the extended consensus in the following +way. Take the last, say 100, bases (termed the "probe length" by the program) +of the rightmost consensus, compare it both +orientations with the extended consensus of all the other contigs. Display +any sufficiently good alignments. Repeat with the left end of the rightmost +contig. Do the same for the ends of all the entended contigs, always only +comparing with the contigs to their left, so that the same matches do not +appear twice. +.left margin2 +Good cliped data is defined by sliding a window of "Window size for good data +scan" bases outwards +along the sequence and stopping when "Maximum number of dashes in scan window" + or more dashes appear in the window. +Note that +it is advisable to have some sort of cutoff because if we simply take all the +data it might be so full of rubbish that we wont find any good matches. For +the same reason it is worth trying the procedure with different cutoffs. An +initial run using no clipped data is also recommended. +Sufficiently good +alignments are defined by criteria equivalent to those used in autoassemble, +however here we only display alignments that pass all tests. +.left margin2 +Bugs +.left margin2 +If a small contig is wholly contained within a larger one, such that its +ends are further than ("Probe length" - "Minimum initial match length") +from the ends of the larger contig, and the consensus for the small +contig lies to the left +of the consensus for large contig, the overlap will not be discovered. (See +the search stratgey). +.left margin2 + All numbering is +relative to base number one in the contig: matches to the left (i.e. in +the clipped data) have negative +positions, matches off the right end of the contig (i.e. in the clipped +data) have positions +greater than that of the contig length. +The convention for reporting the positions of overlaps is as follows: if neither +contig needs to be complemented the positions are as shown. If the program says +"contig x in the - sense" then the positions shown assume contig x has been +complemented. For example in the results given below the positions for the +first overlap are as reported, but those for the second assume that the contig +in the minus sense (i.e. 443) has been complemented. +.lit + + + Possible join between contig 445 in the + sense and contig 405 + Percentage mismatch after alignment = 4.9 + 412 422 432 442 452 462 + 405 TTTCCCGACT GGAAAGCGGG CAGTGAGCGC AACGCAATTA ATGTGAG,TT AGCTCACTCA + ********* * ******** ***** *** ********** ********** ********** + 445 -TTCCCGACT G,AAAGCGGG TAGTGA,CGC AACGCAATTA ATGTGAG-TT AGCTCACTCA + -127 -117 -107 -97 -87 -77 + 472 482 492 502 512 + 405 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT + ********** ********** ********** ********** ** + 445 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT + -67 -57 -47 -37 -27 + Possible join between contig 443 in the - sense and contig 423 + Percentage mismatch after alignment = 10.4 + 64 74 84 94 104 114 + 423 ATCGAAGAAA GAAAAGGAGG AGAAGATGAT TTTAAAAATG AAACG-CGAT GTCAGATGGG + **** ***** ********** ********** ****** ** ***** **** ********* + 443 ATCG,AGAAA GAAAAGGAGG AGAAGATGAT TTTAAA,,TG AAACGACGAT GTCAGATGG, + 3610 3620 3630 3640 3650 3660 + 124 134 144 154 164 + 423 TTG-ATGAAG TAGAAGTAGG AG-AGGTGGA AGAGAAGAGA GTGGGA + *** ****** ********** ** ******* *** ***** ** ** + 443 TTGGATGAAG TAGAAGTAGG AGGAGGTGGA ,GAG,AGAGA GTTGG- + 3670 3680 3690 3700 3710 + + +.end lit +.left margin1 +@36. TX 3 @Double strand +.left margin2 +.para +PLEASE MAKE A COPY OF THE DATABASE BEFORE USING THIS OPTION AS IT HAS +CURRENTLY HAD VERY LITTLE TESTING. +.para +Uses the cutoff data to change single stranded sections of a contig into +double stranded sections. Data is used carefully to try and minimise the +number of data disagreements created. However it must be noted that an overall +slight degradation in quality will still occur. +.para +When using this option you will be prompted for a contig and a region within +that contig. The default region is the entire contig. The option will then +search through the region for areas of good data on one strand and cutoff data +on the opposite strand, extending the cutoff data. The criteria for evaluating +the amount of cutoff data to be used is based upon a maximum number of +mismatches and a score (derived by accumulating points for mismatches (-8), +matches(+1) and insertions (-5) over the length of an alignment). The defaults +are: +.lit + +maximum mismatches : 6 + +score for mismatch : -8 +score for correct match : +1 +score for insertion : -5 +.end lit +.para +Note that with successive calls to this option it is possible to double strand +more and more data. Naturally however the quality of the data generated will +diminish each time. +.left margin1 +@37. TX 3 @Auto-select oligos +.left margin2 +.para +PLEASE MAKE A COPY OF THE DATABASE BEFORE USING THIS OPTION AS IT HAS +CURRENTLY HAD VERY LITTLE TESTING. +.para +Generates a file (default "primers") of suggested primers to use for covering +a single stranded section or for walking off the end of a contig. The file +generated contains the gel reading name, the primer sequence, it's offset in +the contig and the orientation. An example file would be : +.lit + +c81d12.s1 TTGTCTGTAAGCGGATG (@ 6449 ) + +c98a10.s1 ATTATCACTTTACGGGTC (@ 6959 ) + +c81c1.s1 CAAGAAGGCGATAGAAG (@ 7643 ) + +c76a10.s1 CCTCATCCTGTCTCTTG (@ 8441 ) + +c81g4.s1 ATGAAACCTGGGCGTTG (@ 16156 ) + +c91e6.s1 GTTTTCAGATGTCGGAG (@ 18249 ) + +c81e12.s1 GCTACCGTAAAACACTTC (@ 18737 ) + +c93h11.s1 GCTGCTTTTTGTTTTATCC (@ 19158 ) + +c81h6.s1 CTTCCACTTCTTTCTTATC (@ 21210 ) + +c86a12.s1 CGAATGATAAAGACAAATCAG (@ 22122 ) + +c98b1.s1 GCCACTTTATCCGAGAC (@ 3048 ) - +c97c5.s1 GTGTTTTGGGTATATTGTG (@ 3371 ) - +c83d2.s1 CTACACAGAATGAACCC (@ 3768 ) - +c78h10.s1 GGCGGTGAAGATTGAAG (@ 4200 ) - +c98h9.s2dt CTCGTTTAAATTTCAAACTTCC (@ 7419 ) - +c95a9.s1 ATTGGAAGGAAGGAGGG (@ 22996 ) - +c82b4.s1 TGTAGCCGAAATCTTCC (@ 23369 ) - +.end lit +.para +This is best employed after having previously used the 'Double strand' option. +When selecting the option you will be asked for the contig, a region within +this contig and the file to write the list of primers to. For each primer +suggested a tag is automatically created containing details of the gel reading +name and the sequence. Preferably the tag will be created on the gel reading +from which the primer was selected. However this is not always possible so +failing that the tag will be on another sequence overlapping the primer +position. +.para +When invoked with the dialogue option you will be asked a couple more +questions relating to the position and size of the consensus checked for +suitable oligos. You will be prompted for the start and end of a region +(default 40-140) at a relative position to the left of our initial region. +.para +For example: +.lit + +? Menu or option number=d37 + Auto-select oligos + Default Contig identfier=/e97f2.s1 + ? Contig identfier= + ? Start position in contig (1-20942) (1) =10000 + ? End position in contig (10000-20942) (20942) =11000 + Default Name of file for primers=primers + ? Name of file for primers= + ? Start of oligo choice region (1-1024) (40) =50 + ? End of oligo choice region (50-1024) (150) =150 + +.end lit +.para +This implies that we are going to look for oligos to use as primers covering +the region 10000 to 11000. For each single stranded section in this region we +search for the oligos at between 50 and 150 to the left. So if we had a single +stranded section from 10121 to 10295 we would search for oligos in the region +9971 to 10071. +.left margin1 +@38. TX 1 @Check assembly +.left margin2 +.para +This new function is used for checking the positioning of assembled readings. +It is useful for checking sequences that contain repeats +of length similar to that of a single gel reading. It takes the poor +quality data for each reading and compares it to the segment of the consensus +to which it should align. +If the extension of the +read does not match the consensus then the read (or its neighbours) has +probably been assembled into the wrong place. +The program displays the bad alignments. +The quality of an alignment is defined by the percentage mismatch. +Naturally the user should select a value that takes into account +the poor quality of the data being aligned. +.para +When the routine is used from the X version the +user is offered the editor to examine poor alignments. + If alignments are reported as poor, but on inspection are OK, the user +can set a tag so that the poor quality data is ignored on subsequent passes +through the routine. Note however such data will then also be ignored by +the automatic double stranding routine! +.para + The user defines the percentage mismatch; the window size and number of +dashes allowed in the window used for selecting the amount of the poor data +to be employed; can choose to save the names of the poorly aligned reads +in a file; can select an individual contig or scan the whole database. +The file containing the names of the poorly aligned reads can be used by +the disassembly routine to remove them from the database, and then can be used +to reassemble them. Note that the routine complements each contig twice +during processing. + +.left margin1 +@39. TX 1 @Find read pairs +.left margin2 +.para +This new function is used to check the positions of readings taken from each +end of the same template. For each forward read it searches for a corresponding +reverse reading. The search can be over the whole database or over a single contig. +The results can be presented graphically for single contig searches and the crosshair +function can be used to identify the readings displayed. +.para +Note that at present the function only knows that two reads are from the same template +by comparing reading names. For our local projects we use the following naming +convention: forward reads are named abcdefgh.s1 and reverse reads abcdefgh.r1. The +program expects this naming convention and so if it finds read fred.s1 and fred.r1 it +assumes they are the forward and reverse reads for template fred. In the future we +will make the routine more general! +.para +If a single contig is selected and the output is listed the program displays two +lines for each pair: the first line shows the reading name, its position and length, +and the distance between the extremeties of the two reads; the second line shows the +other read name, its position and length. If there are pairs that are in separate contigs +or are facing away from one another they are listed after the pairs that face inwards. +Is this true? +.para +If the results are plotted the full length of the template is drawn with arrows +indicating the direction of reads and the extent of each reading. Those reads that have +their partner in another contig are marked by asterisks. +.para +Typical dialogue is shown below. +.lit + + ? Select contigs (y/n) (y) = + Default Contig identifier=/i55d8.s1 + ? Contig identifier= + ? Start position in contig (1-15227) (1) = + ? End position in contig (1-15227) (15227) = + ? Plot results (y/n) (y) = n + 852 k23a1.r1 249 238 1615 + 806 k23a1.s1 1529 -335 + 238 i68e6.s1 422 193 1632 + 868 i68e6.r1 1756 -298 + 576 k17a2.s1 2370 213 1676 + 885 k17a2.r1 3790 -256 + 84 k27g6.s1 3456 291 1777 + 867 k27g6.r1 4905 -328 + 453 k01g10.s1 5805 142 1251 + 881 k01g10.r1 6909 -147 + 781 i98b8.r1 6754 338 1079 + 10 i98b8.s1 7653 -180 + 883 k02d11.r1 7327 276 1597 + 283 k02d11.s1 8726 -198 + 269 i68f9.s1 8191 169 1055 + 777 i68f9.r1 8891 -355 + 710 i91c6.s1 8245 95 1516 + 780 i91c6.r1 9403 -358 + 596 k27d12.s1 136 329 -329 + 219 k27d12.r1 1 -116 + 159 k27d11.r1 1830 -263 -263 + 317 k27d11.s1 2902 343 + 886 k17g11.r1 7107 -123 -123 + 647 k17g11.s1 1867 265 + 851 i69g10.r1 8045 -137 -137 + 277 i69g10.s1 4658 174 +.end lit +.para +If contigs are not selected the pairs are sorted on their separations. +.lit + + ? Select contigs (y/n) (y) = n + i68f2.s1 27 1781 1777 + i68f2.r1 776 111 1777 + k17f6.s1 601 60 1706 + k17f6.r1 856 1405 1706 + k17a2.s1 576 2370 1676 + k17a2.r1 885 3790 1676 + k27g3.s1 177 14985 1664 + k27g3.r1 889 13564 1664 +. +. + k27b12.s1 764 1 1086 + k27b12.r1 857 932 1086 + i98b8.s1 10 7653 1079 + i98b8.r1 781 6754 1079 + k16a3.s1 748 1276 1070 + k16a3.r1 784 472 1070 + k17b7.r1 786 14937 18942* + k17b7.s1 787 3601 18942* + k27d12.r1 219 1 15208* + k27d12.s1 596 136 15208* + k01g2.s1 502 87 14754* + k01g2.r1 782 9224 14754* + +.end lit + +.left margin1 +@ end of help diff --git a/help/DAP.RNO b/help/DAP.RNO new file mode 100644 index 0000000..7bcfa2d --- /dev/null +++ b/help/DAP.RNO @@ -0,0 +1,2724 @@ +.npa +.left margin1 +@-1. TX 0 @General +.sp +@-2. T 0 @Screen control +.sp +@-2. X 0 @Screen +.sp +@-3. TX 0 @Modification +.sp +@0. TX -1 @SAP +.left margin2 +.PARA +This is help information for the X Windows version of SAP. +Currently it is being brought up to date with the new features in XDAP. +The accuracy of this help should therefore not be assumed. +.PARA +This is an interactive program whose primary use is +for managing shotgun sequencing projects, but it can also be used for +handling alignments of other sequences, including those of proteins. +Currently the maximum 'gel reading' length is set to 4096 characters. +Almost all of the information below describes the use of the program for +shotgun projects, but those using the programs for handling other +sequence +alignments should interpret it accordingly. +The data for such a project is stored in a special type of database. The +program + contains the tools that are required to type in gel readings, +screen them against vector sequences and restriction sites; +enter new gel +readings into the database (automatically comparing and aligning +them). In addition it contains editors and functions to examine the quality +of the aligned sequences. +.para + There are three main menus: "general", "screen" and "modification", +and some functions have submenus. +.left margin2 +.lit + The general menu contains the following options: + + Open a database + Display a contig + List a text file + Direct output + Calculate a consensus + Screen against restriction enzymes + Screen against vector + Check database + Copy database + Show relationships + set parameters + Highlight disagreements + Examine quality + Find internal joins + +The graphics menu contains: + + Clear graphics + Clear text + Draw ruler + Use cross hair + Change margins + Label diagram + Plot map + Plot single contig + Plot all contigs + + +The modification menu contains: + + Edit contig + Auto assemble + Join contigs + Complement a contig + Alter relationships + Extract gel readings + + +The alter relationships menu contains: + + Cancel + Line change + Edit single gel reading + Delete contig + Shift + Move gel reading + Rename gel reading + Break contig + Alter raw data parameters + +.END LIT +.SK1 +.para +Overview of the methodology +.para +The shotgun sequencing strategy +.para + In the shotgun sequencing procedure +the sequence to be determined is randomly broken into fragments of +about +400 nucleotides in length. These fragments are cloned and then +selected randomly and their + + sequences determined. The relationship between any pair of + + fragments is not known beforehand +but is found by comparing their sequences. + + If the sequence of one found to be wholly or partially contained + + within that of another for sufficient length to distinguish an + + overlap from a repeat then those two fragments can be joined. +The + + process of select, sequence and compare is continued until the +whole + + of the DNA to be sequenced is in one continuous well +determined + + piece. + +.para + Definition of a contig + +.para + A CONTIG is a set of gel readings that are related to one + another by overlap of their sequences. All gel readings belong to + a contig and each contig contains at least one gel + reading. The gel readings in a contig can be summed to produce +a continuous consensus sequence and the length of this sequence is +the length of the contig. The rules used to perform this summation are + given under "the consensus algorithm". + At any stage + of a sequencing project the data will comprise a number of +contigs; +when a project is + + complete there should be only one contig and its consensus will be + the finished sequence. Note that since being introduced and +defined as above the word "contig" has been taken up by those involved in +genomic mapping. In that context the consensus with a precise length is not +defined. + +.SK1 +.LEFT MARGIN2 +Introduction to the computer method +.LEFT margin2 +.PARA +It is useful to consider the objectives of a sequencing project before +outlining how we use the computer to help achieve them. The aim of a +shotgun sequencing project is to +produce an accurate consensus sequence from many overlapping gel +readings. +It is necessary to know, particularly at the latter +stages of the project, how accurate the +consensus sequence is. This enables us to know which regions of the + sequence require further work and also to know when the project is +finished. +To show the quality of the consensus, the programs described here +produce displays like that shown below. +.sk1 +.lit + + 10 20 30 40 50 + -6 HINW.010 GCGACGGTCTCGGCACAAAGCCGCTGCGGCGCACCTACCCTTCTCTTATA + CONSENSUS GCGACGGTCTCGGCACAAAGCCGCTGCGGCGCACCTACCCTTCTCTTATA + + 60 70 80 90 100 + -6 HINW.010 CACAAGCGAGCGAGTGGGGCACGGTGACGTGGTCACGCCGCGGACACGTC + -3 HINW.007 GGCACA*GTC + CONSENSUS CACAAGCGAGCGAGTGGGGCACGGTGACGTGGTCACGCCG-G-ACA-GTC + + 110 120 130 140 150 + -6 HINW.010 GATTAGGAGACGAACTGGGGCG3CGCC*GCTGCTGTGGCAGCGACCGTCG + -3 HINW.007 GATTAG4AGACGAACTGGGGCGACGCCCG*TGCTGTGGCAGCGACCGTCG + -5 HINW.009 GGCAGCGACCGTCG + 17 HINW.999 AGCGACCGTCG + CONSENSUS GATTAGGAGACGAACTGGGGCGACGCC-G-TGCTGTGGCAGCGACCGTCG + + 160 170 180 190 200 + -6 HINW.010 TCT*GAGCAGTGTGGGCGCTG*CCGGGCTCGGAGGGCATGAAGTAGAGC* + -3 HINW.007 TCT*GAGCAGTGTGGGCGCTGC*CGGGCTCGGAGGGCATGAAGTAGAGC* + -5 HINW.009 TCT*GAGCAGTGTGGGCG*T*G*CGGGCTCGGAGGGCATGAAGTAGAGC* + 17 HINW.999 TCTCGAGCAGTGTGGGCGCTG**CGGGCTCGGAGGGCATGAAGTAGAGCG + 12 HINW.017 GTAGAGC* + CONSENSUS TCT*GAGCAGTGTGGGCGCTG-*CGGGCTCGGAGGGCATGAAGTAGAGC* +.END LIT +.para + This is an example showing the left end of a contig from + position 1 to 200. Overlapping this region are gel readings +numbered 6, 3, 5, 17 and 12; +6, 3 and 5 +are in reverse orientation to their original reading (denoted by a minus +sign). Each gel reading also has a name (eg HINW.010). It can be seen that +in a number of places the sequences contain characters other than A,C,G +and +T. Some of these extra characters have been used by the sequencer to +indicate regions of uncertainty in the initial interpretation of the gel +reading, but the asterisks (*) have been inserted by the automatic +assembly function in order to align the sequences. Underneath each 50 +character block of gel reading sequences is the consensus derived from +the +sequences aligned above (the line labelled CONSENSUS). For most of its +length the consensus has a definite nucleotide assignment but in a few +positions there is insufficient agreement between the gel readings and +so a dash (-) appears in the sequence. This display contains all the +evidence needed to assess the quality of the consensus: the number of +times +the sequence has been determined on each strand of the DNA, and the +individual nucleotide assignments given for each gel reading. +.para +So the aim is to produce the consensus sequence and, equally important, +a display of the experimental results from which it was derived. +.para +In order to achieve this the following operations need to be performed: +.left margin2 +1) Put individual gel readings into the computer. +This might involved the manual interpretation of autoradiographs +or the transfer and process of machine-readable files from fluorescent +sequencing machines. +.left margin2 +2) Check each gel reading to make sure it is not simply part of one of the +vectors used to clone the sequence. +.left margin2 +3) Check each gel reading to make sure that those fragments that span +the +ligation point used prior to sonication are not assembled as single +sequences. +.left margin2 +4) Compare all the remaining gel readings with one another to assemble +them +to produce the consensus sequence. +.left margin2 +5) Check the quality of the consensus and edit the sequences. +.left margin2 +6) When all the consensus is sufficiently well determined, produce a copy +of +it for processing by other analysis programs. +.para +It is very unlikely that this procedure will only be passed through once. +Usually steps 1 to 5 are cycled through repeatedly, with step 4 just +adding +new sequences to those already assembled. Generally step 6 is also used +in +order to analyse imperfect sequence to check if it is the one the project +intended to sequence, or to look for interesting features. Analysis of +the consensus, such as +searches for protein coding regions, +can also help to find errors in the sequence. The display of the +overlapping gel readings shown above can be used to indicate, not only +the +poorly determined regions, but also which clones should be resequenced +to +resolve ambiguities, or those which can usefully be extended or +sequenced +in the reverse direction, to cover +difficult regions. + +.PARA +The original +individual gel readings for a sequencing project are each stored in +separate files. As the gel readings are entered into the computer +(usually in batches, say 10 +from a film), the file names they are given are stored in +a further file, called a file of file names. Files of file names +enable gel readings to be processed in batches. +.para +For each sequencing project +we start a project database. This database has a structure specifically +designed for +dealing with shotgun sequence data. +In order to arrive at the final consensus sequence many operations will +be +performed on the sequence data. Individual fragments must be +sequenced and +compared in both senses (i.e. both orientations) with all the other +sequences. When an overlap between a new gel reading and a contig are +found +they must be aligned and the new gel reading added to the contig. If a +new +gel reading overlaps two contigs they must be aligned and joined. Before +the two contigs are joined one of them may need to be turned around +(reversed and complemented) so they are both in in the same orientation. +.para +Clearly, keeping track of all these manipulations is quite complicated, +and to be able to perform the operations +quickly requires careful choice of data +structure and algorithms. For these reasons it is not practicable to store +the gel readings aligned as shown in the display above. Rather, it is more +convenient to store the sequences unassembled, and to record sufficient +information for programs to assemble them during processing. The +data used to assemble the sequences is called relational information. +.left margin2 +.PARA + The database comprises five files and they are described under the +section entitled "open database". +.PARA +Before entry into the project database +each new gel reading must be compared to look for overlaps +with all the data already contained +within the database. This last point is +important: all searching for overlaps is between individual new gel +readings and the data already in the database. There is no searching for +overlaps between sequences within the database; overlaps must be found +before new gel readings are entered into the database. +.para +Below I give an introduction to how the sequences are processed by +being +passed from one function to the next. +.para +This program is used to start a +database for the project and +then the following procedure is used. +.para +Data in the form of individual gel readings are entered into the computer + +and stored in separate files using either program this program or the digitizer + +program. Batches +of these gel readings +are passed to the screening functions in this program to search for overlaps + +with vector sequences ("screen against vector") or for matches to + +restriction enzyme sites that should not be + +present ("screen against enzymes"). +Each run of these screening functions passes on only those gel + +readings that do not contain unwanted sequences. Sequences are passed + +via +files of file names and eventually are processed by the automatic +assembly function ("auto assemble"). This function compares each gel +reading with a consensus of all the previous gel readings +stored in the database. +If it finds any +overlaps + it aligns the overlapping sequences by inserting padding characters, +and then adds the new gel reading to the database. +Gels that overlap are added to existing contigs and gels that do not +overlap any data in the database start +new contigs. If a new gel overlaps two contigs they are joined. +Any gel readings that appear to overlap but which +cannot be aligned sufficiently well are not entered and have +their names written to a file of failed gel reading names. +.PARA +Generally data is entered +into the database in batches as just described. The program + is also used to examine + +the data in the database, to enter gel readings that the automatic + +assembly function cannot align ("auto assemble"), + + and to make final edits. Edits to whole contigs + +can be made in several ways. +A mouse-driven editor ("edit contig") is used to perform all edits manually. +Disagreements between gel readings + +in contigs and their consensus + +sequences can be highlighted by use of the function "highlight + +disagreements". +.PARA +Editing the sequences is obviously an essential part of managing a + +sequencing project. +Editing is required when new + +sequences are added, when contigs are joined, and when sequences are + +corrected. +A basic part of the strategy + +used here is that new + +gel readings should be correctly aligned throughout their whole length + +when +they are entered into the database, and that when contigs are joined they + +are edited so that they are well aligned in the region of overlap. + + Alignment can be achieved by + +adding padding characters to the sequences, and this is the way "auto + +assemble" +operates when adding new sequences to the database. + +.para +In order to search +for overlaps that may have been missed due to errors in + +the gel readings, the function "extract gel readings" can be used to take + +copies of the gel + +readings at the ends of contigs, and write them out as separate files. + +These can then be compared with the database consensus using the "auto + +assemble" function in a mode that forbids entry of data into the +database, +and any gel reading matching two contigs will indicate a join that has + +been +missed. The joins can then be made interactively using "join contigs". + +Missed matches can be + +found at this stage because the errors in the sequences may have been + +corrected by new data. + +.para +Generally the users need not concern themselves with how the relational +information is used by the program, but it is necessary to know +how contigs are identified. Because contigs are constantly being changed and +reordered the program identifies them by the numbers of the gel readings +they contain. Whenever users need to identify a contig they need only +know +the number or name of one of the gel readings it contains. Whenever the +program asks users to identify a contig or gel reading they can type its +number or its archive name. If they type its archive name they must precede +the name by a slash "/" symbol to denote that it is a name rather than a +number. E.g if the archive +name is fred.gel with number 99, users should +type /fred.gel or 99 when asked to identify the contig. Generally, + when it asks for the gel reading to be identified, +the program will offer the user a default name, + and if the user types only return, that +contig will be accessed. When a database is opened the default contig will +be the longest one, but if another is accessed, it will subsequently become +the current default. +.para +Further information is located in the following places. +The database files are described under "open database". The format +for +vector and consensus sequences is given under "calculate a consensus", as are +the +uncertainty codes used in gel readings. +.left margin2 +.para +There are two programs, +other than this, relevant to sequencing are the digitizer +program and the trace editor program, both is outlined briefly below. +.para + The digitiser program +is used for the initial input of gel readings +and for writing a file of file names. The program +uses a digitizer for data entry. +A digitizer is + a two dimensional surface such as a light box +which is such that if a special pen is pressed onto it, the pens +coordinates are recorded by a computer. +These coordinates + can be interpreted by a program. +.para + In order to read an autoradiograph placed on the light box +the user need only define the bottom of +the four sequencing lanes and the bases + to which they correspond and then use the pen to point to each + successive band progressing up the gel. The program examines +the + coordinates of each pen position to see in which of the four +lanes + it lies and assigns the corresponding base to be stored in the + computer. Each time the pen tip is depressed to point to a position + on the surface of the digitizer the program sounds the bell on the + terminal to indicate to the user that a point has been recorded. As + the sequence is read the program displays it on the screen. +.para + The trace editor program +is used for the initial processing of data obtained from +fluorescent sequencing machines. It allows the user to visually +select left and right cutoff positions to denote the start and end of good +data. Users may also edit the sequence at this point. +Output from ted is a sequence file in Staden format with headers that +describe to xdap the cutoff information. + +.left margin1 +@17. TX 1 @Screen against enzymes +.left margin2 +.PARA +Used to compare gel readings against any restriction enzyme recognition + +sequences that may have been used during cloning and which should not + +be present in the data. Works on single gel readings or processes batches + +accessed through files of file names. The algorithm looks for exact + +matches to recognition sequences stored in a file. + +.para +The file containing the recognition sequences must be identified. The +user +must choose between employing a file of file names, or typing in the + + +names of individual gel reading files. If a file of file names is used the + + +program will also create a new file of file names. When the option has + +finished operating this new file will contain the names of all those gel + +readings that did not match any of the recognition sequences. Hence it + can +be used for further processing of the batch. The recognition sequences + +should be stored in a simple text file with one recognition sequence per + +line. +.left margin1 +@18. TX 1 @Screen against vector +.left margin2 +.PARA +Used to compare gel readings against any vector sequences that may have + +been picked up during cloning. Works on single gel readings or processes + +batches accessed through files of file names. The algorithm looks for +exact +matches of length "minimum match length" and displays the overlapping + +sequences. +.para +The file containing the vector sequence must be identified. The user must + +choose between employing a file of file names, or typing in the names of + +individual gel reading files. If a file of file names is used the program +will +also create a new file of file names. When the option has finished + +operating this new file will contain the names of all those gel readings + +that did not match the vector sequence. Hence it can be used for further + +processing of the batch. The vector sequence should be stored in a simple + +text file with up to 80 characters of data per line. More than one vector + +can be stored in a single file. If so each should be preceded by a 20 + +character title of the form <---m13mp8.001-----> where the < and > + signs +and the number like .001 are obligatory. The number must be preceded + +by a dot (.) and be 3 digits long. The total sequence in the file must be < + +50,001 characters in length. + +.left margin1 +@20. TX 3 @Auto assemble +.left margin2 +.PARA +Compares gel readings against the current contents of the database and + +produces alignments. In its normal mode of operation +("entry permitted"), the function +will automatically enter the gel readings into the database, but if entry +is not permitted it will only produce alignments. It works on + +single gel readings or processes batches of gel readings accessed through + +files of file names. It is the usual way to enter data into the database. + +.para +The function will check the database for logical consistency and will + only +proceed if it is OK. Choose if gel readings should be entered into the + +database, or if they should only be compared. Choose between using a file + +of file names or typing file names on the keyboard. If so selected, supply + +the file of file names. Also supply a file of file names to contain the names of + +all the gel readings that fail to get entered. +Select the entry mode. Normal assembly is appropriate for all but special +cases, as is "permit joins". Uses for the other modes are not documented +here. +Define a minimum initial + +match length. Define a minimum alignment block (the default value is + +taken in all but exceptional circumstances). Define the maximum number + +of padding characters allowed to be used in each gel reading to help + +achieve alignment, and the same for the number allowed in the contig for + +each gel reading. Finally define the maximum percentage mismatch to +be allowed for any gel reading to be entered into the database. If + +for any gel reading, either of these last three values is exceeded the gel + +reading will not be entered into the database. + +.para +In operation the function takes a batch of gel readings (probably passed + + on as a file of file names from one of the screening routines) and +enters them into a + database for a sequencing project. It takes each gel reading + in turn, + compares it with the current consensus for the database, it then + produces an alignment for any regions of the consensus it + overlaps; if this alignment is sufficiently good it then edits + both the new gel reading and the sequences it overlaps and adds +the + new gel reading to the database. The program then updates the +consensus + accordingly and carries on to the next gel reading. +.para + All alignments are displayed and any gel readings +that do match but that + + cannot be aligned sufficiently well have their names written to a + file of failed gel reading names. The function works without any + + user intervention and can process any number of gel readings in a + single run. Those gel readings that fail can be recompared using + + the same function (to find the current overlap position) and the + +user can enter them into the database + + manually using the "enter new gel reading" option. +.para +Typical dialogue and output from the function is shown below. (Note that +output for gel readings 2 - 9 has been deleted to save space). +.lit +Automatic sequence assembler +Database is logically consistent +? (y/n) (y) Permit entry +? (y/n) (y) Use file of file names +? File of gel reading names=demo.nam +? File for names of failures=demo.fail +Select entry mode +X 1 Perform normal shotgun assembly + 2 Put all sequences in one contig + 3 Put all sequences in new contigs +? Selection (1-3) (1) = +? (y/n) (y) Permit joins +? Minimum initial match (12-4097) (15) = +? Minimum alignment block (2-5) (3) = +? Maximum pads per gel (0-25) (8) = +? Maximum pads per gel in contig (0-25) (8) = +? Maximum percent mismatch after alignment (0.00-15.00) (8.00) = + >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> + Processing 1 in batch + Gel reading name=HINW.004 + Gel reading length= 283 + Searching for overlaps + Strand 1 + Strand 2 + No matches found + Total matches found 1 + Padding in contig= 0 and in gel= 1 + Percentage mismatch after alignment = 1.8 + Best alignment found + 1 11 21 31 41 51 + TTTTCCAGCG TGCGTCTGAC GCTGTCTTGC TTAATGATCT CCATCGTGTG CCTAGGTCTG + ********** ********** ********** ********** ********** ********** + TTTTCCAGCG TGCGTCTGAC GCTGTCTTGC TTAATGATCT CCATCGTGTG CCTAGGTCTG + 1 11 21 31 41 51 + 61 71 81 91 101 111 + TTGCGTTGGG CCGAGCCCAA CTTTCCCAAA AACGTATGGA TCTTACTGAC GTACA-GTTG + ********** ********** ********** ********** ********** ***** **** + TTGCGTTGGG CCGAGCCCAA CTTTCCCAAA AACGTATGGA TCTTACTGAC GTACACGTTG + 61 71 81 91 101 111 + 121 131 141 151 161 171 + CTTACCAGCG TGGCTGTCAC GGCGTCAGGC TTCCACTTTA GTCATCGTTC AGTCATTTAT + ********** ********** ********** ********** ********** ********** + CTTACCAGCG TGGCTGTCAC GGCGTCAGGC TTCCACTTTA GTCATCGTTC AGTCATTTAT + 121 131 141 151 161 171 + 181 191 201 211 221 231 + GCCATGGTGG CCACAGTGAC G-TATTTTGT TTCCTCACGC TCGCTACGTA TCTGTTTGCC + ********** ********** * ******** ********** ********** ********** + GCCATGGTGG CCACAGTGAC GCTATTTTGT TTCCTCACGC TCGCTACGTA TCTGTTTGCC + 181 191 201 211 221 231 + 241 251 261 271 281 + CGCG--GTGG AATTACAGCG TTCCCTATTG ACGGGCGCAT CCAC + **** **** ********** ** * ***** ********** **** + CGCGACGTGG AATTACAGCG TT,CDTATTG ACGGGCGCAT CCAC + 241 251 261 271 281 + Batch finished + 9 sequences processed + 0 sequences entered into database + 0 joins made + +.end lit + +.para +Note that "auto assemble" cannot align protein sequences. +.left margin1 +@28. TX 1 @Highlight disagreements +.left margin2 +.para +Used in the latter stages of a project +to highlight disagreements between individual gel readings +and their consensus sequences. Characters that agree with the + +consensus are shown as : symbols for the plus strand and . for the minus + +strand. Characters that disagree with the consensus are left unchanged + +and so stand out clearly. The results of this analysis are written to a +file. + +.para +Before selecting this option create a file of the display of the contig to +be +"highlighted". The option will ask for the name of this file. Select + symbols +to denote "agreeing" characters on each strand, the defaults are : and ., + +but any others can be used. Supply the name of a file in which to put + +the output. +.para +The display file needed as input for this option is created by selecting + +"Redirect output", followed immediately by "display contig", and then +"Redirect output" again. The + +cutoff score used in the consensus calculation can be set by option "set + +display parameters". Note that for the highlight function +there is a limit of 50 for the number of gel +readings that are aligned at any position - ie the contig must be less +than 51 gel readings deep at its thickest point. I hope that those performing +shotgun sequencing never reach this limit, but those using the program for +comparing sequence families might. +.para +Typical output from this function is shown below. +.lit + + 210 220 230 240 250 + 1 HINW.004 :C::::::::::::::::::::::::::::::::::::::::::AC:::: + 7 HINW.018 :*::::::::::::::::::::::::::::::::::::::::::CA:::: + -4 HINW.017 ...............AC.... + G-TATTTTGTTTCCTCACGCTCGCTACGTATCTGTTTGCCCGCG--GTGG + + 260 270 280 290 300 + 1 HINW.004 ::::::::::::*:D::::::::::::::::::: + 7 HINW.018 ::::::::::::::::::::CA:::::T:*:::*::::::::::::CA: + -4 HINW.017 ..............................................A... + 3 HINW.009 :::::::::::::::V::::::::::::::::::::::::::::*AV::: + -6 HINW.028 ......................A... + AATTACAGCGTTCCCTATTGACGGGCGCATCCACGCTGATTCTCTT-CTG + +.end lit +.left margin1 +@32. TX 3 @Extract gel readings +.left margin2 +.para +Used to make copies of the aligned gel readings in a database, +to write them into separate files, and to write a + +corresponding file of file names. It operates in two modes: either all gel + +readings are extracted, or only those at the ends of contigs. + +.para +Choose which mode of operation is required and supply a file of file + +names. +.para +The gel readings are given their original + +names. +If used to extract the gel readings from the ends of contigs the function + is +useful for checking for missed contig joins: the file of file names can be + +used with the auto assemble function to recompare these gel readings, + +and each should only overlap one contig. Any that overlap two contigs + +will identify possible joins. +.para +If the option is used to extract all the gel readings from a database, a + +subsequent run of "auto assemble" can reconstitute a database which has + +been corrupted. This rarely occurs and is usually necessitated by a + +user employing "alter relationships" incorrectly without first having + +made a copy. +.left margin1 +@1. TX 0 @Help +.left margin2 +.PARA +Help is available on the following topics : + +.LEFT MARGIN1 +@2. TX 0 @Quit +.LEFT MARGIN2 +.PARA +This command stops the program and is the only safe way to terminate a + +run +of the program that has altered the contents of the database in any way. + +.left margin1 +@3. TX 1 @Open a database +.LEFT MARGIN2 +.PARA +Opens existing databases or allows new ones to be started. The function + is +automatically called into operation +when the program is started but can also be selected + +from the general menu. +.para +Choose to open an existing database or start a new one, or if ! is typed +when the program is first started, enter the program without opening a +database. Supply a project + +database name, and if it already exists, the "version". If starting a new + +database define the database size and if it is for DNA or protein sequences. +The database size is an initial size for the database. It can be increased +later during the project. It is the sum of the number of gel +readings plus the number of contigs. +.para +Database names can have from one to 12 letters and must not include full + +stop (.). The database is made from five separate files. If the database + is +called FRED then version 0 of database FRED comprises files FRED.AR0, + +FRED.RL0, FRED.SQ0, FRED.TG0 and FRED.CC0. The version is the last symbol in the file names. + +Only this program + can read these files. If the "copy database" option is used it + +will ask the user to define a new "version". +.para +For normal use the maximum gel reading length is set to 512 characters, + +but when a database is started the user may choose lengths of either + + 512, +1024, 1536..., 4096. Normally the program is used to handle DNA + +sequences but many of the functions also work on protein sequences. The + +choice of sequence type is made when the database is started. + +.para +The contigs are not stored on the disk as the user sees them displayed on + +the screen. Each gel reading is stored with sufficient information about + +how it overlaps other gel readings so that the program can work out how + +to +present them aligned on the screen. We refer to this extra data as "the +relationships" and it is explained below. + +The database comprises 5 separate files. + +.left margin2 + 1. a working version of each gel reading. This is the version of + the gel reading +that is in the database and initially it is an exact copy of + the original sequence (known as the archive) + but it is edited and manipulated to align it + with other gel readings. + +.left margin2 + 2. the file of relationships. This file contains all of the + + information that is required to assemble the working versions +into + + contigs during processing; any manipulations on the data use this + + file and it is automatically updated at any time that the + + relationships are changed. The information in this file is as + + follows: +.left margin2 + (A) Facts about each gel reading and its relationship to +others +("gel + + descriptor lines"): + +.left margin2 + (a) the number of the gel +reading (each gel reading is given a number as it is + + entered into the database) + +.left margin2 + (b) the length of the sequence from this gel reading + +.left margin2 + (c) the position of the left end of this gel +reading relative to the left + + end of the contig of which it is a member + +.left margin2 + (d) the number of the next gel +reading to the left of this gel reading + +.left margin2 + (e) the number of the next gel reading to the right + +.left margin2 + (f) the relative strandedness of this gel +reading , ie whether it is in + + the same sense or the complementary sense as its archive. + +.left margin2 + (B) Facts about each contig ("contig descriptor lines"): + +.left margin2 + (a) the length of this contig + +.left margin2 + (b) the number of the leftmost gel +reading of this contig + +.left margin2 + (c) the number of the rightmost gel reading of this contig. + +.left margin2 + (C) General facts: + +.left margin2 + (a) the number of gel readings in the database + +.left margin2 + (b) the number of contigs in the database. + +.left margin2 + 3. the file of archive names. This is simply a list of the names + + of each of the archive files in the database but on line number + + 1000 we also store the size of the database. ie the number of lines + + of information allowed in the database files. This file always has + + 1000 lines but the length of the file of relationships and the file + + of working versions can be set by the user when creating a +database + + or when copying from one to another. +.left margin2 + 4. the file of tags (annotation). +This consists of linked lists of tag information for each sequences in the +database. +Tags are created by the user as annotation, or by xdap as records of edits or +for storing cutoff information. +As the number of tags can grow without limit, so can this file. +For each gel there is a header record, which contains the record number of +the start of the linked list for that gel. On line IDBSIZ there is a record +containing information about the file such as its present length and if there +are any free "tag" slots to be reused in the file. + + 5. the file of comments (annotation). +This consists of linked lists of comment fragments. +Comments are created by the user as a message attached to annotation, +or by the system to store cutoff information. +Comments are character strings of any length. +Comments longer than 40 characters are broken up into fragments, each 40 +characters long, and are chained together in a link list. +As the number of comments can grow without limit, so can this file. + +.para + Structure of the database files +.para + 1. The file of relationships +.para + The file contains IDBSIZ lines of data: + the general data are stored on line IDBSIZ; data about gel +readings are + stored from line 1 downwards; data about contigs are stored from + line IDBSIZ-1 upwards. A database of 500 lines containing 25 gel +readings and 4 contigs would have a file + of relationships as is shown below. +.lit + + + --------------------------------------------- + 1 Gel descriptor record + 2 " " " + 3 " " " + 4 " " " + 5 " " " + ' ' ' ' + ' ' ' ' + 25 " " " + 26 Empty record + ' ' ' + + ' ' ' + 495 ' ' + 496 Contig descriptor record + 497 " " " + 498 " " " + 499 " " " + 500 Number of gel readings=25, Number of contigs=4 + --------------------------------------------- + + The arrangement of the data in the file of relationships + +.end lit +As each new gel reading is added into the database a new line is added + to the end of the list of gel descriptor + lines. If this new gel reading does not + overlap with any gel readings + already in the database a new contig line is + added to the top of the list of contig lines. If it overlaps with + one contig then no new contig line need be added but if it overlaps + with two contigs then these two contigs must be joined and the + number of contig lines will be reduced by one. Then the list of +contig + lines is compressed to leave the empty line at the top of the list. + Initially the two types of line will move towards one another but + eventually, as contigs are joined, the contig descriptor lines will + move in the same direction as the gel descriptor + lines. At the end of a + project there should be only one contig line. The database is thus + capable of handling a project of 998 gels. +.para + 2. Structure of the working versions file +.para + The working versions of gel readings are stored in a file of + IDBSIZ lines each containing 512 characters. Gel reading +number 1 is stored on line + 1, gel reading number 2 on line 2 and so on. +.para + 3. Structure of the archive names file +.para + This file, unlike the others, always has 1000 lines each 10 + characters in length. Its length is fixed because line 1000 is used + to store IDBSIZ the database size and the programs need a definite + location from which to read this number. +.para + 4. Structure of the tag file +.para +This file initially starts with IDBSIZ lines, and is expanded as new tags are +created. +Information about the length of the file, and which tag records are reusable +is stored on line IDBSIZ. +A database of 500 lines would have a file of tags as shown below. +.lit + + --------------------------------------------- + 1 Tag descriptor record + 2 " " " + 3 " " " + 4 " " " + 5 " " " + ' ' ' ' + ' ' ' ' + 497 " " " + 498 " " " + 499 " " " + 500 Length of file=N, Free list=0 + 501 Tag record + 502 " " + 503 " " + ' ' ' + ' ' ' + N-2 " " + N-1 " " + N Tag record + --------------------------------------------- + + The arrangement of the data in the file of relationships + +.end lit +As each new tag is added to the database, a check is made in the +file descriptor record at line IDBSIZ. If the list of reusable records is 0, +the file is extended by one line. Otherwise the new tag is assigned to +record at the head of the freelist. +When tags are deleted, they are added to the free list in the file descriptor +record. +.para + 5. Structure of the comment file +.para +This file initially starts with 1 line, and is expanded as new annotation is +created. +Information about the length of the file, and which comment records are reusable +is stored on the first line. +.lit + + --------------------------------------------- + 1 Length of file=N, Free list=0 + 2 Comment fragment + 3 " " + 4 " " + ' ' ' + ' ' ' + N-2 " " + N-1 " " + N Comment fragment + --------------------------------------------- + + The arrangement of the data in the file of relationships + +.end lit +As each new comment is added to the database, a check is made in the file +descriptor record at line 1. If the list of reusable records is 0, +the file is extended to hold the new comment. Otherwise the new comments is +assigned to records starting with the head of the freelist. +When comments are deleted, the discarded records are added to the free list in +the file descriptor record. +.para + There are various checks within the programs to + protect users from themselves:- +.left margin2 + 1. All user input is checked for errors - e.g. reference to + non-existent gel +readings or contigs, incorrect positions in the + contig or gel readings. +.left margin2 + 2. Before entering a gel reading the system checks to see if a + file of the same name has already been entered. +.left margin2 + 3. Join will not allow the circularising of a contig. +.left margin2 + 4. Both enter and join functions restrict the region + that the user is allowed to edit (using edit contig) to the + region of overlap. +.left margin2 +5. Users may escape from any point in the program. +.left margin2 +6. Help is available from all points in the program. +.SK2 +.LEFT MARGIN2 +IT IS ESSENTIAL THAT USERS DO NOT KILL THE PROGRAM WHILE IT IS +DOING +ANYTHING THAT INVOLVES CHANGING THE CONTENTS OF THE +DATABASE. I.E DURING AUTO ASSEMBLE, +COMPLETE ENTRY, COMPLETE JOIN, COMPLEMENT CONTIG, EDIT CONTIG, AND SCREEN +EDIT. +This could +corrupt the database so badly that it is impossible to fix. The program +should always be left using the QUIT option. + +.left margin1 +@4. TX 3 @Edit contig +.LEFT MARGIN2 +.PARA +The Contig Editor is a mouse-driven editor that can insert, +delete and change gel reading sequences. +.para +The Contig Editor allows scrolling from one end of a contig to the other +using the scroll bar and scroll buttons. Action of mouse button presses +when the mouse pointer is in the scroll bar: +.sk1 +.lit + Middle Mouse Button Set editor position + Left Mouse Button Scroll forward one screenful + Right Mouse Button Scroll backwards one screenful +.end lit +.sk1 +The four scroll buttons operate as follows: +.sk1 +.lit + "<<" Scroll left half a screenful + "<" Scroll left one character + ">" Scroll right one character + ">>" Scroll right half a screenful +.end lit +.para +The Editor cursor can be positioned anywhere in the edit window by +moving the mouse pointer over the character of interest, then pressing the +left mouse button. The Editor cursor can also be moved by using the +direction arrow keys. +.para +The editor operates in two main edit modes - Replace and Insert. Replace allows +a character to be replaced by another. Insert allows characters to be +inserted into a gel reading sequence. Characters are entered by typing +them from the keyboard. Only valid characters are permitted. +Characters can be deleted by positioning the cursor one character to the right, +then pressing the delete key. +Normally Insert and Delete apply to the consensus line of the contig ONLY. +This restraint can be overridden by using the "Super Edit" mode of +operation, THOUGH IT IS NOT RECOMMENDED. +.para +Edits can also be performed on the consensus, though they are +restricted to insertion and deletion of padding characters ("*"). +These edits also have special meanings. +A deletion will delete ALL characters at the position to the left +of the cursor in the contig, and move the relative positions of all +sequences starting to the right of the cursor position left one +character. +An insertion will insert the character typed ("*") into ALL gel +reading sequences at the cursors position in the contig, and move the +relative positions of all sequences starting to the right of the cursor +position right one character. +.para +The effect of the last edit can be undone by pressing the "Undo" button +at the top of the editor window. +.para +The cursor will automatically be positioned at the next problem when the +"Find Next Problem" button is selected. The next problem is where the +consensus shows either an ambiguity ("-") or a pad ("*") character. +.para +The edits to the contig can be saved by pressing the "Leave Editor" +button and replying "Yes" to the prompt to "Save changes?". As no changes +are made to the working copy of your database til this point it +is possible to abort the editor if +the edit session ends up in an unsatisfactory state (ie if you've +stuffed it up!) +.left margin1 +.sk3 +Displaying Traces +.left margin2 +.para +The original data from which the gel reading sequences where derived can +be seen by double clicking (two quick clicks) with the middle mouse button +on the area of interest. The trace will be displayed with the point +clicked at the centre of the trace viewport. +.para +All traces that are displayed are maintained in one window, called the Trace +Manager. The Trace Manager will only display four traces maximum. When four +traces are already being managed and a new one is requested, the one at the top +of the Trace Manager is removed and the new one is added to the bottom. +Traces can be removed individually by using the "quit" button in the panel next +to the trace. +.left margin1 +.sk3 +Extending Reads Using Cutoff Information +.left margin2 +.para +Sequence data read in from Automated Fluorescent sequencing machines +trace files processed through the program ted +will have the discarded sequence (vector at start and poor read at +end) available to the contig editor. To display the cutoff +information, press the "Display Cutoff" button at the top of the +editor window. +The cutoff sequence appears in grey. This sequence can be incorporated +into the editable sequence, by moving the cutoff position. This is +done by positioning the cursor at the end of the gel sequence, and +using Meta-Left-Arrow and Meta-Right-Arrow to adjust the point of cutoff. +The Meta key is a diamond on the Sun keyboard. +.left margin1 +.sk3 +Pop-up menu +.left margin2 +.para +A pop-up menu is revealed by depressing the "Control" key on the keyboard +and at the same time pressing the left mouse button. The menu has the following +functions: +.lit + + Search + Save Contig + Create Tag + Edit Tag + Delete Tag + +.end lit +"Save Contig" is described above. +Searching and operations on tags are described below. +.left margin1 +.sk3 +Searching +.left margin2 +.para +Selecting "Search" brings up a +window which can remain present during normal editor operation. The +window allows the user to select the direction of search, the type of +search and a value to search on. The value is entered into the value +text window. Then pressing the "search" button +performs the search. If successful, the cursor is positioned and +centred accordingly. An audible tone indicates failure. Pressing the +"ok" button removes the search window. The search window is +automatically removed when the contig editor is exited. +.sk1 +There are seven different search modes: +.sk1 +1. Search by position +.sk1 +This positions the cursor at the numeric position specified in the +value text window. Eg a value of "1234" causes the cursor to be placed +at base number 1234 in the contig. Positioning withing a gel reading is +achieved by prefixing the number with the "@" character, eg "@123" +positions the cursor at base 123 of the sequence in which the cursor +lies. Relative positions can be specified by prefixing the number with +a plus or minus character. Eg "+1234" will advance the cursor 1234 +bases. If possible, the cursor is positioned within the same sequence. +The direction buttons have no effect on the operation of "search +by position". +.sk1 +2. Search by reading name +.sk1 +This positions the cursor at the left end of the gel reading specified +in the value text window. If the value is prefixed with a slash is is +assumed to be a gel reading name. Otherwise it is assumed to be a gel +reading number. Eg "123" positions the cursor at the left end of gel +reading number 123. "/a16a12.s1" positions at the start of reading +a16a12.s1. If the value was "/a16" the cursor is positioned at the +first reading which starts with "a16". The direction buttons have no +effect on the operation of "search by position". +.sk1 +3. Search by tag type. +.sk1 +This positions the cursor at the start of the next tag which has the +the same type as specified by the type value menu. To change the type, +select off the menu that pops up when the mouse is clicked on the +button labeled "Type:". The search can be performed either forwards +or backwards of the current cursor position. To find all tags, use +"search by annotation", with a null text value string. +.sk1 +4. Search by annotation. +.sk1 +This positions the cursor at the start of the next tag which has a +comment containing the string specified in the value text window. The +search performed is a regular expression search, and certain +characters have special meaning. Be careful when your value string +contains ".", "*", "[", "^" or "$". The search can be performed either +forwards or backwards from the current cursor position. +.sk1 +5. Search by sequence. +.sk1 +This positions the cursor at the start of the next piece of sequence +that matches the value specified in the text value window. The search +is for an exact match, which means the case of value string is +important. The search is performed on the gel readings themselves, +rather than the consensus sequence. The search can be performed either +forwards or backwards from the current cursor position. +.sk1 +6. Search by problem. +.sk1 +This positions the cursor at the next place in the consensus sequence +which is not an "A", "C", "G" or "T". The search can be performed +either forwards or backwards from the current cursor position. +.sk1 +7. Search by quality +.sk1 +This positions the cursor at the next place in the consensus sequence +where the consensus calculation for each strand disagrees. When only +sequences on one strand is present, the search will stop at every +base. The search can be performed either forwards or backwards from the +current cursor position. +.left margin1 +.sk3 +Annotation +.left margin2 +.para +Parts of a sequence can be annotated, to record the positions of primers used +for walking, or to mark sites, such as compressions that have caused problems +during sequencing. +The consensus sequence CANNOT be annotated. +.para +To annotate a piece of sequence first select the part of sequence +using the mouse buttons. Use the left mouse button to position the start of the +selection, and while this button is being held down, move the mouse to extend. +The selection can be extended further using the right mouse button. +.para +To create annotation, invoke the pop-up menu, and select the "Create Tag" +function. A small "tag editor" will appear which +allows you to select the type of the +annotation from a pull-down menu, and specify a comment if desired. +To select a new type pull down the Type menu, and select the entry desired. +To enter a comment, simply type into the text window in the tag editor. +The annotation is created when the "Leave" button on the tag editor, +and is displayed in the colour defined in the tag database file (TAGDB). +.para +To edit existing annotation, +position the cursor with the left mouse button +on the tag, and select the +"Edit Tag" +off the pop-up menu. +This invokes the tag editor, and changes to the type and comment of the +annotation can be made. The tag is updated when the "Leave" button is pressed. +.para +To delete an existing annotation, +position the cursor with the left mouse button +on the tag, and select the +"Delete Tag" +off the pop-up menu. +.left margin1 +.sk3 +NOTE: +.left margin2 +.para +As the Contig Editor is a very powerful tool, it is possible that the alignment +of the gel reading sequences has unexpectedly been disrupted. +This can easily happen to parts of the contig that lie to the right +of the screen if excessive use has been made of the "Super Edit" facility. +Until familiar with "Super Edit" it would benefit the sequencer to quickly +scan through the contig after editing to check that bad alignments have not +been created. +.left margin1 +@9. T 3 @Screen edit +.LEFT MARGIN2 +.para +THIS OPTION IS NO LONGER AVAILABLE IN XDAP. USE EDIT CONTIG +.para +Gives access to the system editor on the machine (for example EDT on a VAX) +and allows users to edit contigs. The contigs are presented as for +"display contig" and the program will +reconstitute the contig's sequences and relationships when the editor is +exited. +.para +To screen edit a contig set the line length to 50 characters, +select the contig to edit, and supply the name of a temporary file in which +the editing will be performed. +After a short pause the system +editor will present the first page of the file. Edit the file obeying the +rules given below. Exit from the editor and affirm the intention of +returning the contig to the database. The program will put the contig +back into the database. +.para +Rules for screen editing +.para +There are some limitations on the changes that can be made to the contigs +when using the screen editor. Users are unlikely to want to break the +rules +in order to achieve changes to contigs, but nevertheless the +constraints need to be defined and they are given below. +.para +Alignments must be maintained during editing. +Whole lines of sequence should not be deleted or added unless the +order +of the gel readings in the contig is preserved. +Each line in the +contig display consists of gel reading numbers, their names and 50 +character sections of sequence. Insertions are limited in the following +way. +No line of sequence can be extended rightwards more than 10 characters +beyond the end of a full length line (a full length line is 50 characters +long). Only one character can be added to the left end of full length +lines, but sections of sequence beginning further into a line + can be extended leftwards up to an equivalent position. Do not delete any +non-sequence lines in the file. +.para +Before returning the contig to the database the program checks that the +rules have been obeyed. If an error is found the number of the erroneous +line in the +file is displayed and the contig will not be changed. +.left margin1 +@5. TX 1 @Display a contig +.LEFT MARGIN2 +.para +Used to show the aligned gel readings for any part of a contig. The + +number, name and strandedness of each gel reading is shown and the + +consensus is written below. +.para +If required identify the contig, and then the start and end points of the + +region to display. +.para +The display can be directed to a disk file using "direct output to disk". + +These files are required by options: "screen edit" and "highlight + +disagreements", and printed copies of them +are very useful for marking corrections prior to + +using the editors. +.para + Below is an example showing the left end of a contig from + position 1 to 200. Overlapping this region are gels 6,3,5,17and 12; +6, 3 and 5 +are in reverse orientation to their archives (denoted by a minus sign) + There are a few uncertainty codes and a few padding + characters in the working versions, but the consensus (shown +below + each page width) has a definite assignment for almost every +position. +.lit + + 10 20 30 40 50 + -6 HINW.010 GCGACGGTCTCGGCACAAAGCCGCTGCGGCGCACCTACCCTTCTCTTATA + CONSENSUS GCGACGGTCTCGGCACAAAGCCGCTGCGGCGCACCTACCCTTCTCTTATA + + 60 70 80 90 100 + -6 HINW.010 CACAAGCGAGCGAGTGGGGCACGGTGACGTGGTCACGCCGCGGACACGTC + -3 HINW.007 GGCACA*GTC + CONSENSUS CACAAGCGAGCGAGTGGGGCACGGTGACGTGGTCACGCCG-G-ACA-GTC + + 110 120 130 140 150 + -6 HINW.010 GATTAGGAGACGAACTGGGGCG3CGCC*GCTGCTGTGGCAGCGACCGTCG + -3 HINW.007 GATTAG4AGACGAACTGGGGCGACGCCCG*TGCTGTGGCAGCGACCGTCG + -5 HINW.009 GGCAGCGACCGTCG + 17 HINW.999 AGCGACCGTCG + CONSENSUS GATTAGGAGACGAACTGGGGCGACGCC-G-TGCTGTGGCAGCGACCGTCG + + 160 170 180 190 200 + -6 HINW.010 TCT*GAGCAGTGTGGGCGCTG*CCGGGCTCGGAGGGCATGAAGTAGAGC* + -3 HINW.007 TCT*GAGCAGTGTGGGCGCTGC*CGGGCTCGGAGGGCATGAAGTAGAGC* + -5 HINW.009 TCT*GAGCAGTGTGGGCG*T*G*CGGGCTCGGAGGGCATGAAGTAGAGC* + 17 HINW.999 TCTCGAGCAGTGTGGGCGCTG**CGGGCTCGGAGGGCATGAAGTAGAGCG + 12 HINW.017 GTAGAGC* + CONSENSUS TCT*GAGCAGTGTGGGCGCTG-*CGGGCTCGGAGGGCATGAAGTAGAGC* +.END LIT +.left margin1 +@6. TX 1 @List a text file +.LEFT MARGIN2 +.PARA +This option allows users to list text files on the screen. It can be used +to read a file containing notes, for checking files written to disk etc. The +user is asked to type the name of the file to list. +.left margin1 +@8. TX 1 @Calculate a consensus +.LEFT MARGIN2 +.para + Calculates a consensus sequence either for the whole database or + +for selected contigs. The consensus is written to a file named by the + user. +.left margin2 +Supply a file name, choose between whole database or selected contigs. +.para + Symbols for uncertainty in gel readings +.para +In order to record uncertainties when reading gels the codes shown + +below can be used. Use of these codes permits us to extract the + +maximum amount of data from each gel and yet record any doubts by + +choice of code. The program can deal with all of these codes and any + +other characters in a sequence are treated as dash (-) characters. + + +.lit + + SYMBOL MEANING + + 1 PROBABLY C + 2 " T + 3 " A + 4 " G + D " C POSSIBLY CC + V " T " TT + B " A " AA + H " G " GG + K " C " C- + L " T " T- + M " A " A- + N " G " G- + R A OR G + Y C OR T + 5 A OR C + 6 G OR T + 7 A OR T + 8 G OR C + - A OR G OR C OR T + a A set by auto edit + c C set by auto edit + g G set by auto edit + t T set by auto edit + * padding character placed by auto assembler + else = - + +.end lit + +.LEFT MARGIN2 + The DNA consensus algorithm +.para +The "calculate consensus" function, the "display contig" routine and the + +"show quality" option use the rules outlined here to calculate a + +consensus from aligned gel readings. Note that "display contig" +calculates +a consensus for each page width it displays (it does not use the + +consensus sequence file calculated by the consensus function). + +.LEFT MARGIN2 +.para +We have 6 possible symbols in the consensus sequence: A,C,G,T,* and -. The +last symbols is assigned if none of the others makes up a sufficient +proportion of the aligned characters at any position in the contig. The +following calculation is used to decide which symbol to place in the +consensus at each position. +.para +Each uncertainty code contributes a score +to one of A,C,G,T,* and also to the total at each point. Symbols like R +and Y which don't correspond to a single base type contribute only to the +total at each point. The scores are shown below. +.lit + definite assignments ie A,C,G,T,B,D,H,V,K,L,M,N,a,c,g,t,* =1 + + probable assignments ie 1,2,3,4 = 0.75 + + other uncertainty codes including R,Y,5,6,7,8,- = 0.1 +.end lit +.para +A cutoff score of 51% to 100% is supplied by the user. (When the program +starts this is set to 75%. See "set display parameters"). +At each position in the contig we calculate the total score for each of +the 5 symbols +A,C,G,T and * (denote these by Xi, where i=A,C,G,T or *), +and also the sum of these totals +(denote this by S). Then if 100 Xi / S > the cutoff for any i, symbol i is +placed in the consensus; otherwise - is assigned. +.para +Notice that S does not equal the number of times the sequence has been +determined, but is the score total, and hence we are less likely to put a - +in the consensus. For the "examine quality" algorithm each strand is +treated separately but the calculation is the same. (It was originally +different). +.para +Format of the consensus sequence ( and vector sequences). +.para +A consensus sequence file may contain the consensus for several contigs + +and so we identify each of them by preceding them by a 20 character + +title. The title is of the form <---LAMBDA.076-----> ( where LAMBDA is + +the project name and gel reading number + + + 76 is the leftmost gel +reading to contribute to this consensus sequence). + + + The angle brackets <> and the three digit number precede by a . + +are important to some processing programs. +.left margin1 +@25. TX 1 @Show relationships +.LEFT MARGIN2 +.para + Used to show the relationships of the gel readings in the database in + +three ways - +.LEFT MARGIN2 + (a) All contig descriptor lines followed by all gel descriptor + lines. +.LEFT MARGIN2 + (b) All contigs one after the other sorted, i.e. for each + contig show its contig descriptor line followed by all its + gel descriptor lines sorted on position from left to right +.LEFT MARGIN2 + (c) Selected contigs: show the contig line and, in order, + those gel readings that cover a user-defined region. + Note that this output can be directed to a disk file by + prior selection of "disk output". +.LEFT MARGIN2 +.para + Below is an example showing a contig from position + 1 to 689. The left gel reading is number 6 and has archive +name HINW.010, the +rightmost gel reading is number 2 and is has archive name HINW.004. +On each gel descriptor line is shown: + the name of the archive version, the gel number, the position of the + left end of the gel reading relative to the left end of the contig, the + length of the gel +reading (if this is negative it means that the gel reading is in + the opposite orientation to its archive), the number of the gel +reading to + the left and the number of the gel reading to the right. +.lit + + + CONTIG LINES + CONTIG LINE LENGTH ENDS + LEFT RIGHT + 48 689 6 2 + GEL LINES + NAME NUMBER POSITION LENGTH NEIGHBOURS + LEFT RIGHT + HINW.010 6 1 -279 0 3 + HINW.007 3 91 -265 6 5 + HINW.009 5 137 -299 3 17 + HINW.999 17 140 273 5 12 + HINW.017 12 193 265 17 18 + HINW.031 18 385 -245 12 2 + HINW.004 2 401 -289 18 0 + +.end lit +.left margin1 +@21. TX 3 @Enter new gel reading +.LEFT MARGIN2 +.para +THIS OPTION IS NO LONGER AVAILABLE IN XDAP. USE AUTO ASSEMBLE +.para +Used to enter new gel readings into the +database. The new gel reading must have previously been compared with +the +contents of the database by use of " auto assemble" in order to ascertain +if it overlaps any previously entered data. +.para +The user is expected to know: if +the gel reading overlaps; if so which contig it overlaps; if so where it +overlaps. The program takes the user through a series of question to +establish the nature of the overlap and then displays the overlap. The +user +is then offered a number of options, including editors for the new gel +reading and the contig, to enable the correct alignment of the gel reading +throughout its whole length. +.left margin2 + +Supply the name of the gel reading file. +If the gel +reading has been entered before the program will not permit + +entry. +The program gives the gel reading a unique number and asks if the + +sequence overlaps any data already in the database (reported by "auto + +assemble"). + +If it does not, entry is complete. +If it does overlap the + +dialogue +continues with the program asking if the gel readings overlaps "in the + +normal sense", if not it will automatically complement the sequence. + +Then supply the number of the contig the gel reading overlaps (as + +reported by "auto assemble"). +.para +Overlaps are divided into two types: those for which the new gel reading + +protrudes from the left end of the contig it overlaps, and those for which + +it does not. The program asks about this with the question "Left end of +gel +reading is inside contig". If this is true the program will go on to ask for + +the position in the contig of the left end of the new gel reading. If it is + not +true the program will ask for the position in the new gel reading of the + +left end of the contig. +.para +Once this is completed the program will display the first 50 bases of + +the overlap. +The gel readings in the contig and their consensus are displayed with the + +new gel reading underneath. The mismatches are shown by *'s on the +next +line down. +For example: +.lit + + + 60 70 80 90 100 + -6 HINW.010 CACAAGCGAGCGAGTGGGGCACGGTGACGTGGTCACGCCGCGGACACGTC + -3 HINW.007 GGCACA*GTC + CONSENSUS CACAAGCGAGCGAGTGGGGCACGGTGACGTGGTCACGCCG-G-ACACGTC + NEWGEL CACAAGCGAGCGAGAGGGGCACCGTGACGTGGTCACGCCGGGGACACGTC + MISMATCH * * * + 10 20 30 40 50 + +.end lit +.para +The program then needs to know if the position of the left end of the +overlap is correct. + +If it is the user should type return, if not, 1 and the program will ask for +the +new position and display it. + +.LEFT MARGIN2 +The program now offers a number of options to allow the + user to align the new gel reading +correctly over its whole length with + the data already in the contig. It is important that + sufficient edits are made to the new gel reading +or the sequences in the + contig at this stage to get the alignment correct, because once + entry is completed, the alignment is fixed and cannot easily be + changed (see "alter relationships"). + Alignment can be achieved +by making + insertions or deletions but deletion of data requires the + original gels to be checked. For this reason at entry we + usually make only insertions to achieve alignment. We use X or + asterisks (*) as padding characters to achieve alignment and + so can, if required, + distinguish padding characters from characters assigned from + reading gels. +.LEFT MARGIN2 +.para +The options available are: +.lit + ? = HELP + ! = Give up + 3 = Complete entry + 4 = Edit contig + 5 = Display overlap + 6 = Edit new gel reading + +.end lit + +.sk1 +.para +1. HELP gives this information. +.para +2. Give up allows users to change their minds about entering the new gel +reading. The program will ask the user to +confirm this choice. +.para +3. Complete entry is the command to add the new gel reading to the +contig. The +program updates the relationships accordingly. The user is asked to +confirm +this command. +.para +4. Edit contig gives the user access to a simple editor that allows +insertions, deletions and changes to be made to the contig. The editor +maintains alignments by making the same number of insertions or +deletions +in all sequences covering the edit position. +The program + protects the user by allowing edits only within + the region of overlap. +.para +5. Display allows display of the region of overlap only. This + is defined by the relative positions in the contig. The + default is the whole of the region of overlap. +.para +6. Edit new gel reading allows the new gel reading to be edited using a +simple editor. +.left margin1 +@23. TX 3 @ Complement a contig +.LEFT MARGIN2 +.PARA + This function will complement and reverse all of the gel +readings in a + contig. It automatically reverses and complements each gel + reading sequence, reorders left and right neighbours, recalculates +relative + positions and changes each strandedness. +.PARA + The only user input required is to identify the contig to + complement by the number or name of a gel reading it contains. +DO NOT KILL THE +PROGRAM DURING THIS STEP! +.left margin1 +@22. TX 3 @ Join contigs +.LEFT MARGIN2 +.PARA +This function joins contigs interactively using a mouse driven editor. +The operation of this editor is very similar to the Contig Editor +described in "@4 Edit". + +.para +It allows the +user to align the ends of the two contigs by editing each +contig separately. It is important that the alignment achieved is +correct because once the join is completed the alignment is fixed. +The program needs to know which two contigs to join. +.para +First specify which two contigs are to be joined. +The user should identify the two +contigs. First the left contig and then the right. +The program checks that the two contig numbers are different (it will not +allow circles to be formed!) +.para +The Join Editor consists of two Contig Editors in between which is sandwiched +a disagreement box. This disagreement box shows exclamation marks to +denote mismatches between the two consensuses. +.para +For example, the display will look something like this: +.lit + + 1460 1470 1480 1490 1500 + 56 HINW.100 TCT*GAGCAGTGTGGGCGCTG*CCGG + 33 HINW.300 TCT*GAGCAGTGTGGGCGCTGC*CGGGCTCGGAGGG + -25 HINW.090 TCT*GAGCAGTGTGGGCG*T*G*CGGGCTCGGAGGG + 19 HINW.123 TCTCGAGCAGTGTGGGCGCTG**CGGGCTCGGAGGGCATGAAGTAGAGCG + CONSENSUS TCTCGAGCAGTGTGGGCGCTG-CCGGGCTCGGAGGGCATGAAGTAGAGCG + MISMATCH ! !!!!!! + 10 20 30 40 50 + -6 HINW.010 TCTCGAGCAGTGTGGGCGCTGCCCGGGCTCGGAGGGCATGAAGTTAGAGC + -3 HINW.007 TGGGCGCTGCCCGGGCTCGGAGGGCATGAAGT*AGAGC + -5 HINW.009 GCTCGGAGGGCATGAAGT*AGAGC + CONSENSUS TCTCGAGCAGTGTGGGCGCTGCCCGGGCTCGGAGGGCATGAAGTTAGAGC + +.END LIT +.para +.para +The best strategy for joining is to +identify the exact position of overlap. This is defined as +the position in the left contig that the leftmost character of the right +contig overlaps. +The overlap must be of at least one character. +Use the scroll bar and the scroll buttons (`<<',`<',`>',and`>>') +for positioning the relative positions of the two contigs. +.para +The join position can be fixed in position +by pressing the `lock' button at the top of the Join Editor. +Locking allows the two contigs to be scrolled as one when using the scroll bar +and buttons, the left ends always in the same position relative to each +other. +.para +Once locked, it is best to proceed to the right along the contigs, inserting +padding characters (`*') into the consensuses to minimise the +disagreements. +.para +It is essential that the user aligns the two contigs throughout the whole +region of overlap before completing the join because it is only at this +stage that the two contigs can be edited independently. Once the join is +completed the alignment can only be altered using the routines supplied +by "alter relationships". +.para +The join can be completed by pressing the `Leave Editor' button. The +percentage mismatch is displayed, and the user is required to confirm that +they want to perform the join. +.left margin1 +@24. TX 1 @ Copy the database +.LEFT MARGIN2 +.PARA +Used to make a copy of the database. If required the database size can be + +altered using this option. The "version" of a database is encoded as the + +last letter in the names of the five files that contain the database. + +.para +Supply a "version" number (the default is version 1), and if required + +select a new size for the database. The size of a database is the number + of +lines of information it can hold. It needs a line for each gel reading and + +another for each contig. +.left margin1 +@19. TX 1 @ Check database +.LEFT MARGIN2 +.para +Used to perform a check on the logical consistency of the + database. No user intervention is required. +.para + The following relationships are checked: +.LEFT MARGIN2 + 1. If gel reading A thinks gel reading B is its left + neighbour + +does B think A is + its right neighbour? + The error message is +.left margin2 +"Hand holding problem for gel reading A" +.left margin2 +followed by the + gel descriptor lines for gel readings A and B. +.LEFT MARGIN2 + 2. Are there any contig lines with no left or right +end gel readings? + The error message is +.left margin2 +"Bad contig line number A" +.LEFT MARGIN2 + 3. Do the gel readings that are described as left ends on +contig + lines agree that they are left ends? + The error message is +.left margin2 +"The end gel readings of contig A have outward neighbours" +.LEFT MARGIN2 + 4. Are there gel readings that are in more than one contig? + The error message is +.left margin2 +" Gel number A is used N times" +.LEFT MARGIN2 + 5. Are there gel readings that are not in any contig? + The error message is +.left margin2 +" Gel number A is not used" +.LEFT MARGIN2 + 6. Do the relative positions of gel readings agree with +their + position as defined by left and right neighbourliness? + The error message is +.left margin2 +" Gel number A with position X is left neighbour of gel number B with +position Y" +.LEFT MARGIN2 + 7. Are there any loops in contigs? If so no further + checking is done. + The error message is +.left margin2 +" Loop in contig n no further checking done, but gel reading numbers follow" +.left margin2 + The + program then prints the gel reading numbers in the looped +contig up +to + the start of the loop. +.LEFT MARGIN2 +8. Are there any contigs of length <1? The error message is +.left margin2 +" The contig on line +number x has zero length" +.LEFT MARGIN2 +9. Are there any gel readings (used in only one contig) that have zero + +length? The error +message is +.left margin2 +" Gel number N has zero length" +.left margin2 +Note that "auto assemble" also uses this logical consistency check and + will +only tolerate a "Gel number N + is not used" error. Any other error will cause it to + +give up. + +.left margin1 +@29. TX 1 @ Examine quality +.LEFT MARGIN2 +.para +Analyses the quality of the data in a contig. It reports on the proportion + +of the consensus that is "well determined" and will display a sequence of + +symbols that indicate the quality of the consensus at each position. + +.para +Identify the contig to analyse, and the section of interest. The current + +consensus calculation cutoff score will be used to decide if each position +is +"well determined". In general the quality of a reading deteriorates along +the length of the gel and so it is also possible to use a length cutoff for +the quality calculation. Only the data from the first section of each reading +will be included in the quality calcualtion. The length is altered under +"set parameters" and is initially set to the maximum reading length. +A summary showing the percentage of the consensus +that falls into each category of quality is shown. Choose whether or not to +have the quality codes for each position of the consensus displayed. +They can be displayed as either graphics or text. +.para +The quality of the data depends on the number of times it has been + +sequenced and the particular uncertainty codes used in each gel + +reading. This function divides the data into five categories, assigning + +each +a symbol or code: +.LEFT MARGIN2 + 1. Well determined on both strands and they agree. code=0 +.LEFT MARGIN2 + 2. Well determined on the plus strand only. code=1 +.LEFT MARGIN2 + 3. Well determined on the minus strand only. code=2 +.LEFT MARGIN2 + 4. Not well determined on either strand. code=3 +.LEFT MARGIN2 + 5. Well determined on both strands but they disagree. code=4 +.LEFT MARGIN2 + A position is "well determined" if it is assigned one of the symbols +A,C,G,T when the algorithm described in the section "calculate a +consensus". +The calculation is performed +separately for each strand. +.para +If the user chooses to have the data displayed graphically the following +scheme is used. A rectangular box is drawn so that the x coordinate +represents the length of the contig. The box is notionally +divided vertically into +5 possible levels which are given the y values: -2,-1,0,1,2. +The quality codes attributed to each base position are plotted as +rectangles. +Each rectangle represents a region in +which the quality codes are identical, so a single base having a different +code from its immediate neighbours will appear as a very narrow rectangle. +.lit + + Rectangle bottom and top y values + + Quality 0 rectangle from 0 to 0 + Quality 1 rectangle from 0 to 1 + Quality 2 rectangle from 0 to -1 + Quality 3 rectangle from -1 to 1 + Quality 4 rectangle from -2 to 2 +.end lit +.para +Obviously a single line at the midheight shows a perfect sequence. +.para +Typical dialogue is shown below. +.lit + + 41.47% OK on both strands and they agree(0) + 55.48% OK on plus strand only(1) + 2.08% OK on minus strand only(2) + 0.97% Bad on both strands(3) + 0.00% OK on both strands but they disagree(4) + ? (y/n) (y) Show sequence of codes + + 10 20 30 40 50 + 1111111111 1111111111 1111111111 1111111111 1111111111 + + 60 70 80 90 100 + 1111111111 1111111111 1111111111 3111111111 1111111111 + + 110 120 130 140 150 + 1111111111 1111131111 1111111111 1111111111 1111111111 + + 160 170 180 190 200 + 1111111111 1111111111 1111111111 1111111111 1111111133 + + 210 220 230 240 250 + 1311111111 1111111111 1111111110 0000000000 0000220000 + + 260 270 280 290 300 + 0000000000 0020000000 2200000202 0002000000 0000222200 + +.end lit +.left margin1 +@26. TX 3 @ Alter relationships +.LEFT MARGIN2 +.para +Used to make what are normally illegal changes to the database. That is + +the normal checks are not done and any item in the database can be +changed independently of all others. Users need to know what they are + +doing because it is very easy to make a horrible mess. Always start by + +making a copy! +.para +By using the options here users can edit individual gel readings in contigs, +move one section of a contig relative to another, break contigs, remove +contigs, remove gel readings, etc. To give flexibility most + of the commands do only one thing. This means that several commands +may +have to be executed to complete any change. At the end of this help +section +there are notes on removing gel readings from the database. +.para +The following options are offered: +.lit + + Cancel + Line change + Edit single gel reading + Delete contig + Shift + Move gel reading + Rename gel reading + Break a contig + Alter raw data parameters + +.end lit +.left margin2 +1. QUIT returns to the main options of SAP. +.left margin2 + +2. Line change +.left margin2 + allows the user to change the contents of any line in the + +file of relationships. The line is selected by number, the + + program prints the current line and prompts for the new line. + +.left margin2 +3. Edit +.left margin2 +allows the user to edit an individual gel reading + independently of any others it may be related to. The edit +positions are relative to + the contig. The effect of this editing on the length of the + gel reading is taken care of but, if it changes the length of + a contig, + or its relationship to others, this must be accounted for (if + necessary) by use of the "line change" function. + +.left margin2 +4. Delete contig +.left margin2 +is a function that deletes a contig line by moving down + all the contig lines above by one position. It prompts only + for the line to delete. It does not delete any of the gel +readings + or gel reading +lines for the deleted contig but it does reduce the + number of contigs on line IDBSIZ by 1. + +.left margin2 +5. Shift +.left margin2 + allows the user to change all the relative positions of a + set of neighbouring gel +readings by some fixed value, i.e. it will + shift related gel readings + either left or right. It can therefore + be used to change the alignment of the gel +readings in a contig + or as part of the process of breaking a contig into two parts + (see below). It prompts for the number of the first gel +reading to + shift and then for the distance to move them (Note a + negative value will move the gel readings + left and a positive value + right). It then chains rightwards (ie follows right + neighbours) and shifts each gel +reading, in turn, up to the end + of the contig. (This means that only those gel readings + from the first + to shift to the rightmost are moved). It updates the length of + the contig accordingly. + +.left margin2 +6. Move gel reading +.left margin2 + is a function to renumber a gel reading. It moves all the information + about a gel +reading on to another line. The user must specify the +number + of the gel reading +to move and the number of the line to place it. It + takes care of all the relationships. Of course gel +readings must not be + moved to lines occupied by other gel +readings! It can be used as part + of the process of removing a gel +reading from the database (see below). + +.left margin2 +7. Rename gel reading +.left margin2 + is a function that is used to rename the archive names of + gel +readings in the database; it only changes the name in the .ARN + file of the database. + +.sk1 +.LEFT MARGIN2 +8. Break contig +.LEFT MARGIN2 +.PARA +Occasionally it is necessary to break a contig into two parts and this can be +achieved using this option. The program needs only the number of a gel +reading. This is the gel reading that will become a left end after the +break. That +is, the break is made between this gel +reading and its left neighbour. A new contig +line is created so ensure that there is sufficient space in the database. +.left margin2 +Removing gel readings from contigs +.left margin2 +.PARA +Gel +readings can be removed from contigs if they are not essential for holding the +contig together (ie are not the only gel reading covering a particular region). +Suppose the gel reading to remove is gel number +b with left neighbour a and right +neighbour c. +Using "line change" change the right neighbour of a to c, and the left +neighbour of c to a. To tidy things up: suppose there are x gel +readings in the +database; then, using "move gel reading" move gel x to line b; then, using +"line change" +decrease the number of gel +readings in the database (stored in the last line) by 1. +.sk1 +.LEFT MARGIN2 +8. Alter raw data parameters +.LEFT MARGIN2 +.PARA +Allows the user to edit the individual raw data parameters, such as +the left and right cutoff lengths and the name of the machine readable trace +file. +The user must specify the gel line to modify, and provide new values for +the length of the raw sequence including cutoff lengths, the left cutoff position, the length of the original working sequence, the machine type, and the name +of the raw data file, where these values change. +.left margin1 +@27. TX 1 @ Set display parameters +.LEFT MARGIN2 +.para +Used to redefine the parameters that control the cutoff employed by the + +consensus calculation and quality examiner, the maximum length of each +reading to include in the quality calculation, the line length used by + +the display function, the text window length used by the graphics +options, and the graphics window length used by the graphics options. +.para +The default cutoff score is 75%. The default line length is 50 characters. +For protein sequences the cutoff is always 100%. +.para +The text window used by the graphics options controls the amount of +sequence listed at the crosshair position. The graphics window controls the +"zoom" function. Both these windows are defined as the number of bases that +should be shown, to both left and right of the crosshair. +.left margin1 +@30. TX 3 @ Auto edit a contig +.left margin2 +.para +This function automatically changes characters in gel readings to make + +them agree with the consensus sequence. If employed as is intended, use + +of this function is not a criminal activity but a method that saves a large + +amount of work. All characters changed by the auto editor will appear in + +the gel readings as lowercase letters. The current consensus calculation +cutoff score is used. +.para +Identify the contig and the section to edit. The program will display a + +summary of changes made. Note that it is important to understand both + +what the auto editor does and the order in which it does it. Before + +employing the auto editor users should note all the corrections that they +require, so that after it has been used the corrections can be checked. + +.para + The +general strategy employed when collecting shotgun sequence data is to let +the contigs get fairly deep, to get a printout of a contig, +check problems against the +films, note corrections on the printout, and +make the changes using an interactive editor. +In general the consensus is correct except for places where padding +characters have been used to accommodate a single gel with an extra +character, or where the consensus is dash. The important point for the +auto +editor is that +most edits simply make the +gel readings conform to the consensus, or remove columns of pads. +.para +The new editor does the following. +.para +1) calculates a consensus for the contig (or part of a contig) to be +edited, and then uses this consensus to direct the editing of the contig +in 3 stages +.para +2) stage 1: find and correct all places where, if the order of two adjacent +characters is swapped, they will both agree with the consensus (given +that +they did not match the consensus before). These corrections are termed +"transpositions" +.para +3) stage 2: find and correct all places where there is a definite consensus +but the gel reading has a different character. These corrections are +termed +"changes". +.para +4) stage 3: delete all positions in which padding is the consensus. These +corrections are termed "deletions". +.para +All changed characters are shown in lowercase letters so it will be +obvious which +characters have been assigned by the program (except for deletions). The +number of each type of correction will be displayed. + +.LEFT MARGIN1 +@10. TX 2 @Clear graphics +.LEFT MARGIN2 +.para + Clears graphics from the screen. +.left margin1 +@11. TX 2 @Clear text +.LEFT MARGIN1 +.para + Clears text from the screen. +.left margin1 +@12. TX 2 @Draw a ruler. +.LEFT MARGIN2 +.para +This option +allows the user to draw a ruler or scale along the x axis of the screen to +help identify the coordinates of points of interest. The user can define +the position of the first base to be marked (for example if the active +region is 1501 to 8000, the user might wish to mark every 1000th base +starting at either 1501 or 2000 - it depends if the user wishes to treat +the active region as an independent unit with its own numbering starting +at +its left edge, or as part of the whole sequence). The user can also define +the separation of the ticks on the scale and their height. If required the +labelling routine can be used to add numbers to the ticks. +.left margin1 +@14. TX 2 @Reposition plots +.LEFT MARGIN2 +.para +The positions of each of the plots is defined relative to a users drawing +board which has size 1-10,000 in x and 1-10,000 in y. +Plots for +each option are drawn in a window defined by x0,y0 and xlength,ylength. +Where x0,y0 is the position of the bottom left hand corner of the window, + and xlength is the width of the window and ylength the +height of the window. +.lit + --------------------------------------------------------- 10,000 + 1 1 + 1 -------------------------------------- ^ 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 1 1 ylength 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 -------------------------------------- v 1 + 1 x0,y0^ 1 + 1 <---------------xlength--------------> 1 + --------------------------------------------------------- 1 + 1 10,000 + +.end lit +All values are in drawing board units (i.e. 1-10,000, 1-10,000). +The default window positions are read from a file "ANALMARG" when the +program is started. Users can have their own file if required. +As all the plots start +at the same position in x and have the same width, x0 and xlength are the +same for all options. Generally users will only want to change the start +level of the window y0 and its height ylength. + This option +allows users to change window positions whilst running the program. +The routine prompts first for the number of the option that the users +wishes +to reposition; then for the y start and height; then for the x start and +length. Note that changes to the x values affect all options. If the user +types only carriage return for any value it will remain unchanged. +Note that, unlike all the other programs, the boxes used to contain +analytical results (eg plot quality) should not be made to overlap one +another, as the function of the crosshair routine depends on which box the +crosshair is in! +.LEFT MARGIN1 +@15. TX 2 @Label a diagram +.LEFT MARGIN2 +.para +This routine allows users to label any diagrams they have produced. They +are asked to type in a label. When the user types carriage return to finish +typing the label the cross-hair appears on the screen. The user can +position it anywhere on the screen. If the user types R (for right justify) +the label will be +written on the diagram with its right end at the cross-hair position. +If the user types L (for left justify) the label will be written on the +diagram with its left end at the cross hair position. +The +cross-hair will then immediately reappear. The user may put the same +label +on another part of the diagram as before or if he hits the space bar he +will be asked if he wishes to type in another label. +.para +Typical dialogue follows. +.lit +? Menu or option number=15 +Type label then drive cross hair to left or right end +of label position then hit "L" to write label left +justified or "R" to write label right justified or +the space bar to quit + + +? Label=delta gene + + missing graphics + +? Label= + +.end lit +.left margin1 +@16. TX 2 @Display a map +.LEFT MARGIN2 +.para +This draws a map +of any sequence features selected by the user. +These features may be protein coding regions (CDS), tRNA genes (TRNA), +promoter positions (PRM), etc. Users may define their own feature table +key +names. For example I find it convenient to split CDS lines into CDS1, +CDS2 +and CDS3 each of which contains only those sequences that code in the +reading frames 1, 2 or 3. Then I can plot them at different heights on +the screen ( suitable heights can be determined by using the cross-hair). +The coordinates must be stored in a file in the format of an EMBL feature +table. +.para +Typical dialogue follows. +.lit +? Menu or option number=16 + Display a map using an EMBL feature table file +? map file name=hsegl1.ft +? feature code(e.g. CDS) =CDS +X 1 + strand + 2 - strand + 3 both strands +? 0,1,2,3 = +? level (0-9480) (256) =4000 + + missing graphics + +? feature code(e.g. CDS) = + +.end lit +.left margin1 +@7. TX 1 @Redirect output +.LEFT MARGIN2 +.para +Used to direct output that would normally appear on the screen to a file. +.para +Select redirection of either text or graphics, and +supply the name of the file that the output should be written to. +.para + The results from the next options selected will not appear on the screen +but will be written to the file. When option 7 is selected again +the file will be +closed and output will again appear on the screen. +.left margin1 +@13. TX 2 @Use crosshair +.left margin2 +.para +This option puts a steerable cross on the screen which the user +drives around +by using the arrow keys (or mouse). When the crosshair is +visible a number of options are available if the user types one of a +set of special keyboard characters. Any other characters will cause +an exit from the crosshair option. The special keys are: +.lit + + I = Identify the nearest gel reading + Z = Zoom in + Q = plot Quality + S = display the aligned Sequences at the crosshair position + N = list the Names and Numbers of the sequences at the crosshair +.end lit +.para +In order for any of these special keys to operate, the crosshair +must be in an appropriate display box, and the precise function of +the keys will also depend on which box the crosshair is in. +.para + If the +crosshair is in the "plot all contigs" box, Z will cause a new box to +appear showing all the readings for the nearest contig; Q will give +the same as Z but will also produce an extra box showing the +"quality" plot. +.para + If Z is hit in the "plot single contig" box, the contig will be zoomed +to the current graphics window size. The zoom will be roughly +centred on the crosshair position. Because of this it is possible to +step along a contig by repeatedly zooming with the crosshair near +to one end of the single contig display box. If I is hit the crosshair +must be close to a gel reading line. If Q is hit, the quality plot will +be produced for the region shown in the plot single contig box. In +all cases when the "plot all contigs" box is shown, a vertical line will +bisect the line the represents the relevant contig, at the current +position. +.para +If the crosshair is in the plot quality box only the character "s" will operate +as a special symbol. +.para +The number of bases shown in the N and S options is controlled by +the current graphics text window size, and the size of the zoom +window by the current graphics window size. Both are set by the +parameter setting function of the general menu. +.left margin1 +@33. TX 2 @Plot single contig +.left margin2 +.para +This option produces a schematic of a selected region of a single +contig by drawing a horizontal line to represent each of its gel +readings. The lines show the relative positions of each reading and +also their sense. The plot is divided vertically into two sections by +a line that is identified by an asterisk drawn at each end. All lines +that lie above this line represent readings that are in their original +sense, all lines below show readings that are in the +complementary sense to their original. By use of the crosshair +function the plot can be stepped through and examined in more +detail. See help on crosshair. +.left margin1 +@34. TX 2 @Plot all contigs +.left margin2 +.para +This option produces a schematic of all the contigs in a database. It +does this by drawing a horizontal line to represent each of them. +In order to show the ends of each contig it draws the lines for +contigs at alternate heights: the first at height one, the +second at height two, the third at height one, etc. The order of the +contigs in the display is the same as their order in the database. +By use of the crosshair function the plot can be stepped +through and examined in more detail. See help on crosshair. +.left margin1 +@31. TX 3 @ Type in gel readings +.left margin2 +.para +THIS OPTION IS NO LONGER AVAILABLE IN XDAP. +.para +This option allows gel readings to be typed in at the keyboard. It creates +a separate file for each gel reading and a file of file names for the +batch. The sequences from each batch may be listed when they have all been +entered. Users may choose to employ special keys to identify the 4 bases +A,C,G and T. By default these special keys are N M , . but any other four +characters may be used. If special keys are used the characters are +automatically translated to A C G T before being stored on the disk. + +.left margin1 +@35. TX 1 3 @Find internal joins +.left margin2 +.para +The purpose of this function is to use data already in the database to +find possible joins between contigs. +Joins may have been missed due to poor data or may have not been made +due to repeated sequences. Where appropriate, it may be +possible to find potential +joins by using the data clipped off readings prior to their entry into the +database. +.left margin2 +The database is checked for logical consistency. Supply a minimum initial +match length, a minimum alignment block, the maximum pads per sequence, +the maximum percent mismatch after alignment, the probe length. Choose +if clipped data is to be used, if so define the window size for finding good +data and the number of dashes allowed in the window. Processing will commence. +Most of these values are used in an identical way in the autoassemble +function. The others are defined below. +.left margin2 +The program strategy +.left margin2 +Take the first contig and calculate its consensus. If clipped data is being +used examine all readings that +are in the complementary orientation, and sufficiently near to the contigs left +end, to see if they have good clipped sequence which if present, would +protrude +from the left end of the contig. If found add the longest such sequence to the +left end of the consensus. Do the same for the right end by examining +readings that are in their +original orientation. If any are found add the longest extension to the +right end of +the consensus. Repeat the consensus calculations and extensions +for all contigs hence producing an extended consensus. If clipped data is not +being used simply calculate the consensus for the whole database. Now +look for possible joins by processing the extended consensus in the following +way. Take the last, say 100, bases (termed the "probe length" by the program) +of the rightmost consensus, compare it both +orientations with the extended consensus of all the other contigs. Display +any sufficiently good alignments. Repeat with the left end of the rightmost +contig. Do the same for the ends of all the entended contigs, always only +comparing with the contigs to their left, so that the same matches do not +appear twice. +.left margin2 +Good cliped data is defined by sliding a window of "Window size for good data +scan" bases outwards +along the sequence and stopping when "Maximum number of dashes in scan window" + or more dashes appear in the window. +Note that +it is advisable to have some sort of cutoff because if we simply take all the +data it might be so full of rubbish that we wont find any good matches. For +the same reason it is worth trying the procedure with different cutoffs. An +initial run using no clipped data is also recommended. +Sufficiently good +alignments are defined by criteria equivalent to those used in autoassemble, +however here we only display alignments that pass all tests. +.left margin2 +Bugs +.left margin2 +If a small contig is wholly contained within a larger one, such that its +ends are further than ("Probe length" - "Minimum initial match length") +from the ends of the larger contig, and the consensus for the small +contig lies to the left +of the consensus for large contig, the overlap will not be discovered. (See +the search stratgey). +.left margin2 + All numbering is +relative to base number one in the contig: matches to the left (i.e. in +the clipped data) have negative +positions, matches off the right end of the contig (i.e. in the clipped +data) have positions +greater than that of the contig length. +The convention for reporting the positions of overlaps is as follows: if neither +contig needs to be complemented the positions are as shown. If the program says +"contig x in the - sense" then the positions shown assume contig x has been +complemented. For example in the results given below the positions for the +first overlap are as reported, but those for the second assume that the contig +in the minus sense (i.e. 443) has been complemented. +.lit + + + Possible join between contig 445 in the + sense and contig 405 + Percentage mismatch after alignment = 4.9 + 412 422 432 442 452 462 + 405 TTTCCCGACT GGAAAGCGGG CAGTGAGCGC AACGCAATTA ATGTGAG,TT AGCTCACTCA + ********* * ******** ***** *** ********** ********** ********** + 445 -TTCCCGACT G,AAAGCGGG TAGTGA,CGC AACGCAATTA ATGTGAG-TT AGCTCACTCA + -127 -117 -107 -97 -87 -77 + 472 482 492 502 512 + 405 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT + ********** ********** ********** ********** ** + 445 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT + -67 -57 -47 -37 -27 + Possible join between contig 443 in the - sense and contig 423 + Percentage mismatch after alignment = 10.4 + 64 74 84 94 104 114 + 423 ATCGAAGAAA GAAAAGGAGG AGAAGATGAT TTTAAAAATG AAACG-CGAT GTCAGATGGG + **** ***** ********** ********** ****** ** ***** **** ********* + 443 ATCG,AGAAA GAAAAGGAGG AGAAGATGAT TTTAAA,,TG AAACGACGAT GTCAGATGG, + 3610 3620 3630 3640 3650 3660 + 124 134 144 154 164 + 423 TTG-ATGAAG TAGAAGTAGG AG-AGGTGGA AGAGAAGAGA GTGGGA + *** ****** ********** ** ******* *** ***** ** ** + 443 TTGGATGAAG TAGAAGTAGG AGGAGGTGGA ,GAG,AGAGA GTTGG- + 3670 3680 3690 3700 3710 + + +.end lit +.left margin1 +@ end of help diff --git a/help/GIP.RNO b/help/GIP.RNO new file mode 100644 index 0000000..d72dd68 --- /dev/null +++ b/help/GIP.RNO @@ -0,0 +1,205 @@ +.NPA +.left margin1 +.CENTER +GIP +.LEFT MARGIN1 +.PARA +A digitizer is + a two dimensional surface +which is such that if a special pen is pressed onto it, the pens +coordinates can be recorded by a computer. +These coordinates + can be interpreted by a program. +.para +The digitizing device we use works by the pen emitting a high frequency +sound which is picked up by two microphones positioned at the rear of the +working area. The pen position is determined by triangulation and the +digitizing device sends the coordinates to the computer. As no special +surface is required the device can conveniently be positioned on a light +box giving the sequencer an unobscured view of the autoradiographs. +.LEFT MARGIN1 +The digitizer + is called a GRAPHBAR MODEL GP7 made by + Science Accessories Corp, + 970 Kings Highway West, + Southport, + Connecticut 06490, + USA. + +.para + The program uses a menu to allow the user to select commands or + to enter the uncertainty codes for areas of the gel that are + difficult to interpret. A menu is simply a series of boxes drawn on + the digitizing surface that each contain a command or + uncertainty code. When the user puts the pen down in these special + regions the program interprets the coordinates as commands and acts + appropriately. A copy of the menu should have been sent to you. +It should be stuck down on the surface of the +light box in the digitizing area. For convenience it is best to position it +to the right of the digitizing area, but in practice as long as +its top +edge is parallel to the digitizer box, it can be put anywhere in the active +region. +.sk1 +.left margin1 + Entering gel readings using a digitizer +.left margin1 +.para +The autoradiograph should be stuck down on the light box with the lanes +running, as near is as +possible, at right angles to the digitizer. To read +an autoradiograph placed on the light box +the user need only define the positions of +the four sequencing lanes and the bases + to which they correspond and then use the pen to point to each + successive band progressing up the gel. The program examines the + coordinates of each pen position to see in which of the four +lanes + it lies and assigns the corresponding base to be stored in the + computer. Each time the pen tip is depressed to point to a position + on the surface of the digitizer the program sounds the bell on the + terminal (a different sound for each of the four bases on the +microcomputer version of the program) + to indicate to the user that a point has been recorded. As + the sequence is read the program displays it on the screen. + + +.para + The program uses a menu +to allow the user to select commands or + to enter the uncertainty codes for areas of the gel that are + difficult to interpret. A menu is simply a series of boxes drawn on + the digitizing surface that each contain a command or + uncertainty code. When the user puts the pen down in these special + regions the program interprets the coordinates as commands and acts + appropriately. As well as the uncertainty codes + A,C,G,T,1,2,3,4,B,D,H,V,R,Y,X,-,5,6,7,8 the following commands are + included in the menu: DELETE removes the last character from +the sequence; + RESET allows the lane centres to be redefined; +START means begin the next + stage of the procedure; STOP means stop the current stage in the + procedure; CONFIRM means confirm that the last command or set of + coordinates are correct. +.para +The digitizing device also has a menu of its own. This lies in a two inch wide +strip immediately in front of the digitizing box. Pen positions within this +two inch strip are interpretted as commands to the digitizer and are not +sent to the GIP program. In general the only time users will need to use +the device menu is when they tell GIP where the program menu lies in the +digitizing area. This is done by first hitting ORIGIN in the device menu +and then hitting the bottom left hand corner of the program menu. The +program menu can hence be positioned anywhere in the active region but +should be arranged parallel to the digitizer. +.para +The user should try to hit the bands as near as possible to the centre of +the lanes because the program tracks the lanes up the film using the pen +positions. By using this tracking strategy the user only has to define the +centres of the bottom of the lanes before starting to read the film. The +program can correctly follow quite curved lanes and constantly checks that +its lane centre coordinates look sensible. If the lane centres appear to be +getting too close the program stops responding to the pen positions of +bands and hence does not ring the bell. If this occurs users must hit the +reset box in the menu and the program will request them to redefine the +lane centres at the current reading position. Then they can continue +reading. As a further safeguard the program will only respond to pen +positions either in the menu or very close to the current reading position. +.sk1 +.left margin1 + Running the gel reading program +.left margin1 +The autoradiograph should be firmly stuck down on the light box and the +program started by typing GIP. It will ask the first question. +.left margin2 +" ? FILE OF FILE NAMES=" +.left margin2 +Type the name for the file of file names and then follow the instructions. +.left margin2 +" HIT DIGITIZER MENU ORIGIN" +.left margin2 +" THEN PROGRAM MENU ORIGIN" +.left margin2 +" THEN HIT START IN PROGRAM MENU" +.left margin2 +If the bell does not sound after you hit start try hitting metric in the +device menu (the program uses metric units, and some digitizers are set to +default to use inches; hitting metric switches between the two). +.left margin2 +After the bell has sounded the program will give the default lane order. +.left margin2 +" LANE ORDER IS T C A G" +.left margin2 +" IF CORRECT HIT CONFIRM, ELSE HIT RESET" +.left margin2 +If the lane order, reading from left to right is correct hit confirm in the +program menu. If you are using a different order hit reset and you will be +asked to define the lane order from left to right using the program menu +(as follows). +.left margin2 +" DEFINE LANE ORDER (LEFT TO RIGHT) USING MENU" +.left margin2 +Hit the boxes in the menu that contain the symbols A,C,G,T in the +left-right order of the lanes. The program will respond with the lane order +as above and ask for confirmation. When this is received, the next task is +to define the start positions of the next four lanes. +.left margin2 +" HIT START, THEN HIT (LEFT TO RIGHT)" +.left margin2 +" THE START POSITIONS FOR THE NEXT FOUR LANES" +.left margin2 +Hit the centres of the four lanes at a height level with the first band +that is going to be read. The program will report the mean lane separations +and asks for confirmation that they are correct. +.left margin2 +" MEAN LANE SEPARATION IS XX" +.left margin2 +" HIT CONFIRM TO CONTINUE" +.left margin2 +Users will become familiar with the values from their films and will spot +any unusual numbers. +Asking for confirmation allows users to try again if they had made a +mistake, but generally the lane separation values can be ignored. +Hit confirm, and the program will give the message +.left margin2 +" HIT START WHEN READY TO BEGIN READING" +.left margin2 +Hit start and the program will give the message +.left margin2 +" HIT BANDS, UNCERTAINTY CODES, RESET OR STOP" +.left margin2 +Hit the bands, interpretting the sequence progressing +up the film. If necessary use the uncertainty codes. If the pen stops +responding hit reset and follow the instructions as above. When the +sequence becomes unreadable hit stop and the program will ask for a file +name for the gel reading just read. +.left margin2 +" ? FILE NAME FOR THIS GEL READING=" +.left margin2 +Type the file name observing the rules about legal gel readings names. +The program will ask if you wish +to read another sequence. +.left margin2 +" TO ENTER ANOTHER GEL READING TYPE 1" +.left margin2 +To enter another type 1 and you will be back to the step of defining the +lane order. Typing anything else will stop the program. +.left margin1 +.sk1 +Running the microcomputer version of the gel reading program +.left margin1 +The microcomputer version of GIP is slightly different and is called +GIPB. The BBC micro +does not have the capacity to process the gel readings beyond the reading +stage. +This means that users of this program +would need to transfer their gel readings from the micro to another machine +using a terminal emmulator. Transferring many files is tedious and so the +microcomputer version of the gel reading program stores all the gel +readings for each run of the program in a single file. This special +file contains both sequences and file names and can be moved in a single +transfer to another machine. Once on the other machine the single file must +be split into separate gel reading files and a file of file names. This is +done using the program BSPLIT. As far as using the microcomputer version +of GIP, the only difference is that the first file name the program +requests is not a file of file names, but a name for the single file to +contain all the gel readings and their names. diff --git a/help/MEP.RNO b/help/MEP.RNO new file mode 100644 index 0000000..b3d575e --- /dev/null +++ b/help/MEP.RNO @@ -0,0 +1,859 @@ +.NPA +.SP 1 +.left margin1 +@-1. TX 0 @General +.sp +@-2. T 0 @Screen control +.sp +@-2. X 0 @Screen +.sp +@-3. TX 0 @Dictionary analysis +.sp +@0. TX -1 @MEP +.left margin2 +.para +This is a program for analysing families of nucleotide sequences in order +to find common motifs and potential binding sites. +The ideas in this program were described in Staden, R. "Methods +for discovering novel motifs in nucleic acid sequences". +Computer Applications in the Biosciences, 5, 293-298, (1989). +.PARA +The program can read +sequences stored in either of two formats: 1) all sequences aligned in a +single file; 2) all sequences in separate files and accessed through a file +of file names. +.PARA +The program contains functions that can answer several questions +about a set of sequences: +.SK1 +.left margin2 + Which words are most common? +.left margin2 + Which words occur in the most sequences? +.left margin2 + Which words contain the most information? +.left margin2 + Which words occur in equivalent positions in the sequences? +.left margin2 + Which words are inverted repeats? +.left margin2 + Which words occur on both strands of the sequences? +.left margin2 + Where are the inverted repeats? +.left margin2 + Where are the fuzzy words? +.para + Most of the program is +concerned with analysing +what it terms "fuzzy +words" within the set of sequences. The analysis is explained +below. Note that the standard version of the programs is limited +to words of maximum length 8 letters, and a maximum fuzziness +of 2. +.para +The following analyses (preceded by their option numbers) are included: +.lit + ? = Help + ! = Quit + 3 = Read new sequences + 4 = Redefine active region + 5 = List the sequences + 6 = List text file + 7 = Direct output to disk + 10 = Clear graphics + 11 = Clear text + 12 = Draw ruler + 13 = Use cross hair + 14 = Reset margins + 15 = Label diagram + 16 = Draw map + 17 = Search for strings + 18 = Set strand + 19 = Set composition + 20 = Set word length + 21 = Set number of mismatches + 22 = Show settings + 23 = Make dictionary Dw + 24 = Make dictionary Ds + 25 = Make fuzzy dictionary Dm from Dw + 26 = Make fuzzy dictionary Dm from Ds + 27 = Make fuzzy dictionary Dh from Dm + 28 = Examine fuzzy dictionary Dm + 29 = Examine fuzzy dictionary Dh + 30 = Examine words in Dm + 31 = Examine words in Dh + 32 = Save or restore a dictionary + 33 = Find inverted repeats +.end lit +.para +Some of these methods produce graphical + results +and so the +program is generally used from a graphics terminal (a vdu on which lines +and points can be drawn as well as characters). +.para +.LEFT MARGIN2 +The positions of each of the plots is defined relative to a users drawing +board which has size 1-10,000 in x and 1-10,000 in y. +Plots for +each option are drawn in a window defined by x0,y0 and xlength,ylength. +Where x0,y0 is the position of the bottom left hand corner of the window, + and xlength is the width of the window and ylength the +height of the window. +.lit + --------------------------------------------------------- 10,000 + 1 1 + 1 -------------------------------------- ^ 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 1 1 ylength 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 -------------------------------------- v 1 + 1 x0,y0^ 1 + 1 <---------------xlength--------------> 1 + --------------------------------------------------------- 1 + 1 10,000 + +.end lit +All values are in drawing board units (i.e. 1-10,000, 1-10,000). +The default window positions are read from a file "MEPMARG" when the +program is started. Users can have their own file if required. +.para +The options for the program are accessed from 3 main menus: general, screen +control and dictionary analylsis. +Both menus and options are selected by number. +.para +The most important and novel part of the program is its use of "fuzzy +dictionaries" and an information theory measure, to help show the most +interesting motifs. + + Central to the method is the idea of a fuzzy dictionary of word +frequencies. A dictionary of word frequencies is an ordered list of +all the words in the sequences and a count of the number of times +that they occur. A fuzzy dictionary is an equivalent list but which +contains instead, for each word, a count of the number of times +similar words occur in the sequences. We term words that are +similar "relations". The fuzziness is defined by the number of +letters in a word that are allowed to be different. So if we had a +fuzziness of 1 we allow 1 letter to be different. For example, with +a fuzziness of 1, the entry in the fuzzy dictionary for the word +TTTTTT would contain a count of the numbers of times TTTTTT +occured plus the number of times all words differing by exactly +one letter from TTTTTT occured. +.para + Once the fuzzy dictionary has been created we can examine it in +several ways to find candidate control sequences. The simplest +question we can ask is which word in the dictionary is the most +common. Sometimes this simple criterion of "most common" may +be adequate to discover a new motif but in general we would not +expect it to be sufficient. For example some words will be common +simply because of a base composition bias in the sequences being +analysed. In addition a word can be the most frequent and yet not +be "well defined". This last point is best explained by an example. +.para + Suppose we were looking at two letter words and allowing one +mismatch, and that there were 10 occurences of TT and 5 of AC. +We could align the 10 words that were one letter different from TT +and the 5 that were related to AC. Then we could count the +number of times each base occured in each position for each of +these two sets of words. Suppose we got the two base frequency +tables shown below. +.lit + TT AC + T 6 4 T 1 0 + C 1 3 C 0 4 + A 1 2 A 4 1 + G 2 1 G 0 0 + +.end lit +These tables show that although TT occurs (with one letter +mismatch) more often than AC, the ratio of base frequencies for +AC at 4/5, 4/5 is higher than those for TT at 6/10, 4/10. Hence we +would say that AC was better defined than TT. +Expressing this another way we would say that the definition of AC +contained more information than that for TT. The program +calculates the information content in a way that takes into account +both the sequence composition and the level of definition of the +motif. +.para +Definitions + +.para +Here we deal only with the dictionary analysis. +Suppose we are dealing with a set of +sequences and are examining them for words that are six +characters in length. + +.para +Dictionary Dw contains a count of the number of times each word +occurs in the set of sequences. For example the entry for TTTTTT +contains a value equal to the number of times the word TTTTTT +occurs in the set of sequences. + +.para +Dictionary Ds contains a count of the number of different sequences in +which each word occurs. For example if the entry for word TTTTTT +contains the value 10, it denotes that the word TTTTTT occurs in ten +different sequences. Unlike Dw it only counts words once for each +sequence. For example if we had a set of 100 sequences, the maximum +possible value that Ds could take is 100, and this would only happen if +a word occurred in every sequence. However for the same set of +sequences, Dw could contain values greater than 100, and this would +show that a word had occurred more than once in at least one +sequence. + +.para +From either of the two dictionaries Dw or Ds we can calculate a fuzzy +dictionary Dm. For each word, the entry in the fuzzy dictionary Dm +contains the sum of the dictionary values (taken from either Dw or Ds) +for all words that differ from it by up to m letters. For example if m=2 +the entry for TTTTTT contains the number of times that TTTTTT +occurs in the dictionary, plus the counts for all words that differ from +TTTTTT by 1 or 2 letters. +Obviously the interpretation of the values in Dm depends on which of +the two dictionaries Dw or Ds they were derived from. When derived +from Dw the entry for any word in Dm gives the total number of +times it, and its relations, occur in the set of sequences. When derived +from Ds the entry for any word in Dm gives the total number of +different sequences that contain a word and each of its relations. + +.para +Finally, from fuzzy dictionary Dm we can derive fuzzy dictionary Dh. +All entries in Dh are zero except for the word(s), within each set of +relations, that are most frequent. For example if TTTTTT occurred 20 +times but had a relation that occurred more often, then the entry for +TTTTTT would be zero. However if TTTTTT did not have a more +frequently occurring relation, then the entry for TTTTTT would +contain the value 20. + +.LEFT MARGIN1 +@1. T 0 @Help +.LEFT MARGIN2 +.para +This option gives online help. The user should select option numbers and +the current documentation will be given. Note that option 0 gives an +introduction to the program, and that ? will get help from anywhere in +the +program. +The following analyses (preceded by their option numbers) are included: +.lit + ? = Help + ! = Quit + 3 = Read new sequences + 4 = Redefine active region + 5 = List the sequences + 6 = List text file + 7 = Direct output to disk + 10 = Clear graphics + 11 = Clear text + 12 = Draw ruler + 13 = Use cross hair + 14 = Reset margins + 15 = Label diagram + 16 = Draw map + 17 = Search for strings + 18 = Set strand + 19 = Set composition + 20 = Set word length + 21 = Set number of mismatches + 22 = Show settings + 23 = Make dictionary Dw + 24 = Make dictionary Ds + 25 = Make fuzzy dictionary Dm from Dw + 26 = Make fuzzy dictionary Dm from Ds + 27 = Make fuzzy dictionary Dh from Dm + 28 = Examine fuzzy dictionary Dm + 29 = Examine fuzzy dictionary Dh + 30 = Examine words in Dm + 31 = Examine words in Dh + 32 = Save or restore a dictionary + 33 = Find inverted repeats +.end lit +.left margin1 +@2. T 0 @Quit +.left margin2 +.para +This function stops the program. +.left margin1 +@3. TX 1 @Read a new sequence +.LEFT MARGIN2 +.para +It can read +sequences stored in either of two formats: 1) all sequences aligned in a +single file; 2) all sequences in separate files and accessed through a file +of file names. Typical dialogue follows: +.lit + +X 1 Read file of aligned sequences + 2 Use file of file names +? 0,1,2 = + +? File of aligned sequences=F1 +Number of files 88 + +.end lit +.left margin1 +@4. TX 1 @Define active region +.LEFT MARGIN2 +.para +For its analytic functions +the program always works on a region of the sequence called the active +region. When new sequences are read into the program the active region is +automatically set to start at the beginning of the sequences and go +up to the end of the longest one. +.left margin1 +@5. TX 1 @List a sequence +.LEFT MARGIN2 +.para +The sequence can be listed with line lengths of 50 bases with each sequence +numbered in the order in which they were read. +Output can be directed to a disk file by +first selecting disk output. Typical dialogue follows. +.lit + +? Menu or option number=5 + + 10 20 30 40 50 + 1 TAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCA + 2 CAAATAATCAATGTGGACTTTTCTGCCGTGATTATAGACACTTTTGTTAC + 3 TAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATGCTATGGTTATT + 4 ACTAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATGCTATGGTTA + 5 AGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGA + 6 TAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGC + 7 ACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCG + 8 GGGGCAAGGAGGATGGAAAGAGGTTGCCGTATAAAGAAACTAGAGTCCGT + 9 AGGGGGTGGAGGATTTAAGCCATCTCCTGATGACGCATAGTCAGCCCATC + 10 AAAACGTCATCGCTTGCATTAGAAAGGTTTCTGGCCGACCTTATAACCAT + + 60 + 1 TACCCGTTTTT + 2 GCGTTTTTGT + 3 TCATACCATAAG + 4 TTTCATACC + 5 ATTGTGAGC + 6 TTCCGGCTCG + 7 GAAGAGAGT + 8 TCAGGTGT + 9 ATGAATG + 10 TAATTACG +.end lit +.left margin1 +@6. TX 1 @List a text file +.LEFT MARGIN2 +.para +Allows the user to have a text file displayed on the screen. It will appear +one page at a time. +.left margin1 +@7. TX 1 @Direct output to disk +.LEFT MARGIN2 +.para +Used to direct output that would normally appear on the screen to a file. +.para +Select redirection of either text or graphics, and +supply the name of the file that the output should be written to. +.para + The results from the next options selected will not appear on the screen +but will be written to the file. When option 7 is selected again +the file will be +closed and output will again appear on the screen. +.left margin1 +@10. TX 2 @Clear graphics +.LEFT MARGIN2 +.para + Clears the screen of both text and graphics. +.left margin1 +@11. TX 2 @Clear text +.LEFT MARGIN2 +.para + Clears only text from the screen. +.left margin1 +@12. TX 2 @Draw a ruler +.LEFT MARGIN2 +.para +This option +allows the user to draw a ruler or scale along the x axis of the screen to +help identify the coordinates of points of interest. The user can define +the position of the first amino acid to be marked (for example if the +active +region is 1501 to 8000, the user might wish to mark every 1000th amino +acid +starting at either 1501 or 2000 - it depends if the user wishes to treat +the active region as an independent unit with its own numbering starting +at +its left edge, or as part of the whole sequence). The user can also define +the separation of the ticks on the scale and their height. If required the +labelling routine can be used to add numbers to the ticks. +.left margin1 +@13. TX 2 @Use crosshair +.LEFT MARGIN2 +.para +This function puts +a steerable cross on the screen that can be used to find the +coordinates of points in the sequence. The user can move the cross +around using the directional keys; when he hits the space bar the +program will print out the coordinates of the cross in sequence units and +the option will be exited. +.para +If instead, +you hit a , the position will be displayed but the cross will remain on +the screen. +.para +If a letter s is hit the sequence around the cross hair is displayed and +the cross remains on the screen. +.left margin1 +@14. TX 2 @Reposition plots +.LEFT MARGIN2 +.para +The positions of each of the plots is defined relative to a users drawing +board which has size 1-10,000 in x and 1-10,000 in y. +Plots for +each option are drawn in a window defined by x0,y0 and xlength,ylength. +Where x0,y0 is the position of the bottom left hand corner of the window, + and xlength is the width of the window and ylength the +height of the window. +.lit + --------------------------------------------------------- 10,000 + 1 1 + 1 -------------------------------------- ^ 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 1 1 ylength 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 -------------------------------------- v 1 + 1 x0,y0^ 1 + 1 <---------------xlength--------------> 1 + --------------------------------------------------------- 1 + 1 10,000 + +.end lit +All values are in drawing board units (i.e. 1-10,000, 1-10,000). +The default window positions are read from a file "MEPMARG" when the +program is started. Users can have their own file if required. +As all the plots start +at the same position in x and have the same width, x0 and xlength are the +same for all options. Generally users will only want to change the start +level of the window y0 and its height ylength. + This option +allows users to change window positions whilst running the program. +The routine prompts first for the number of the option that the users +wishes +to reposition; then for the y start and height; then for the x start and +length. Note that changes to the x values affect all options. If the user +types only carriage return for any value it will remain unchanged. +The cross-hair can be used to choose suitable heights. +.LEFT MARGIN1 +@15. TX 2 @Label a diagram +.LEFT MARGIN2 +.para +This routine allows users to label any diagrams they have produced. They +are asked to type in a label. When the user types carriage return to finish +typing the label the cross-hair appears on the screen. The user can +position it anywhere on the screen. If the user types R (for right justify) + the label will be +written on the diagram with its right end at the cross-hair position. +If the user types L (for left justify) the label will be written on the +diagram with its left end at the cross hair position. +The +cross-hair will then immediately reappear. The user may put the same +label +on another part of the diagram as before or if he hits the space bar he +will be asked if he wishes to type in another label. +.left margin1 +@16. TX 2 @Display a map +.LEFT MARGIN2 +.para +It is often convenient to plot a map alongside graphed analysis in order +to +indicate features within the sequence. This function allows users to +draw +maps using files arranged in the form of EMBL feature tables. Of course +the +EMBL table are usually only used for nucleic acid sequence annotation +but, +as long as the features are written in the correct format, they can be +employed by this routine. The map is composed of a line representing the +sequence and then further lines denoting the endpoints of each feature +the +user identifies. The user is asked to define height at which the line +representing the sequence should be drawn; then for the feature height; +then for the features to plot. +.left margin1 +@17. TX 1 @Search for strings +.left margin2 +.para +Search for strings +perfoms searches of all the sequences for selected words and +shows which sequences they are found in. The user types in a word and +defines the allowed number of mismatches. The results are listed or +plotted. If listed the display includes the sequence number, the position +in the sequence and the matching string. +The results are plotted in the +following way. The x axis of the plot represents the length of the aligned +sequences and the y direction is divided into sufficient strips to accommodate +each sequence. So if a match is found in the 3rd sequence at a position +equivalent to halfway along the longest of the sequences then a short +vertical line will be drawn at the midpoint of the 3rd strip. If the sequences +are aligned it can be useful if the motifs happen to appear in +related positions. For example see the original publication. Typical +dialogue follows. +.lit + +? Menu or option number=17 +X 1 Plot match positions + 2 Plot histogram of matches +? 0,1,2 = +? Word to search for=TTGACA +? Minimum match (0-6) (6) =5 +? (y/n) (y) Plot results N + 2 35 TAGACA + 5 14 TTTACA + 6 37 TTTACA + 11 14 TAGACA + 14 14 TTGACA + 17 14 GTGACA + 17 22 TTAACA + 20 1 TTGACA +.end lit +.left margin1 +@18. TX 3 @Set strand +.left margin2 +.para +Set strand allows the user to define which strand(s) of the sequences to +analyse: input stand, complement of input, or both. +.left margin1 +@19. TX 3 @Set composition +.left margin2 +.para +Set composition gives the user three choices for setting the composition +of the sequences for use in the calculation of the information content of +words. The user can select the overall composition of the sequences as read, +an even composition, or can type in any other 4 values. +.left margin1 +@20. TX 3 @Set word length +.left margin2 +.para +Set word length sets the length of word for which dictionaries will be made. +.left margin1 +@21. TX 3 @Set number of mismatches +.left margin2 +.para +Set number of mismatches sets the level of fuzziness for the creation of +dictionary Dm. +.left margin1 +@22. TX 3 @Show settings +.left margin2 +.para +Show settings show the current settings for all parameters associated with +dictionary analysis. A typical diaplsy follows: +.lit + ? Menu or option number=22 + Current word length = 6 + Number of mismatches = 1 + Start position = 1 + End position = 63 + Input strand only + Observed composition + Dictionary Dw unmade + Dictionary Ds unmade + Dictionary Dm unmade + Dictionary Dh unmade +.end lit +.left margin1 +@23. TX 3 @Make dictionary Dw +.left margin2 +.para +Make dictionary Dw creates a dictionary that contains a count of the +frequency of occurrence of each word in the collected sequences. +.left margin1 +@24. TX 3 @Make dictionary Ds +.left margin2 +.para +Make dictionary Ds creates a dictionary that contains a count of the +number of different sequences that contain each word. +.left margin1 +@25. TX 3 @Make dictionary Dm from Dw +.left margin2 +.para +Make dictionary Dm from Dw creates a dictionary from dictionary Dw that +contains the frequency of occurrence of each word (say X) in Dw plus the +frequency of occurrence of each word in Dw that differs from X by up to m +letters. Dm is called a fuzzy dictionary as it contains the frequencies of +occurrence of all words plus the frequencies of all the words that are +similar to them. +.left margin1 +@26. TX 3 @Make dictionary Dm from Ds +.left margin2 +.para +Make dictionary Dm from Ds creates a dictionary from dictionary Ds that +contains the frequency of occurrence of each word (say X) in Ds plus the +frequency of occurrence of each word in Ds that differs from X by up to m +letters. Dm is called a fuzzy dictionary as it contains the frequencies of +occurrence of all words plus the frequencies of all the words that are +similar to them. +.left margin1 +@27. TX 3 @Make dictionary Dh from Dm +.left margin2 +.para +Make dictionary Dh creates a dictionary from dictionary Dm and whose +entries are zero except for those words in any set of related words that +are most frequent. It finds the dominant words in each set of relations +and stores their counts. +.left margin1 +@28. TX 3 @Examine fuzzy dictionary Dm +.left margin2 +.para +Examine dictionary Dm allows users to analyse the contents of dictionary +Dm to find the most common words or those words that contain the most +information. The user supplies a frequency or information cutoff and chooses +to have the results sorted on either value. The program will find the top 100 +words that achieve the cutoff values and present them to the user sorted +as selected. The information content will be calcutated from either Dw or Ds +depending which was used to create Dm, and using the current composition +setting. Typical dialogue follows: +.lit + +? Menu or option number=28 +Looking for highest scoring words +The highest word score = 115 +? Minimum word score (0-115) (0) =60 +? Minimum information (0.00-1.00) (0.00) =.62 +X 1 Sort on information + 2 Sort on word score +? 0,1,2 = + +? Maximum number to list (0-100) (100) = + +The words are + Total words= 9 Maximum information= 0.7385326 +TTGACA 60 0.73850 +AAAAAC 64 0.66460 +AAAAAA 90 0.64880 +GTTTTT 66 0.64300 +TTTTTG 73 0.64070 +TTTTGT 63 0.63820 +TTTTTC 65 0.63810 +AAAATA 63 0.62670 +TATAAT 65 0.62510 +The highest word score = 115 +? Minimum word score (0-115) (0) =60 +? Minimum information (0.00-1.00) (0.00) =.62 +X 1 Sort on information + 2 Sort on word score +? 0,1,2 =2 +? Maximum number to list (0-100) (100) = + +The words are + Total words= 9 Maximum information= 0.7385326 +AAAAAA 90 0.64880 +TTTTTG 73 0.64070 +GTTTTT 66 0.64300 +TTTTTC 65 0.63810 +TATAAT 65 0.62510 +AAAAAC 64 0.66460 +TTTTGT 63 0.63820 +AAAATA 63 0.62670 +TTGACA 60 0.73850 +The highest word score = 115 +? Minimum word score (0-115) (0) =! + +.end lit +.left margin1 +@29. TX 3 @Examine fuzzy dictionary Dh +.left margin2 +.para +Examine dictionary Dh allows users to analyse the contents of dictionary Dh +to find the most common words or those words that contain the most +information. The user supplies a frequency or information cutoff and chooses +to have the results sorted on either value. The program will find the top 100 +words that achieve the cutoff values and present them to the user sorted as +selected. The information content will be calcutated from either Dw or Ds +depending which was used to create Dh and using the current composition +setting. Typical dialogue follows: +.lit + +? Menu or option number=29 +Looking for highest scoring words +The highest word score = 115 +? Minimum word score (0-115) (0) =60 +? Minimum information (0.00-1.00) (0.00) =.6 +X 1 Sort on information + 2 Sort on word score +? 0,1,2 = + +? Maximum number to list (0-100) (100) = + +The words are + Total words= 4 Maximum information= 0.7385326 +TTGACA 60 0.73850 +AAAAAA 90 0.64880 +TATAAT 65 0.62510 +TTTTTT 115 0.60630 +The highest word score = 115 +? Minimum word score (0-115) (0) =50 +? Minimum information (0.00-1.00) (0.00) =.5 +X 1 Sort on information + 2 Sort on word score +? 0,1,2 = + +? Maximum number to list (0-100) (100) = + +The words are + Total words= 8 Maximum information= 0.7385326 +TTGACA 60 0.73850 +TCTTGA 54 0.66080 +AAAAAA 90 0.64880 +TATAAT 65 0.62510 +ACTTTA 57 0.61960 +TTTTTT 115 0.60630 +AGTATA 51 0.60540 +TTATAA 55 0.59300 +The highest word score = 115 +? Minimum word score (0-115) (0) =50 +? Minimum information (0.00-1.00) (0.00) = + +X 1 Sort on information + 2 Sort on word score +? 0,1,2 = + +? Maximum number to list (0-100) (100) = + +The words are + Total words= 8 Maximum information= 0.7385326 +TTGACA 60 0.73850 +TCTTGA 54 0.66080 +AAAAAA 90 0.64880 +TATAAT 65 0.62510 +ACTTTA 57 0.61960 +TTTTTT 115 0.60630 +AGTATA 51 0.60540 +TTATAA 55 0.59300 +The highest word score = 115 +? Minimum word score (0-115) (0) =! + +.end lit +.left margin1 +@30. TX 3 @Examine words in Dm +.left margin2 +.para +Examine words in Dm allows users to analyse the contents of dictonary Dm at the +level of individual words to find their frequency, information content, and to +see their base frequency table. The user types in a word to examine and the +program displays the values and table. The information content will be +calcutated from either Dw or Ds depending which was used to create Dm, +and using the current composition setting. Typical dialogue follows: +.lit +? Menu or option number=30 +? Word to examine=TTGACA +TtgacA 60 0.7385326 + 56 56 6 7 5 11 + 4 3 2 1 52 1 + 1 4 2 53 3 48 + 3 1 54 3 4 4 +TTGACA +? Word to examine=TATAAT +taTAat 65 0.6251902 + 56 3 53 4 4 60 + 6 1 5 5 5 3 + 3 60 5 57 57 4 + 4 5 6 3 3 2 +TATAAT +? Word to examine= + +.end lit +.left margin1 +@31. TX 3 @Examine words in Dh +.left margin2 +.para +Examine words in Dh allows users to analyse the contents of dictonary Dh at the +level of individual words to find their frequency, information content, and to +see their base frequency table. The user types in a word to examine and the +program displays the values and table. The information content will be +calcutated from either Dw or Ds depending which was used to create Dm, +and using the current composition setting. Typical dialogue follows: +.lit + + ? Menu or option number=31 +? Word to examine=TTGACA +TtgacA 60 0.7385326 + 56 56 6 7 5 11 + 4 3 2 1 52 1 + 1 4 2 53 3 48 + 3 1 54 3 4 4 +TTGACA +? Word to examine=TATAAT +taTAat 65 0.6251902 + 56 3 53 4 4 60 + 6 1 5 5 5 3 + 3 60 5 57 57 4 + 4 5 6 3 3 2 +TATAAT +? Word to examine=GGGGGG +gggggg 0 0.6199890 + 3 1 1 2 3 4 + 1 3 1 2 2 1 + 2 1 1 1 1 1 + 11 12 14 12 11 11 +GGGGGG +? Word to examine= + +.end lit +.left margin1 +@32. TX 3 @Save or restore a dictionary +.left margin2 +.para +Save or restore dictionary allows users to write or read any dictionary to +and from disk files. The user is asked te define the dictionary and file. The +function is useful if the machine being used is very slow at calculating +because the files can be handled quickly. However note that the files +cannot be processed by any other program. +.left margin1 +@33. TX 1 @Find inverted repeats +.left margin2 +.para +Find inverted repeats performs searches for simple inverted repeat sequences +in each sequence. They are defined by a range of loop sizes and a minimum +number of potential basepairs. The results can be plotted or listed. The x +axis of the plot represents the length of the aligned sequences and the y +direction is divided into sufficient strips to accommodate each sequence. +So if an inverted repeat is found in the 3rd sequence at a position equivalent +to halfway along the longest of the sequences then a short vertical line will +be drawn at the midpoint of the 3rd strip. Alternatively, if the results are +listed, the potential hairpin loops are drawn out, with the sequence number +and the position of the loop. Typical dialogue follows. +.lit + +? Menu or option number=33 +Define the range of loop sizes +? Minimum loop size (0-10) (3) =0 +? Maximum loop size (1-20) (3) = +? Minimum number of basepairs (1-20) (6) = +? (y/n) (y) Plot results N + Searching + +Sequence 3 34 + C + G.T + T-A + A-T + T.G + T.G + G.T + ATCTTT TATTTCA + 33 + +Sequence 5 35 + T + G.T + T.G + A-T + T.G + G.T + C-G + T.G + TCCGGC AATTGTG + 34 +.end lit +.left margin1 +@ End of help diff --git a/help/NIP.RNO b/help/NIP.RNO new file mode 100644 index 0000000..6db18e6 --- /dev/null +++ b/help/NIP.RNO @@ -0,0 +1,5116 @@ +.NPA +.SP 1 +.left margin1 +@-1. TX 0 @General +.sp +@-2. T 0 @Screen control +.sp +@-2. X 0 @Screen +.sp +@-3. T 0 @Statistical analysis of content +.sp +@-3. X 0 @Statistics +.sp +@-4. T 0 @Structures and repeats +.sp +@-4. X 0 @Structures +.sp +@-5. TX 0 @Translation and codons +.sp +@-6. TX 0 @Gene search by content +.sp +@-7. TX 0 @General signals +.sp +@-8. TX 0 @Specific signals +.sp +@0. TX -1 @NIP +.PARA +.para +This is a program for analysing individual nucleotide sequences. It can +read sequences stored in many of the most commonly used formats, and +performs all of the usual simple analyses. However the main purpose of +the program is to provide methods for finding the function of each +section of a sequence. In general no single method can give an +unequivecal interpretation of a sequence so we need to use many +techniques together and to combine their results. For this reason the +program present many of its results graphically. +.para +General information is contained in the user interface. Online +documentation for any function follows a consistent pattern: summary, +list of inputs, list of outputs, details, example. +.LEFT MARGIN1 +@1. TX 0 @ Help +.LEFT MARGIN2 +.para +This option gives online help. The user should select option numbers and +the current documentation will be given. Note that option 0 gives an +introduction to the program, and that ? will get help from anywhere in +the +program. +The following functions are included: +.left margin1 +@2. TX 0 @ Quit +.left margin2 +.para +This function stops the program. +.left margin1 +@3. TX 1 @ Read a new sequence +.LEFT MARGIN2 +.para +This option allows users to read in new sequences, browse through annotations, + or search sequence +libraries for keywords. Sequences can be read from "personal" +sequence files or from sequence libraries. These are referred to as the +sequence "source". Personal files can be stored in several formats: +Staden, PIR, EMBL, GENBANK and GCG. +At LMB we use "Staden" format for sequencing and all +the +libraries are stored in their original formats. Note, however, that libraries +such as EMBL or GenBank that are divided into several files (eg GenBank has +13 separate files) are indexed as a whole. This means that users do not need +to know which file contains an entry, only which library. +When the user selects to read in a sequence the program first asks for the +sequence "source". +.para +If the user selects "personal" the program will ask for +the format (Staden, PIR, EMBL, GENBANK or GCG), and then for the name of +the file. For PIR format the user will also be required to know the entry +name of the sequence as the file can contain several. For the other formats +only a single entry is expected. The file will be read, its length and +composition will be displayed and the option left. +.para +If the user selects "library" as the sequence source the program will display a +list of available libraries. The programs are capable of handling all current +libraries but which ones are available will vary from site to site. At LMB we +have several libraries and also weekly updates of data gathered between releases. +The program will ask users to select a library and then give a list of options: +.lit + + X 1 Get a sequence + 2 Get annotations + 3 Get entrynames from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + +.end lit +If get a sequence or get annotations is selected users will be asked to +type the entry name. The option will be left when a sequence is selected or +! is typed. The composition and length will be displayed. +.para +The text index contains all words from feature tables, reference titles, +definition lines, keywords lists and comments, so the text index search +is most useful. It is also the fastest. Up to 5 words can be searched for +at once. The words should be typed separated by spaces, for example +.lit + ? Keywords=P53 mouse murine tumo + +.end lit +will search for all entries that contain words starting with p53, mouse, +murine and tumo. Only the unique entries that contain ALL words will be +listed. Before listing the matching entries +the program will show the number of 'hits' for each word and ring the bell. +Escape is possible at this point, or after each screenfull of entries. +In addition to the entry names the text search displays the primary accession +number, the sequence length and up to 80 characters of description. +(The search of 'titles' is now redundant because the full text index +contains all the title words and the search is much faster. It will probably +be removed from the program.) +All searches are independent of case. Where +possible the program will offer default entry names. +.para +Typical dialogue follows. +.lit +Select sequence source +X 1 Personal file + 2 Sequence library +? Selection (1-2) (1) = +Select sequence file format +X 1 Staden + 2 EMBL + 3 GenBank + 4 PIR + 5 GCG +? Selection (1-5) (1) = +? Sequence file name=M13MP7.SEQ + Contig title removed +Sequence length= 7238 + Sequence composition + T C A G - + 2405. 1539. 1765. 1527. 2. + 33.2% 21.3% 24.4% 21.1% 0.0% + . + . + . + + + Select sequence source + X 1 Personal file + 2 Sequence library + ? Selection (1-2) (1) =2 + Select a library + X 1 EMBL 29 nucleotide library Dec 91 + 2 SWISSPROT 20 protein library Nov 91 + 3 PIR 31 protein library Dec 91 + 4 NRL3D 58 From Brookhaven protein library Dec 91 + 5 GenBank + ? Selection (1-5) (1) = +Library is in EMBL format with indexes + Select a task + X 1 Get a sequence + 2 Get annotations + 3 Get entry names from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + ? Selection (1-5) (1) =5 + Search for keywords + ? Keywords=P53 mouse +P53 hits 68 +MOUSE hits 8180 + + MMANT01 X00875 536 Murine gene fragment for cellular tumour antigen + MMANT02 X00876 83 Murine gene fragment for cellular tumour antigen + MMANT03 X00877 21 Murine gene fragment for cellular tumour antigen + MMANT04 X00878 261 Murine gene fragment for cellular tumour antigen + MMANT05 X00879 184 Murine gene fragment for cellular tumour antigen + MMANT06 X00880 113 Murine gene fragment for cellular tumour antigen + MMANT07 X00881 110 Murine gene fragment for cellular tumour antigen + MMANT08 X00882 137 Murine gene fragment for cellular tumour antigen + MMANT09 X00883 74 Murine gene fragment for cellular tumour antigen + MMANT10 X00884 107 Murine gene for cellular tumour antigen p53 (exon + MMANT11 X00885 562 Murine p53 gene 3' region with exon 11 + MMANTP53 M26862 536 Mouse tumor antigen p53 gene, 5' end. + MMLYN M64608 2044 Mouse lyn protein mRNA, complete cds. + MMP53 X00741 1377 Mouse mRNA for transformation associated protein + MMP53A M13872 1285 Mouse p53 mRNA, complete cds, clone pcD53. + MMP53B M13873 1241 Mouse p53 mRNA, complete cds, clone p53-m11. + MMP53C M13874 1322 Mouse p53 mRNA, complete cds, clone p53-m8. + MMP53G1 X01235 554 Mouse genomic DNA for 5' region of cellular tumou + MMP53IN4 X60470 729 M.musculus p53 gene for p53 protein, intron 4 + MMP53P X01236 2132 Mouse pseudogene for cellular tumour antigen p53 + MMP53R X01237 1773 Mouse mRNA for cellular tumour antigen p53 + MMRSB2P5 M64597 196 Mouse B2 repeat in the 3' flank of protein 53 (p5 + 22 different entries found + + Select a task + X 1 Get a sequence + 2 Get annotations + 3 Get entry names from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + ? Selection (1-5) (1) =4 + Search for keywords + ? Keywords=alpha + Searching for alpha + AAGHA 623 a.anguilla mrna for glycoprotein hormone alpha subunit precu + AAMALI 3338 a.aegypti mali gene encoding alpha 1-4 glucosidase, complete + AAMALIA 1659 a.aegypti maltase-like i (mali) gene encoding alpha-1,4-gluc + AAMALIB 1832 a.aegypti maltase-like i (mali) mrna encoding alpha-1,4-gluc + ACA13GT 371 alouatta caraya alpha-1,3gt gene, 3' flank. + ADHBADA1 102 duck alpha-d-globin gene, exon 1. + ADHBADA2 1145 duck alpha-a-globin gene and 5' flank + ADHBADWP 513 duck (white pekin) alpha ii (minor) globin mrna, complete co + AEACOXABC 5279 a.eutrophus protein x (acox), acetoin:dcpip oxidoreductase-a + AGA13GT 371 ateles geoffroyi alpha-1,3gt gene, 3' flank. + AGAAAGFP 282 c.tetragonoloba alpha-amylase/alpha-galactosidase fusion pro + AGAABL 138 b.subtilis alpha-amylase signal peptide gene e.coli beta-lac + AGAFAMYA 57 synthetic b.stearothermophilus alpha amylase/s.cerevisiae ma + AGAFAMYB 57 synthetic b.stearothermophilus alpha amylase/s.cerevisiae ma + AGAFAMYC 57 synthetic b.stearothermophilus alpha amylase/s.cerevisiae ma + AGAFCOXA 98 synthetic alpha-factor/cox iv fusion gene signal peptide. + AGAGABA 7876 synthetic gossypium hirsutum (cotton) alpha globulin a and b + AGAMYLS 120 synthetic alpha-amylase gene, 5' end. + AGANPS 95 synthetic gene (jcnf-1) encoding alpha-factor pro-region/han +! + Select a task + X 1 Get a sequence + 2 Get annotations + 3 Get entry names from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + ? Selection (1-5) (1) =3 + ? Accession number=v00636 +Entry name LAMBDA + Select a task + X 1 Get a sequence + 2 Get annotations + 3 Get entry names from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + ? Selection (1-5) (1) =2 + Default Entry name=LAMBDA + ? Entry name= +ID LAMBDA standard; DNA; PHG; 48502 BP. +XX +AC V00636; J02459; M17233; X00906; +XX +DT 03-JUL-1991 (Rel. 28, Last updated, Version 3) +DT 09-JUN-1982 (Rel. 1, Created) +XX +DE Genome of the bacteriophage lambda (Styloviridae). +XX +KW circular; coat protein; DNA binding protein; genome; +KW origin of replication. +XX +OS Bacteriophage lambda +OC Viridae; ds-DNA nonenveloped viruses; Siphoviridae. +XX +RN [1] +RP 1-48502 +RA Sanger F., Coulson A.R., Hong G.F., Hill D.F., Petersen G.B.; +RT "Nucleotide sequence of bacteriophage lambda DNA"; +RL J. Mol. Biol. 162:729-773(1982). +XX +! + Select a task + X 1 Get a sequence + 2 Get annotations + 3 Get entry names from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + ? Selection (1-5) (1) = + Default Entry name=LAMBDA + ? Entry name= +DE Genome of the bacteriophage lambda (Styloviridae). + Sequence length 48502 + Sequence composition + T C A G - + 11988. 11360. 12336. 12818. 0. + 24.7% 23.4% 25.4% 26.4% 0.0% + +.end lit +.left margin1 +@4. TX 1 @ Define active region +.LEFT MARGIN2 +.para +For its analytic functions +the program always works on a region of the sequence called the "active +region". This function allows the start and end points of the active region +to be reset. +.para +Define the required start and end points. +.para +When a new sequence is read into the program the active region is +automatically set to start at the beginning of the sequence and extend to +the +maximum the program can +handle. On most machines this will be to the end of the sequence. The +positions are shown on the screen. + Note that for +convenience, in the +listing and translation functions, the user is given access to regions +outside the active region. +.left margin1 +@5. TX 1 @ List a sequence +.LEFT MARGIN2 +.para +The sequence can be listed single or double stranded with line lengths +from +10 to 120 in multiples of 10. +.para +Define the region to list, the line length required and choose between a +single or double stranded display. +The output looks like: +.lit + + GTTAATGTAG CTTAATAACA AAGCAAAGCA CTGAAAATGC TTAGATGGAT + CAATTACATC GAATTATTGT TTCGTTTCGT GACTTTTACG AATCTACCTA + 10 20 30 40 50 + + AATTGTATCC CATAAACACA AAGGTTTGGT CCTGGCCTTA TAATTAATTA + TTAACATAGG GTATTTGTGT TTCCAAACCA GGACCGGAAT ATTAATTAAT + 60 70 80 90 100 + + GAGGTAAAAT TACACATGCA AACCTCCATA GACCGGTGTA AAATCCCTTA + CTCCATTTTA ATGTGTACGT TTGGAGGTAT CTGGCCACAT TTTAGGGAAT + 110 120 130 140 150 + + AACATTTACT TAAAATTTAA GGAGAGGGTA TCAAGCACAT TAAAATAGCT + TTGTAAATGA ATTTTAAATT CCTCTCCCAT AGTTCGTGTA ATTTTATCGA + 160 170 180 190 200 + +.end lit +.left margin1 +@6. TX 1 @ List a text file. +.LEFT MARGIN2 +.para +Allows the user to have a text file displayed on the screen. It will appear +one page at a time. +.para +Supply the name of the file to be displayed. +.left margin1 +@7. TX 1 @ Direct output to disk +.LEFT MARGIN2 +.para +Used to direct output that would normally appear on the screen to a file. +.para +Select redirection of either text or graphics, and +supply the name of the file that the output should be written to. +.para + The results from the next options selected will not appear on the screen +but will be written to the file. When option 7 is selected again +the file will be +closed and output will again appear on the screen. +.left margin1 +@8. TX 1 @ Write active region to disk +.LEFT MARGIN2 +.para +Used to write the current active section of sequence to a disk file in +"Staden format". +.para +Supply a file name and an optional title. +.para +The program has the capability of reading sequences stored in several +formats and so, in conjunction with this option, can be used to reformat +them. +.left margin1 +@9. TX 1 @ Edit the sequence +.LEFT MARGIN2 +.para +Used to edit sequences or any other files by giving access to the +computers system editor. For editing sequences the input file should +have already been created using one of the listing functions such as "list +sequence", "list translation" or "list restriction sites above the +sequence". +.para +Supply the name of the file to edit. Wait while the system editor is made +ready (can take awhile on a vax). Use the editor. Exit from the editor. If a +sequence has been edited, and you want to process it, affirm that the +sequence should be "made active". The edited sequence will replace the +original sequence. +.para +This editing method is designed to give users access to an editor with +which they are familiar - i.e. the one on their machine, and yet to allow +them to edit a sequence which contains all the landmarks they need in +order to know where they are. Users can create files containing simple +listings (single stranded) with numbering, using "list the sequence", and +then edit them with their system editor, using the numbering to know +where they are within the sequence. When the edits are complete they +exit from the editor and the program "analyses" the edited file to extract +only the sequence characters. Similarly a file containing a three phase +tranlslation can be edited, or a file containing a sequence plus its three +phase translation, plus its restriction sites marked above the sequence. +In order to be able to "analyse" such complicated listings and correctly +extract the sequence the following simple rule is used: all lines in the +file that contain a character that is not A,C,T,G or U are deleted. It is +obviously important to be aware of this rule and its implications. +.left margin1 +@10. TX 2 @ Clear graphics +.LEFT MARGIN1 +.para + Clears graphics from the screen. +.left margin1 +@11. TX 2 @ Clear text +.LEFT MARGIN1 +.para + Clears text from the screen. +.left margin1 +@12. TX 2 @ Draw a ruler +.LEFT MARGIN2 +.para +This option +allows the user to draw a ruler or scale along the x axis of the screen to +help identify the coordinates of points of interest. The user can define +the position of the first base to be marked (for example if the active +region is 1501 to 8000, the user might wish to mark every 1000th base +starting at either 1501 or 2000 - it depends if the user wishes to treat +the active region as an independent unit with its own numbering starting +at +its left edge, or as part of the whole sequence). The user can also define +the separation of the ticks on the scale and their height. If required the +labelling routine can be used to add numbers to the ticks. +.left margin1 +@13. TX 2 @ Use crosshair +.LEFT MARGIN2 +.para +This function puts +a steerable cross on the screen that can be used to find the +coordinates of points in the sequence. The user can move the cross +around using the directional keys; when he hits the space bar the +program will print out the coordinates of the cross in sequence units and +the option will be exited. +.PARA +If instead, +you hit a , the position will be displayed but the cross will remain on +the screen. +.PARA +If a letter s is hit the program will display the sequence around the +crosshair +position, and leave the cross on the screen. +.left margin1 +@14. TX 2 @ Reposition plots +.LEFT MARGIN2 +.para +The positions of each of the plots is defined relative to a users drawing +board which has size 1-10,000 in x and 1-10,000 in y. +Plots for +each option are drawn in a window defined by x0,y0 and xlength,ylength. +Where x0,y0 is the position of the bottom left hand corner of the window, + and xlength is the width of the window and ylength the +height of the window. +.lit + --------------------------------------------------------- 10,000 + 1 1 + 1 -------------------------------------- ^ 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 1 1 ylength 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 -------------------------------------- v 1 + 1 x0,y0^ 1 + 1 <---------------xlength--------------> 1 + --------------------------------------------------------- 1 + 1 10,000 + +.end lit +All values are in drawing board units (i.e. 1-10,000, 1-10,000). +The default window positions are read from a file "NIPMARG" when the +program is started. Users can have their own file if required. +As all the plots start +at the same position in x and have the same width, x0 and xlength are the +same for all options. Generally users will only want to change the start +level of the window y0 and its height ylength. + This option +allows users to change window positions whilst running the program. +The routine prompts first for the number of the option that the users +wishes +to reposition; then for the y start and height; then for the x start and +length. Note that changes to the x values affect all options. If the user +types only carriage return for any value it will remain unchanged. +The cross-hair can be used to choose suitable heights. +.LEFT MARGIN1 +@15. TX 2 @ Label a diagram +.LEFT MARGIN2 +.para +This routine allows users to label any diagrams they have produced. They +are asked to type in a label. When the user types carriage return to finish +typing the label the cross-hair appears on the screen. The user can +position it anywhere on the screen. If the user types R (for right justify) +the label will be +written on the diagram with its right end at the cross-hair position. +If the user types L (for left justify) the label will be written on the +diagram with its left end at the cross hair position. +The +cross-hair will then immediately reappear. The user may put the same +label +on another part of the diagram as before or if he hits the space bar he +will be asked if he wishes to type in another label. +.para +Typical dialogue follows. +.lit +? Menu or option number=15 +Type label then drive cross hair to left or right end +of label position then hit "L" to write label left +justified or "R" to write label right justified or +the space bar to quit + + +? Label=delta gene + + missing graphics + +? Label= + +.end lit +.left margin1 +@16. TX 2 @Display a map +.LEFT MARGIN2 +.para +This draws a map +of any sequence features selected by the user. +These features may be protein coding regions (CDS), tRNA genes (TRNA), +promoter positions (PRM), etc. Users may define their own feature table +key +names. For example I find it convenient to split CDS lines into CDS1, +CDS2 +and CDS3 each of which contains only those sequences that code in the +reading frames 1, 2 or 3. Then I can plot them at different heights on +the screen ( suitable heights can be determined by using the cross-hair). +.para +The coordinates must be stored in a file in the format of an EMBL or GenBank +feature table. Note that this means that the file must include either EMBL +or GenBank headers, and a suitable "tail". The simplest header is the word +FEATURES starting in column 1 of the first line of the file. The simplest +tail is 2 empty lines at the end of the file. These lines are not included +when nip writes out results in feature table format. +.para +Typical dialogue follows. +.lit +? Menu or option number=16 + Display a map using an EMBL feature table file +? map file name=hsegl1.ft +? feature code(e.g. CDS) =CDS +X 1 + strand + 2 - strand + 3 both strands +? 0,1,2,3 = +? level (0-9480) (256) =4000 + + missing graphics + +? feature code(e.g. CDS) = + +.end lit +.left margin1 +@17. TX 1 @ Search for restriction enzymes +.LEFT MARGIN2 +.para +This routine is used to search for short sequences, like restriction +enzyme +recognition sequences, +and can either list the results or present them graphically. Listings can +take several forms and can include the sequence and its translation. +Examples are given below. The program will also display the names of +enzymes that cut the sequence infrequently. Users can select from sets +of enzymes stored in files or can enter them from the keyboard. +.para +The short +sequences (strings) and their names need to be arranged in a particular +way. See below. Select to search, list an enzyme file or clear the screen. +Choose either a file of enzymes or to enter their recognition sequences at the +keyboard. Choose to search for all the enzymes in the list or to select +from the list. Select a mode of output. Define the sequence as circular or +linear. Select to search for "definite" or "possible" matches. The search +starts, and after the results have been displayed, further searches can be +performed. +.para +When the enzymes and their recognition sequences are stored in a file +they must be defined in the following way. We +call the recognition sequences "strings". +The format is as follows: each string or set of strings must be +preceded by a name, each string must be preceded and +terminated with a slash (/), and +each set of strings by 2 slashes. +For example +AATII/GACGT'C// defines the name AATII, its recognition sequence +GACGTC +and its cut site with the ' symbol; ACCI/GT'MKAC// defines the name +ACCI +and its recognition sequence includes IUB symbols for incompletely +defined +symbols in nucleic acid sequences; +BBVI/GCAGCNNNNNNNN'/'NNNNNNNNNNNNGCTGC// +defines the name BBVI and this time two recognition sequences and cut +sites +are specified in order to correctly show the cutting position relative to +the recognition sequence. If no cut site is included the first base of the +recognition sequence is displayed as being on the 3' side of the +recognition sequence. +.para +These collections of strings and their +names can be read from disk or entered from the keyboard. +When names and strings are entered from the keyboard the program will ask +for the name and then the string(s). If more than one string is typed per +name they must be separated by slash (/) characters. See the "Typical +dialogue" below. + Three files +containing restriction enzyme recognition sequences are currently +available. The "all enzymes" file contains the Rich Roberts REBASE +restriction enzyme database, which is updated monthly. +.para +The user can select strings +by name from these collections. If so the program will prompt for the +names, one at a time. The user can continue to select names until a blank +name is entered (by the user typing only return). +.para + Listed output can be displayed in several ways: it +can be ordered enzyme by enzyme, or on cut positions, or with enzyme +names +written above a listing of the sequence. This last listing can also include +a three phase translation of the sequence. In addition the program will +display only infrequent cutters (the user defines the minimum number of +cuts), or can plot the positions of matches. +.para +Listings sorted "enzyme by enzyme" have the following form: +.lit + + Matches found= 1 + Name Sequence Position Fragment lengths + 1 AATII GACGT'C 112 111 111 + 912 912 + Matches found= 2 + Name Sequence Position Fragment lengths + 1 ACCI GT'CGAC 112 111 111 + 2 ACCI GT'AGAC 420 308 308 + 604 604 + Matches found= 2 + Name Sequence Position Fragment lengths + 1 AHAII GA'CGTC 109 108 90 + 2 AHAII GG'CGTC 199 90 108 + 825 825 + Matches found= 2 + Name Sequence Position Fragment lengths + 1 AVAII G'GACC 84 83 51 + 2 AVAII G'GTCC 973 889 83 + 51 889 + Matches found= 1 + Name Sequence Position Fragment lengths + 1 BALI TGG'CCA 258 257 257 + 766 766 + Matches found= 1 + Name Sequence Position Fragment lengths + 1 BAMHI G'GATCC 92 91 91 + + ...... etc + +Listings sorted on cut position have the following form: + + Searching + Name Sequence Position Fragment lengths + 1 ECORI G'AATTC 2 1 + 2 BANI G'GTGCC 26 24 + 3 BSP1286 GTGCC'C 31 5 + 4 BBVI 'TACTGCGCCGCAGCTGC 38 7 + 5 NSPBII CAG'CTG 51 13 + 6 PVUII CAG'CTG 51 0 + 7 BBVI GCAGCTGCTGGTG' 60 9 + 8 HINCII GTC'AAC 80 20 + 9 AVAII G'GACC 84 4 + 10 BINI 'CCAGGGATCC 87 3 + 11 BSTNI CC'AGG 89 2 + 12 BAMHI G'GATCC 92 3 + 13 XHOII G'GATCC 92 0 + 14 NSPBII CCG'CTG 98 6 + 15 BINI GGATCCGCT' 100 2 + 16 AHAII GA'CGTC 109 9 + 17 SALI G'TCGAC 111 2 + 18 AATII GACGT'C 112 1 + 19 ACCI GT'CGAC 112 0 + 20 HINCII GTC'GAC 113 1 + 21 BBVI GCAGCGACTGATT' 166 53 + 22 BINI 'ACTCAGATCC 178 12 + 23 XHOII A'GATCC 183 5 + 24 HGAI 'GGCGGCGGAGGCGTC 188 5 + + .....etc + +Lists of infrequent cutters have the following form: + + 0 AFLII + 0 AFLIII + 0 APAI + 0 APALI + 0 ASUII + 0 AVAI + 0 AVRII + 0 BCLI + 0 BGLI + 0 BGLII + 0 BSMI + 0 BSPMII + 0 BSTEII + ...... etc + + Listings showing names above the sequence, and a translation have the +following form: + + + ECORI BANI BSP1286 + . . . BBVI NSPBII + . . . . PVUII BBVI +GAATTCGGTTTGGGCTTGGTGTGAGGTGCCCAGAGATTACTGCGCCGCAGCTGCTG +GTGC + 10 20 30 40 50 60 + E F G L G L V * G A Q R L L R R S C W C + N S V W A W C E V P R D Y C A A A A G A + I R F G L G V R C P E I T A P Q L L V L + + HINCII + . AVAII + . . BINI + . . . BSTNI + . . . . BAMHI + . . . . XHOII NSPBII + . . . . . . BINI AHAII + . . . . . . . . SALI + . . . . . . . . .AATII + . . . . . . . . .ACCI + . . . . . . . . ..HINCII +TGGCGGTGCGGAGGTCGTCAACGGACCCAGGGATCCGCTGGACGAGGACGTCGACG +ACGA + 70 80 90 100 110 120 + W R C G G R Q R T Q G S A G R G R R R R + G G A E V V N G P R D P L D E D V D D E + A V R R S S T D P G I R W T R T S T T R + + BBVI BINI +GGAGGAGGTGGATAGCGCATTGCTGGTGGCTGGCAGCGACTGATTTGAGTTCTGAC +CACT + 130 140 150 160 170 180 + G G G G * R I A G G W Q R L I * V L T T + E E V D S A L L V A G S D * F E F * P L + R R W I A H C W W L A A T D L S S D H S + + XHOII + . HGAI AHAII PFIMI + . . . . BBVI +CAGATCCGGCGGCGGAGGCGTCGAGGCTCCCGAAACTCCCAGTGGCTGGCCTGCTA +GATT + 190 200 210 220 230 240 + Q I R R R R R R G S R N S Q W L A C * I + R S G G G G V E A P E T P S G W P A R F + D P A A E A S R L P K L P V A G L L D S + + .........etc + +.end lit +.para +The terms "possible" and "definite" matches are important only for back +translations of protein into DNA, and which include IUB redundancy codes. +Those matches that the program terms "definite matches" and are ones in +which the specification of the recognition sequence corresponds +exactly to that of the back translation, and consequently are definitely in +the DNA sequence. The program will also find what it +terms 'possible matches' which are ones that depend on the particular +codons +chosen for each amino acid. +These are sites at which recognition +sequences could be engineered to produce a cut in the DNA +without changing the amino +acid, but which are not +necessarily found in the original sequence. +.para +The routine will handle both linear and circular sequences, and +so finds cutsites spanning the "ends" of circular sequences. + The program will only find cutsites spanning the +ends of sequences if the sequence is declared as circular. +This includes sites for +recognition sequences containing leading or trailing N symbols, in which +the actual recognition sequence does not span the join. For example if the +recognition sequence was 'NNNNACGT and the first 4 characters in the +sequence were ACGT, then the match would only be found if the sequence +was +declared as circular. If the sequence is linear then the first fragment +starts at base number 1, and the last ends at the last base. If the +sequence is circular then the length of the first fragment is the +clockwise +distance from the last cut to the first. +.para +Graphical output marks the position of each string by a +short vertical line and gives the name of the enzyme at the left end of +the +line. If the top of the screen is reached the program gives the user the +oportunity to take a hard copy and then will clear the screen and restart +plotting results at the original start position. +.para +Below is an edited piece of dialogue from use of the search option: +.lit +? Menu or option number=17 + +Search for restriction enzyme sites +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 = 2 + + 1 All enzymes +X 2 Six cutters + 3 Four cutters + 4 Personal file + 5 Keyboard +? 0,1,2,3,4,5 = + +AATII/GACGT'C// +ACCI/GT'MKAC// +AFLII/C'TTAAG// +AFLIII/A'CRYGT// +AHAII/GR'CGYC// +APAI/GGGCC'C// +APALI/G'TGCAC// +ASUII/TT'CGAA// +AVAI/C'YCGRG// +AVAII/G'GWCC// +AVRII/C'CTAGG// +BALI/TGG'CCA// +BAMHI/G'GATCC// +BANI/G'GYRCC// +BANII/GRGCY'C// +BBVI/GCAGCNNNNNNNN'/'NNNNNNNNNNNNGCTGC// +BCLI/T'GATCA// +BGLI/GCCNNNN'NGGC// +BGLII/A'GATCT// +BINI/GGATCNNNN'/'NNNNNGATCC// +BSMI/GAATGCN'/NG'CATTC// +BSP1286/GDGCH'C// + +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 = + 1 All enzymes +X 2 Six cutters + 3 Four cutters + 4 Personal file + 5 Keyboard +? 0,1,2,3,4,5 = +? (y/n) (y) Search for all names +X 1 Order results enzyme by enzyme + 2 Order results by position + 3 Show only infrequent cutters + 4 Show names above the sequence +? 0,1,2,3,4 = +? (y/n) (y) List matches +? (y/n) (y) The sequence is linear +? (y/n) (y) Search for definite matches + + Searching + Matches found= 1 + Name Sequence Position Fragment lengths + 1 AATII GACGT'C 112 111 111 + 912 912 + Matches found= 2 + Name Sequence Position Fragment lengths + 1 ACCI GT'CGAC 112 111 111 + 2 ACCI GT'AGAC 420 308 308 + 604 604 + Matches found= 2 + Name Sequence Position Fragment lengths + 1 AHAII GA'CGTC 109 108 90 + 2 AHAII GG'CGTC 199 90 108 + 825 825 + Matches found= 2 + Name Sequence Position Fragment lengths + 1 AVAII G'GACC 84 83 51 + 2 AVAII G'GTCC 973 889 83 + 51 889 + Matches found= 1 + Name Sequence Position Fragment lengths + 1 BALI TGG'CCA 258 257 257 + 766 766 + Matches found= 1 + Name Sequence Position Fragment lengths + 1 BAMHI G'GATCC 92 91 91 + 932 932 + Matches found= 1 + Name Sequence Position Fragment lengths + 1 BANI G'GTGCC 26 25 25 + 998 998 + Matches found= 1 + Name Sequence Position Fragment lengths + 1 BANII GAGCC'C 490 489 489 + 534 534 + Matches found= 11 + Name Sequence Position Fragment lengths + 1 BBVI 'TACTGCGCCGCAGCTGC 38 37 3 + 2 BBVI GCAGCTGCTGGTG' 60 22 22 + 3 BBVI GCAGCGACTGATT' 166 106 28 + 4 BBVI 'CCTGCTAGATTCGCTGC 230 64 37 + 5 BBVI GCAGCGGTACGTA' 452 222 50 + 6 BBVI 'CTCGCCAACGTTGCTGC 502 50 55 + 7 BBVI GCAGCCTTCAACT' 606 104 64 + 8 BBVI 'GAGGTATTCCTGGCTGC 634 28 97 + 9 BBVI 'CTGGCCGCCGCCGCTGC 869 235 104 + 10 BBVI 'GCCGCCGCCGCTGCTGC 872 3 106 + 11 BBVI GCAGCGATGAGGA' 927 55 222 + + ....etc + + X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 = + + 1 All enzymes +X 2 Six cutters + 3 Four cutters + 4 Personal file + 5 Keyboard +? 0,1,2,3,4,5 = + +? (y/n) (y) Search for all names + +X 1 Order results enzyme by enzyme + 2 Order results by position + 3 Show only infrequent cutters + 4 Show names above the sequence +? 0,1,2,3,4 = 2 + +? (y/n) (y) List matches +? (y/n) (y) The sequence is linear +? (y/n) (y) Search for definite matches + + Searching + Name Sequence Position Fragment lengths + 1 ECORI G'AATTC 2 1 + 2 BANI G'GTGCC 26 24 + 3 BSP1286 GTGCC'C 31 5 + 4 BBVI 'TACTGCGCCGCAGCTGC 38 7 + 5 NSPBII CAG'CTG 51 13 + 6 PVUII CAG'CTG 51 0 + 7 BBVI GCAGCTGCTGGTG' 60 9 + 8 HINCII GTC'AAC 80 20 + 9 AVAII G'GACC 84 4 + 10 BINI 'CCAGGGATCC 87 3 + 11 BSTNI CC'AGG 89 2 + 12 BAMHI G'GATCC 92 3 + 13 XHOII G'GATCC 92 0 + 14 NSPBII CCG'CTG 98 6 + 15 BINI GGATCCGCT' 100 2 + 16 AHAII GA'CGTC 109 9 + 17 SALI G'TCGAC 111 2 + 18 AATII GACGT'C 112 1 + 19 ACCI GT'CGAC 112 0 + 20 HINCII GTC'GAC 113 1 + + .....etc + +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 = + + 1 All enzymes +X 2 Six cutters + 3 Four cutters + 4 Personal file + 5 Keyboard +? 0,1,2,3,4,5 = + +? (y/n) (y) Search for all names + + 1 Order results enzyme by enzyme +X 2 Order results by position + 3 Show only infrequent cutters + 4 Show names above the sequence +? 0,1,2,3,4 =3 +? Maximum number of cuts (0-100) (0) = +? (y/n) (y) The sequence is linear +? (y/n) (y) Search for definite matches + + Searching + 0 AFLII + 0 AFLIII + 0 APAI + 0 APALI + 0 ASUII + 0 AVAI + 0 AVRII + 0 BCLI + 0 BGLI + 0 BGLII + 0 BSMI + 0 BSPMII + 0 BSTEII + 0 CLAI + 0 DRAI + 0 DRAII + 0 ECOB + 0 ECOK + 0 ECORV + 0 ESPI + + ......etc + +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 = + + 1 All enzymes +X 2 Six cutters + 3 Four cutters + 4 Personal file + 5 Keyboard +? 0,1,2,3,4,5 = + +? (y/n) (y) Search for all names + + 1 Order results enzyme by enzyme + 2 Order results by position +X 3 Show only infrequent cutters + 4 Show names above the sequence +? 0,1,2,3,4 =4 +? (y/n) (y) Hide translation n +? (y/n) (y) Use 1 letter codes +? Line length (30-90) (60) = +? (y/n) (y) The sequence is linear +? (y/n) (y) Search for definite matches + + Searching + ECORI BANI BSP1286 + . . . BBVI NSPBII + . . . . PVUII BBVI +GAATTCGGTTTGGGCTTGGTGTGAGGTGCCCAGAGATTACTGCGCCGCAGCTGCTG +GTGC + 10 20 30 40 50 60 + E F G L G L V * G A Q R L L R R S C W C + N S V W A W C E V P R D Y C A A A A G A + I R F G L G V R C P E I T A P Q L L V L + + HINCII + . AVAII + . . BINI + . . . BSTNI + . . . . BAMHI + . . . . XHOII NSPBII + . . . . . . BINI AHAII + . . . . . . . . SALI + . . . . . . . . .AATII + . . . . . . . . .ACCI + . . . . . . . . ..HINCII +TGGCGGTGCGGAGGTCGTCAACGGACCCAGGGATCCGCTGGACGAGGACGTCGACG +ACGA + 70 80 90 100 110 120 + W R C G G R Q R T Q G S A G R G R R R R + G G A E V V N G P R D P L D E D V D D E + A V R R S S T D P G I R W T R T S T T R + + BBVI BINI +GGAGGAGGTGGATAGCGCATTGCTGGTGGCTGGCAGCGACTGATTTGAGTTCTGAC +CACT + 130 140 150 160 170 180 + G G G G * R I A G G W Q R L I * V L T T + E E V D S A L L V A G S D * F E F * P L + R R W I A H C W W L A A T D L S S D H S + + .......etc + +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 = + + 1 All enzymes +X 2 Six cutters + 3 Four cutters + 4 Personal file + 5 Keyboard +? 0,1,2,3,4,5 =5 +Define search strings by typing a string name +followed by the string(s) +? Name=FRED +? String(s)=AAAAAA/TTTTTT +? Name=MARY +? String(s)=CCCC/GGGG/GCGCT +? Name= +? (y/n) (y) Search for all names +X 1 Order results enzyme by enzyme + 2 Order results by position + 3 Show only infrequent cutters + 4 Show names above the sequence +? 0,1,2,3,4 = +? (y/n) (y) List matches +? (y/n) (y) The sequence is linear +? (y/n) (y) Search for definite matches + Searching + Matches found= 9 + Name Sequence Position Fragment lengths + 1 FRED 'TTTTTT 1557 1556 1 + 2 FRED 'TTTTTT 1558 1 1 + 3 FRED 'TTTTTT 1559 1 1 + 4 FRED 'TTTTTT 1560 1 22 + 5 FRED 'AAAAAA 1582 22 529 + 6 FRED 'AAAAAA 3160 1578 1019 + 7 FRED 'AAAAAA 4204 1044 1044 + 8 FRED 'AAAAAA 5691 1487 1487 + 9 FRED 'AAAAAA 6710 1019 1556 + 529 1578 + Matches found= 36 + Name Sequence Position Fragment lengths + 1 MARY 'CCCC 47 46 1 + 2 MARY 'GGGG 486 439 1 + 3 MARY 'GGGG 487 1 1 + 4 MARY 'CCCC 557 70 1 + 5 MARY 'CCCC 558 1 1 + 6 MARY 'GCGCT 1177 619 1 + + ... etc + +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 = + 1 All enzymes +X 2 Six cutters + 3 Four cutters + 4 Personal file + 5 Keyboard +? 0,1,2,3,4,5 =5 +Define search strings by typing a string name +followed by the string(s) +? Name=JANE +? String(s)=A'TTTT/CC'GGG +? Name= +? (y/n) (y) Search for all names +X 1 Order results enzyme by enzyme + 2 Order results by position + 3 Show only infrequent cutters + 4 Show names above the sequence +? 0,1,2,3,4 = +? (y/n) (y) List matches +? (y/n) (y) The sequence is linear +? (y/n) (y) Search for definite matches + Searching + Matches found= 30 + Name Sequence Position Fragment lengths + 1 JANE A'TTTT 437 436 6 + 2 JANE A'TTTT 546 109 33 + 3 JANE A'TTTT 597 51 43 + 4 JANE A'TTTT 777 180 51 + 5 JANE A'TTTT 1274 497 60 + 6 JANE A'TTTT 1571 297 62 + 7 JANE CC'GGG 1926 355 75 + 8 JANE A'TTTT 2403 477 81 + 9 JANE A'TTTT 2586 183 82 + 10 JANE A'TTTT 2731 145 101 + 11 JANE A'TTTT 2812 81 103 + + ... etc + + +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 =! +.end lit + +.left margin1 +@18. TX 1 7 @ Compare a short sequence +.LEFT MARGIN2 +.para +This routine slides a short sequence along the current sequence and finds +all positions at which a given percentage of the bases match. +Output is in both graphical and listed forms. +.para +If users call for dialogue when the routine is selected they will be given +the choice of keyboard or file input. Define the string, select the "sense" +to use and the percentage match. Matches will be plotted out and then the +user can select to have them listed. Then the routine cycles around. +.para + The routine slides the search string +along the sequence and marks the positions at which a minimum +percentage score is reached. The graphical output draws a vertical line at +the match position; the height of the line represents the percentage +score, +so that if the line reaches the top of the box the score is 100%. +The NC-IUB symbols may be used in the search sequence to encode +uncertain +characters. Any other symbols will not match. +.LIT + + + NC-IUB SYMBOLS + + A,C,G,T + R (A,G) 'puRine' + Y (T,C) 'pYrimidine' + W (A,T) 'Weak' + S (C,G) 'Strong' + M (A,C) 'aMino' + K (G,T) 'Keto' + H (A,T,C) 'not G' + B (G,C,T) 'not A' + V (G,A,C) 'not T' + D (G,A,T) 'not C' + N (G,A,C,T) 'aNy' + + Typical dialogue is shown below. + + +? Menu or option number=18 + Find percentage matches +? (y/n) (y) Keep picture +? String=AAATTTCCC +STRING=AAATTTCCC +? (y/n) (y) This sense +? Percent match (1.00-100.00) (70.00) = + + Missing graphics display here + +Total scoring positions above 70.000 percent = 7 +Scores 7 6 6 6 6 6 6 +Positions 365 212 213 292 311 358 627 +? Display (0-7) (0) =3 + + 365 + ACATTTCGC + * ***** * + AAATTTCCC + 1 + + 212 + GAAACTCCC + ** **** + AAATTTCCC + 1 + + 213 + AAACTCCCA + *** * ** + AAATTTCCC + 1 +? (y/n) (y) Keep picture +Default String=AAATTTCCC +? String= +STRING=AAATTTCCC +? (y/n) (y) This sense n +STRING=GGGAAATTT +? Percent match (1.00-100.00) (70.00) = + + Missing graphics display here + +Total scoring positions above 70.000 percent = 7 +Scores 6 6 6 6 6 6 6 +Positions 269 270 271 288 354 624 853 +? Display (0-7) (0) =3 + + 269 + GAGGGATTT + * * **** + GGGAAATTT + 1 + + 270 + AGGGATTTT + ** * *** + GGGAAATTT + 1 + + 271 + GGGATTTTC + **** ** + GGGAAATTT + 1 +? (y/n) (y) Keep picture ! + +.end lit +.left margin1 +@19. TX 7 @ Compare a short sequence using a score matrix +.LEFT MARGIN2 +.para +This routine slides a short sequence along the current sequence and finds +all positions at which a given level of similarity (a cutoff score) is +reached. The score is defined by use of a score matrix. Output is in both +graphical and listed forms. +.para +If users call for dialogue when the routine is selected they will be given +the choice of keyboard or file input. Define the string, select the "sense" +to use and the cutoff score. Matches will be plotted out and then the user +can select to have them listed. Then the routine cycles around. +.para + The routine slides the search string +along the sequence and marks the positions at which a the cutoff score +is achieved. The graphical output draws a vertical line at +the match position; the height of the line represents the score, +so that if the line reaches the top of the box the score is the maximum +possible. +The NC-IUB symbols may be used in the search sequence to encode +uncertain +characters. +.para + The score matrix reflects the level of +redundancy in the probe sequence and hence will put more emphasis on +those +characters that are better defined. The score matrix is: +.lit + DNA SCORE MATRIX USING IUB SYMBOLS + + T C A G - R Y W S M K H B V D N ? + + T 36 0 0 0 9 0 18 18 0 0 18 12 12 0 12 9 0 + C 0 36 0 0 9 0 18 0 18 18 0 12 12 12 0 9 0 + A 0 0 36 0 9 18 0 18 0 18 0 12 0 12 12 9 0 + G 0 0 0 36 9 18 0 0 18 0 18 0 12 12 12 9 0 + - 9 9 9 9 36 18 18 18 18 18 18 27 27 27 27 36 0 + R 0 0 18 18 18 36 0 9 9 9 9 6 6 12 12 18 0 + Y 18 18 0 0 18 0 36 9 9 9 9 12 12 6 6 18 0 + W 18 0 18 0 18 9 9 36 0 9 9 12 6 6 12 18 0 + S 0 18 0 18 18 9 9 0 36 9 9 6 12 12 6 18 0 + M 0 18 18 0 18 9 9 9 9 36 0 12 6 12 6 18 0 + K 18 0 0 18 18 9 9 9 9 0 36 6 12 6 12 18 0 + H 12 12 12 0 27 6 12 12 6 12 6 36 8 8 8 27 0 + B 12 12 0 12 27 6 12 6 12 6 12 8 36 8 8 27 0 + V 0 12 12 12 27 12 6 6 12 12 6 8 8 36 8 27 0 + D 12 0 12 12 27 12 6 12 6 6 12 8 8 8 36 27 0 + N 9 9 9 9 36 18 18 18 18 18 18 27 27 27 27 36 0 + ? 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + + ? is any unrecognised character. + + Typical dialogue is shown below. + +? Menu or option number=19 + Find matches using a score matrix +? (y/n) (y) Keep picture +? String=AAATTTCCC +STRING=AAATTTCCC +? (y/n) (y) This sense +Minimum score= 0 Maximum score= 324 +? Score (0-324) (280) =250 + + Missing graphics display here + +For score 250 the number of matches= 1 +Scores 252 +Positions 365 +? Display (0-1) (0) =1 + + 365 + ACATTTCGC + * ***** * + AAATTTCCC + 1 +? (y/n) (y) Keep picture +Default String=AAATTTCCC +? String= +STRING=AAATTTCCC +? (y/n) (y) This sense n +STRING=GGGAAATTT +Minimum score= 0 Maximum score= 324 +? Score (0-324) (222) = 200 + + Missing graphics display here + +For score 200 the number of matches= 7 +Scores 216 216 216 216 216 216 216 +Positions 269 270 271 288 354 624 853 +? Display (0-7) (0) =3 + + 269 + GAGGGATTT + * * **** + GGGAAATTT + 1 + + 270 + AGGGATTTT + ** * *** + GGGAAATTT + 1 + + 271 + GGGATTTTC + **** ** + GGGAAATTT + 1 +? (y/n) (y) Keep picture ! + +.end lit +.left margin1 +@20. TX 7 @ Search for a motif using a weight matrix +.LEFT MARGIN2 +.para +This function performs searches for short sequence +motifs using an appropriate weight matrix. In addition it can be used to +create or modify weight matrices. In order to perform a search the only +input +required is the name of the file containing the weight matrix. +The results can be presented graphically or listed. The graphical +presentation will draw line at the position of any matches found; the +height of the line is proportional to the score. +.para +For a search, select "use weight matrix", supply the name of the file +containing the weight matrix, and choose between having results plotted +or listed. If dialogue is requested when the function is selected users can +alter the cutoff score employed. +.para +To create a weight matrix several steps are involved. A file containing an +alignment of known motifs is required. (This file must be created before +the current option is selected. The format is a follows: each sequence is +written on a separate line with at least one space at the beginning; each +sequence is terminated by a space character, and can be followed by a +name. The sequences must be aligned.) Supply the name of the file of +aligned sequences. The program reads and displays the sequences. Choose +between "summing logs of weights" or summing weights (i.e. whether to +multiply or add weights). If logs are used all scores will be negative. +Choose if all positions in the set of aligned sequences should be used or +if a mask should be applied. If so selected, define a mask as a string of +symbols, in which symbol - means ignore and any other symbol means +use. E.g. xx-x--abc means use all positions except 3,5 and 6. +.para +The program will calculate weights as the frequencies of each base at +each unmasked position in the set of aligned sequences. These weights +are then applied to the set of aligned sequences to give a range of +"observed" scores. The mean and standard deviation of these scores is +displayed. The user is asked to supply several values to be used when the +weight matrix is applied to other sequences: a cutoff score (by default, +the mean minus 3 standard deviations), a top score for scaling graphical +results (by default, the mean plus 3 standard deviations), and a position +to identify (this means that if a particular base within the motif is used +as a "landmark", such as the A of the AG in splice acceptor sites, then its +position will be marked in plots). All these values are stored along with +the weight matrix. Finally supply the name of a file to contain the weight +matrix. +.para +Weight matrices can be "rescaled" using a set of aligned sequences in +much the same ways as a matrix is created. The purpose is to redefine +the cutoff scores, and rescaling does not alter any other values in the +weight matrix file. +.para +The methods have changed considerably but were first outlined in +Staden, R. Nucl. Acid Res. 12 505-519 1984, and +Staden, R. Genetic +engineering: principles and methods vol 7, Edited by J.K. Setlow and A. +Hollaender, Plenum publishing corp., 1985. +.para + The methods have always had to deal with the problem of zeroes in the +matrices. The current versions +employ "Laplaces Law of Succession" in which 1 is +added to each term. +.para +It is now possible to apply a mask to a set of aligned sequences in +order to give weight to selected positions only. +Sequences have superimposed functions: some parts may be of general +structural +importance and give rise to an overall framework, and other parts give +specificity and hence are not common; we may want to use a set of +aligned +sequences to define a motif, but want to use only the framework +positions. + Alternatively we may want to pick out +only those parts of a set of aligned sequences that give a particular +property, and to ignore other similarities that are due to some other +property +and which could obscure the pattern +we are interested in. The ability to define a mask allows certain +positions +to be used in the motif and others to be ignored, and yet still permits the +use of a set of aligned sequences to calculate weights. +.para +Typical dialogue is shown below. +.lit + +? Menu or option number=20 +X 1 Use weight matrix + 2 Make weight matrix + 3 Rescale weight matrix +? 0,1,2,3 =2 +? Name of aligned sequences file=[RS.MOTIFS]GCN4.SEQ + + 1 AGCGTGACTCTTCCCGGAA HIS1 + 2 GAGGTGACTCACTTGGAAG HIS1 + 3 CGGATGACTCTTTTTTTTT HIS3 + 4 ACAGTGACTCACGTTTTTT HIS4 + 5 GTCGTGACTCATATGCTTT ARG3 + 6 TGAATGACTCACTTTTTGG ARG4 + 7 TTCTTGACTCGTCTTTTCT CPA1 + 8 CGAATGACTCTTATTGATG CPA2 + 9 AGAATGACTAATTTTACTA TRP5 + 10 TCGTTGACTCATTCTAATC TRP3 + 11 TTGCTGACTCATTACGATT TRP2 + 12 GAGATGACTCTTTTTCTTT IV1 + 13 GCGATGATTCATTTCTCTG IV2 + 14 TAGATGACTCAGTTTAGTC LEU1 + 15 TAAGTGACTCAGTTCTTTC LEU4 + 16 ATGATGACTCTTAAGCATG ILS1 +Length of motif 19 +? (y/n) (y) Sum logs of weights + +? (y/n) (y) Use all motif positions n +x means use, - means ignore +e.g. xx-x---x-x means use positions 1,2,4,8,10 +? Mask=----XXXXXXXX + Applying weights to input sequences + 1 -27.979 AGCGTGACTCTTCCCGGAA + 2 -24.543 GAGGTGACTCACTTGGAAG + 3 -20.890 CGGATGACTCTTTTTTTTT + 4 -23.087 ACAGTGACTCACGTTTTTT + 5 -22.771 GTCGTGACTCATATGCTTT + 6 -23.408 TGAATGACTCACTTTTTGG + 7 -25.159 TTCTTGACTCGTCTTTTCT + 8 -22.679 CGAATGACTCTTATTGATG + 9 -24.751 AGAATGACTAATTTTACTA + 10 -23.157 TCGTTGACTCATTCTAATC + 11 -23.067 TTGCTGACTCATTACGATT + 12 -21.449 GAGATGACTCTTTTTCTTT + 13 -24.191 GCGATGATTCATTTCTCTG + 14 -23.770 TAGATGACTCAGTTTAGTC + 15 -22.923 TAAGTGACTCAGTTCTTTC + 16 -25.285 ATGATGACTCTTAAGCATG +Top score -20.890 Bottom score -27.979 +Mean -23.694 Standard deviation 1.613 +Mean minus 3.sd -28.534 Mean plus 3.sd -18.854 +? Cutoff score (-999.00-9999.00) (-28.53) = +? Top score for scaling plots (-28.53-999.00) (-18.85) = +? Position to identify (0-19) (1) = +? Title=GCN4 SEQUENCES +? Name for new weight matrix file=1.WTS + + +? Menu or option number=20 +X 1 Use weight matrix + 2 Make weight matrix + 3 Rescale weight matrix +? 0,1,2,3 =3 +? Name of existing weight matrix file=1.WTS + GCN4 SEQUENCES +? Name of aligned sequences file=[RS.MOTIFS]GCN4.SEQ +Length of motif 19 +? (y/n) (y) Sum logs of weights n +? (y/n) (y) Use all motif positions + + Applying weights to input sequences + 1 128.000 AGCGTGACTCTTCCCGGAA + 2 148.000 GAGGTGACTCACTTGGAAG + 3 172.000 CGGATGACTCTTTTTTTTT + 4 160.000 ACAGTGACTCACGTTTTTT + 5 161.000 GTCGTGACTCATATGCTTT + 6 157.000 TGAATGACTCACTTTTTGG + 7 149.000 TTCTTGACTCGTCTTTTCT + 8 160.000 CGAATGACTCTTATTGATG + 9 151.000 AGAATGACTAATTTTACTA + 10 159.000 TCGTTGACTCATTCTAATC + 11 158.000 TTGCTGACTCATTACGATT + 12 169.000 GAGATGACTCTTTTTCTTT + 13 152.000 GCGATGATTCATTTCTCTG + 14 157.000 TAGATGACTCAGTTTAGTC + 15 160.000 TAAGTGACTCAGTTCTTTC + 16 143.000 ATGATGACTCTTAAGCATG +Top score 172.000 Bottom score 128.000 +Mean 155.250 Standard deviation 10.034 +Mean minus 3.sd 125.147 Mean plus 3.sd 185.353 +? Cutoff score (-999.00-9999.00) (125.15) = +? Top score for scaling plots (125.15-999.00) (185.35) = +? Position to identify (0-19) (1) = +? Title=GCN4 SEQUENCES +? Name for new weight matrix file=2.WTS + + +? Menu or option number=20 +X 1 Use weight matrix + 2 Make weight matrix + 3 Rescale weight matrix +? 0,1,2,3 = +? Motif weight matrix file=1.WTS + GCN4 SEQUENCES +? (y/n) (y) Plot results n + + 153 -22.61 GCAGCGACTGATTTGAGTT + 169 -28.53 GTTCTGACCACTCAGATCC + 172 -27.27 CTGACCACTCAGATCCGGC + 219 -27.35 CCAGTGGCTGGCCTGCTAG + 268 -27.82 CGAGGGATTTTCGATCTTG + 274 -26.99 ATTTTCGATCTTGTGGATG + 283 -25.79 CTTGTGGATGATTTTCACG + 287 -27.50 TGGATGATTTTCACGTGCG + 298 -28.17 CACGTGCGCCGTCATATTG + 332 -28.27 TCTTTGAAGCAGAAGGGAC + 351 -28.27 AGGGGTACACTTTCACATT + 357 -25.05 ACACTTTCACATTTCGCTT + 364 -28.51 CACATTTCGCTTATGGGAG + 400 -23.77 GAAGTTACTAATGTGCGTG + 451 -26.22 ATGCTCGCCCTCTTTGGTG + 476 -28.00 TCCCTCACTGAGCCCTCCG + 480 -28.33 TCACTGAGCCCTCCGCCTC + 517 -23.46 GCTAAGATTCAGCTTGGTT + 556 -27.27 TCCAGCACTCAGGTTCGGC + 602 -27.01 AACTTGAATCCATCGTTGC + 648 -28.45 TGCTAAACACAGCCGGTTT + 679 -28.18 CTGTTTGCCCAGTTTGGGC + 691 -28.51 TTTGGGCCGCTTCTGGACG + 713 -27.67 GGCTTGACCGTGGCTGTGG + 803 -25.47 ATGCTGACCATGCTTTTCA + 848 -28.11 ATAATGTTAAGTTTGATTC + 857 -25.97 AGTTTGATTCCGCTGGCCG + 879 -27.85 CCGCTGCTGCTGTTTCCAC + 917 -27.77 GCGATGAGGAAGGCTTGTT + 931 -27.81 TTGTTGGCGCGCCTGCTCG + 952 -23.52 GAGGTGACTACCATCCGTG + 977 -28.40 TGCGTGGGTGAGCTGTTGT + + + + +? Menu or option number=6 +Page through text files +? Name of file to read=1.WTS + GCN4 SEQUENCES + 19 1 -28.534 -18.854 + P 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 + N 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 +16 + T 0 0 0 0 16 0 0 1 16 0 5 11 10 12 9 6 7 12 6 + C 0 0 0 0 0 0 0 15 0 15 0 3 2 2 4 3 2 1 3 + A 0 0 0 0 0 0 16 0 0 1 10 0 3 2 0 3 5 2 2 + G 0 0 0 0 0 16 0 0 0 0 1 2 1 0 3 4 2 1 5 +End of file + +.end lit + + +.left margin1 +@21. TX 3 @ Count base composition +.LEFT MARGIN2 +.para +This routine +calculates the base composition of the +active region of the sequence as both totals and percentages. +.left margin1 +@22. TX 3 @ Count dinucleotide frequencies +.LEFT MARGIN2 +.para +This routine simply counts dinucleotide frequencies for the currently +active region of the sequence. It also calculates an expected distribution +based on the base composition. +The output looks like: +.LIT + T C A G + obs expected obs expected obs expected obs expected + + T 8.44 8.25 6.67 7.01 10.35 9.92 3.27 3.54 + C 7.49 7.01 6.76 5.95 8.39 8.43 1.76 3.01 + A 10.13 9.92 7.78 8.43 11.74 11.93 4.89 4.26 + G 2.67 3.54 3.19 3.01 4.06 4.26 2.42 1.52 + +.END LIT +.left margin1 +@23. TX 3 5 @ Count codons and amino acids +.LEFT MARGIN2 +.para +This function +counts codons, amino acid composition, protein molecular weights, and +base +composition. Users select the segments of the sequence that the program +should analyse. +.para +Choose between being shown observed counts or counts normalised so +that the totals for each amino acid sum to 100. Select to define +segments using either the keyboard or an EMBL feature table. +Define the segments to count over. Select strand for each segment. Stop +selecting segments by typing a zero for "Count from ()". The results are +displayed a screenful at a time, and the bell is sounded to show there is +more to come. A zero start position, or the end of an EMBL feature table, +signals +the routine to print out totals for all values. + +.para +The counts are broken down into several figures. + Base +composition by position in codon expressed as a percentage of each bases +own frequency; base composition by position in codon expressed as a +percentage of the overall base composition of the section; base +composition +expected for this amino acid composition if there was no codon +preference; +percentage deviations of the observed amino acid composition from an +average amino acid composition. +.para +The output looks like: +.LIT + + =========================================== + F TTT 1. S TCT 2. Y TAT 2. C TGT 1. + F TTC 1. S TCC 1. Y TAC 3. C TGC 2. + L TTA 7. S TCA 4. * TAA 9. * TGA 1. + L TTG 2. S TCG 1. * TAG 2. W TGG 2. + =========================================== + L CTT 3. P CCT 2. H CAT 4. R CGT 1. + L CTC 2. P CCC 3. H CAC 1. R CGC 0. + L CTA 3. P CCA 2. Q CAA 4. R CGA 0. + L CTG 2. P CCG 2. Q CAG 1. R CGG 2. + =========================================== + I ATT 9. T ACT 1. N AAT 7. S AGT 3. + I ATC 2. T ACC 2. N AAC 4. S AGC 2. + I ATA 4. T ACA 5. K AAA 13. R AGA 5. + M ATG 1. T ACG 2. K AAG 4. R AGG 1. + =========================================== + V GTT 2. A GCT 2. D GAT 1. G GGT 3. + V GTC 2. A GCC 2. D GAC 1. G GGC 1. + V GTA 4. A GCA 3. E GAA 2. G GGA 1. + V GTG 2. A GCG 0. E GAG 1. G GGG 1. + =========================================== + total codons= 166. + T C A G + + 1 31.06 33.68 34.03 35.00 + 2 35.61 35.79 30.89 32.50 + 3 33.33 30.53 35.08 32.50 + + 1 24.70 19.28 39.16 16.87 + 2 28.31 20.48 35.54 15.66 + 3 26.51 17.47 40.36 15.66 + % 26.51 19.08 38.35 16.06 observed, overall totals + % 25.00 22.26 33.10 19.65 expected, even codons per acid + + A C D E F G H I K L + 7. 3. 2. 3. 2. 6. 5. 15. 17. 19. + o-e % -47. -33. -76. -68. -64. -54. 62. 116. 67. 67. + + M N P Q R S T V W Y + 1. 11. 9. 5. 9. 13. 10. 10. 2. 5. + o-e % -62. 66. 12. -17. 19. 21. 6. -2. 0. -5. + total acids= 154. molecular weight= 17421. + + Typical dialogue follows. + +? Menu or option number=23 + Calculate codon usage, base composition + and amino acid composition +? (y/n) (y) Show observed counts +? (y/n) (y) Define segments using keyboard +? Count from (0-1023) (0) =1 +? Count to (1-1023) (1023) =1000 +? (y/n) (y) + strand + + =========================================== + F TTT 13. S TCT 1. Y TAT 1. C TGT 3. + F TTC 4. S TCC 10. Y TAC 1. C TGC 7. + L TTA 1. S TCA 0. * TAA 1. * TGA 4. + L TTG 4. S TCG 1. * TAG 3. W TGG 5. + =========================================== + L CTT 9. P CCT 1. H CAT 3. R CGT 14. + L CTC 7. P CCC 0. H CAC 7. R CGC 14. + L CTA 0. P CCA 0. Q CAA 4. R CGA 9. + L CTG 12. P CCG 1. Q CAG 9. R CGG 8. + =========================================== + I ATT 7. T ACT 4. N AAT 4. S AGT 1. + I ATC 4. T ACC 5. N AAC 3. S AGC 7. + I ATA 1. T ACA 1. K AAA 3. R AGA 2. + M ATG 2. T ACG 1. K AAG 2. R AGG 2. + =========================================== + V GTT 11. A GCT 13. D GAT 6. G GGT 9. + V GTC 5. A GCC 10. D GAC 9. G GGC 11. + V GTA 6. A GCA 5. E GAA 6. G GGA 12. + V GTG 8. A GCG 5. E GAG 3. G GGG 8. + =========================================== + + + Total codons= 333. + T C A G + + 1 23.32 37.69 28.99 40.06 + 2 37.15 22.31 38.46 36.59 + 3 39.53 40.00 32.54 23.34 + ----- ----- ----- ----- + = 100% 100% 100% 100% + + 1 17.72 29.43 14.71 38.14 = 100% + 2 28.23 17.42 19.52 34.83 = 100% + 3 30.03 31.23 16.52 22.22 = 100% + % 25.33 26.03 16.92 31.73 Observed, overall totals + % 24.44 22.31 20.90 32.35 Expected, even codons per acid + + A C D E F G H I K L + 33. 10. 15. 9. 17. 40. 10. 12. 5. 33. +O-E % 22. 81. -13. -55. 34. 71. 40. -29. -73. 13. + + M N P Q R S T V W Y + 2. 7. 2. 13. 49. 20. 11. 30. 5. 2. +O-E % -74. -51. -88. 0. 165. -11. -42. 40. 18. -81. +Total acids= 325. Molecular weight= 35831. Hydrophobicity= -17.8 + + +? Count from (0-1023) (0) = + + Codon totals over all genes + =========================================== + F TTT 13. S TCT 1. Y TAT 1. C TGT 3. + F TTC 4. S TCC 10. Y TAC 1. C TGC 7. + L TTA 1. S TCA 0. * TAA 1. * TGA 4. + L TTG 4. S TCG 1. * TAG 3. W TGG 5. + =========================================== + L CTT 9. P CCT 1. H CAT 3. R CGT 14. + L CTC 7. P CCC 0. H CAC 7. R CGC 14. + L CTA 0. P CCA 0. Q CAA 4. R CGA 9. + L CTG 12. P CCG 1. Q CAG 9. R CGG 8. + =========================================== + I ATT 7. T ACT 4. N AAT 4. S AGT 1. + I ATC 4. T ACC 5. N AAC 3. S AGC 7. + I ATA 1. T ACA 1. K AAA 3. R AGA 2. + M ATG 2. T ACG 1. K AAG 2. R AGG 2. + =========================================== + V GTT 11. A GCT 13. D GAT 6. G GGT 9. + V GTC 5. A GCC 10. D GAC 9. G GGC 11. + V GTA 6. A GCA 5. E GAA 6. G GGA 12. + V GTG 8. A GCG 5. E GAG 3. G GGG 8. + =========================================== + + + Total codons= 333. + T C A G + + 1 23.32 37.69 28.99 40.06 + 2 37.15 22.31 38.46 36.59 + 3 39.53 40.00 32.54 23.34 + ----- ----- ----- ----- + = 100% 100% 100% 100% + + 1 17.72 29.43 14.71 38.14 = 100% + 2 28.23 17.42 19.52 34.83 = 100% + 3 30.03 31.23 16.52 22.22 = 100% + % 25.33 26.03 16.92 31.73 Observed, overall totals + % 24.44 22.31 20.90 32.35 Expected, even codons per acid + + A C D E F G H I K L + 33. 10. 15. 9. 17. 40. 10. 12. 5. 33. +O-E % 22. 81. -13. -55. 34. 71. 40. -29. -73. 13. + + M N P Q R S T V W Y + 2. 7. 2. 13. 49. 20. 11. 30. 5. 2. +O-E % -74. -51. -88. 0. 165. -11. -42. 40. 18. -81. +Total acids= 325. Molecular weight= 35831. Hydrophobicity= -17.8 + +.END LIT +.LEFT MARGIN1 +@24. TX 3 @ Plot base composition +.LEFT MARGIN2 +.para +This option plots the base composition of the sequence. The counts for +any combination of bases can be plotted. +.para +If dialogue is requested the user is presented with a check box for +selecting which bases should be counted, and then allowed to define a +window length, and a "plot interval". Otherwise, the AT composition is +plotted with a window of 101 and a plot interval of 5. +.para +Typical dialogue follows. +.lit +? Menu or option number=d24 + Plot base composition + +checkbox: those set are marked X +X 1 T + 2 C +X 3 A + 4 G +? 0,1,2,3,4 =1 + +checkbox: those set are marked X + 1 T + 2 C +X 3 A + 4 G +? 0,1,2,3,4 =3 + +checkbox: those set are marked X + 1 T + 2 C + 3 A + 4 G +? 0,1,2,3,4 =2 + +checkbox: those set are marked X + 1 T +X 2 C + 3 A + 4 G +? 0,1,2,3,4 =4 + +checkbox: those set are marked X + 1 T +X 2 C + 3 A +X 4 G +? 0,1,2,3,4 = + +? odd span length (1-201) (31) = +? plot interval (1-11) (5) = + + missing graphics + + + +.end lit +.left margIN1 +@25. TX 3 @ Plot local deviations in base composition +.LEFT MARGIN2 +.para +The "local deviation" routines are designed to indicate the similarity of +the compositions of different parts of the sequence. The composition of +every segment of the sequence is compared with a standard composition. +The levels of similarity are plotted as a chi squared values. The standard +can be the composition of the whole sequence, or alternatively that of a +small segment defined by the user. +.para +If dialogue is forced define the standard region, the window length and +the plot interval. Otherwise the composition of the whole sequence is +taken as a standard. The maximum and minimum observed value of the chi +squared calculation is displayed, and plots will always exactly fill the +available box. Any unusual regions will show as peaks. +.para +The following measure is used: for each window position +calculate (sum((obs-exp)*(obs-exp))/(exp*exp)) +where obs is the observed composition +and exp is the expected composition (the composition of the standard). + The calculation is performed once to find out the range of values and is +then repeated and +plotted so that the plot exactly fills the allocated screen space. +.left margIN1 +@26. TX 3 @ Plot local deviations from dinucleotide composition +.LEFT MARGIN2 +.para +The "local deviation" routines are designed to indicate the similarity of +the compositions of different parts of the sequence. The dinucleotide +composition of every segment of the sequence is compared with a +standard composition. The levels of similarity are plotted as a chi +squared values. The standard can be the composition of the whole +sequence, or alternatively that of a small segment defined by the user. +.para +If dialogue is forced define the standard region, the window length and +the plot interval. Otherwise the composition of the whole sequence is +taken as a standard. The maximum and minimum observed value of the chi +squared calculation is displayed, and plots will always exactly fill the +available box. Any unusual regions will show as peaks. +.para +The following measure is used: for each window position +calculate (sum((obs-exp)*(obs-exp))/(exp*exp)) +where obs is the observed composition +and exp is the expected composition (the composition of the standard). + The calculation is performed once to find out the range of values and is +then repeated and +plotted so that the plot exactly fills the allocated screen space. +.left margin1 +@27. TX 3 @ Plot local deviations from trinucleotide composition +.LEFT MARGIN2 +.para +The "local deviation" routines are designed to indicate the similarity of +the compositions of different parts of the sequence. The trinucleotide +composition of every segment of the sequence is compared with a +standard composition. The levels of similarity are plotted as a chi +squared values. The standard can be the composition of the whole +sequence, or alternatively that of a small segment defined by the user. +.para +If dialogue is forced define the standard region, the window length and +the plot interval. Otherwise the composition of the whole sequence is +taken as a standard. The maximum and minimum observed value of the chi +squared calculation is displayed, and plots will always exactly fill the +available box. Any unusual regions will show as peaks. +.para +The following measure is used: for each window position +calculate (sum((obs-exp)*(obs-exp))/(exp*exp)) +where obs is the observed composition +and exp is the expected composition (the composition of the standard). + The calculation is performed once to find out the range of values and is +then repeated and +plotted so that the plot exactly fills the allocated screen space. +.left margin1 +@28. TX 5 @ Calculate codon constraint +.left margin2 +.para +The purpose of this option (which is somewhat specialised) is to measure +the level of constraint imposed on the sequence by coding for a protein of +the observed composition. It measures the strength of the codon bias +averaged over windows of 99 codons and displays the values observed. +.para +Select between defining segments at the keyboard or using an EMBL +feature table. Finish selecting segments by typing a zero start. The value +for each segment is displayed: +.para + Mean (W-EW) / EWD, window 99 10.5 +.para +The codon constraint is the +difference between the observed codon improbability and the mean +improbabilty for +a sequence of the same composition. See McLachlan, Staden and Boswell +Nucl. Acid Res. 1984 + +.left margin1 +@59. TX 3 @ Plot negentropy +.LEFT MARGIN2 +.para +This routine is designed to show regions of the sequence that differ in +composition from others, and hence is like the "plot deviation.." routines. +.para +Negentropy or information is defined in the following way: let Pi be the +probability of observing base i, where i = A,C,G or T, then the average +information per base is +I=-sum(Pi.Log(Pi)) (sum over all i). This routine calculates Pi by +calculating the overall composition for the sequence and then plots I for +windows of length defined by the user. +.left margin1 +@30. TX 4 @ Search for hairpin loops +.LEFT MARGIN2 +.para +Used to find simple inverted repeats or potential hairpin loops + The loops are defined by a range of sizes for +the loop and a minimum number of consecutive base pairs in the stem. +The results can be presented graphically or listed. +A-T, G-C and G-T basepairs are counted. +.para +Define the range of loop sizes and the minimum number of consecutive +basepairs required. Choose between plotted or listed results. +.para +The loops found are plotted as blips on a +horizontal line that represents the sequence, the heights of the lines are +proportional to the number of basepairs in the stems. Note that only +uninterrupted stems are found - i.e. all basepairs must be made. To look +for stems with some unpaired bases (or for palindromes) use the inverted +repeat motif class in the pattern searching option. +.para +Typical dialogue follows. +.lit +? Menu or option number=30 + Search for hairpin loops +Define the range of loop sizes +? Minimum loop size (1-30) (1) = +? Maximum loop size (3-60) (3) = +? Minimum number of basepairs (2-20) (6) = +? (y/n) (y) Plot results n + Searching + + T.G + G-C + G.T + T.G + C-G + G-C + T.G + C-G + G.T + GCCGCA GCGGAGG + 49 + + G + G-C + T.G + C-G + G.T + T.G + G-C + CTGCTG GGAGGTC + 56 + + + G + T.G + G-C + G.T + T.G + C-G + G-C + T-A + T.G + AGCGCA CGACTGA + 139 + + A C + G.T + C-G + G.T + C-G + C-G + G-C + TTCGCT CAACGCC + 244 + +.end lit +.LEFT MARGIN1 +@31. TX 4 @ Search for long range inverted repeats +.LEFT MARGIN2 +.para +Searches for inverted repeats. The repeats found are exact matches of at +least 6 consecutive bases. Results can be presented graphically or listed. +Plotted results show the end points of repeats joined by rectangular +lines. +.para +If dialogue is not requested the defaults will be taken. Otherwise choose +between plotted or listed results. If required select to analyse a +restricted segment of the currently active region. Choose a repeat length. +.para +Typical dialogue follows. +.lit +? Menu or option number=D31 + Plot long-range inverted repeats +? (y/n) (y) Plot results n +Define restricted region +? start (1-1023) (1) = +? end (2-1023) (1023) = +? Minimum inverted repeat (6-30) (12) =10 + Searching + 27 909 10 TGCCCAGAGA + +.end lit +.LEFT MARGIN1 +@32. TX 4 @ Search for repeats +.LEFT MARGIN2 +.para +Searches for direct repeats. The repeats found are exact matches of at +least 6 consecutive bases. Results can be presented graphically or listed. +Plotted results show the end points of repeats joined by rectangular +lines. +.para +If dialogue is not requested the defaults will be taken. Otherwise choose +between plotted or listed results. If required select to analyse a +restricted segment of the currently active region. Choose a repeat length. +.para +Typical dialogue follows. + +.lit + ? Menu or option number=D32 + Plot repeats +? (y/n) (y) Plot results n +Define restricted region +? start (1-1023) (1) = +? end (2-1023) (1023) = +? Minimum repeat (6-30) (12) =8 + Searching + 619 988 8 GCTGTTGT + 514 646 8 GCTGCTAA + 94 865 8 TCCGCTGG + 146 222 9 GTGGCTGGC + 455 497 8 TCGCCCTC + 454 496 9 CTCGCCCTC + 872 875 8 GCCGCCGC + 510 615 8 CGTTGCTG + 152 913 8 GGCAGCGA + 199 265 8 CGTCGAGG + 689 794 8 AGTTTGGG + 147 223 8 TGGCTGGC + 101 116 8 GACGAGGA + 8 690 8 GTTTGGGC + 52 141 8 TGCTGGTG + +.end lit +.left margin1 +@33. TX 4 @ Search for z dna (total ry, yr) +.LEFT MARGIN2 +.para +Searches for segments of the sequence that might form Z DNA. A window +length is chosen and the number of RY and YR dinucleotides within each +window is plotted. The top of the box corresponds to all RY or YR, the +bottom to zero RY or YR. +.para +If dialogue is requested, select a window length and plot interval. +Otherwise the defaults will be used. +.para +The program contains three +separate ways of doing this (options 33,34,35). +.left margin1 +@34. TX 4 @ Search for z dna (runs of ry, yr) +.LEFT MARGIN2 +.para +Searches for segments of the sequence that might form Z DNA. Results +are plotted. +.para +If dialogue is requested define a window length and plot interval. +Otherwise the defaults will be used. + The routine +counts the number of R in positions 1,3,5 etc =R1, the number of Y in +positions 2,4,6 etc =Y1, the number of Y in positions 1,3,5 etc =Y2 and +the +number of R in positions 2,4,6 etc =R2 for a window length. It plots the +maximum of R1+Y1 and R2+Y2 relative to a minimum of (window +length)/2 and a +maximum of (window length). (see 33,35). +.LEFT MARGIN1 +@35. TX 4 @ Search for z dna (best phased value) +.LEFT MARGIN2 +.para +Searches for segments of the sequence that might form Z DNA. Results +are plotted. +.para +If dialogue is requested define a window length and a plot interval. +Ohterwise the defaults values will be used. +.para + The routine +counts the number of consecutive RY or YR dinucleotides in phase. It +moves +through the sequence counting the number of RY or YR dinucleotides; when +the next dinucleotide is not of the correct type the score is set back to +zero and the search restarted using the current base to set the phase. The +plots are done relative to a minimum of zero and a maximum defined by +the +user. (See 33,34). +.LEFT MARGIN1 +@36. TX 4 @ Local similarity or complementarity search +.LEFT MARGIN2 +.PARA +This function is designed to find segments of +local similarity or complementarity. It is therefore like performing +a DIAGON +plot that is +restricted to regions near the main diagonal. Results can be presented +graphically or listed. +.para +Users define +a region to search through, +a span length, a range for searching through and a cut-off score. The +program takes all sections of sequence +of length span within the defined region + and compares them to +all other sequences within the region and +range specified. +If a match above the cutoff is found we +need to show the position +of the two sections of sequence and the score, and we do it in the +following way. +If we have a 70% +match between +a sequence that starts at p1 and a sequence that starts at p2 +the program draws a +diagonal line that starts at p1 with height 70% of the box and which +finishes at p2 with +height 0. +The matches can also be listed. +.para +Here I define the terms range, region, and span and what is compared. +Suppose we have a defined region j1 to j2, a range of i1 to i2 and a span +of +s; the program will take, in turn, all sections of sequence of length s +within j1 and j2 and compare them to all sequences that start a distance +i1+s-1 +to i2+s-1 away from them. First it will take the sequence of length s +starting +at j1 and compare it +with the sequence of length s starting at +j1+s-1+i1, then j1+s-1+i1+1, etc up to j1+s-1+i2; then it will take the +sequence of length s starting at j1+1 and compare it with the sequence +starting at j1+s-1+1+i1 etc. This continues until we hit + the right hand end of the +sequence as defined by j2. Note 1)that sequences are not compared with +themselves: the nearest sequence compared to a span s starting at j +starts +at j+s; 2) ranges i1 and i2 are ranges of start positions; 3) by choosing a +range greater than the length of the sequence this routine will do a full +DIAGON analysis except for those points within a distance span of + the main diagonal (see note 1). +.para +Typical dialog follows. +.lit + +? Menu or option number=36 + Search for local similarity or complementarity +? (y/n) (y) Find direct repeats +? (y/n) (y) Keep picture n +? Span (5-200) (15) = +Define restricted region +? start (0-1023) (1) = +? end (2-1023) (1023) = +? Percent match (1.00-100.00) (70.00) = +? Range start (1-50) (1) = +? Range end (1-50) (1) =5 +? (y/n) (y) Plot results n + Working + + + 118 128 + CGAGGAGGAG GTGGA + ** ***** ** ** + GGACGAGGAC GTCGA + 100 110 + + + 119 129 + GAGGAGGAGG TGGAT + ** ***** * * ** + GACGAGGACG TCGAC + 101 111 +? (y/n) (y) Find direct repeats n +? (y/n) (y) Keep picture +? Span (5-200) (15) = +Define restricted region +? start (0-1023) (1) = +? end (2-1023) (1023) = +? Percent match (1.00-100.00) (70.00) = +? Range start (1-50) (1) = +? Range end (1-50) (5) =8 +? (y/n) (y) List results + + Working + + + 178 188 + ACTCAGATCC GGCGG + ***** *** * ** + ACTCAAATCA GTCGC + 156 166 + + + 177 187 + CACTCAGATC CGGCG + ***** *** * ** + AACTCAAATC AGTCG + 157 167 +? (y/n) (y) Find inverted repeats ! +.end lit + +.left margin1 +@37. TX 5 @ Set genetic code +.LEFT MARGIN2 +.para +This function allows the user to change the current active genetic code +for +all the options. The user may select: the standard code, the mammalian +mitochondrial code, the yeast mitochondrial code or a personal code +(define +your own). +.para +Select code. If personal, define a codon and select an amino acid. When all +codons have been reset define a blank codon. +.para +The code differences are: +.lit + Mammalian Yeast + Codon Mitochondrial Mitochondrial Standard + UGA W W STOP + AUA M M I + CUA L T L + AGA STOP R R + AGG STOP R R +.END LIT +.para +Typical dialogue follows. + +.lit +? Menu or option number=37 +X 1 Standard code + 2 Mammalian mitochondrial code + 3 Yeast mitochondrial code + 4 Personal code +? 0,1,2,3,4 =2 + +? Menu or option number=37 +X 1 Standard code + 2 Mammalian mitochondrial code + 3 Yeast mitochondrial code + 4 Personal code +? 0,1,2,3,4 =4 +Define genetic code by typing a codon +followed by a 1 letter amino acid symbol +? Codon=TTT +Default Amino acid symbol=F +? Amino acid symbol=W +? Codon= +.end lit + +.left margin1 +@38. T 3 4 @ Examine repeats +.left margin2 +.para +This function can be used to examine the frequencies of repeated words +within a sequence. It finds all words that occur more than once. The +user selects a minimum word length and the program finds all words of that +length that occur more than once; then it "follows" each repeated word until it +becomes unique. For each word length it can report the number of different +repeated words, the number of occurrences of each word, and their actual +positions and sequences. +.para +It is possible that the algorithm may run out of memory, paticularly if a short +mimimum word length is chosen, or if the sequence is very long or very +repetitive. If this occurs the longest reported word length will not +necessarily be the longest in the sequence: the memory will have been consumed +before the longest word is found. +.lit +Typical dialogue and output is shown below. + + Expected length of longest repeat 14 + ? Minumim word length (1-6) (6) =6 + Working + ? Show repeat frequencies for words of at least length (6-15) (15) =10 + For length 10 the number of different repeated words is 2035 + For length 11 the number of different repeated words is 613 + For length 12 the number of different repeated words is 161 + For length 13 the number of different repeated words is 37 + For length 14 the number of different repeated words is 10 + For length 15 the number of different repeated words is 1 + ? Show repeats for words of length (6-15) (15) =14 + ? Show repeats for words occuring with frequency (2-9999) (2) =2 + + ggtgctcatgccca + occurs at 21611 + occurs at 21851 + ttatccggtgatga + occurs at 4604 + occurs at 8806 + agcaccacgctgac + occurs at 5954 + occurs at 9486 + catgacggaggatg + occurs at 10480 + occurs at 19925 + aaagacgggaaaat + occurs at 11820 + occurs at 43157 + tacaaaaccaattt + occurs at 26797 + occurs at 31369 + cgagaaagagtgcg + occurs at 4260 + occurs at 44305 + gccggatgatggcg + occurs at 7893 + occurs at 16638 + atgacggaggatga + occurs at 10481 + occurs at 19926 + gcggcgaacgaggc + occurs at 11352 + occurs at 18718 + ? Show repeats for words of length (6-15) (15) =! + +Example of not enough memory +---------------------------- + + Expected length of longest repeat 14 + ? Minumim word length (1-6) (6) =1 + Working + Not enough memory + Memory used in bytes 1125996. Length of longest repeat 5 + ? Show repeat frequencies for words of at least length (1-5) (5) =! + +.end lit +.left margin1 +@39. TX 5 @ Translate and list in upto six phases +.LEFT MARGIN2 +.para +This is a general listing function that will perform translations and +produce several forms of output. The possibilities are: +.lit +1) no translation, list one or two strands, two ways of numbering the +sequence. +2) translation, one or two strands, one or three letter codes. + Positions defined by: + a) open reading frames of some minimum length l, l can be 0, hence giving +a complete six phase translation. + b) positions typed on keyboard, again 1 to 6 phases, translations appearing +above and below the dna. + c) positions read from a feature table. + +It should be used in preference to option 5. For publication +without a translation, the option to number ends of lines is more compact +than option 5. Some examples and typical dialogue are given below. Note the +requirement for d39. + +? Menu or option number=D39 +Find open reading frames, translate and list +? (y/n) (y) Show translation + +The segments to translate can be + 1 Typed on the keyboard + 2 Read from a feature table +X 3 Open reading frames +? 1,2,3 = +? Minimum open frame in amino acids (0-7238) (30) = +? (y/n) (y) Use 1 letter codes +Define section of DNA to display +? start (1-7238) (1) = +? end (2-7238) (7238) =300 +? Line length (30-120) (60) = +Which strands should be shown +X 1 + strand only + 2 - strand only + 3 Both strands +? 1,2,3 =3 +? (y/n) (y) Number ends of lines + + + N A T T I S R I D A T F S A R A P N E N + AACGCTACTACTATTAGTAGAATTGATGCCACCTTTTCAGCTCGCGCCCCAAATGAAAAT 60 + . : . : . : . : . : . : + TTGCGATGATGATAATCATCTTAACTACGGTGGAAAAGTCGAGCGCGGGGTTTACTTTTA + * S A G W I F I + A V V I L L I S A V K E A R A G F S F + + I A K Q V I D H L R N V S N G Q T K S T + L N R L L T I C E M Y L M V K L N L L + ATAGCTAAACAGGTTATTGACCATTTGCGAAATGTATCTAATGGTCAAACTAAATCTACT 120 + . : . : . : . : . : . : + TATCGATTTGTCCAATAACTGGTAAACGCTTTACATAGATTACCAGTTTGATTTAGATGA + Y S F L N N V M Q S I Y R I T L S F R S + I A L C T I S W K R F T D L P * V L D V + + R S Q N W E S T V T W N E T S R H R T L + V R R I G N Q L L H G M K L P D T V L * + CGTTCGCAGAATTGGGAATCAACTGTTACATGGAATGAAACTTCCAGACACCGTACTTTA 180 + . : . : . : . : . : . : + GCAAGCGTCTTAACCCTTAGTTGACAATGTACCTTACTTTGAAGGTCTGTGGCATGAAAT + T R L I P F + R E C F Q S D V T V H F S V E L C R V K + + V A Y L K H V E L Q H Q I Q Q L S S K P + GTTGCATATTTAAAACATGTTGAGCTACAGCACCAGATTCAGCAATTAAGCTCTAAGCCA 240 + . : . : . : . : . : . : + CAACGTATAAATTTTGTACAACTCGATGTCGTGGTCTAAGTCGTTAATTCGAGATTCGGT + T A Y K F C T S S C C W I + + S A K M T S Y Q K E Q L K V L S N P D L + TCCGCAAAAATGACCTCTTATCAAAAGGAGCAATTAAAGGTACTCTCTAATCCTGACCTG 300 + . : . : . : . : . : . : + AGGCGTTTTTACTGGAGAATAGTTTTCCTCGTTAATTTCCATGAGAGATTAGGACTGGAC + + +? Menu or option number=D39 +Find open reading frames, translate and list +? (y/n) (y) Show translation N +Define section of DNA to display +? start (1-7238) (1) = +? end (2-7238) (7238) =300 +? Line length (30-120) (60) = +Which strands should be shown +X 1 + strand only + 2 - strand only + 3 Both strands +? 1,2,3 = +? (y/n) (y) Number ends of lines + + + AACGCTACTACTATTAGTAGAATTGATGCCACCTTTTCAGCTCGCGCCCCAAATGAAAAT 60 + + ATAGCTAAACAGGTTATTGACCATTTGCGAAATGTATCTAATGGTCAAACTAAATCTACT 120 + + CGTTCGCAGAATTGGGAATCAACTGTTACATGGAATGAAACTTCCAGACACCGTACTTTA 180 + + GTTGCATATTTAAAACATGTTGAGCTACAGCACCAGATTCAGCAATTAAGCTCTAAGCCA 240 + + TCCGCAAAAATGACCTCTTATCAAAAGGAGCAATTAAAGGTACTCTCTAATCCTGACCTG 300 + + +? Menu or option number=D39 +Find open reading frames, translate and list +? (y/n) (y) Show translation +The segments to translate can be + 1 Typed on the keyboard + 2 Read from a feature table +X 3 Open reading frames +? 1,2,3 = +? Minimum open frame in amino acids (0-7238) (30) =0 +? (y/n) (y) Use 1 letter codes N +Define section of DNA to display +? start (1-7238) (1) = +? end (2-7238) (7238) =300 +? Line length (30-120) (60) = +Which strands should be shown +X 1 + strand only + 2 - strand only + 3 Both strands +? 1,2,3 =3 +? (y/n) (y) Number ends of lines + + + AsnAlaThrThrIleSerArgIleAspAlaThrPheSerAlaArgAlaProAsnGluAsn + ThrLeuLeuLeuLeuValGluLeuMetProProPheGlnLeuAlaProGlnMetLysIle + ArgTyrTyrTyr******Asn***CysHisLeuPheSerSerArgProLys***Lys + AACGCTACTACTATTAGTAGAATTGATGCCACCTTTTCAGCTCGCGCCCCAAATGAAAAT 60 + . : . : . : . : . : . : + TTGCGATGATGATAATCATCTTAACTACGGTGGAAAAGTCGAGCGCGGGGTTTACTTTTA + ValSerSerSerAsnThrSerAsnIleGlyGlyLys***SerAlaGlyTrpIlePheIle + Arg************TyrPheGlnHisTrpArgLysLeuGluArgGlyLeuHisPheTyr + AlaValValIleLeuLeuIleSerAlaValLysGluAlaArgAlaGlyPheSerPhe + + IleAlaLysGlnValIleAspHisLeuArgAsnValSerAsnGlyGlnThrLysSerThr + ***LeuAsnArgLeuLeuThrIleCysGluMetTyrLeuMetValLysLeuAsnLeuLeu + TyrSer***ThrGlyTyr***ProPheAlaLysCysIle***TrpSerAsn***IleTyr + ATAGCTAAACAGGTTATTGACCATTTGCGAAATGTATCTAATGGTCAAACTAAATCTACT 120 + . : . : . : . : . : . : + TATCGATTTGTCCAATAACTGGTAAACGCTTTACATAGATTACCAGTTTGATTTAGATGA + TyrSerPheLeuAsnAsnValMetGlnSerIleTyrArgIleThrLeuSerPheArgSer + Leu***ValPro***GlnGlyAsnAlaPheHisIle***HisAspPhe***Ile***Glu + IleAlaLeuCysThrIleSerTrpLysArgPheThrAspLeuPro***ValLeuAspVal + + ArgSerGlnAsnTrpGluSerThrValThrTrpAsnGluThrSerArgHisArgThrLeu + ValArgArgIleGlyAsnGlnLeuLeuHisGlyMetLysLeuProAspThrValLeu*** + SerPheAlaGluLeuGlyIleAsnCysTyrMetGlu***AsnPheGlnThrProTyrPhe + CGTTCGCAGAATTGGGAATCAACTGTTACATGGAATGAAACTTCCAGACACCGTACTTTA 180 + . : . : . : . : . : . : + GCAAGCGTCTTAACCCTTAGTTGACAATGTACCTTACTTTGAAGGTCTGTGGCATGAAAT + ThrArgLeuIleProPhe***SerAsnCysProIlePheSerGlySerValThrSer*** + AsnAlaSerAsnProIleLeuGln***MetSerHisPheLysTrpValGlyTyrLysLeu + ArgGluCysPheGlnSerAspValThrValHisPheSerValGluLeuCysArgValLys + + ValAlaTyrLeuLysHisValGluLeuGlnHisGlnIleGlnGlnLeuSerSerLysPro + LeuHisIle***AsnMetLeuSerTyrSerThrArgPheSerAsn***AlaLeuSerHis + SerCysIlePheLysThrCys***AlaThrAlaProAspSerAlaIleLysLeu***Ala + GTTGCATATTTAAAACATGTTGAGCTACAGCACCAGATTCAGCAATTAAGCTCTAAGCCA 240 + . : . : . : . : . : . : + CAACGTATAAATTTTGTACAACTCGATGTCGTGGTCTAAGTCGTTAATTCGAGATTCGGT + AsnCysIle***PheMetAsnLeu***LeuValLeuAsnLeuLeu***AlaArgLeuTrp + GlnMetAsnLeuValHisGlnAlaValAlaGlySerGluAlaIleLeuSer***AlaMet + ThrAlaTyrLysPheCysThrSerSerCysCysTrpIle***CysAsnLeuGluLeuGly + + SerAlaLysMetThrSerTyrGlnLysGluGlnLeuLysValLeuSerAsnProAspLeu + ProGlnLys***ProLeuIleLysArgSerAsn***ArgTyrSerLeuIleLeuThrCys + IleArgLysAsnAspLeuLeuSerLysGlyAlaIleLysGlyThrLeu***Ser***Pro + TCCGCAAAAATGACCTCTTATCAAAAGGAGCAATTAAAGGTACTCTCTAATCCTGACCTG 300 + . : . : . : . : . : . : + AGGCGTTTTTACTGGAGAATAGTTTTCCTCGTTAATTTCCATGAGAGATTAGGACTGGAC + GlyCysPheHisGlyArgIleLeuLeuLeuLeu***LeuTyrGluArgIleArgValGln + ArgLeuPheSerArgLysAspPheProAlaIleLeuProValArg***AspGlnGlyThr + AspAlaPheIleValGlu******PheSerCysAsnPheThrSerGluLeuGlySerArg + + +? Menu or option number=D39 +Find open reading frames, translate and list +? (y/n) (y) Show translation +The segments to translate can be + 1 Typed on the keyboard + 2 Read from a feature table +X 3 Open reading frames +? 1,2,3 =1 +? (y/n) (y) Use 1 letter codes +Define section of DNA to display +? start (1-7238) (1) = +? end (2-7238) (7238) =300 +? Line length (30-120) (60) = +Which strands should be shown +X 1 + strand only + 2 - strand only + 3 Both strands +? 1,2,3 = +? (y/n) (y) Number ends of lines N +Translate +? From (0-300) (0) =101 +? To (1-300) (300) =300 +Translate +? From (0-300) (0) =102 +? To (1-300) (300) =200 +Translate +? From (0-300) (0) = + + + AACGCTACTACTATTAGTAGAATTGATGCCACCTTTTCAGCTCGCGCCCCAAATGAAAAT + 10 20 30 40 50 60 + + M V K L N L L + W S N * I Y + ATAGCTAAACAGGTTATTGACCATTTGCGAAATGTATCTAATGGTCAAACTAAATCTACT + 70 80 90 100 110 120 + + V R R I G N Q L L H G M K L P D T V L * + S F A E L G I N C Y M E * N F Q T P Y F + CGTTCGCAGAATTGGGAATCAACTGTTACATGGAATGAAACTTCCAGACACCGTACTTTA + 130 140 150 160 170 180 + + L H I * N M L S Y S T R F S N * A L S H + S C I F K T C + GTTGCATATTTAAAACATGTTGAGCTACAGCACCAGATTCAGCAATTAAGCTCTAAGCCA + 190 200 210 220 230 240 + + P Q K * P L I K R S N * R Y S L I L T C + TCCGCAAAAATGACCTCTTATCAAAAGGAGCAATTAAAGGTACTCTCTAATCCTGACCTG + 250 260 270 280 290 300 + + +? Menu or option number=D39 +Find open reading frames, translate and list +? (y/n) (y) Show translation +The segments to translate can be + 1 Typed on the keyboard + 2 Read from a feature table +X 3 Open reading frames +? 1,2,3 =2 +? Embl feature table file=1.FT +? (y/n) (y) Use 1 letter codes +Define section of DNA to display +? start (1-7238) (1) = +? end (2-7238) (7238) =300 +? Line length (30-120) (60) = +Which strands should be shown +X 1 + strand only + 2 - strand only + 3 Both strands +? 1,2,3 =3 +? (y/n) (y) Number ends of lines + + + N A T T I S R I D A T F S A R A P N E N + AACGCTACTACTATTAGTAGAATTGATGCCACCTTTTCAGCTCGCGCCCCAAATGAAAAT 60 + . : . : . : . : . : . : + TTGCGATGATGATAATCATCTTAACTACGGTGGAAAAGTCGAGCGCGGGGTTTACTTTTA + * S A G W I F I + A V V I L L I S A V K E A R A G F S F + + I A K Q V I D H L R N V S N G Q T K S T + L N R L L T I C E M Y L M V K L N L L + ATAGCTAAACAGGTTATTGACCATTTGCGAAATGTATCTAATGGTCAAACTAAATCTACT 120 + . : . : . : . : . : . : + TATCGATTTGTCCAATAACTGGTAAACGCTTTACATAGATTACCAGTTTGATTTAGATGA + Y S F L N N V M Q S I Y R I T L S F R S + I A L C T I S W K R F T D L P * V L D V + + R S Q N W E S T V T W N E T S R H R T L + V R R I G N Q L L H G M K L P D T V L * + CGTTCGCAGAATTGGGAATCAACTGTTACATGGAATGAAACTTCCAGACACCGTACTTTA 180 + . : . : . : . : . : . : + GCAAGCGTCTTAACCCTTAGTTGACAATGTACCTTACTTTGAAGGTCTGTGGCATGAAAT + T R L I P F + R E C F Q S D V T V H F S V E L C R V K + + V A Y L K H V E L Q H Q I Q Q L S S K P + GTTGCATATTTAAAACATGTTGAGCTACAGCACCAGATTCAGCAATTAAGCTCTAAGCCA 240 + . : . : . : . : . : . : + CAACGTATAAATTTTGTACAACTCGATGTCGTGGTCTAAGTCGTTAATTCGAGATTCGGT + T A Y K F C T S S C C W I + + S A K M T S Y Q K E Q L K V L S N P D L + TCCGCAAAAATGACCTCTTATCAAAAGGAGCAATTAAAGGTACTCTCTAATCCTGACCTG 300 + . : . : . : . : . : . : + AGGCGTTTTTACTGGAGAATAGTTTTCCTCGTTAATTTCCATGAGAGATTAGGACTGGAC + * L Y E R I R V Q + * F S C N F T S E L G S R +.end lit +.left margin1 +@40. TX 5 @ Translate and write the protein sequence to disk +.LEFT MARGIN2 +.para +This routine allows the user to translate sections of the sequence into +the +1 letter amino acid codes and store the resulting amino acid sequences in +a disk file. +Two modes of use are possible. Either all open reading frames of at least +some minimum length will +automatically be found and translated, or the user can specify that +particular segments be translated. +.para +Mode 1: the user selects to to translate all open reading frames. +.para +Either, or both, strands can be +translated. + The output file is in the same format as a PIR .seq file. +Each protein segment is given an entry name that is its start base in +the DNA, and a title that includes its end position, +reading frame and strand (+ for plus, - for minus). +Each segment is terminated by * whether or not +there is a stop codon in the DNA. The file is therefore suitable for input +to FASTA, ALIGNL and ANALYSEPL. +.para +Mode 2: the user selects to identify the segments to translate. +.para +Either, or both, strands can be +translated. +If multiple coding regions +are translated each will be separated from the previous one by a gap of 5 +dashes (-----). +The sections to translate can be +defined from the keyboard or by supplying the name of the appropriate +EMBL +library feature table. +.para +Typical dialogue follows. +.lit +? Menu or option number=40 + Translate and write protein sequence to disk +? (y/n) (y) Translate selected regions +? (y/n) (y) Define segments using keyboard +Translate +? From (0-1023) (0) =1 +? To (1-1023) (1023) =111 +? (y/n) (y) + strand +Translate +? From (0-1023) (0) = +? Output file name=1.OUT + + ? Menu or option number=40 + Translate and write protein sequence to disk +? (y/n) (y) Translate selected regions n +? Minimum open frame in amino acids (5-1000) (30) = + +X 1 + strand only + 2 - strand only + 3 Both strands +? 0,1,2,3 =3 +? File name for translation=1.OUT + +? Menu or option number=6 +Page through text files +? Name of file to read=1.OUT +>P1; 25 + 135 1 + + GAQRLLRRSCWCWRCGGRQRTQGSAGRGRRRRGGGG* +>P1; 238 + 486 1 + + IRCRDCGQRRRGIFDLVDDFHVRRHIVLARKLFEAEGTGVHFHISLMGGNIVTAEVTNVR + VDAGADFAAVRMLALFGAVVPH* +>P1; 556 + 795 1 + + + SSTQVRRASAQTSSLQLESIVAVVNVEVFLAAKHSRFYIAVLFAQFGPLLDARLDRGCGK + GAGRRDQWRGGGVDLANGR* +>P1; 796 + 987 1 + + + FGYADHAFHLRSTSRHSDNVKFDSAGRRRCCCFHLVFSLGSDEEGLLARLLVEVTTIRVV + LRG* +>P1; 2 + 163 2 + + NSVWAWCEVPRDYCAAAAGAGGAEVVNGPRDPLDEDVDDEEEVDSALLVAGSD* +>P1; 176 + 391 2 + + PLRSGGGGVEAPETPSGWPARFAAATVANAVEGFSILWMIFTCAVILSLRVNSLKQKGQG + YTFTFRLWEVT* +>P1; 476 + 628 2 + + SLTEPSASPSPTLLLRFSLVLTEGVPNPALRFGVLPLRPAAFNLNPSLLL* +>P1; 629 + 958 2 + + MSRYSWLLNTAGFTSPFCLPSLGRFWTRGLTVAVEKEPAGETNGVEAALTLPMGVSLGML + TMLFTCAPPAAIPIMLSLIPLAAAAAAVSTWCFLWAAMRKACWRACSLR* +>P1; 3 + 293 3 + + IRFGLGVRCPEITAPQLLVLAVRRSSTDPGIRWTRTSTTRRRWIAHCWWLAATDLSSDHS + DPAAEASRLPKLPVAGLLDSLPRLWPTPSRDFRSCG* +>P1; 411 + 521 3 + + CACRRGSRLCSGTYARPLWCSSPSLSPPPRPRQRCC* +>P1; 1020 + 37 1 - + EFGKYNPLTDNSSPTQDHTDGSHLNEQARQQAFLIAAQRKHQVETAAAAAASGIKLNIIG + MAAGGAQVKSMVSIPKLTPIGKVNAASTPLVSPAGSFSTATVKPRVQKRPKLGKQNGDVK + PAVFSSQEYLDIYNSNDGFKLKAAGLSGSTPNLSAGLGTPSVKTKLNLSSNVGEGEAEGS + VRDYCTKEGEHTYRCKVCSRVYTHISNFCRHYVTSHKRNVKVYPCPFCFKEFTRKDNMTA + HVKIIHKIENPSTALATVAAANLAGQPLGVSGASTPPPPDLSGQNSNQSLPATSNALSTS + SSSSTSSSSGSLGPLTTSAPPAPAAAAQ* +>P1; 373 + -1 2 - + AKCESVPLSLLLQRVYAQGQYDGARENHPQDRKSLDGVGHSRGSESSRPATGSFGSLDAS + AAGSEWSELKSVAASHQQCAIHLLLVVDVLVQRIPGSVDDLRTASTSSCGAVISGHLTPS + PNRI* +>P1; 517 + 407 2 - + QQRWRGRGGGLSEGLLHQRGRAYVPLQSLLPRLHAH* +>P1; 649 + 518 2 - + QPGIPRHLQQQRWIQVEGCWSERKHAEPECWIRNSLCQNQAES* +>P1; 853 + 650 2 - + HYRNGGWWSAGEKHGQHTQTNAHWQGQRRLHAIGLACRLLFHSHGQAARPEAAQTQTER + RCKTGCV* +>P1; 958 + 854 2 - + SPQRAGAPTSLPHRCPEKTPGGNSSSGGGQRNQT* +>P1; 179 + 78 3 - + VVRTQISRCQPPAMRYPPPPRRRRPRPADPWVR* +>P1; 479 + 363 3 - + GTTAPKRASIRTAAKSAPASTRTLVTSAVTMLPPISEM* +>P1; 791 + 666 3 - + RPLARSTPPPRHWSRLPAPFPQPRSSRASRSGPNWANRTAM* +>P1; 1022 + 819 3 - + SNSASTTRSPTTAHPRRTTRMVVTSTSRRANKPSSSLPRENTRWKQQQRRRPAESNLTLS + EWRLVERR* +End of file +.end lit + +.LEFT MARGIN1 +@41. TX 5 @ Calculate and write codon table to disk +.LEFT MARGIN2 +.para +This routine calculates codon usage tables +for sections of the sequence +and stores the resulting tables on disk. +The sections to translate can be +defined from the keyboard or by supplying the name of the appropriate +EMBL +library feature table. +.para +If required users can add to an existing codon table stored as a disk file. +Choose between storing observed counts or having them normalised so +that the totals for each amino acid sum to 100. Select between defining +segments at the keyboard or using an EMBL feature table. Define +segments. Signal completion with a zero start. Supply a file name. For +each segment the program will display the counts, at the end it will +display the accumulated totals. +.lit + + Typical dialogue follows. +? Menu or option number=41 + Calculate and write codon table to disk +? (y/n) (y) Start with empty table +? (y/n) (y) Show observed counts +? (y/n) (y) Define segments using keyboard +? Count from (0-1023) (0) =1 +? Count to (1-1023) (1023) =111 +? (y/n) (y) + strand + + =========================================== + F TTT 0. S TCT 0. Y TAT 0. C TGT 0. + F TTC 1. S TCC 1. Y TAC 0. C TGC 3. + L TTA 1. S TCA 0. * TAA 0. * TGA 1. + L TTG 2. S TCG 0. * TAG 0. W TGG 2. + =========================================== + L CTT 0. P CCT 0. H CAT 0. R CGT 2. + L CTC 0. P CCC 0. H CAC 0. R CGC 2. + L CTA 0. P CCA 0. Q CAA 1. R CGA 1. + L CTG 1. P CCG 0. Q CAG 2. R CGG 2. + =========================================== + I ATT 0. T ACT 0. N AAT 0. S AGT 0. + I ATC 0. T ACC 1. N AAC 0. S AGC 1. + I ATA 0. T ACA 0. K AAA 0. R AGA 1. + M ATG 0. T ACG 0. K AAG 0. R AGG 0. + =========================================== + V GTT 0. A GCT 1. D GAT 0. G GGT 3. + V GTC 0. A GCC 1. D GAC 0. G GGC 1. + V GTA 0. A GCA 0. E GAA 1. G GGA 4. + V GTG 1. A GCG 0. E GAG 0. G GGG 0. + =========================================== +? Count from (0-1023) (0) = + + Codon totals over all genes + =========================================== + F TTT 0. S TCT 0. Y TAT 0. C TGT 0. + F TTC 1. S TCC 1. Y TAC 0. C TGC 3. + L TTA 1. S TCA 0. * TAA 0. * TGA 1. + L TTG 2. S TCG 0. * TAG 0. W TGG 2. + =========================================== + L CTT 0. P CCT 0. H CAT 0. R CGT 2. + L CTC 0. P CCC 0. H CAC 0. R CGC 2. + L CTA 0. P CCA 0. Q CAA 1. R CGA 1. + L CTG 1. P CCG 0. Q CAG 2. R CGG 2. + =========================================== + I ATT 0. T ACT 0. N AAT 0. S AGT 0. + I ATC 0. T ACC 1. N AAC 0. S AGC 1. + I ATA 0. T ACA 0. K AAA 0. R AGA 1. + M ATG 0. T ACG 0. K AAG 0. R AGG 0. + =========================================== + V GTT 0. A GCT 1. D GAT 0. G GGT 3. + V GTC 0. A GCC 1. D GAC 0. G GGC 1. + V GTA 0. A GCA 0. E GAA 1. G GGA 4. + V GTG 1. A GCG 0. E GAG 0. G GGG 0. + =========================================== +? (y/n) (y) Save table in a file n +.end lit + +.left margin1 +@42. TX 6 @ Codon usage method +.LEFT MARGIN2 +.para +Used to find protein coding regions. For each window length of the +sequence the routine measures the closeness to an expected codon usage. +Results are plotted for each of the three reading frames. Stop and start +codons are also marked on the plots. Has the highest resolution of all +such methods, but makes the strongest assumption, i.e. that the codon +usage is known. The latest version is described in Methods in Enzymology +183, 193-211. +.para +Choose whether to use an internal standard (i.e. part of the current +sequence known to code for a protein). If so define its end points, and +those of any others. Otherwise supply the name of a disk file containing a +table of codon usage. Tables are listed. Choose between using the +observed counts, or two types of normalisation: normalised to give an +average amino acid composition; normalised to no amino acid bias. The +first normalisation is clearly often sensible, but the second removes +valuable information and is only made availabe for special +circumstances. The final table will be displayed, followed by the +expected scores for window lengths 21, 31 and 41 codons. The scores for +each of the three reading frames are shown (they are logarithmic values) +to help users choose a window length for the analysis. Define a window +length and plot interval. Plotting will start. +.para +The method was first described in +Staden and McLachlan Nucl. Acid Res. 10 141-156 (1982) and the +following is a summary of the initial ideas. +The method makes the following main assumptions: the codon +preferences +of all the +genes in the sequence we are examining are similar to those of the +standard; +the sequence is coding +throughout its whole length in only one reading frame; in the coding +frame +the frequency of codon abc has a definite value Fabc +.LEFT MARGIN2 +If we select a sequence a1b1c1a2b2c2a3b3c3,...,anbncnan+1bn+1cn+1 +then the +probability of selecting it in each of the three frames is: +.left margin15 +frame 1: p1=Fa1b1c1.Fa2b2c2....Fanbncn +.left margin15 +frame 2: p2=Fb1c1a2.Fb2c2a3...Fbncnan+1 +.left margin15 +frame 3: p3=Fc1a2b2.Fc2a3b3...Fcnan+1bn+1 +.LEFT MARGIN2 +The probability that selection of a particular sequence was "caused" by it +being a coding sequence is: +.LEFT MARGIN2 +P1=p1/(p1+p2+p3), P2=p2/(p1+p2+p3), P3=p3/(p1+p2+p3). +.LEFT MARGIN2 +The program calculates these values for the given window length but +plots +Log(P/(1-P)) for each of the three frames. At each point along the +sequence +that the program has a +point to plot it finds which of the three values is highest and places a +single point at the 50% level for the corresponding frame. These single +points will join to form a solid line if one frame is consistently the +highest scoring. In addition stop codons are shown as short vertical lines +that bisect the 50% +level of probability. When looking for coding regions +the user should look for solid horizontal lines at the +50% level that are not interrupted by these short vertical lines. +.para +Changes. + Two normalisations are offered: 1) to remove all amino acid +compositional components from the tables, hence leaving only the codon +preference components. In general this is not recommended as the amino +acid +component alone is often sufficient to choose correctly between frames, +but +may be useful in special circumstances. 2) to change the amino acid +composition components to give an average amino acid composition +rather the +the one contained in the standard (this leaves the codon preference +components unchanged). In general this should be useful as the average +amino acid composition is likely to be closer to the composition of the +genes being hunted, than is that of the standard table of codon +preferences. +The average composition +is that recently published by Argos, not the Dayhoff one that we have +used +before. +.para +Typical dialogue follows. +.lit + +? Menu or option number=42 +Staden and McLachlan codon usage method +Codon tables for standards may be read from disk +or calculated from parts of the current sequence +? (y/n) (y) Define internal standard +Define standard +? start (0-1023) (0) =1 +? end (2-1023) (1023) =1000 + =========================================== + F TTT 13. S TCT 1. Y TAT 1. C TGT 3. + F TTC 4. S TCC 10. Y TAC 1. C TGC 7. + L TTA 1. S TCA 0. * TAA 1. * TGA 4. + L TTG 4. S TCG 1. * TAG 3. W TGG 5. + =========================================== + L CTT 9. P CCT 1. H CAT 3. R CGT 14. + L CTC 7. P CCC 0. H CAC 7. R CGC 14. + L CTA 0. P CCA 0. Q CAA 4. R CGA 9. + L CTG 12. P CCG 1. Q CAG 9. R CGG 8. + =========================================== + I ATT 7. T ACT 4. N AAT 4. S AGT 1. + I ATC 4. T ACC 5. N AAC 3. S AGC 7. + I ATA 1. T ACA 1. K AAA 3. R AGA 2. + M ATG 2. T ACG 1. K AAG 2. R AGG 2. + =========================================== + V GTT 11. A GCT 13. D GAT 6. G GGT 9. + V GTC 5. A GCC 10. D GAC 9. G GGC 11. + V GTA 6. A GCA 5. E GAA 6. G GGA 12. + V GTG 8. A GCG 5. E GAG 3. G GGG 8. + =========================================== +Define standard +? start (0-1023) (0) = +Total codons in standard= 333. +X 1 Use observed frequencies + 2 Normalize to average amino acid composition + 3 Normalize to no amino acid bias +? 0,1,2,3 =2 + =========================================== + F TTT 19. S TCT 2. Y TAT 10. C TGT 3. + F TTC 6. S TCC 22. Y TAC 10. C TGC 8. + L TTA 2. S TCA 0. * TAA 0. * TGA 0. + L TTG 7. S TCG 2. * TAG 0. W TGG 8. + =========================================== + L CTT 16. P CCT 16. H CAT 4. R CGT 10. + L CTC 12. P CCC 0. H CAC 10. R CGC 10. + L CTA 0. P CCA 0. Q CAA 8. R CGA 7. + L CTG 21. P CCG 16. Q CAG 18. R CGG 6. + =========================================== + I ATT 19. T ACT 13. N AAT 16. S AGT 2. + I ATC 11. T ACC 17. N AAC 12. S AGC 15. + I ATA 3. T ACA 3. K AAA 22. R AGA 1. + M ATG 15. T ACG 3. K AAG 15. R AGG 1. + =========================================== + V GTT 15. A GCT 21. D GAT 14. G GGT 10. + V GTC 7. A GCC 16. D GAC 20. G GGC 13. + V GTA 8. A GCA 8. E GAA 26. G GGA 14. + V GTG 11. A GCG 8. E GAG 13. G GGG 9. + =========================================== +Span length 21 expected mean values: 4.8 -5.7 -4.8 +Span length 31 expected mean values: 7.1 -8.4 -7.2 +Span length 41 expected mean values: 9.5 -11.1 -9.5 +? odd span length (11-101) (25) =41 +? plot interval (1-11) (5) = + + Missing graphics display here + +.end lit + +.left margin1 +@43. TX 6 @ Positional base preference method. +.LEFT MARGIN2 +.para +Used to find protein coding regions. For each window length of the +sequence the routine measures the closeness to an expected pattern of +base frequencies . Results are plotted for each of the three reading +frames. Stop and start codons are also marked on the plots. The method +is particularly useful for showing which reading frame is the most likely +to be coding. The latest version is described in a forthcoming issue of +Methods in Enzymology, but the original ideas were given in +Staden, R. Nucl. Acid Res. 12 551-567 (1984). +.para +If dialogue is requested the following inputs are needed, otherwise the +standard analysis is performed. Choose between a "global" standard, or a +selected one. If the global standard is selected the +expected scores are displayed and the user asked to define a span length +and a plot interval. Then users choose between plotting relative or +absolute scores, and can reset the scaling values employed for plotting. +If the global standard is not selected users must define a region of the +sequence to use as a standard, or they can read in a codon table from which +the +program will calculate one. Then they can either, use the values +observed in this standard, or they can combine its values for the third +positions in codons, with those from the global standard. Next they can +give different weightings to each of the three positions in codons. +.para +In its original form the method + took advantage of the +uneven +use of amino acids by proteins and the structure of the genetic code table +and assumed that there is a typical ("global") +amino acid composition +and no codon preference. The typical amino acid composition is the +average +composition found by Argos (see below). + This composition and no codon preference +determines the frequency of each of the four bases in each of the three +codon positions. This 3x4 frequency table shows unequal use of the bases +and in particular a marked use of G in position 1 and of A in position 2 +(at the expence of G). The routine slides a window along the sequence and +calculates a score for each of the three reading +frames at each window position. It assumes the sequence is coding +throughout its whole length and calcualtes the probability that it is +coding in each of the three frames. +When tested against all the E. coli sequences in the EMBL sequence +library +it correctly identified the coding frame for 91% of window positions. +(The E. coli +sequences were chosen only for technical reasons: I have no reason to +think +the method would work less well on other organisms with roughly even +base composition.) +The routine can plot either absolute or relative values: ie absolute values +are the values found by summing the scores for each frame (say p1, p2 +and +p3), and the relative values are then p1/(p1+p2+p3), p2/(p1+p2+p3) and +p3/(p1+p2+p3). +.para +At each point along the sequence +that the program has a +point to plot it finds which of the three values is highest and places a +single point at the 50% level for the corresponding frame. These single +points will join to form a solid line if one frame is consistently the +highest scoring. In addition stop codons are shown as short vertical lines +that bisect the 50% +level of probability. When looking for coding regions +the user should look for solid horizontal lines at the +50% level that are not interrupted by these short vertical lines. + +The absolute mean +values expected on the complement of +the coding strand (and in the same frame) +are 5% lower than those on the coding strand but the relative values +are the same on both strands. Although the +relative values give smoother plots and tend to emphasize the coding +frame +they therefore, cannot be used to decide which strand is coding. The +absolute values plot should be used for this purpose but bearing in mind +the fact the the differences between strands are quite small. +.para +The method has been improved in two overall ways: first it now allows +users to +define their own typical amino acid composition by selecting a standard +sequence from within the sequence they are analysing or from a codon table; +secondly it allows the inclusion of third position preferences. +Again these third position preferences are defined by the use of an +internal standard sequence. Not only can users define their own standards +but they can also give weights to each of the three positions in codons. +This allows different emphasis to be used for each of the three positions. +As an example of its use, by giving, in turn, weights of 1.0, 0.0, 0.0, and +0.0, 1.0, 0.0, and finally 0.0, 0.0, 1.0, you could see the separate +contribution made by each of the three positions. It is also possible to +use the third position preferences with the values for the first two +positions taken from the "global" amino acid composition. + In all cases users may choose to plot +absolute or relative values. The expected scores are displayed before +each +analysis and scales are drawn on the plots. +At present this method does not give probabilities of coding; it has only +been tested for its ability to choose the correct reading frame (see +above). It could be used to give probabilities of coding if was applied to +all known coding and non-coding sequences in the way that the uneven +positional base frequencies method was. It is designed to be used in +conjunction with this method. Note that the average amino composition +used +to derive the base frequencies was changed on 17-11-1988, to be + the new average given by McCaldon and Argos in Proteins 4 99-122 +(1988). +A further change is to allow users to select their own scales for +producing the plots. It can be helpful if they want to emphasise or +diminish +certain features. +.para +Typical dialogue follows. +.lit +? Menu or option number=D43 +Positional base preferences method to find protein genes +Select standard source +X 1 Use global standard + 2 Use internal standard + 3 Use codon usage table +? Selection (1-3) (1) =2 +Define region for standard +? start (0-8134) (0) =3171 +? end (3172-8134) (8134) =4700 +Select normalisation +X 1 Use observed frequencies + 2 Combine with global standard +? Selection (1-2) (1) =1 + T C A G Range + 1 0.125 0.249 0.230 0.397 0.272 + 2 0.298 0.245 0.292 0.165 0.132 + 3 0.288 0.313 0.169 0.230 0.144 +? (y/n) (y) Use 1.0 for positional weights +Give weights between 0.0 and 1.0 +to each of the 3 codon positions +? Position 1 (0.00-1.00) (1.00) = +? Position 2 (0.00-1.00) (1.00) = +? Position 3 (0.00-1.00) (1.00) = +Expected scores per codon in each frame + 0.136 0.122 0.123 +? odd span length (31-101) (67) = +? plot interval (1-11) (5) = +? (y/n) (y) Plot relative scores +Scaling values: + Minimum maximum range + 0.3121 0.3656 0.0382 +? (y/n) (y) Leave scaling values unchanged + + Graphics not shown + +? Menu or option number=D43 +Positional base preferences method to find protein genes +Select standard source +X 1 Use global standard + 2 Use internal standard + 3 Use codon usage table +? Selection (1-3) (1) =3 +? File name of standard=atpase.cods + =========================================== + F TTT 21. S TCT 33. Y TAT 15. C TGT 5. + F TTC 55. S TCC 40. Y TAC 40. C TGC 4. + L TTA 8. S TCA 7. * TAA 8. * TGA 0. + L TTG 19. S TCG 12. * TAG 1. W TGG 17. + =========================================== + L CTT 22. P CCT 17. H CAT 6. R CGT 73. + L CTC 21. P CCC 4. H CAC 30. R CGC 23. + L CTA 1. P CCA 10. Q CAA 19. R CGA 5. + L CTG 168. P CCG 48. Q CAG 80. R CGG 3. + =========================================== + I ATT 47. T ACT 14. N AAT 17. S AGT 8. + I ATC 98. T ACC 54. N AAC 52. S AGC 26. + I ATA 6. T ACA 7. K AAA 85. R AGA 0. + M ATG 75. T ACG 13. K AAG 28. R AGG 0. + =========================================== + V GTT 67. A GCT 56. D GAT 41. G GGT 90. + V GTC 29. A GCC 53. D GAC 66. G GGC 66. + V GTA 49. A GCA 59. E GAA 101. G GGA 5. + V GTG 57. A GCG 64. E GAG 41. G GGG 8. + =========================================== +Select normalisation +X 1 Use observed frequencies + 2 Combine with global standard +? Selection (1-2) (1) =2 + T C A G Range + 1 0.177 0.211 0.277 0.336 0.159 + 2 0.271 0.238 0.310 0.182 0.128 + 3 0.242 0.301 0.168 0.289 0.132 +? (y/n) (y) Use 1.0 for positional weights +Expected scores per codon in each frame + 0.785 0.736 0.736 +? odd span length (31-101) (67) = +? plot interval (1-11) (5) = +? (y/n) (y) Plot relative scores +Scaling values: + Minimum maximum range + 0.3219 0.3519 0.0214 +? (y/n) (y) Leave scaling values unchanged + + Graphics not shown +.end lit +.left margIN1 +@44. TX 6 @ Uneven positional base frequencies. +.LEFT MARGIN2 +.para +Used to find regions of a sequence that might be coding for a protein. The +method looks for sections of the sequence in which the frequency at +which each of the four bases occupies the three positions in codons is +nonrandom. The level of nonrandomness is plotted on a scale that shows +the probability that the sequence is coding. At each position along a +sequence the calculation gives the same value for all six possible reading +frames, so only one value is plotted. +.para +Define the window length and plot interval. +.para +The results are plotted in a box divided by a horizontal line marked "76%". +76% of coding regions achieve values above this line and 76% of +noncoding regions achieve scores below the line. +.para +This method, first described in Staden R. Nucl. Acid Res. 12 551-567 +1984, +looks for uneven positional +usage of bases in codons. +It looks through the sequence in one fixed +phase and counts the number of times each base apears in each of the +three +codon positions: for each window position it counts A1,A2,A3 and +C1,C2,C3 +and G1,G2,G3 and T1,T2,T3 and calculates AMEAN=(A1+A2+A3)/3, and +similarly +CMEAN, GMEAN +and TMEAN; it then calculates +ADIF=abs(A1-AMEAN)+abs(A2-AMEAN)+abs(A3-AMEAN) and similarly +CDIF, GDIF and +TDIF to measure the differences between an even base usage for all +positions in the codons and the observed usage. The routine then +calculates +the sum ADIF+CDIF+GDIF+TDIF and plots this value on the following scale: +the base level is such that no known window in a coding region has a +lower +value, whereas 14% of windows in noncoding sequences score below it. +The +top of the scale is not achieved by any known noncoding +region, but is reached by 16% of known coding regions. +The bar drawn across the +plot corresponds to a level that is exceeded by 76% of windows in known +coding regions +but is reached by only 24% of windows in known noncoding regions. ie +76% of +coding windows score above and 76% of noncoding windows score below. +This is similar to Ficketts method but without +the probabilities and weightings from the Los Alamos sequence library: it +is therefore unbiased but may well give very similar results. +.left margin1 +@45. TX 6 @ Codon improbability on base composition +.LEFT MARGIN2 +.para +Used to find regions of a sequence that might code for a protein. +.para +If dialogue is requested define a window length and plot interval. +.para + The idea of the method is, that of all sequence features +that we know, it is only +coding regions that will give rise to codon biases well above those +expected +from the base composition. +If a region of sequence shows sufficiently strong +codon bias then we conclude that it is coding for a protein. + Using the multinomial distribution we +have derived a function to measure the improbability of observing a +set of codons from a sequence of the given composition. Using the +Poisson +distribution we have worked out the distribution +of the improbability. The program plots the observed improbability minus +the expected improbability (the mean as calculated from the Poisson +distribution). The plots are presented against a scale of units of standard +deviation as measured from the Poisson distribution. As with the other +Staden and McLachlan method the program puts an extra point at a fixed +level for the highest of the three probabilities; for this function this +point is placed at six standard deviations above the mean expected level. +The top of each plot corresponds to 12 standard deviations above the +expected level and the bottom corresponds to the expected value. +.para +Analysis of the application +of the method to the EMBL sequence library indicates that the method +does +work for most sequences and that the levels of improbability roughly +correlate with levels of expression. +Coding regions will show high peaks in all three frames making +interpretation more difficult than for some of the other methods. +.left margin1 +@46. TX 6 @ Codon improbability on amino acid composition +.LEFT MARGIN2 +.para +Used to finds regions of a sequence that might code for a protein. +.para +If dialogue is requested define a window length and a plot interval. +.para +The idea of the method is, that of all sequence features +that we know, it is only +coding regions that will give rise to codon biases such that, for each +amino acid, some codons are used far more frequently than others. The +method is independent of what the bias actually is, requiring only that it +is present. +If a region of sequence shows sufficiently strong +codon bias then we conclude that it is coding for a protein. + Using the multinomial distribution we +have derived a function to measure the improbability of observing a +set of codons from a sequence of the given composition. Using the +Poisson +distribution we have worked out the distribution +of the improbability. The program plots the observed improbability minus +the expected improbability (the mean as calculated from the Poisson +distribution). The plots are presented against a scale of units of standard +deviation as measured from the Poisson distribution. As with the other +Staden and McLachlan method the program puts an extra point at a fixed +level for the highest of the three probabilities; for this function this +point is placed at six standard deviations above the mean expected level. +The top of each plot corresponds to 12 standard deviations above the +expected level and the bottom corresponds to the expected value. +.left margin1 +@47. TX 6 @ Shepherd RNY preference method +.LEFT MARGIN2 +.para +Used to find regions of a sequence that might code for a protein. Based on +the method of Shepherd +(PNAS 78 1596-1600, 1981). +.para +If dialogue is requested define a window length and plot interval. +.para +Shepherd has found that +many genes have a preference for the use of codons of the form RNY +where +R=purine, Y=pyrimidine and N=any base. He has attributed this to being +due +to remants of a primitive genetic code. The calculation is similar to that +for the Staden and McLachlan method, the p1's being simply the number of +RNY codons found in frame 1 etc and the P's being p/(p1+p2+p3). +.left margIN1 +@48. TX 6 @ Ficketts method +.LEFT MARGIN2 +.para +Used to find regions of a sequence that might code for a protein. Based on +the method of Fickett +(Nucl. Acid Res.10 +1982), but plots values for fixed window lengths rather than over the +whole of open reading frames. +.para +If dialogue is requested define a window length and plot interval. The +results are plotted in a box divided into three horizontal strips. +.para +Sections of the sequence with values plotted in the top strip of the box +are adjudged to be coding, those in the middle strip "no decision", and +those in the bottom "not coding". +.para +The program performs the following calculations: let A1 = the number of +occurences of base A in position 1 of codons, A2 for position 2 etc. +Similarly for bases C,G and T. For each window position calculate +Apos=max(A1,A2,A3)/min(A1,A2,A3)+1. Similarly for C,G and T to give 4 +positional values. Also count the base composition for the window to +give +Acomp, Ccomp etc. Fickett tested each of these 8 parameters singly as +to +their ability to distinguish coding from noncoding regions and arived at +probabilities of coding for the range of values each can take = Pcod. He +also measured their relative abilities and given weightings to each of +the 8 parameters = Pw. To calculate the "TESTCODE" for a window we +first lookup the Pcod for each of the calculated compositional and +positional values and then calculate TESTCODE=sum(Pcod*Pw). TESTCODE +is +plotted relative to three levels of decision: the top division="coding", +the middle="no opinion" and the bottom division="non coding". +.left margin1 +@49. TX 6 @ tRNA gene search. +.LEFT MARGIN2 +.para +Used to find segments of a sequence that might code for tRNAs. Looks for +potential cloverleaf forming structures and then for the presence of +expected conserved bases. Presents results graphically or draws out the +cloverleafs. +.para +If dialogue is requested a large number of parameters need to be given +values, including some loop lengths, scores for each of the four stems, +and scores for the conserved bases. +.para +The program was first described in +Staden Nucl. Acid Res 817-825 (1980). + The tRNA's that have + been sequenced so far have two characteristics that can be used +to + locate their genes within long DNA sequences. Firstly they have a + common secondary structure - the cloverleaf - and secondly, + particular bases almost always appear at certain positions in +the + cloverleaf. The cloverleaf is composed of four base-paired +stems + and four loops. Three of the stems are of fixed length but the + fourth, the dhu stem which usually has four base pairs, +sometimes + has only three. All of the loops can vary in size. The following + relationships between the stems in the cloverleaf are assumed in +the + program: (a) there are no bases between one end of the +aminoacyl + stem and the adjoining tuc stem; (b) there are two bases +between + the aminoacyl stem and the dhu stem; (c) there is one base +between + the dhu stem and the anticodon stem; (d) there are at least three + bases between the anticodon stem and the tuc stem. + The program looks first for cloverleaf structure and then, if + required, for conserved bases. The sizes of the loops, the number + of basepairs in the stems and the required conserved bases may +all + be specified by the user. The process of looking for the presence + of conserved bases can reduce the number of potential +structures + found considerably. + The + user may also specify that an intron may be present in the +anticodon + loop. +.para +The user may define a minimum number of +base pairs for each stem using the scoring system G-C, A-T=2 and G-T=1 +and +scores for each of the conserved bases. Recommended values for the stem +scores are given by the prompts and the percentage conservation of the +conserved bases as found in the Nucl. Acid Res 1979 paper Gauss, Gruter + and Sprinzl are also given, +but the user must decide which bases are most +likely to be conserved for the sequence being examined. +The output shows the position of the possible gene in the sequence by a +vertical line the height of which shows the number of basepairs made in +the +stems. The cloverleaf structure is also drawn but will scroll up off the +screen. Output of the cloverleafs will look like: +.lit + + 6942 + A + A-U + A-U + G-C + A-U + U-A + A-U + U-A AAU + U UAUCU + AA A !!!!! + AAUG AUAGA A + U !!!! U UCA + C UUAC U + AA A + U-AA A + A-U + A-U + C-G + U-A + U A + U A + GUC + + Typical dialogue follows. + +? Menu or option number=D49 + tRNA search +? Maximum trna length (70-130) (92) = +? Aminoacyl stem score (0-14) (11) = +? Tu stem score (0-10) (8) = +? Anticodon stem score (0-10) (8) = +? D stem score (0-8) (3) = +? Minimum base pairing total (30-32) (32) = +? Minimum intron length (0-30) (0) = +? Minimum length for TU loop (4-12) (6) = +? Maximum length for TU loop (6-12) (9) = +? (y/n) (y) Skip search for conserved bases n +Give a score for each base, then a minimum total at the end +? Base 8, T is 100% conserved. Score (0-100) (0) = +? Base 10, G is 95% conserved. Score (0-100) (0) = +? Base 11, Y is 96% conserved. Score (0-100) (0) = +? Base 14, A is 100% conserved. Score (0-100) (0) = +? Base 15, R is 100% conserved. Score (0-100) (0) = +? Base 21, A is 97% conserved. Score (0-100) (0) = +? Base 32, Y is 100% conserved. Score (0-100) (0) = +? Base 33, T is 98% conserved. Score (0-100) (0) = +? Base 37, A is 91% conserved. Score (0-100) (0) = +? Base 48, Y is 100% conserved. Score (0-100) (0) = +? Base 53, G is 100% conserved. Score (0-100) (0) = +? Base 54, T is 95% conserved. Score (0-100) (0) = +? Base 55, T is 97% conserved. Score (0-100) (0) = +? Base 56, C is 100% conserved. Score (0-100) (0) = +? Base 57, R is 100% conserved. Score (0-100) (0) = +? Base 58, A is 100% conserved. Score (0-100) (0) = +? Base 60, Y is 92% conserved. Score (0-100) (0) = +? Base 61, C is 100% conserved. Score (0-100) (0) = +? Minimum total conserved base score (0-0) (0) = +? (y/n) (y) Plot results n + + Searching + + 306 + C + C-G + C-G + G-C + T-A + C-G + A-T + T+G AT + A ATACA + TTC T !!!! G + CTGT TATGG G + G ! ! T GA + C TAAA C + GCG C G + T+GA C + C-G C T + T+G A T + T-A G T + T-A G A + G G G C + A A G A + AGC T C + A T + C T + A + C T + + +.end lit +.left margIN1 +.left margIN1 +@50. TX 7 @ Plot start codons +.left margin2 +.para +This function plots the positions of all start codons for each of the three +reading frames. +.left margin1 +@51. TX 7 @ Plot stop codons +.left margin2 +.para +This function plots the positions of all stop codons for each of the three +reading frames. +.left margIN1 +@52. TX 7 @ Plot stop codons on the complementary strand +.left margin2 +.para +This function plots the positions of all stop codons for each of the three +reading frames on the complementary strand. +.left margin1 +@53. TX 7 @ Plot stop codons on both strands +.left margin2 +.para +This function plots the positions of all stop codons for each of the three +reading frames on both strands. +.left margin1 +@54. TX 5 @ Search for longest open reading frames +.left margin2 +.para +This function will report the positons of the ends of +all sections of sequence that contain no stop codons. All six reading +frames are examined. Results are presented in the form of an EMBL feature +table. Hence if the results are stored in a file by use of "direct output +to disk", the file + can be used to translate the +open reading frames in a sequence. +Note that in order for the file to be used as a feature table it +must include either EMBL +or GenBank headers, and a suitable "tail". The simplest header is the word +FEATURES starting in column 1 of the first line of the file. The simplest +tail is 2 empty lines at the end of the file. These lines are not included +when nip writes out results in feature table format. +.para +Define the minimum length of open reading frame to report (in amino +acids). +Choose to search either or both strands. The program displays the end +points, the reading frame and strand. +.para +Typical dialogue follows. +.lit + +? Menu or option number=D54 + Find open reading frames +? Minimum open frame in amino acids (5-1000) (30) =100 + +X 1 + strand only + 2 - strand only + 3 Both strands +? 0,1,2,3 =3 + +FT CDS 1 831 1 831 +FT CDS 1540 2853 1 1314 +FT CDS 3130 4242 1 1113 +FT CDS 5761 6114 1 354 +FT CDS 6187 6711 1 525 +FT CDS 1766 2077 2 312 +FT CDS 2078 2446 2 369 +FT CDS 4136 5500 2 1365 +FT CDS 1335 1637 3 303 +FT CDS 2844 3194 3 351 +FT CDS 6819 7238 3 420 +FT CDS 2073 1711 C 1 363 +FT CDS 2469 2149 C 1 321 +FT CDS 6542 6144 C 3 399 + +.end lit +.left margin1 +@55. TX 8 @ Search for E. coli promoter (general) +.LEFT MARGIN2 +.para +Searches for E coli promoter like sequences using a standard weight +matrix. The positions of the matches are plotted. No dialogue is required. +.para +The method was first described in + Staden R. Nucl. Acid Res. 12 505-519 1984. +This search uses a weight matrix taken from the frequency tables +contained +in Hawley, D. K. and McClure, R., nar 11 2237-2255 (1983). + The weight matrix is +divided into 3 sections that are separated by varying sizes of gap: the - +35 +region, the -10 and the +1 region. +The algorithm first looks for a sufficiently good -35 region, then for the +best -10 region within range and then for the best +1 region within range +of the -10; each separate region must score above the lowest known +score +for the corresponding section. The gap penalty is then applied and two +plots +produced: one with gap penalties, one without. + Scaling is such that no +known promoter scores below the bottom level and no known promoter +scores +above the top level when the weight matrix is applied. +.para +Two other functions also look for E. coli promoters: 92 looks for sites on +the complementary strand and 93 looks for individual -35 and -10 +regions +and plots them on a scale such the top is the highest known value +10% +and +the bottom is the lowest known -10% +.LEFT MARGIN1 +.lit +weights for E. coli promoters +-35 region: +P -50-49-48-47-46-45-44-43-42-41-40-39-38-37-36-35-34-33-32-31-30-29-28-27-26 + +107109109110110110110110110111111110111112112112112112112112112112112112112 +T 41 33 32 25 34 22 35 35 42 27 32 42 47 14 92 94 11 19 15 37 46 34 38 48 34 +C 22 27 18 29 20 14 20 12 22 23 16 25 10 43 7 6 11 18 60 8 25 23 23 17 20 +A 28 38 30 37 35 56 42 42 37 42 39 18 25 26 2 6 2 72 26 50 26 34 25 26 31 +G 16 11 29 19 21 18 13 21 9 19 24 26 29 29 11 6 88 3 11 17 15 21 26 21 27 +-10 region: +P -23-22-21-20-19-18-17-16-15-14-13-12-11-10 -9 -8 -7 -6 -5 + 112112112112112112112112112112112112112112112112112112112 +T 35 28 28 27 39 51 34 43 26 31 89 3 49 15 19108 31 29 21 +C 34 21 24 27 12 25 20 25 20 27 10 2 16 14 22 3 13 16 30 +A 20 39 33 33 39 23 29 16 23 19 2106 29 66 57 1 35 23 31 +G 23 24 27 25 22 13 29 28 43 35 11 1 18 17 14 0 33 24 30 ++ region: +P -2 -1 1 2 3 4 5 6 7 8 9 10 + 86 88 85 88 88 88 88 88 88 88 88 88 +T 16 22 2 42 27 23 20 25 27 15 16 29 +C 29 49 4 25 25 13 18 22 17 17 16 17 +A 20 9 45 16 24 25 28 24 24 32 35 26 +G 21 8 37 5 12 27 22 17 20 24 21 16 +.end lit +Notes: +E. coli promoters have been shown to contain 2 regions of conserved +sequence +located about 10 and 35 bases upstream of the transcription startsite. +These +are TATAAT and TTGACA with an allowed spacing of 15 to 21 bases +between. The +spacing with maximum efficiency was 17 bases and all but 12 of the 112 +sequences could be aligned with a separation of 17 +or-1 bases. The +standard +promoter has spacing 7 and 17 bases between the startsite and the -10 +region, +and the -10 and -35 regions, respectively. The spacing between the -10 +region +and the startsite is usually 6 or 7 bases but varies between 4 and 8 +bases. +There is an AT rich region of 8 to 10 bases upstream of the -35 region. +Iniation with a purine is highly prefered with G being used if A is not +present. +.lit +Gap penalties: + 15 0.02 (only exists as mutant) + 16 0.2 + 17 1.0 + 18 0.2 + 19 0.05 (guess) + 20 0.02 (guess) + 21 0.01 (guess) +.end lit +.left margin1 +@56. TX 8 @ Search for E. coli promoter (general) +strand +.LEFT MARGIN2 +.para +This function searches for E. Coli promoters on the complementary strand +of +the sequence. See the notes on option 55. +.left margin1 +@57. TX 8 @ Search for E. coli promoter sequences. (-35 and -10) +.LEFT MARGIN +.para +This function searches separately for the -35 and -10 sequences of an E. +coli promoter. See the notes on option 55. +.left margIN1 +@58. TX 8 @ Search for procaryotic ribosome binding sites +.LEFT MARGIN2 +.para +This function searches for the 5' ends of prokaryotic genes using an +unusual weight matrix. The search is relatively slow because the matrix +is 101 bases in length. No dialogue is required. +.para +The method was first described in + Staden Nucl. Acid Res. 12 505-519 1984. This actually looks for more +than +a ribosome binding site as is explained below. This uses their weight +matrix w101 of Stormo and +Schneider (NAR 10 2971-3024, 1982) +which with a value of 2 finds all gene starts in their library. +.LEFT MARGIN1 +.lit + P-60-59-58-57-56-55-54-53-52-51-50-49-48-47-46-45-44-43-42-41-40-39-38-37-36 + T 5 1 -3 9-14 7 15 -5 3-16-17 4 18 5 -3 -1 2 4 5 -5 7 8 -5-15 6 + C-21 -6-11-21 0 8 -7-12 -1 1 0-19 12 -3 -1 10 2 -8 -5-11 8 1 23 6 -5 + A 7 -2 13 -2 -8-13-18 5 0 -5 13 8-15 9 -4 -7 9 0 -8-11-10 -6 -7 -5 -6 + G -6 -9 -7 0 8-16 -4 -2-16 1 -4 8-14 5 11-13-24 3 7 22-11 -9-15 10 -4 + + P-35-34-33-32-31-30-29-28-27-26-25-24-23-22-21-20-19-18-17-16-15-14-13-12-11 + T 3 4 16 -4 7 11 -4 -1 12 8 10 -1 1 8 2-10-16 11 1 -3 16 -3-36 -8-27 + C 2-14 -3 -8-10-21 2 0 -2 -1-11 -3 -1 5-11 -4 7 0-14 6 -8-20 -7-36-44 + A-12 -1-27 -3 -6 0-12 -3 -4 -7 14 -2 -4 -6 0 12 5 -9 0-11-11 10 8 2 8 + G 4 -5 -6 -3 -1 -4 -1 -4-15 0-14 3 10-19 -3-10 -7 -7 7 1 -8 -6 15 21 42 + + P-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 + T-53-27-26-23 2 -7-14-40-28 0-53 75-62-20-40-10-35 -5-12 -1 4 14-23 7 -2 + C-15-50-43-35-38-29-29 1 -9 1-87-55-64-45 11-22-14-20-15-15-10-22 -5 2 6 + A 0 -3 -5 4-20-11 5 6 -2-15 66-69-52 -5 -4 6 8-24 -7-10 -7 13 14 -9-18 + G 35 22 16 -6 -5-15-25-33-28-53-36-50107 -5-37-44-27-15-23-16-29-47-17-29-15 + + P 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 + T-26 1 4 -7 3 -4 0-10 8-18 7-22-21 8 4 -3 -6 7 -8 1 -5-16-16 7 -6 + C 6 -8 19 -7 9 -3 17 -2 3 -9 5 22 22 8 -1 1 18 6 11-10 -8 7 10 0 7 + A 14-12-42 1 -5 -4-32 12-10 20 -6 -1 3 -4 4-10 -1 -2-14 11 14 -3 2-13 5 + G-23 -7 -1 -6-17 -4 0-15-14 -4-17-10 -5-13 -8 10-13-13 9 -4 -3 10 2 4 -8 + + P 40 + T 0 + C 14 + A 5 + G-21 +.END LIT +These come from w101 of Stormo, Schneider, Gold and Ehrenfeucht Nucl. +Acid Res. 10 2997- +3011, 1982. They report that this matrix gives a score of at least 2 for +all +gene starts in their library whereas all other sequences score 1 or less. +.left margin1 +@29. TX 1 @ Reverse and complement the sequence +.LEFT MARGIN2 +.para +Reverses and complements the current active region of the sequence. +.left margin1 +@60. TX 7 @ Search using a dinucleotide weight matrix +.LEFT MARGIN2 +.para +This function performs searches for short sequence +motifs using an appropriate dinucleotide weight matrix. In addition it +can be used to create or modify weight matrices. In order to perform a +search the only input +required is the name of the file containing the weight matrix. +The results can be presented graphically or listed. The graphical +presentation will draw line at the position of any matches found; the +height of the line is proportional to the score. The method is identical to +that using weight matrices derived from nucleotide frequencies, except +that here we use the frequencies of dinucleotides. +.para +For a search, select "use weight matrix", supply the name of the file +containing the weight matrix, and choose between having results plotted +or listed. If dialogue is requested when the function is selected users can +alter the cutoff score employed. +.para +To create a weight matrix several steps are involved. A file containing an +alignment of known motifs is required. (This file must be created before +the current option is selected. The format is a follows: each sequence is +written on a separate line with at least one space at the beginning; each +sequence is terminated by a space character, and can be followed by a +name. The sequences must be aligned.) Supply the name of the file of +aligned sequences. The program reads and displays the sequences. Choose +between "summing logs of weights" or summing weights (i.e. whether to +multiply or add weights). If logs are used all scores will be negative. +Choose if all positions in the set of aligned sequences should be used or +if a mask should be applied. If so selected, define a mask as a string of +symbols, in which symbol - means ignore and any other symbol means +use. E.g. xx-x--abc means use all positions except 3,5 and 6. +.para +The program will calculate weights as the frequencies of the +dinucleotides at each unmasked position in the set of aligned sequences. +These weights are then applied to the set of aligned sequences to give a +range of "observed" scores. The mean and standard deviation of these +scores is displayed. The user is asked to supply several values to be used +when the weight matrix is applied to other sequences: a cutoff score (by +default, the mean minus 3 standard deviations), a top score for scaling +graphical results (by default, the mean plus 3 standard deviations), and a +position to identify (this means that if a particular base within the +motif is used as a "landmark", such as the A of the AG in splice acceptor +sites, then its position will be marked in plots). All these values are +stored along with the weight matrix. Finally supply the name of a file to +contain the weight matrix. +.para +Weight matrices can be "rescaled" using a set of aligned sequences in +much the same ways as a matrix is created. The purpose is to redefine +the cutoff scores, and rescaling does not alter any other values in the +weight matrix file. +.para + The methods have always had to deal with the problem of zeroes in the +matrices. The current versions +employ "Laplaces Law of Succession" in which 1 is +added to each term. + +.lit +Typical dialogue follows. + +? Menu or option number=D60 + + Motif search using dinucleotide weight matrix +X 1 Use weight matrix + 2 Make weight matrix + 3 Rescale weight matrix +? 0,1,2,3 = 2 +? Name of aligned sequences file=[RS.MOTIFS]GCN4.SEQ + + + 1 AGCGTGACTCTTCCCGGAA HIS1 + 2 GAGGTGACTCACTTGGAAG HIS1 + 3 CGGATGACTCTTTTTTTTT HIS3 + 4 ACAGTGACTCACGTTTTTT HIS4 + 5 GTCGTGACTCATATGCTTT ARG3 + 6 TGAATGACTCACTTTTTGG ARG4 + 7 TTCTTGACTCGTCTTTTCT CPA1 + 8 CGAATGACTCTTATTGATG CPA2 + 9 AGAATGACTAATTTTACTA TRP5 + 10 TCGTTGACTCATTCTAATC TRP3 + 11 TTGCTGACTCATTACGATT TRP2 + 12 GAGATGACTCTTTTTCTTT IV1 + 13 GCGATGATTCATTTCTCTG IV2 + 14 TAGATGACTCAGTTTAGTC LEU1 + 15 TAAGTGACTCAGTTCTTTC LEU4 + 16 ATGATGACTCTTAAGCATG ILS1 +Length of motif 18 +? (y/n) (y) Sum logs of weights n +? (y/n) (y) Use all motif positions n +x means use, - means ignore +e.g. xx-x---x-x means use positions 1,2,4,8,10 +? Mask=----XXXXXXXX-------- + Applying weights to input sequences + 1 89.000 AGCGTGACTCTTCCCGGA + 2 91.000 GAGGTGACTCACTTGGAA + 3 93.000 CGGATGACTCTTTTTTTT + 4 90.000 ACAGTGACTCACGTTTTT + 5 94.000 GTCGTGACTCATATGCTT + 6 91.000 TGAATGACTCACTTTTTG + 7 81.000 TTCTTGACTCGTCTTTTC + 8 90.000 CGAATGACTCTTATTGAT + 9 75.000 AGAATGACTAATTTTACT + 10 97.000 TCGTTGACTCATTCTAAT + 11 97.000 TTGCTGACTCATTACGAT + 12 93.000 GAGATGACTCTTTTTCTT + 13 69.000 GCGATGATTCATTTCTCT + 14 90.000 TAGATGACTCAGTTTAGT + 15 90.000 TAAGTGACTCAGTTCTTT + 16 90.000 ATGATGACTCTTAAGCAT +Top score 97.000 Bottom score 69.000 +Mean 88.750 Standard deviation 7.319 +Mean minus 3.sd 66.794 Mean plus 3.sd 110.706 +? Cutoff score (-999.00-9999.00) (66.79) = +? Top score for scaling plots (66.79-999.00) (110.71) = +? Position to identify (0-18) (1) = +? Title=GCN4 DI WTS +? Name for new weight matrix file=3.WTS + +? Menu or option number=D60 + Motif search using dinucleotide weight matrix +X 1 Use weight matrix + 2 Make weight matrix + 3 Rescale weight matrix +? 0,1,2,3 = +? Motif weight matrix file=3.WTS + GCN4 DI WTS +? Cutoff score (-9999.00-9999.00) (66.79) =40 +? (y/n) (y) Plot results n + 15 42.00 CAACCCGCTCACCGACAA + 29 42.00 ACAACAGCTCACCCACGC + 93 46.00 AGCCTTCCTCATCGCTGC + 153 40.00 CAGCGGAATCAAACTTAA + 408 42.00 CGATGGATTCAAGTTGAA + 469 47.00 TTAGGAACTCCCTCTGTC + 493 60.00 AAGCTGAATCTTAGCAGC + 530 43.00 CGGAGGGCTCAGTGAGGG + 542 47.00 TGAGGGACTACTGCACCA + 678 41.00 CTTCTGCTTCAAAGAGTT + 709 47.00 AATATGACGGCGCACGTG + 848 54.00 GTCAGAACTCAAATCAGT + 940 49.00 CCGTTGACGACCTCCGCA + 992 42.00 TGGGCACCTCACACCAAG + + +.end lit +.left margIN1 +@61. TX 8 @ Search for eukaryotic ribosome binding sites +.LEFT MARGIN2 +.para +Searches for eukaryotic ribosome binding sites using weightings derived +from + Sargan,Gregory,Butterworth febs let 147 133-136 1982. No dialogue is +required. First described in Staden Nucl. Acid Res. 12 505-519 1984. + +.LEFT MARGIN1 +.lit +mRNA WTS FOR EUKARYOTES SARGAN,GREGORY,BUTTERWORTH FEBS LET +147 133-136 1982 +P -7 -6 -5 -4 -3 -2 -1 1 2 3 + 102102102102102102102102102102 +T 19 24 31 12 0 18 5 0102 0 +C 20 15 32 65 5 42 52 0 0 0 +A 50 27 27 19 86 36 34102 0 0 +G 6 29 12 6 11 6 11 0 0102 +VIRAL ONLY +P -7 -6 -5 -4 -3 -2 -1 1 2 3 + 41 41 41 41 41 41 41 41 41 41 +T 14 12 16 4 2 13 9 0 41 0 +C 7 3 13 17 7 9 14 0 0 0 +A 15 10 6 10 27 15 9 41 0 0 +G 5 16 6 10 5 4 9 0 0 41 +.END LIT +The Sargan et al paper puts forward the hypothesis that there is an +interaction between +some mRNA leader sequences and a highly conserved structure in the 18S +rRNA +of eukaryotic ribosomes. The attempt to substantiate the hypothesis +includes +a table of base frequencies for sequences immediately 5' to start codons. +They examined 102 sequences and I have used the base frequencies they +found +as a weight matrix for searching for eukaryotic gene starts. I don't yet +know how good this method is. The viral sequences were found to be +slightly +different but the separate table shown here is not used in the program. +.left margin1 +@62. TX 8 @ Search for splice junctions +.LEFT MARGIN2 +.para +Used to search for mRNA splice junctions using a weight matrix. The +default weight matrix is still that derived from the paper of Mount (Nucl. +Acids Res. 10, 459-472). However users may employ their own tables. +By default the positions of possible junctions will +be plotted rather than listed. + The diagram splits the donor plot into 3 horizontal boxes + so that all the +sites marked in any box are from the same reading frame. The acceptor +plot appears above the donor plot and is split in an equivalent way. So +sites marked as donors and acceptors in equivalent boxes are compatible. +i.e. donors from donor box 1 are compatible with acceptors from acceptor box +1, etc. Of course it is the combination of reading frame and splice sites +that really matters, and donors from box 1 can be compatible with acceptors +in box 3 if the reading frame switches. +.para +If dialogue is selected users can employ their own file of weights (see +below for the format), can change the cutoff scores, and can elect to have +the results listed rather than plotted. Listed results show the position +(of the last or first base in the exon), the frame and the matching sequence. +The frequency table shown below is used as a default +weight matrix and AG and GT are obligatory at the appropriate positions. +The plots are scaled so that the top of scale is the highest value achieved +by +a junction sequence in the set used to compile the frequency table, and +the +bottom of the scale is the lowest value achieved by a junction sequence +in +the set used to compile the frequency table. +.para +In the light of current knowledge it would be sensible for users to use +the weight matrix search option (20) +to create matrices that define more specific splice junctions. If so it is +important that the positions "marked" are the last base in the donor exon and +the first base in the acceptor exon. To make a weight matrix suitable for +use with this function follow the instructions for option 20 and create +files for both donor and acceptor sites. Then concatenate the two matrix files +with the donor file first. +Note that any positions in the weight matrix that are +100% conserved will be made obligatory (normally the AG and GT). +.LEFT MARGIN1 +.lit + + Mount donors redone 16-4-91 + 12 3 -16.085 -7.500 + P -2 -1 0 1 2 3 4 5 6 7 8 9 + N 136 136 136 136 136 136 136 136 136 136 136 136 + T 28 8 15 17 0 136 9 16 7 84 30 36 + C 41 60 16 7 0 0 3 13 3 17 28 39 + A 40 56 89 12 0 0 83 91 12 23 53 33 + G 27 12 16 100 136 0 41 16 114 12 25 28 + Mount acceptors redone 16-4-91 + 18 15 -26.142 -14.400 + P -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 + N 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 + T 58 50 57 59 67 56 58 49 47 66 64 31 34 0 0 11 41 31 + C 21 28 34 25 29 33 35 32 42 40 33 25 74 0 0 23 28 41 + A 17 11 11 18 7 17 12 23 15 3 10 29 5 113 0 24 21 21 + G 17 24 11 11 10 7 8 9 9 4 6 28 0 0 113 55 23 20 +.END LIT + +.left margIN1 +@63. TX 7 @ Search using a weight matrix (complementary) +.LEFT MARGIN2 +.para +This function searches the complementary strand of the sequence using +a weight matrix. Many +motifs can bind to either strand of the DNA and this function allows +users to +search the complementary strand without having to change the +orientation of the sequence. See option 20 for more details. +.left margin1 +@64. TX 3 @ Plot observed-expected word frequencies +.LEFT MARGIN2 +.PARA +This option is designed to examine the abundances of short +words in a sequence to see if particular ones are either under or over +represented. It compares the observed and expected frequencies and +plots them along the sequence. There has been some work on the relative +amounts of CG dinucleotides in eukaryotic sequences (eg Bird, Nature +321, +209-213 (1986)) and this new routine can be used to examine such +biases, or +any others that might be interesting. +.para +The user selects a word - say CG -, a window length, and a maximum and +mininum scale for plotting the results. The +program examines each sucessive window length along the sequence, +with each +window overlapping the previous one by windowlength-1. +The program counts the base frequencies in each window, and the number +of +occurrences of the chosen word within the window. Using the base +frequencies it calculates an expected number of occurrences for the +chosen +word (simply by multiplying the relevant frequencies). It plots +observed-expected, and hence will show regions that are rich or depleted +in +the chosen word. The longest allowed word is 9 characters, but the +calculation of the expected frequencies becomes less appropriate as the +word +length increases above 2. +.para +Typical dialogue follows. +.lit + +? Menu or option number=D64 +Plot composition differences (obs-exp)) +Default String=CG +? String= +? odd span length (3-401) (101) = +? plot interval (1-20) (5) = +? Maximum plot value (-6.31-25.25) (6.31) = +? Minimum plot value (-25.25-6.31) (-6.31) = + + Missing graphics display here + +.end lit +.left margIN1 +@65. TX 9 @ Search for polya sites +.LEFT MARGIN2 +.para +Simply searches for the sequence AATAAA + (Proudfoot and Brownlee Nature 263, 211-214, + 1982) and marks it with a short vertical line. +.left margin1 +@66. TX 1 @ Interconvert t and u +.LEFT MARGIN2 +.para +This function interconverts T and U characters in the active sequence i.e +between DNA and RNA. +.LEFT MARGIN1 +@67. TX 7 @ Search for patterns of motifs +.left margin2 +.para +This option searches for patterns of motifs. Patterns can be defined +interactively or read from files. Results can be displayed in several ways +in both graphical and textual form. Used to create pattern files for +searching libraries. The option is extremely flexible and consequently the +following documentation is quite lengthy. However the routine is capable +of searching for almost any known pattern. In addition the flexibility +does not necessitate difficulty of use, and the userinterface has been +simplified considerably since the methods were first published. +.para +Users should refer to the "typical dialogue" shown below for the most +helpful information on using the program. +.para +There are currently +four ways to display the matching patterns: 1=each individual +motif and its position is listed; 2=all the sequence between, and +including the two +outermost motifs is listed; 3=graphical, with a vertical line marking the +position +of the leftmost motif; 4 = EMBL feature table format, where the KEYNAM +field if the motif name, the FROM and TO fields denote the ends of the +match, and the DESCRIPTION field is "Program". +.para +When it is defined for the first time a pattern must be entered +interactively at the keyboard, but the pattern description +can be saved to a file. +This file can be used for all subsequent searches. +.para +When defining a pattern interactively +select a motif class and the program will request the required inputs. +.para +The program gives each motif an identifying name and number. +For motifs other than the first, a range of allowed positions must be +defined (Note that sets of motifs included using the OR operator will all +be given the same range, and so the program will only request range +values +for the first motif in any such set). +To specify the allowed range for a motif the user must supply the +following: the +identifying number of the motif, relative to which the current motifs +positions are to be defined (termed the "reference motif"); a "relative start +position" and a range. The relative start position can be negative or positive. +A negative start position means that although the reference motif +is searched for first, the current motif can be found to its left. +A zero relative start position means their left ends are superimposed. The +default start position is to butt-joint the motif to righthand end of the +"reference motif". The range is "the number of extra positions" that the +motif can take. +.para +The program will display the probability of finding each motif. These +values are presented in the following form: .1234E-5 means 0.1234 times +10 +to the power -5. +.para +After the pattern has been defined, the program will type a description +of +it on the screen. It will then allow the user to give an overall cutoff +score and overall probability cutoff. +.para +Typical dialogue for all the different motif classes is displayed below. +.lit + +? Menu or option number=67 + Pattern searcher +? (y/n) (y) Read pattern from keyboard +X 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Complement of weight matrix + 6 Inverted repeat or stem-loop + 7 Exact match, defined step + 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 = +? Motif name=Ematch +? String=AA +Probability of score 2.0000 = 0.595E-01 +X 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Complement of weight matrix + 6 Inverted repeat or stem-loop + 7 Exact match, defined step + 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 =2 +? Motif name=AAA +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-1) (1) = +? Relative start position (-1000-1000) (3) = +? Number of extra positions (0-1000) (0) = +? string=AAA +? Minimum matches (1.00-3.00) (3.00) =2 +Probability of score 2.0000 = 0.149E+00 + 1 Exact match +X 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Complement of weight matrix + 6 Inverted repeat or stem-loop + 7 Exact match, defined step + 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 =3 +? Motif name=T'S +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-2) (2) = +? Relative start position (-1000-1000) (4) = +? Number of extra positions (0-1000) (0) = +? String=TTT +? Minimum score (0.00-108.00) (108.00) =72 +Probability of score 72.0000 = 0.258E+00 + 1 Exact match + 2 Percentage match +X 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Complement of weight matrix + 6 Inverted repeat or stem-loop + 7 Exact match, defined step + 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 =4 +? Motif name=GCN4 +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-3) (3) = +? Relative start position (-1000-1000) (4) = +? Number of extra positions (0-1000) (0) = +? Weight matrix file name=GCN4 + GCN4 FROM WEIGHTS 17-11-87 +Probability of score -22.0020 = 0.139E-02 + 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix +X 4 Cut-off score and weight matrix + 5 Complement of weight matrix + 6 Inverted repeat or stem-loop + 7 Exact match, defined step + 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 =5 +? Motif name=GCN4 +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-4) (4) = +? Relative start position (-1000-1000) (20) = +? Number of extra positions (0-1000) (0) = +? Weight matrix file name=GCN4 + GCN4 FROM WEIGHTS 17-11-87 +Probability of score -22.0020 = 0.606E-03 + 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix +X 5 Complement of weight matrix + 6 Inverted repeat or stem-loop + 7 Exact match, defined step + 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 =6 +? Motif name=LOOP +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-5) (5) = +? Relative start position (-1000-1000) (20) = +? Number of extra positions (0-1000) (0) = +? Stem length (1-60) (6) = +? Minimum loop length (-6-60) (0) = +? Maximum loop length (0-60) (0) =5 +? Minimum score (1.00-12.00) (12.00) =10 +Probability of score 10.0000 = 0.598E-02 + 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Complement of weight matrix +X 6 Inverted repeat or stem-loop + 7 Exact match, defined step + 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 =7 +? Motif name=Tstep +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-6) (6) = +? (y/n) (y) Relative to 5 prime end +? Relative start position (-1000-1000) (1) = +? Number of extra positions (0-1000) (0) = +? String=TTT +? Step (1-20) (3) = +Probability of score 3.0000 = 0.367E-01 + 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Complement of weight matrix + 6 Inverted repeat or stem-loop +X 7 Exact match, defined step + 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 =8 +? Motif name=REPEAT +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-7) (7) = +? Relative start position (-1000-1000) (4) = +? Number of extra positions (0-1000) (0) =2 +? Repeat length (1-60) (6) = +? Minimum gap (0-60) (0) = +? Maximum gap (0-60) (0) =4 +? Minimum score (1.00-6.00) (6.00) =5 +Probability of score 5.0000 = 0.554E-02 + 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Complement of weight matrix + 6 Inverted repeat or stem-loop + 7 Exact match, defined step +X 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 =9 +? (y/n) (y) Save pattern in a file N + +Pattern description + +Motif 1 named Ematch is of class 1 +Which is an exact match to the string +AA +Motif 2 named AAA is of class 2 +which is a match of score 2. to the string +AAA +and the 5 prime base can take positions 3 to 3 +relative to the 5 prime end of motif 1 +It is anded with the previous motif. +Motif 3 named T'S is of class 3 +which is a match of score 72. to the string +TTT +and the 5 prime base can take positions 4 to 4 +relative to the 5 prime end of motif 2 +It is anded with the previous motif. +Motif 4 named GCN4 is of class 4 +Which is a match to a weight matrix with score -22.002 +and the 5 prime base can take positions 4 to 4 +relative to the 5 prime end of motif 3 +It is anded with the previous motif. +Motif 5 named GCN4 is of class 5 +Which is a match to the complement of a weight matrix with score -22.002 +and the 5 prime base can take positions 20 to 20 +relative to the 5 prime end of motif 4 +It is anded with the previous motif. +Motif 6 named LOOP is of class 6 +Which is a stem-loop structure with stem length 6 and score 10. +The loop can have sizes 0 to 5 +and the 5 prime base can take positions 20 to 20 +relative to the 5 prime end of motif 5 +It is anded with the previous motif. +Motif 7 named Tstep is of class 7 +Which is an exact match to the string +TTT +with a step size of 3 +and the 5 prime base can take positions 1 to 1 +relative to the 5 prime end of motif 6 +It is anded with the previous motif. +Motif 8 named REPEAT is of class 8 +Which is a repeat with repeat length 6 and score 5. +The loop-out can have sizes 0 to 4 +and the 5 prime base can take positions 4 to 6 +relative to the 5 prime end of motif 7 +It is anded with the previous motif. +Probability of finding pattern = 0.2348E-14 +Expected number of matches = 0.5100E-09 +? Maximum pattern probability (0.00-1.00) (1.00) = +? Minimum pattern score (-9999.00-9999.00) (-9999.00) = + Select display mode +X 1 Motif by motif + 2 Inclusive + 3 Graphical + 4 EMBL feature table +? 0,1,2,3,4 =4 + Searching + + +Total matches found 0 + +Menus and their numbers are +m0 = This menu +m1 = General +m2 = Screen control +m3 = Statistical analysis of content +m4 = Structures and repeats +m5 = Translation and codons +m6 = Gene search by content +m7 = Prokaryotic signal search +m8 = Eukaryotic signal search + ? = Help + ! = Quit +? Menu or option number=67 + Pattern searcher +? (y/n) (y) Read pattern from keyboard +X 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Complement of weight matrix + 6 Inverted repeat or stem-loop + 7 Exact match, defined step + 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 = +? Motif name=Arun +? String=AAAAAA +Probability of score 6.0000 = 0.210E-03 +X 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Complement of weight matrix + 6 Inverted repeat or stem-loop + 7 Exact match, defined step + 8 Direct repeat + 9 Pattern complete +? 0,1,2,3,4,5,6,7,8,9 =9 +? (y/n) (y) Save pattern in a file N + +Pattern description + +Motif 1 named Arun is of class 1 +Which is an exact match to the string +AAAAAA +Probability of finding pattern = 0.2103E-03 +Expected number of matches = 0.1522E+01 +? Maximum pattern probability (0.00-1.00) (1.00) = +? Minimum pattern score (-9999.00-9999.00) (-9999.00) = + Select display mode +X 1 Motif by motif + 2 Inclusive + 3 Graphical + 4 EMBL feature table +? 0,1,2,3,4 =4 + Searching + + +FT Arun 1582 1587 Program +FT Arun 3160 3165 Program +FT Arun 4204 4209 Program +FT Arun 5691 5696 Program +FT Arun 6710 6715 Program +Total matches found 5 +Minimum and maximum observed scores 6.00 6.00 + +.end lit +.para +These methods allow users to define and search for +complex patterns of motifs defined as single objects. +The programs allow individual DNA motifs to be defined in eight +different +ways, and protein motifs in six. Motifs are combined, using the logical +operators AND, OR and NOT, to describe a pattern. The pattern also +specifies the ranges of allowed relative separations of the individual +motifs. +.para +First some definitions. +.para +A MOTIF is a contiguous subsequence of fixed length. +At its simplest +it could be a single definite base or amino acid; a more complex motif +might be better represented as a consensus or a weight matrix; +two more-abstract types of +motif are direct and inverted repeats. +.para +A PATTERN is a higher order of structure defined by a list of motifs. The +motifs in a pattern are combined using the logical operators AND, OR and +NOT. The list also defines the allowed relative separations of the +motifs. In the current versions of the programs up + to 50 motifs can be combined into a single pattern. So using these +definitions there are two +differences between motifs and patterns: 1) the distances between all +elements of a motif are fixed, but +the separations of parts of patterns can vary; + 2) all characters in a motif are defined +using the same method (class), but different parts of a pattern can be +defined in completely different ways. +.para +Each motif +can be represented in 9 ways (known as the motif class): +.sk1 +.lit + MOTIF CLASSES +CLASS DESCRIPTION + 1 Exact match to a short defined sequence. The IUB symbols + can be used for DNA sequences. + 2 Percentage match to a defined short sequence. In nucleic acids, + the IUB symbols can be used. + 3 Match to a defined sequence, using a score matrix and cutoff + score. The DNA matrix (see option 18) gives scores to IUB symbols + depending on their level of redundancy. MDM78 is used for proteins. + 4 Match to a weight matrix with cutoff score. + 5 As class 4 but on the complementary strand. + 6 Inverted repeat or stem-loop. Fixed stem length, range of + loop sizes, and cutoff score using A-T, G-C=2; G-T=1. + 7 Exact match to short sequence but with a defined step size. + 8 Direct repeat. Fixed repeat length, range of loop-out sizes, + cutoff score, and score matrix (for protein sequences MDM78 and + for nucleic acids an identity matrix). + 9 Membership of a set. A list of sets of allowed amino acids for + each position in the motif. The sets are separated by commas(,). + For example IVL,,,DEKR,FYWILVM defines a motif of length 5 amino + acids in which one of I,V or L must be found in the first position, + then anything in the next two positions, D,E,K or R in the fourth + position and F,Y,W,I,L,V or M in the fifth. This class only applies + to protein sequences because for nucleic acids "membership of a +set" + can be achieved using IUB symbols. + + Classes 1 - 4, 8 and 9 apply to protein sequences, and classes 1-8 to + nucleic acids. + +.end lit +.para +Class 1: exact match. +.para +The motif is defined by a short sequence, which for nucleic acids, + may include IUB symbols. All symbols must match. +.para +Class 2: percentage match +.para +The motif is defined by a short sequence, which for nucleic acids, +may include IUB symbols. The minimum number of matching characters +must +also be specified. +.para +Class 3: match using a score matrix +.para +The motif is defined by a short sequence, which for nucleic acids, +may include IUB symbols. The motif is not compared directly with the +sequence to count the number of matching characters. Instead a matrix is +used to provide a score for all possible pairs of characters. The motif +score for +any position along the sequence is the sum of the scores found by +looking-up the scores for each pair of aligned characters. A match is +declared if some minimum score is achieved. +.para +Class 4: weight matrix +.para +The motif is defined by a table of values (called weights or scores). The +table gives a score for finding each possible character at each position +along the length of the motif. It therefore +has dimension motif-length x character-set-size, and allows us to give +different scores for each character at each position. It is equivalent to +having a different score matrix for each position along the motif, and +provides the most flexible and specific method of defining motifs. The +weight matrices are created by program NIP option 20 and +stored as files. The file contains the values +for each position, as well as an overall minimum score. +There are two ways in which these values can be used to calculate an +overall +score for any section of the sequence. The simplest way is to add the +values in the file. (This means that the highest possible score +can be calculated by adding the top value at each column +position, and the lowest +by adding the bottom value.) + The normal way of using the values in the file is as +follows. +First the programs divide the values in each column by the column total +so +that they sum to 1.0 +Then the natural +logs of these values are used as scores. When the matrix is applied to a +sequence these logarithmic values are summed (which is of course +equivalent +to multiplying the frequencies). +Note that using the natural logs of the frequencies as +weights and +adding them means that the overall cutoff score must be less than zero, +whereas if the original +values in the weight matrix file are added, the cutoff score will be +greater than zero. The search routines therefore decide whether the user +wants to add values or multiply frequencies +by examining the value of the cutoff score: it will add if the cutoff +is +greater than zero and add logs of frequencies if it is less than zero. + Hence we effectively get two +motif classes in one. The program NIP, when creating weight matrix +files, will ask the user whether the scores should be added or multiplied. + If the values in the table have been defined +without using a set of aligned sequences +it is easier for the user to +choose a cutoff score if the values are added. +.para +Class 5: complement of weight matrix +.para +The motif is defined by a weight matrix, but the program searches for its +complement. +.para +Class 6: inverted repeat, or stem-loop +.para +The motif is defined by a repeat length, a minimum score + and a range of loop sizes. The scores are A-T=2, G-C=2, G-T=1, else=0. +The loop sizes are defined by a minimum +and maximum distance from the 3' end of the stem. +For a stem-loop these will be positive numbers. For example to +define a stem of length 8 and loop sizes varying from 3 to 5, the stem +would be set to 8, the minimum start distance to 3 and the maximum +to 5. To define an +inverted repeat the minimum distance will be negative. For example stem +length=9, +minimum distance=-9, and maximum distance=-8 will find +inverted repeats of lengths 9 and 10. +E.g. AAAAATTTT and AAAAATTTTT would be found, the first having a base +at +its centre, the second having none. +.para +Class 7: exact match, defined step size. +.para +The motif is defined by a short sequence, which for nucleic acids, + may include IUB symbols. All symbols must match. The class differs +from +class 1 in that searches will move in steps of some given size. For +example +we could search for a certain codon and use a step size of 3 and hence + keep in a +single reading frame. +.para +Class 8: direct repeat +.para +The motif is defined by a repeat length, a minimum score + and a range of loop sizes. The scores are defined using MDM78 for protein +sequences and an identity matrix for nucleic acids. +The loop sizes are defined by a minimum +and maximum distance from the 3' end of the stem. +.para +Class 9: membership of a set +.para +This motif class is for protein sequences. It is defined by lists of +allowed amino acids for each position in the motif, and a cut-off score. +Positions at which any amino acid can occur are left blank. +All allowed amino acids for each position give a score of 1. +The motifs can be defined in two ways: either typed at the keyboard or +read +in as a weight-matrix-like file. +When the motif is defined at the keyboard the sets of allowed amino +acids +are separated by commas(,). + For example IVL,,,DEKR,FYWILVM defines a motif of length 5 amino + acids in which one of I,V or L must be found in the first position, + then anything in the next two positions, D,E,K or R in the fourth + position and F,Y,W,I,L,V or M in the fifth. To specify that the +whole motif must match a score of 3 would be required (i.e. one of the +allowed amino acids must be found for each of the three defined +positions). +If the motif is read from a file the file must have been written by +program +NIP, or have been saved by the pattern searching routines. If the +user +elects to save a pattern, and it includes class 9 motifs typed at the +keyboard, then the program will save the class 9 motifs as weight matrix +files. Therefore it will request file names for each motif of this class. +If the motif given above as an example were saved the weight matrix file +would have 5 columns. +The first column +would contain zeroes except for the I, V and L rows +which would be set to 1; the next two columns would all be zero; the next +would be zero except for the D,E,K and R rows which would be 1; the final +column would contain 1's in rows F,Y,W,I,L,V and M, with +the rest zero. +.para + +The logical operator (AND, OR or NOT) used to add each motif to the +pattern +is specified by preceding +the class number by the letters A, O or N. A = AND, O = OR, N = NOT. +The default is A, so N2 means include, using the NOT operator, a class 2 +motif; O2 means include, using the OR operator, a class 2 motif; both A2 +and +2 mean include, using the AND operator, a class 2 motif. + +.para +Range setting. +.para +The motifs in a pattern are numbered according to their order in the list. +Apart from the first motif in a pattern all motifs are given a range +of allowed positions relative to a motif further up the list. +For example +suppose we have a pattern defined by A AND B AND C AND D. +Motif A can occur anywhere, but B must have its range of allowed +positions defined relative to the position of motif A, and C's positions +can be defined relative to either A or B, depending on which is most +convenient, and likewise D's positions can be relative to A or B or C. +.para +Notice that the positions of motifs can be defined relative to more than +one motif. Suppose we have a pattern consisting of +motifs A, B and C, and that B occurs 5-10 residues right of A, C occurs 5- +10 +residues right of B, and also C is never more than 15 residues from A. +Then +it is quite consistent with the methods to include motif C into the +pattern +twice using the AND operator: once relative to A and once relative to B. +This will define the relative spacing and the ORDER of the motifs in the +pattern. (If we simply defined the position of C relative to A it could be +found to the left of B). +.para +Motifs combined together using the OR operator are all given the same +range. For example suppose we had a pattern A AND (B OR C) AND (D OR E), + then B and C each have the same range, and D and E also have +the same range as one another. The range for D and E can be relative to +A or to B. +.para +Motifs cannot have their ranges defined relative to motifs that are +included using the NOT operator. For example if we had the pattern A NOT +B +AND C, then the range for C can only be defined relative to motif A. +.para +Speed can be gained by arranging the order +of the motifs so that those higher up the list are of types that can be +searched for rapidly and that are also unlikely to be found. +.para +Motifs combined by the OR operator are alternatives: if any one of a set +of motifs +combined by the OR operator is found, then a match is declared. All +alternatives will be reported. For example if we had a pattern defined by +A +AND (B OR C), then all places where A occurs and B is found within range, +and all places where A is found and C is found within range will be +reported. A typical use would be where we might allow a motif to appear +on +either strand of the DNA sequence. For example a weight matrix +representing +the heatshock element could be used in a pattern which included +heatshock +as a motif class 4 combined using the OR operator +with heatshock as a motif class 5. +.para +The probability calculations are performed for each motif as it is +defined. +If an overall probability cut-off is given the calculation is repeated for +each match found. To achieve maximum searching speed do not give an +overall +probability cut-off. Overall cut-off scores should only be used if the +motif +classes used are compatible. +.para +There are currently +several ways to display the matches: 1 = each +motif and its position is listed; 2 = all the sequence between the two +outermost motifs is listed; 3 = graphical, with a spike marking the +position +of the leftmost motif. The library versions also give entry names, and a +one +line title; in addition they can be used to produce aligned families of +sequences. When this mode of output is selected the program will write a +separate file for each match. The files will be called ENTRYNAME.DAT +where +ENTRYNAME is the name of the entry in the library. The matching +sequence +will be written out so that the spacing between motifs is constant, and +set to the maximum allowed by the pattern definition. Any gaps will be +filled with dashes (-). If the individual sequences were subsequently +written one above the other +they should line up so that all motifs are in register. There two types of +output of this sort: one, option 4, writes out whole sequences, the other, +option 5, writes out only the sequences between the two outermost +motifs. +If the individual sequences were subsequently +written one above the other +they should line up so that all motifs are in register. There two types of +output of this sort: one, option 4, writes out whole sequences, the other, +option 5, writes out only the sequences between the two outermost +motifs. +Note that for option 4 users are asked to type the position of the +first motif, and the reason for +this is explained below. +Consider a pattern found in several sequences. Consider only +the first motif in +the pattern and suppose that it was found in different positions in these +sequences. +Say that of these positions the one furthest from the left end was +position 100. Then, in order to ensure that all the sequences would align, +we must specify that motif 1 must start at position 100. +Any sequences in which motif 1 started +nearer to the left end than position 100 would be padded accordingly. +These modes of output +should only be used when the position of each motif is defined relative to +its +immediate neighbour. +.para +The pattern descriptions can be saved to files. These files +can be used instead of typing definitions again at the keyboard. As the +files are annotated, +they can easily +be changed using system editors, and the modified versions used to +define the variant patterns for the programs. +.para +Use of lists of entry names +.para +The two programs that operate on libraries have the ability to +restrict their searches to subsets of the libraries. This does not require +sublibraries to be created but instead is achieved by using files +containing a list of the entry names of sequences. The user may choose to +search only those entries on the list or, alternatively to search all but +those on the list (i.e. in the latter case +the list contains the names of those to be excluded). + The programs can search libraries that have indexes and those that +do not. + If a list of names for inclusion is used, +then the search will be faster if the index is present. In all other +circumstances the whole library will be read. +The list must be in library order except when it is used +to include entries, and an index is available. +The list must contain each entry name on a separate line, with the name +starting in column 1 of the line. ie there must be no spaces at the start +of the line. +The list of entry names +can be produced by the keyword searches of nip, pip, etc as long +as the listings produced have a space character separating the entry name +from the entry description. This will depend on how well the library +reformatting programs work. For example swissprot entry names tend to run +into the beginning of the descriptions, but other libraries are generally +OK. +.para +One use of the programs is to look for patterns that we already know +about, but in new sequences. However it is hoped that they will also be +useful for finding new motifs. For example +several known control regions in +nucleic acid +sequences consist of particular direct or inverted repeats; +the inclusion of +direct and inverted repeats as motif classes +makes it possible to +find previously unknown +motifs of these types. +Using these new programs we can +ask questions like: "are there any inverted or direct repeats near to +sections of sequence that contain both a +CCAAT box and a TATA box?"; and to search for such things throughout +the +libraries. In addition, the mode of output in which all the sequence +between +the two outermost motifs found is printed out, allows us to extract +sequences and examine them in more detail for further common +subsequences. +For example we might want to collect together all the sequences +between +putative CCAAT and TATA boxes. +.para +A further use of the inverted repeat motif class is the following. If a +regulatory sequence in DNA is poorly defined but also an inverted repeat, +then it might be an advantage to specify it both as a consensus sequence +and +a superimposed inverted repeat. In this way two weak definitions can be +combined to produce a stronger pattern. +.para +Given only a few examples of a motif it +should be possible to perform initial searches using a +class 3 motif, and then, using plausible matching sequences, create a +more +specific weight matrix for the same motif. +.para +If motifs are combined with the first motif using the OR operator +they will be ignored until all +permutations that include the first motif have been looked for. +The whole search will then be repeated, in +turn, for each of +those motifs that are combined with the first motif using the OR +operator. +An interesting consequence of this is that the program can be used, +without +change, to compare any newly determined sequence with all known +individual +motifs. We achieve this by having a pattern in which all known relevant +motifs are combined using the OR operator. +If we ask to use this pattern with +a sequence, the program will automatically compare each individual +motif in +the pattern with the whole length of the +sequence. As the number of known +motifs grows this should become an increasingly useful standard +procedure. +.para +The NOT operator is obviously +useful for making sure particular motifs are not present, but it can also +be used to bracket the levels of matches found. We may want a degree of +match that lies between two limits - binding should occur, but not too +strongly; or base-pairs should form, but not too many. We can specify +this +by asking for a match with a low score, in combination with a match and +a +high score, both for the same motif, but with the high score included +using +the NOT operator. +.para +The algorithm is designed to find all sections of a sequence that satisfy +the pattern rather than only the best match. +Particularly if some of the motifs in a pattern are less well defined than +others, this can often result in the same region of a sequence being +reported as having several matches, but which only vary in the +positions of the weakest motifs. +.para +General remarks on motif searching +.para +Generally motifs are short subsequences that are thought to be +associated with +particular functions in some known sequences. Often +we search for them to try to +understand or interpret other sequences. Sometimes we search for +motifs and +patterns to +test a hypothesis about their role: are they found in the expected +positions in the expected sequences. In doing so we should remember +that, in both proteins and nucleic acids, + what we are really looking for is a particular +three dimensional structure with certain affinities for other structures, +and that we are assuming that the sequence of the motif alone +defines the 3D structure we searching for. + The overall structure +may be completely different to those in which the motif is functional, +and +hence the motif may have a different shape or be inaccessible. +We should be aware of the +importance of the context in which a motif is found. Where does it lie +relative to the overall structure, is it accessible, is the three +dimensional spacing between +it and other motifs correct? For example, is it on the same side of the +double helix, and the correct distance from some other motif? How does +context affect our assessment of the significance of finding a motif? +Finding false mammalian mRNA splice junctions in non-coding sequences +is +far less important than finding false sites in pre-mRNA sequences, but +finding them in the correct places is most important! In other words, it +is +often the case that when we are searching for a motif that is known to +be +necessary for some function, then a positive result in the form of a +match +in the required position, is more important than a high background of +matches in the wrong positions. Being + able to write +down the probability of finding a motif in a random sequence tells us how +well it is defined. +In nucleic +acids the DNA may contain many superimposed types of information such +as +those concerned with histone phasing, protein coding or mRNA secondary +structure. These overlapping "codes" may interfere with one another +causing +matches to motifs to be poorer than expected. +In general we will only have a limited number of examples of the +motif and we do not know how representative they are. +.para +Sequences have superimposed functions: some parts may be of general +structural +importance and give rise to an overall framework, and other parts give +specificity and hence are not common; we may want to use a set of +aligned +sequences to define a motif, but want to use only the framework +positions. + Alternatively we may want to pick out +only those parts of a set of aligned sequences that give a particular +property, and to ignore other similarities that are due to some other +property +and which could obscure the pattern +we are interested in. +It is possible to apply a mask to a set of aligned sequences in +order to give weight to selected positions only. + The ability to define a mask allows certain positions +to be used in the motif and others to be ignored, and yet still permits the +use of a set of aligned sequences to calculate weights. The mask is +requested and applied +by the program and results in the masked positions being zero +in +the weight matrix. The mask is defined in the following way. +Suppose we had a motif of length 15, then the mask +x--x--xx-x will give zero weights to positions 2,3,5,6 and 9 (note it is +the dashes (-) that are significant and that positions +1,4,7,8,10,11,12,13,14 and 15 +will be non-zero). Of course +the same set of sequences could be used with several alternative masks +in +order to extract different features and create corresponding weight +matrices. +.para +The programs are described in Staden,R. +CABIOS 4, 53-60, 1988; Staden,R. + CABIOS 5, 89-96, 1989, and Methods in Enzymology 183, 193-211 (1990). +.left margin1 +@ end of help diff --git a/help/NIPF.RNO b/help/NIPF.RNO new file mode 100644 index 0000000..ccc4bad --- /dev/null +++ b/help/NIPF.RNO @@ -0,0 +1,88 @@ +.NPA +.SP 1 +.left margin1 +@-1. TX 0 @General +.sp +@-2. TX 0 @Screen control +.sp +@-3. TX 0 @Statistical analysis +.sp +@-1. TX 0 @General +.sp +@-2. TX 0 @Screen control +.sp +@-3. TX 0 @Statistical analysis +.sp +@0. TX -1 @NIPF +.sp +@1. TX 1 @ Help +.sp +@2. TX 1 @ Quit +.sp +@3. TX 1 @ Read new sequence +.sp +@4. TX 1 @ Redefine active region +.sp +@5. TX 1 @ List the sequence +.sp +@6. TX 1 @ List a text file +.sp +@7. TX 1 @ Direct output to disk +.sp +@8. TX 1 @ Write active sequence to disk +.sp +@9. TX 1 @ List a translation +.sp +@32. TX 1 @ List showing base differences +.sp +@37. TX 1 @ List showing translation +.sp +@33. TX 1 @ List showing amino acid differences +.sp +@10. TX 2 @ Clear graphics +.sp +@11. TX 2 @ Clear text +.sp +@12. TX 2 @ Draw a ruler +.sp +@13. TX 2 @ Use cross hair +.sp +@14. TX 2 @ Reset margins +.sp +@15. TX 2 @ Label diagram +.sp +@16. TX 2 @ Display a map +.sp +@17. TX 3 @ Set comparison mode +.sp +@18. TX 3 @ Set sort mode +.sp +@21. TX 3 @ Count base changes +.sp +@22. TX 3 @ Count codon changes +.sp +@23. TX 3 @ Count genetic events +.sp +@24. TX 3 @ Show table of base changes +.sp +@36. TX 3 @ Show table of expressed base changes +.sp +@39. TX 3 @ Show table of silent base changes +.sp +@38. TX 3 @ Estimate mutation rate +.sp +@25. TX 3 @ Plot base changes +.sp +@26. TX 3 @ Plot expressed changes per base +.sp +@27. TX 3 @ Plot silent changes per base +.sp +@28. TX 3 @ Count expressed changes per base +.sp +@29. TX 3 @ Count silent changes per base +.sp +@30. TX 3 @ Count changed amino acids +.sp +@31. TX 3 @ Plot amino acid variability +.sp +@ end of help diff --git a/help/PIP.RNO b/help/PIP.RNO new file mode 100644 index 0000000..24e42a2 --- /dev/null +++ b/help/PIP.RNO @@ -0,0 +1,2469 @@ +.NPA +.SP 1 +.left margin1 +@-1. TX 0 @General +.sp +@-2. T 0 @Screen control +.sp +@-2. X 0 @Screen +.sp +@-3. T 0 @Statistical analysis of content +.sp +@-3. X 0 @Statistics +.sp +@-4. T 0 @Structures and repeats +.sp +@-4. X 0 @Structures +.sp +@-5. TX 0 @Search +.sp +@0. TX -1 @PIP +.para +This is a program for analysing individual protein sequences. It can read +sequences stored in many of the most commonly used formats, and +performs all of the usual simple analyses. In addition it has very flexible +search procedures and presents many of its results graphically. +.PARA +The following analyses (preceded by their option numbers) are included: +.lit + ? = Help + ! = Quit + 3 = read a new sequence + 4 = define active region + 5 = list the sequence + 6 = list a text file + 7 = direct output to disk + 8 = write active sequence to disk + 9 = edit the sequence +10 = clear graphics screen +11 = clear text screen +12 = draw a ruler +13 = use cross hair +14 = reposition plots +15 = label diagram +16 = display a map +17 = search for short sequences +18 = compare a sequence +19 = compare a sequence using a score matrix +20 = search for a sequence using a weight matrix +21 = calculate amino acid composition +22 = plot hydrophobicity +23 = plot charge +24 = plot Robson prediction +25 = plot hydrophobic moment +26 = draw helix wheel +27 = back translate +28 = search for patterns of motifs +.end lit +.para +Some of these methods produce graphical + results +and so the +program is generally used from a graphics terminal (a vdu on which lines +and points can be drawn as well as characters). +.para +For users of VT640's or their equivalents the +terminal must be set nowrap (type NOWRAP) prior to running the program. +.LEFT MARGIN2 +The positions of each of the plots is defined relative to a users drawing +board which has size 1-10,000 in x and 1-10,000 in y. +Plots for +each option are drawn in a window defined by x0,y0 and xlength,ylength. +Where x0,y0 is the position of the bottom left hand corner of the window, + and xlength is the width of the window and ylength the +height of the window. +.lit + --------------------------------------------------------- 10,000 + 1 1 + 1 -------------------------------------- ^ 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 1 1 ylength 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 -------------------------------------- v 1 + 1 x0,y0^ 1 + 1 <---------------xlength--------------> 1 + --------------------------------------------------------- 1 + 1 10,000 + +.end lit +All values are in drawing board units (i.e. 1-10,000, 1-10,000). +The default window positions are read from a file "ANALPMRG" when the +program is started. Users can have their own file if required. +.para +The program can handle sequences stored in several formats: +Staden, EMBL, GENBANK, PIR (also known as NBRF) and GCG and they are described +in +the help for 'READ NEW SEQUENCE'. +.para +The options for the program are accessed from 5 main menus: general, +screen control, statistical analysis of content, structure, search. +Both menus and options are selected by number. +.LEFT MARGIN1 +@1. TX 0 @Help +.LEFT MARGIN2 +.para +This option gives online help. The user should select option numbers and +the current documentation will be given. Note that option 0 gives an +introduction to the program, and that ? will get help from anywhere in +the +program. +The following analyses (preceded by their option numbers) are included: +.sp +.left margin1 +@2. TX 0 @Quit +.left margin2 +.para +This function stops the program. +.left margin1 +@3. TX 1 @Read a new sequence +.LEFT MARGIN2 +.para +This option allows users to read in new sequences, browse through annotations, + or search sequence +libraries for keywords. Sequences can be read from "personal" +sequence files or from sequence libraries. These are referred to as the +sequence "source". Personal files can be stored in several formats: +Staden, PIR, EMBL, GENBANK and GCG. +At LMB we use "Staden" format for sequencing and all +the +libraries are stored in their original formats. Note, however, that libraries +such as EMBL or GenBank that are divided into several files (eg GenBank has +13 separate files) are indexed as a whole. This means that users do not need +to know which file contains an entry, only which library. +When the user selects to read in a sequence the program first asks for the +sequence "source". +.para +If the user selects "personal" the program will ask for +the format (Staden, PIR, EMBL, GENBANK or GCG), and then for the name of +the file. For PIR format the user will also be required to know the entry +name of the sequence as the file can contain several. For the other formats +only a single entry is expected. The file will be read, its length and +composition will be displayed and the option left. +.para +If the user selects "library" as the sequence source the program will display a +list of available libraries. The programs are capable of handling all current +libraries but which ones are available will vary from site to site. At LMB we +have several libraries and also weekly updates of data gathered between releases. +The program will ask users to select a library and then give a list of options: +.lit + + X 1 Get a sequence + 2 Get annotations + 3 Get entrynames from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + +.end lit +If get a sequence or get annotations is selected users will be asked to +type the entry name. The option will be left when a sequence is selected or +! is typed. The composition and length will be displayed. +.para +The text index contains all words from feature tables, reference titles, +definition lines, keywords lists and comments, so the text index search +is most useful. It is also the fastest. Up to 5 words can be searched for +at once. The words should be typed separated by spaces, for example +.lit + ? Keywords=P53 mouse murine tumo + +.end lit +will search for all entries that contain words starting with p53, mouse, +murine and tumo. Only the unique entries that contain ALL words will be +listed. Before listing the matching entries +the program will show the number of 'hits' for each word and ring the bell. +Escape is possible at this point, or after each screenfull of entries. +In addition to the entry names the text search displays the primary accession +number, the sequence length and up to 80 characters of description. +(The search of 'titles' is now redundant because the full text index +contains all the title words and the search is much faster. It will probably +be removed from the program.) +All searches are independent of case. Where +possible the program will offer default entry names. +.para +Typical dialogue follows. +.lit +Select sequence source +X 1 Personal file + 2 Sequence library +? Selection (1-2) (1) = +Select sequence file format +X 1 Staden + 2 EMBL + 3 GenBank + 4 PIR + 5 GCG +? Selection (1-5) (1) = +? Sequence file name=M13MP7.SEQ + Contig title removed +Sequence length= 7238 + Sequence composition + T C A G - + 2405. 1539. 1765. 1527. 2. + 33.2% 21.3% 24.4% 21.1% 0.0% + . + . + . + + + Select sequence source + X 1 Personal file + 2 Sequence library + ? Selection (1-2) (1) =2 + Select a library + X 1 EMBL 29 nucleotide library Dec 91 + 2 SWISSPROT 20 protein library Nov 91 + 3 PIR 31 protein library Dec 91 + 4 NRL3D 58 From Brookhaven protein library Dec 91 + 5 GenBank + ? Selection (1-5) (1) = +Library is in EMBL format with indexes + Select a task + X 1 Get a sequence + 2 Get annotations + 3 Get entry names from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + ? Selection (1-5) (1) =5 + Search for keywords + ? Keywords=P53 mouse +P53 hits 68 +MOUSE hits 8180 + + MMANT01 X00875 536 Murine gene fragment for cellular tumour antigen + MMANT02 X00876 83 Murine gene fragment for cellular tumour antigen + MMANT03 X00877 21 Murine gene fragment for cellular tumour antigen + MMANT04 X00878 261 Murine gene fragment for cellular tumour antigen + MMANT05 X00879 184 Murine gene fragment for cellular tumour antigen + MMANT06 X00880 113 Murine gene fragment for cellular tumour antigen + MMANT07 X00881 110 Murine gene fragment for cellular tumour antigen + MMANT08 X00882 137 Murine gene fragment for cellular tumour antigen + MMANT09 X00883 74 Murine gene fragment for cellular tumour antigen + MMANT10 X00884 107 Murine gene for cellular tumour antigen p53 (exon + MMANT11 X00885 562 Murine p53 gene 3' region with exon 11 + MMANTP53 M26862 536 Mouse tumor antigen p53 gene, 5' end. + MMLYN M64608 2044 Mouse lyn protein mRNA, complete cds. + MMP53 X00741 1377 Mouse mRNA for transformation associated protein + MMP53A M13872 1285 Mouse p53 mRNA, complete cds, clone pcD53. + MMP53B M13873 1241 Mouse p53 mRNA, complete cds, clone p53-m11. + MMP53C M13874 1322 Mouse p53 mRNA, complete cds, clone p53-m8. + MMP53G1 X01235 554 Mouse genomic DNA for 5' region of cellular tumou + MMP53IN4 X60470 729 M.musculus p53 gene for p53 protein, intron 4 + MMP53P X01236 2132 Mouse pseudogene for cellular tumour antigen p53 + MMP53R X01237 1773 Mouse mRNA for cellular tumour antigen p53 + MMRSB2P5 M64597 196 Mouse B2 repeat in the 3' flank of protein 53 (p5 + 22 different entries found + + Select a task + X 1 Get a sequence + 2 Get annotations + 3 Get entry names from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + ? Selection (1-5) (1) =4 + Search for keywords + ? Keywords=alpha + Searching for alpha + AAGHA 623 a.anguilla mrna for glycoprotein hormone alpha subunit precu + AAMALI 3338 a.aegypti mali gene encoding alpha 1-4 glucosidase, complete + AAMALIA 1659 a.aegypti maltase-like i (mali) gene encoding alpha-1,4-gluc + AAMALIB 1832 a.aegypti maltase-like i (mali) mrna encoding alpha-1,4-gluc + ACA13GT 371 alouatta caraya alpha-1,3gt gene, 3' flank. + ADHBADA1 102 duck alpha-d-globin gene, exon 1. + ADHBADA2 1145 duck alpha-a-globin gene and 5' flank + ADHBADWP 513 duck (white pekin) alpha ii (minor) globin mrna, complete co + AEACOXABC 5279 a.eutrophus protein x (acox), acetoin:dcpip oxidoreductase-a + AGA13GT 371 ateles geoffroyi alpha-1,3gt gene, 3' flank. + AGAAAGFP 282 c.tetragonoloba alpha-amylase/alpha-galactosidase fusion pro + AGAABL 138 b.subtilis alpha-amylase signal peptide gene e.coli beta-lac + AGAFAMYA 57 synthetic b.stearothermophilus alpha amylase/s.cerevisiae ma + AGAFAMYB 57 synthetic b.stearothermophilus alpha amylase/s.cerevisiae ma + AGAFAMYC 57 synthetic b.stearothermophilus alpha amylase/s.cerevisiae ma + AGAFCOXA 98 synthetic alpha-factor/cox iv fusion gene signal peptide. + AGAGABA 7876 synthetic gossypium hirsutum (cotton) alpha globulin a and b + AGAMYLS 120 synthetic alpha-amylase gene, 5' end. + AGANPS 95 synthetic gene (jcnf-1) encoding alpha-factor pro-region/han +! + Select a task + X 1 Get a sequence + 2 Get annotations + 3 Get entry names from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + ? Selection (1-5) (1) =3 + ? Accession number=v00636 +Entry name LAMBDA + Select a task + X 1 Get a sequence + 2 Get annotations + 3 Get entry names from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + ? Selection (1-5) (1) =2 + Default Entry name=LAMBDA + ? Entry name= +ID LAMBDA standard; DNA; PHG; 48502 BP. +XX +AC V00636; J02459; M17233; X00906; +XX +DT 03-JUL-1991 (Rel. 28, Last updated, Version 3) +DT 09-JUN-1982 (Rel. 1, Created) +XX +DE Genome of the bacteriophage lambda (Styloviridae). +XX +KW circular; coat protein; DNA binding protein; genome; +KW origin of replication. +XX +OS Bacteriophage lambda +OC Viridae; ds-DNA nonenveloped viruses; Siphoviridae. +XX +RN [1] +RP 1-48502 +RA Sanger F., Coulson A.R., Hong G.F., Hill D.F., Petersen G.B.; +RT "Nucleotide sequence of bacteriophage lambda DNA"; +RL J. Mol. Biol. 162:729-773(1982). +XX +! + Select a task + X 1 Get a sequence + 2 Get annotations + 3 Get entry names from accession numbers + 4 Search titles for keywords + 5 Search text index for keywords + ? Selection (1-5) (1) = + Default Entry name=LAMBDA + ? Entry name= +DE Genome of the bacteriophage lambda (Styloviridae). + Sequence length 48502 + Sequence composition + T C A G - + 11988. 11360. 12336. 12818. 0. + 24.7% 23.4% 25.4% 26.4% 0.0% + +.end lit +.left margin1 +@4. TX 1 @Redefine active region +.LEFT MARGIN2 +.para +For its analytic functions +the program always works on a region of the sequence called the active +region. When a new sequence is read into the program the active region is +automatically set to start at the beginning of the sequence and go +up to the +maximum allowed size of active region the version of the program can +handle. The positions are shown on the screen. +On most machines this will be to the end of the sequence. +This option allows the user define a different region. Note that for +convenience in the +listing and translation functions the user is given access to regions +outside the active region. +.left margin1 +@5. TX 1 @List a sequence +.LEFT MARGIN2 +.para +The sequence can be listed with line lengths from +10 to 120 in multiples of 10. Output can be directed to a disk file by +first selecting disk output. The output looks like: +.lit + + 10 20 30 40 50 60 + MQLNSTEISE LIKQRIAQFN VVSEAHNEGT IVSVSDGVIR IHGLADCMQG EMISLPGNRY + + 70 80 90 100 110 120 + AIALNLERDS VGAVVMGPYA DLAEGMKVKC TGRILEVPVG RGLLGRVVNT LGAPIDGKGP + + 130 140 150 160 170 180 + LDHDGFSAVE AIAPGVIERQ SVDQPVQTGY KAVDSMIPIG RGQRELIIGD RQTGKTALAI + + 190 200 210 220 230 240 + DAIINQRDSG IKCIYVAIGQ KASTISNVVR KLEEHGALAN TIVVVATASE SAALQYLARM + + 250 260 270 280 290 300 + PVALMGEYFR DRGEDALIIY DDLSKQAVAY RQISLLLRRP PGREAFPGDV FYLHSRLLER + + 310 320 330 340 350 360 + AARVNAEYVE AFTKGEVKGK TGSLTALPII ETQAGDVSAF VPTNVISITD GQIFLETNLF + + 370 380 390 400 410 420 + NAGIRPAVNP GISVSRVGGA AQTKIMKKLS GGIRTALAQY RELAAFSQFA SDLDDATRKQ + + 430 440 450 460 470 480 + LDHGQKVTEL LKQKQYAPMS VAQQSLVLFA AERGYLADVE LSKIGSFEAA LLAYVDRDHA + + 490 500 510 520 530 540 + PLMQEINQTG GYNDEIEGKL KGILDSFKAT QSW* + +.end lit +.left margin1 +@6. TX 1 @List a text file +.LEFT MARGIN2 +.para +Allows the user to have a text file displayed on the screen. It will appear +one page at a time. +.left margin1 +@7. TX 1 @Direct output to disk +.LEFT MARGIN2 +.para +Used to direct output that would normally appear on the screen to a file. +.para +Select redirection of either text or graphics, and +supply the name of the file that the output should be written to. +.para + The results from the next options selected will not appear on the screen +but will be written to the file. When option 7 is selected again +the file will be +closed and output will again appear on the screen. +.left margin1 +@8. TX 1 @Write active region to disk +.LEFT MARGIN2 +.para +The program has the capability of reading in EMBL, GENBANK, NBRF, GCG +and Staden formats +and of reversing and complementing sequences. This option allows users +to +write the current active sequence to a disk file in Staden format. Hence +it +allows format conversion and crude sequence cutting. +.left margin1 +@9. TX 1 @Edit the sequence +.LEFT MARGIN2 +.para +Used to edit sequences or any other files by giving access to the +computers system editor. For editing sequences the input file should +have already been created using the listing function "list +sequence". +.para +Supply the name of the file to edit. Wait while the system editor is made +ready (can take awhile on a vax). Use the editor. Exit from the editor. If a +sequence has been edited, and you want to process it, affirm that the +sequence should be "made active". The edited sequence will replace the +original sequence. +.para +This editing method is designed to give users access to an editor with +which they are familiar - i.e. the one on their machine, and yet to allow +them to edit a sequence which contains the landmarks they need in +order to know where they are. Users can create files containing simple +listings with numbering, using "list the sequence", and +then edit them with their system editor, using the numbering to know +where they are within the sequence. When the edits are complete they +exit from the editor and the program "analyses" the edited file to extract +only the sequence characters. Define the permitted set of characters to be: +ACDEFGHIKLMNPQRSTVWXYZ-acdefghiklmnpqrstvwxyz. All permitted characters +found in the file will become part of the sequence, all others removed. +.left margin1 +@10. TX 2 @Clear graphics +.LEFT MARGIN2 +.para + Clears the screen of both text and graphics. +.left margin1 +@11. TX 2 @Clear text +.LEFT MARGIN2 +.para + Clears only text from the screen. +.left margin1 +@12. TX 2 @Draw a ruler +.LEFT MARGIN2 +.para +This option +allows the user to draw a ruler or scale along the x axis of the screen to +help identify the coordinates of points of interest. The user can define +the position of the first amino acid to be marked (for example if the +active +region is 1501 to 8000, the user might wish to mark every 1000th amino +acid +starting at either 1501 or 2000 - it depends if the user wishes to treat +the active region as an independent unit with its own numbering starting +at +its left edge, or as part of the whole sequence). The user can also define +the separation of the ticks on the scale and their height. If required the +labelling routine can be used to add numbers to the ticks. +.left margin1 +@13. TX 2 @Use cross hair +.LEFT MARGIN2 +.para +This function puts +a steerable cross on the screen that can be used to find the +coordinates of points in the sequence. The user can move the cross +around using the directional keys; when he hits the space bar the +program will print out the coordinates of the cross in sequence units and +the option will be exited. +.para +If instead, +you hit a , the position will be displayed but the cross will remain on +the screen. +.para +If a letter s is hit the sequence around the cross hair is displayed and +the cross remains on the screen. +.left margin1 +@14. TX 2 @Reset margins +.LEFT MARGIN2 +.para +The positions of each of the plots is defined relative to a users drawing +board which has size 1-10,000 in x and 1-10,000 in y. +Plots for +each option are drawn in a window defined by x0,y0 and xlength,ylength. +Where x0,y0 is the position of the bottom left hand corner of the window, + and xlength is the width of the window and ylength the +height of the window. +.lit + --------------------------------------------------------- 10,000 + 1 1 + 1 -------------------------------------- ^ 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 1 1 ylength 1 + 1 1 1 1 1 + 1 1 1 1 1 + 1 -------------------------------------- v 1 + 1 x0,y0^ 1 + 1 <---------------xlength--------------> 1 + --------------------------------------------------------- 1 + 1 10,000 + +.end lit +All values are in drawing board units (i.e. 1-10,000, 1-10,000). +The default window positions are read from a file "ANALMARG" when the +program is started. Users can have their own file if required. +As all the plots start +at the same position in x and have the same width, x0 and xlength are the +same for all options. Generally users will only want to change the start +level of the window y0 and its height ylength. + This option +allows users to change window positions whilst running the program. +The routine prompts first for the number of the option that the users +wishes +to reposition; then for the y start and height; then for the x start and +length. Note that changes to the x values affect all options. If the user +types only carriage return for any value it will remain unchanged. +The cross-hair can be used to choose suitable heights. +.LEFT MARGIN1 +@15. TX 2 @Label a diagram +.LEFT MARGIN2 +.para +This routine allows users to label any diagrams they have produced. They +are asked to type in a label. When the user types carriage return to finish +typing the label the cross-hair appears on the screen. The user can +position it anywhere on the screen. If the user types R (for right justify) + the label will be +written on the diagram with its right end at the cross-hair position. +If the user types L (for left justify) the label will be written on the +diagram with its left end at the cross hair position. +The +cross-hair will then immediately reappear. The user may put the same +label +on another part of the diagram as before or if he hits the space bar he +will be asked if he wishes to type in another label. +.left margin1 +@16. TX 2 @Display a map +.LEFT MARGIN2 +.para +It is often convenient to plot a map alongside graphed analysis in order +to +indicate features within the sequence. This function allows users to +draw +maps using files arranged in the form of EMBL feature tables. Of course +the +EMBL table are usually only used for nucleic acid sequence annotation +but, +as long as the features are written in the correct format, they can be +employed by this routine. The map is composed of a line representing the +sequence and then further lines denoting the endpoints of each feature +the +user identifies. The user is asked to define height at which the line +representing the sequence should be drawn; then for the feature height; +then for the features to plot. +.left margin1 +@17. TX 1 5 @Short sequence search +.LEFT MARGIN2 +.para +This routine is used to search for exact matches to short sequences. It is +equivalent to the restriction enzyme search in program NIP. It and can +either list matches +or present the results graphically. +.PARA +Select from searching, screen clearing or file listing. Choose a file of +strings and the mode of output required. +.para +The files of short +sequences (strings) and their names +need to be arranged in a particular way. For example +.lit +ACID/D/E// +BASIC/R/K/H// +HYDRO/F/L/I/V/Y// +GLYCO/N-S/N-T// ++/R/K/H// +-/D/E// +.end lit +defines various groups of amino acids. +Each string or set of strings must be +preceded by a name, each string must be preceded and +terminated with a slash (/), and +each set of strings by 2 slashes. These collections of strings and their +names can be read from disk or entered from the keyboard. Two files +containing sequences are currently +available. One contains named groups of amino acids. The other simply +contains the names of all amino acids and gives a convenient way of +producing a plot of the positions of all the different +amino acids in the sequence. +The user can select strings +by name from these collections. Results can be displayed name by name +or all +together. +Strings entered from the keyboard need to be separated by slash +characters(/). +For the name by name search the output looks like: +.lit + MATCHES= 12 + NAME SEQUENCE POSITION FRAGMENT LENGTHS + ACID E 7 7 1 + ACID E 10 3 1 + ACID E 24 14 1 + ACID E 28 4 1 + ACID D 36 8 1 + ACID D 46 10 2 + ACID E 51 5 2 + ACID E 67 16 2 + ACID D 69 2 2 + ACID D 81 12 2 + ACID E 84 3 2 + ACID E 96 12 3 + MATCHES= 10 + NAME SEQUENCE POSITION FRAGMENT LENGTHS + BASIC K 13 13 1 + BASIC R 15 2 1 + BASIC H 26 11 1 + BASIC R 40 14 1 + BASIC H 42 2 2 + BASIC R 59 17 2 + BASIC R 68 9 2 + BASIC K 87 19 2 + BASIC K 89 2 2 + BASIC R 93 4 2 + MATCHES= 1 + NAME SEQUENCE POSITION FRAGMENT LENGTHS + GLYCO NST 4 4 3 + + or when the results are ordered only on position the output looks like: + + NAME SEQUENCE POSITION FRAGMENT LENGTHS + GLYCO NST 4 3 + ACID E 7 3 + ACID E 10 3 + BASIC K 13 3 + BASIC R 15 2 + ACID E 24 9 + BASIC H 26 2 + ACID E 28 2 + ACID D 36 8 + BASIC R 40 4 + BASIC H 42 2 + ACID D 46 4 + ACID E 51 5 + BASIC R 59 8 +.end lit +.LEFT MARGIN2 +Graphical output marks the position of each string by a +short vertical line and gives its name at the left end of the +line. If the top of the screen is reached the program gives the user the +oportunity to take a hard copy and then will clear the screen and restart +plotting results at the original start position. +Note that any character in the string +that is not a recognisable protein symbol will be treated as a +wild card character will match with all +characters in the searched sequence. +.para +.lit +Typical dialogue follows. + +Menus and their numbers are +m0 = This menu +m1 = General +m2 = Screen control +m3 = Statistical analysis of content +m4 = Structure +m5 = Search + ? = Help + ! = Quit +? Menu or option number=17 + Search for short sequences +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 =2 + 1 All acids +X 2 Named groups + 3 Personal file + 4 Keyboard +? 0,1,2,3,4 = + +ACID/D/E// +BASIC/R/K/H// +HYDRO/F/L/I/V/Y// +GLYCO/N-S/N-T// ++/R/K/H// +-/D/E// +DIBASIC/RR/KK/RK/KR// +TURN/N/D/G/P/S// +BLOCK/A/Q/E/I/L/M/F/W/V// +INDIF/R/C/H/K/T/Y// +End of file + + +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 = + + 1 All acids +X 2 Named groups + 3 Personal file + 4 Keyboard +? 0,1,2,3,4 = + +? (y/n) (y) All names n +? Name=acid +? Name=basic +? Name=glyco +? Name= + +? (y/n) (y) Show results name by name +? (y/n) (y) List matches + + searching + matches= 59 +NAME SEQUENCE POSITION FRAGMENT LENGTHS +ACID E 7 7 1 +ACID E 10 3 1 +ACID E 24 14 1 +ACID E 28 4 1 +ACID D 36 8 1 +ACID D 46 10 2 +ACID E 51 5 2 +ACID E 67 16 2 +ACID D 69 2 2 +ACID D 81 12 2 +ACID E 84 3 2 +ACID E 96 12 3 +ACID D 116 20 3 +... etc + matches= 61 +NAME SEQUENCE POSITION FRAGMENT LENGTHS +BASIC K 13 13 1 +BASIC R 15 2 1 +BASIC H 26 11 1 +BASIC R 40 14 1 +BASIC H 42 2 2 +BASIC R 59 17 2 + ...etc + matches= 2 +NAME SEQUENCE POSITION FRAGMENT LENGTHS +GLYCO NST 4 4 3 +GLYCO NQT 487 483 28 + 28 483 + + +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 = + + 1 All acids +X 2 Named groups + 3 Personal file + 4 Keyboard +? 0,1,2,3,4 = + +? (y/n) (y) Selected names + +? Name=basic +? Name=glyco +? Name= + +? (y/n) (y) Show results name by name n +? (y/n) (y) List matches + + searching +NAME SEQUENCE POSITION FRAGMENT LENGTHS +GLYCO NST 4 3 +BASIC K 13 9 +BASIC R 15 2 +BASIC H 26 11 +BASIC R 40 14 +BASIC H 42 2 +BASIC R 59 17 +BASIC R 68 9 +BASIC K 87 19 + ...etc +BASIC R 477 14 +BASIC H 479 2 +GLYCO NQT 487 8 +BASIC K 499 12 +BASIC K 501 2 +BASIC K 508 7 + 7 + +X 1 Search + 2 List enzyme file + 3 Clear text + 4 Clear graphics +? 0,1,2,3,4 = + 1 All acids +X 2 Named groups + 3 Personal file + 4 Keyboard +? 0,1,2,3,4 =4 +Define search strings by typing a string name +followed by the string(s) +? Name=MARY +? String(s)=AL/VI +? Name= +? (y/n) (y) All names +? (y/n) (y) Show results name by name +? (y/n) (y) List matches + + searching + matches= 12 +NAME SEQUENCE POSITION FRAGMENT LENGTHS +MARY VI 38 38 10 +MARY AL 63 25 13 +MARY VI 136 73 16 +MARY AL 177 41 19 +MARY AL 217 40 25 +MARY AL 233 16 37 +MARY AL 243 10 40 +MARY AL 256 13 41 +MARY AL 326 70 45 +MARY VI 345 19 51 +MARY AL 396 51 70 +MARY AL 470 74 73 + + +.END LIT + +.left margin1 +@18. TX 1 5 @Compare a sequence +.LEFT MARGIN2 +.para +This routine slides a short sequence along the current sequence and finds +all positions at which a given percentage of the amino acids match. +Output is in both graphical and listed forms. +.para +If users call for dialogue when the routine is selected they will be given +the choice of keyboard or file input. Define the string, and the percentage +match. Matches will be plotted out and then the user can select to have +them listed. Then the routine cycles around. +.para + The routine slides the search string +along the sequence and marks the positions at which a minimum +percentage score is reached. The graphical output draws a vertical line at +the match position; the height of the line represents the percentage +score, +so that if the line reaches the top of the box the score is 100%. +.para +Typical dialogue follows. +.lit + +? Menu or option number=18 + Find percentage matches +? (y/n) (y) Keep picture + +? String=aaa +? Percent match (1.00-100.00) (70.00) = + + missing graphics + +Total scoring positions above 70.000 percent = 19 +Scores 2 2 2 2 2 2 2 2 2 2 +Positions 61 131 177 217 226 231 232 267 300 301 + +? Number to list (0-19) (0) =3 + + 61 + AIA + * * + aaa + 1 + + 131 + AIA + * * + aaa + 1 + + 177 + ALA + * * + aaa + 1 +? (y/n) (y) Keep picture n + +Default String=aaa +? String=! + +.end lit + +.left margin1 +@19. TX 1 5 @Compare a sequence using a score matrix +.LEFT MARGIN2 +.para +This routine slides a short sequence along the current sequence and finds +all positions at which a given level of similarity (a cutoff score) is +reached. The score is defined by use of a score matrix (MDM78). Output is +in both graphical and listed forms. +.para +If users call for dialogue when the routine is selected they will be given +the choice of keyboard or file input. Define the string and the cutoff +score. Matches will be plotted out and then the user can select to have +them listed. Then the routine cycles around. +.para + The routine slides the search string +along the sequence and marks the positions at which a the cutoff score +is achieved. The graphical output draws a vertical line at +the match position; the height of the line represents the score, +so that if the line reaches the top of the box the score is the maximum +possible. +.para +Typical dialogue follows. +.lit + +Menus and their numbers are +m0 = This menu +m1 = General +m2 = Screen control +m3 = Statistical analysis of content +m4 = Structure +m5 = Search + ? = Help + ! = Quit +? Menu or option number=19 + Find matches using a score matrix +? (y/n) (y) Keep picture + +? String=aaa +Minimum score= 12 Maximum score= 36 +? Score (12-36) (36) = + + missing graphics + +For score 24 the number of matches= 507 +scores 35 35 35 34 34 34 34 34 34 34 +positions 226 231 379 112 133 202 227 267 378 +380 + +? Number to list (0-507) (0) =3 + + 226 + ATA + * * + aaa + 1 + + 231 + SAA + ** + aaa + 1 + + 379 + GAA + ** + aaa + 1 +? (y/n) (y) Keep picture n + +Default String=aaa +? String=! +.end lit +.left margin1 +@20. TX 5 @Search for a motif using a weight matrix +.LEFT MARGIN2 +.para +This function performs searches for short sequence +motifs using an appropriate weight matrix. In addition it can be used to +create or modify weight matrices. In order to perform a search the only +input +required is the name of the file containing the weight matrix. +The results can be presented graphically or listed. The graphical +presentation will draw line at the position of any matches found; the +height of the line is proportional to the score. +.para +For a search, select "use weight matrix", supply the name of the file +containing the weight matrix, and choose between having results plotted +or listed. If dialogue is requested when the function is selected users can +alter the cutoff score employed. +.para +To create a weight matrix several steps are involved. A file containing an +alignment of known motifs is required. (This file must be created before +the current option is selected. The format is a follows: each sequence is +written on a separate line with at least one space at the beginning; each +sequence is terminated by a space character, and can be followed by a +name. The sequences must be aligned.) Supply the name of the file of +aligned sequences. The program reads and displays the sequences. Choose +between "summing logs of weights" or summing weights (i.e. whether to +multiply or add weights). If logs are used all scores will be negative. +Choose if all positions in the set of aligned sequences should be used or +if a mask should be applied. If so selected, define a mask as a string of +symbols, in which symbol - means ignore and any other symbol means +use. E.g. xx-x--abc means use all positions except 3,5 and 6. +.para +The program will calculate weights as the frequencies of each amino +acid at each unmasked position in the set of aligned sequences. These +weights are then applied to the set of aligned sequences to give a range +of "observed" scores. The mean and standard deviation of these scores is +displayed. The user is asked to supply several values to be used when the +weight matrix is applied to other sequences: a cutoff score (by default, +the mean minus 3 standard deviations), a top score for scaling graphical +results (by default, the mean plus 3 standard deviations), and a position +to identify (this means that if a particular amino acid within the motif +is used as a "landmark", such as the G of the helix-turn-helix motif, then +its position will be marked in plots). All these values are stored along +with the weight matrix. Finally supply the name of a file to contain the +weight matrix. +.para +Weight matrices can be "rescaled" using a set of aligned sequences in +much the same ways as a matrix is created. The purpose is to redefine +the cutoff scores, and rescaling does not alter any other values in the +weight matrix file. +.para +The methods have changed considerably but were first outlined in +Staden, R. Nucl. Acid Res. 12 505-519 1984, and +Staden, R. Genetic +engineering: principles and methods vol 7, Edited by J.K. Setlow and A. +Hollaender, Plenum publishing corp., 1985. +.para + The methods have always had to deal with the problem of zeroes in the +matrices. The current versions +employ "Laplaces Law of Succession" in which 1 is +added to each term. +.para +It is now possible to apply a mask to a set of aligned sequences in +order to give weight to selected positions only. +Sequences have superimposed functions: some parts may be of general +structural +importance and give rise to an overall framework, and other parts give +specificity and hence are not common; we may want to use a set of +aligned +sequences to define a motif, but want to use only the framework +positions. + Alternatively we may want to pick out +only those parts of a set of aligned sequences that give a particular +property, and to ignore other similarities that are due to some other +property +and which could obscure the pattern +we are interested in. The ability to define a mask allows certain +positions +to be used in the motif and others to be ignored, and yet still permits the +use of a set of aligned sequences to calculate weights. +.para +Typical dialogue is shown below. +.lit +? Menu or option number=20 +X 1 Use weight matrix + 2 Make weight matrix + 3 Rescale weight matrix +? 0,1,2,3 =2 +? Name of aligned sequences file=[rs.motifs]hth.seq + 1 QESVADKMGMGQSGVGALFN LAMBDA.REP + 2 QTKTAKDLGVYQSAINKAIH LAMBDA.CRO + 3 QAALGKMVGVSNVAISQWQR P22.REP + 4 QRAVAKALGISDAAVSQWKE P22.CRO + 5 QAELAQKVGTTQQSIEQLEN 434.REP + 6 QTELATKAGVKQQSIQLIEA 434.CRO + 7 RQEIGQIVGCSRETVGRILK CAP + 8 RGDIGNYLGLTVETISRLLG Fnr + 9 LYDVAEYAGVSYQTVSRVVN LAC.R + 10 IKDVARLAGVSVATVSRVIN GAL.R + 11 TEKTAEAVGVDKSQISRWKR LAMBDA.CII + 12 QRKVADALGINESQISRWKG P22.CI + 13 KEEVAKKCGITPLQVRVWCN MAT.ALPHA + 14 TRKLAQKLGVEQPTLYWHVK TETR.TN10 + 15 TRRLAERLGVQQPALYWHFK TETR.pSC1 + 16 QRELKNELGAGIATITRGSN TRP.REP + 17 RQQLAIIFGIGVSTLYRYFP H-INVERSN + 18 ATEIAHQLSIARSTVYKILE TN3.RESOL + 19 ASHISKTMNIARSTVYKVIN GD.RESOLV + 20 IASVAQHVCLSPSRLSHLFR ARA.C + 21 RAEIAQRLGFRSPNAAEEHL LEX.R +Length of motif 20 +? (y/n) (y) Sum logs of weights +? (y/n) (y) Use all motif positions n +x means use, - means ignore +e.g. xx-x---x-x means use positions 1,2,4,8,10 +? Mask=--xxxxxxxxxxxx------ + Applying weights to input sequences + 1 -57.143 QESVADKMGMGQSGVGALFN + 2 -55.087 QTKTAKDLGVYQSAINKAIH + 3 -58.079 QAALGKMVGVSNVAISQWQR + 4 -54.986 QRAVAKALGISDAAVSQWKE + 5 -55.181 QAELAQKVGTTQQSIEQLEN + 6 -55.874 QTELATKAGVKQQSIQLIEA + 7 -56.692 RQEIGQIVGCSRETVGRILK + 8 -57.722 RGDIGNYLGLTVETISRLLG + 9 -55.363 LYDVAEYAGVSYQTVSRVVN + 10 -55.769 IKDVARLAGVSVATVSRVIN + 11 -56.786 TEKTAEAVGVDKSQISRWKR + 12 -55.833 QRKVADALGINESQISRWKG + 13 -56.279 KEEVAKKCGITPLQVRVWCN + 14 -53.125 TRKLAQKLGVEQPTLYWHVK + 15 -55.833 TRRLAERLGVQQPALYWHFK + 16 -58.651 QRELKNELGAGIATITRGSN + 17 -56.749 RQQLAIIFGIGVSTLYRYFP + 18 -56.986 ATEIAHQLSIARSTVYKILE + 19 -60.618 ASHISKTMNIARSTVYKVIN + 20 -58.988 IASVAQHVCLSPSRLSHLFR + 21 -58.002 RAEIAQRLGFRSPNAAEEHL +Top score -53.125 Bottom score -60.618 +Mean -56.655 Standard deviation 1.617 +Mean minus 3.sd -61.505 Mean plus 3.sd -51.804 +? Cutoff score (-999.00-9999.00) (-61.51) = +? Top score for scaling plots (-61.51-999.00) (-51.80) = +? Position to identify (0-20) (1) =9 +? Title=hth +? Name for new weight matrix file=1.wts + +Menus and their numbers are +m0 = This menu +m1 = General +m2 = Screen control +m3 = Statistical analysis of content +m4 = Structure +m5 = Search + ? = Help + ! = Quit +? Menu or option number=20 +X 1 Use weight matrix + 2 Make weight matrix + 3 Rescale weight matrix +? 0,1,2,3 = + +? Motif weight matrix file=1.wts + hth +? (y/n) (y) Use frequencies as weights +? (y/n) (y) Plot results n + 5 -61.46 STEISELIKQRIAQFNVVSE + 13 -58.93 KQRIAQFNVVSEAHNEGTIV + 21 -60.42 VVSEAHNEGTIVSVSDGVIR + 57 -59.39 GNRYAIALNLERDSVGAVVM + 59 -61.47 RYAIALNLERDSVGAVVMGP + 79 -59.90 YADLAEGMKVKCTGRILEVP + 88 -61.41 VKCTGRILEVPVGRGLLGRV + 104 -60.38 LGRVVNTLGAPIDGKGPLDH + 127 -60.13 SAVEAIAPGVIERQSVDQPV + 129 -59.91 VEAIAPGVIERQSVDQPVQT + 133 -60.79 APGVIERQSVDQPVQTGYKA + 139 -61.12 RQSVDQPVQTGYKAVDSMIP + 175 -58.90 KTALAIDAIINQRDSGIKCI + 191 -60.95 IKCIYVAIGQKASTISNVVR + 195 -60.94 YVAIGQKASTISNVVRKLEE + 215 -60.66 HGALANTIVVVATASESAAL + 254 -60.56 EDALIIYDDLSKQAVAYRQI + 260 -60.08 YDDLSKQAVAYRQISLLLRR + 297 -61.00 LLERAARVNAEYVEAFTKGE + 314 -61.29 KGEVKGKTGSLTALPIIETQ + 330 -60.49 IETQAGDVSAFVPTNVISIT + 363 -57.63 GIRPAVNPGISVSRVGGAAQ + 365 -61.48 RPAVNPGISVSRVGGAAQTK + 371 -61.02 GISVSRVGGAAQTKIMKKLS + 382 -57.90 QTKIMKKLSGGIRTALAQYR + 394 -60.07 RTALAQYRELAAFSQFASDL + 424 -59.95 GQKVTELLKQKQYAPMSVAQ + 430 -58.89 LLKQKQYAPMSVAQQSLVLF + 432 -61.14 KQKQYAPMSVAQQSLVLFAA + 438 -58.58 PMSVAQQSLVLFAAERGYLA + 458 -61.06 DVELSKIGSFEAALLAYVDR + 466 -61.00 SFEAALLAYVDRDHAPLMQE + 483 -60.48 MQEINQTGGYNDEIEGKLKG + 494 -60.61 DEIEGKLKGILDSFKATQSW + +Menus and their numbers are +m0 = This menu +m1 = General +m2 = Screen control +m3 = Statistical analysis of content +m4 = Structure +m5 = Search + ? = Help + ! = Quit +? Menu or option number=d20 +X 1 Use weight matrix + 2 Make weight matrix + 3 Rescale weight matrix +? 0,1,2,3 = + +? Motif weight matrix file=1.wts + hth +? (y/n) (y) Use frequencies as weights +? Cutoff score (-9999.00-9999.00) (-61.51) =-56. +? (y/n) (y) Plot results n + + +.end lit +.left margin1 +@21. TX 3 @Calculate amino acid composition +.LEFT MARGIN2 +.para +This function calculates the amino acid composition and molecular +weight +for the active region. +.lit +? Menu or option number=21 + Sequence composition + +A C S T P A G N D E Q B Z H +N 3. 32. 23. 18. 57. 47. 16. 28. 31. 28. 0. 0. 7. +% 0.6 6.2 4.5 3.5 11.1 9.1 3.1 5.4 6.0 5.4 0.0 0.0 1.4 +W 309. 2786. 2325. 1748. 4051. 2682. 1826. 3222. 4003. 3588. 0. 0. +960. + +A R K M I L V F Y W - X ? +N 30. 24. 11. 40. 47. 41. 14. 15. 1. 0. 0. 0. 1. +% 5.8 4.7 2.1 7.8 9.1 8.0 2.7 2.9 0.2 0.0 0.0 0.0 0.2 +W 4686. 3076. 1443. 4527. 5319. 4065. 2060. 2448. 186. 0. 0. 0. +0. +Total molecular weight= 55328. + +.end lit +.left margin1 +@22. TX 3 4 @Plot hydrophobicity +.LEFT MARGIN2 +.para +This routine plots the hydrophobicity of each section of the sequence +using +the hydrophobicity +values of Kyte and Doolittle (J. Mol. Biol. 157, 105-132 (1982)). +A window of size span is slid along the sequence and a sum calculated +for +each position. +.para +If dialogue is requested select a span length and a plot interval. +.para +The diagrams are on the same scale as Fig. 6 of the Kyte and Doolittle +paper and values of + and - 50 could be assigned to the top and bottom of +the diagram with corresponding values in between (-40,-20,0,20,40 are +shown +in the paper). +.lit +? Menu or option number=d22 + Plot hydrophobicity +? odd span length (1-101) (11) = +? plot interval (1-101) (3) = + + missing graphics +.end lit +.LEFT MARGIN1 +@23. TX 3 4 @Plot charge +.LEFT MARGIN2 +.para +This routine plots the charge of each section of the sequence. +A window of size span is slid along the sequence and a sum calculated +for +each position. Amino acids are assigned charges of 1, -1 or 0. +.para +If dialogue is requested select a span length and a plot interval. +.para +Typical dialogue follows. +.lit + +? Menu or option number=d23 + Plot charge +? odd span length (1-101) (11) = +? plot interval (1-101) (3) = + + missing graphics + +.end lit +.LEFT MARGIN1 +@24. TX 4 @Plot robson prediction +.LEFT MARGIN2 +.para +This routine uses the method of Garnier J, Osguthorpe D J, and Robson B. +(1978) J. Mol. Biol. 120, 97-120 to predict secondary structures. The +method divides protein secondary structures into 4 classes: helix, +extended +(usually referred to as sheet), turn and coil. The routine calculates the +likelihood that each segment of the sequence lies in each of these +classes. Results are presented graphically or listed. +.para +If dialogue is requested choose between plotted or listed output. +.para + Each residue +has a +certain probability of being found in each of the 4 classes. This +probability +depends both on its own amino acid type and also the 8 +amino acids found to either side along the protein chain. Four tables of +weights, each 20 by 17 elements are used to calculate the likelihood that +each residue along the chain falls into one of the four classes of +structure. The most likely structure at each point +is the one with the highest score. +The four values are plotted in strips labelled H, E, T and C. +Below, a strip labelled D for decision is divided into four levels, each +corresponding to one of the four structure types. Their top to bottom +order +is the same as that for the strips above, i.e C, T, E, and H. For each +residue the program measures which of the four likelhoods is highest. It +places a single dot at the + mid-point of the corresponding strip, and +also at the +appropriate level in the strip labelled D. +.PARA +It should be noted that the method, when tested by Kabsch W and Sander +C, +(1983) Febs. Lett. 155 (179-182), although one of the better ones, was +correct for only about 56% of residues. +.para +Typical dialogue follows. +.lit +? Menu or option number=d24 + Plot Robson secondary structure predictions +? (y/n) (y) Plot results n + + 9 S 217 -7 -39 15 + 10 E 226 5 -27 -39 + 11 L 233 -7 -26 -15 + 12 I 229 -23 9 4 + 13 K 214 -8 10 -8 + 14 Q 178 42 19 5 + 15 R 131 54 16 3 + 16 I 86 42 -31 -23 + 17 A 55 52 -30 -15 + 18 Q 15 67 4 25 + 19 F -34 86 47 74 + 20 N -41 74 17 106 + 21 V -16 118 -5 100 + 22 V 64 88 5 115 + 23 S 96 38 26 155 + 24 E 133 -25 13 96 + 25 A 118 -98 25 100 + 26 H 110 -150 37 86 + 27 N 57 -201 37 66 + 28 E 51 -140 11 -4 + 29 G 2 -77 37 9 + 30 T 2 28 28 7 + 31 I -11 117 -21 22 + 32 V -23 178 -55 5 + 33 S -54 193 -14 35 + 34 V -46 123 5 30 + 35 S -54 53 51 80 + 36 D -60 1 86 55 + 37 G -66 8 57 49 + 38 V -1 128 -30 -5 + 39 I 11 212 -56 -33 + 40 R 16 204 -44 -57 + ...etc + +.end lit +.LEFT MARGIN1 +@26. TX 4 @Draw a helix wheel +.LEFT MARGIN2 +.para +A helical representation of segments of the sequence is shown. The +display +includes a schematic of the helix showing the links between residues, +with +each vertex numbered according to position; the sequence element at +each +vertex; a symbol denoting a classification as hydrophobic(.), positively +charged(+), negatively charged(-), or otherwise( ). The +residue number of the first sequence element in +the current window is displayed at the top-left-hand +corner of the diagram. Also at the top-left corner the sequence in the +current window is listed. Below this is the total hydrophobicity and +hydrophobic moment for the window calculated according to Eisenberg et +al +J. Mol. Biol. 179, 125-142 (1984). +.para +If dialogue is requested the user is asked for the angle to define the turn +between residues as seen +looking along the helix, and a window length. The window length can be up +to 60, with default 18, and the angle has a default of 100 degrees. Note +that 18 x 100 is 5 turns. When the option is selected the first segment in +the current active region is displayed then the bell rings. If the user +types only return, the display will click on by one residue; if another +number is typed, say N, then the display will click forwards (or +backwards +if N is negative) by N residues. If the wheel runs off either end of the +sequence the option will be exited. +.para +Typical dialogue follows. +.lit +? Menu or option number=d26 +? Angle (1-130) (100) = +? Window (1-60) (18) = + + missing graphics + +.end lit +.left margin1 +@25. TX 3 4 @Plot hydrophobic moment +.LEFT MARGIN2 +.para +This routine plots hydrophobic moment and hydrophobicity according to +Eisenberg et al +J. Mol. Biol. 179, 125-142 (1984). The mean hydrophobicity per residue in +the window is plotted on a scale -1.0 to 1.5, and the mean hydrophobic +moment per residue on a scale 0.0 to 1.5. +The hydrophobicity is shown in the top frame with the +hydrophobic moment below. +The plot is arranged so that the +value shown at position x represents the mean value for residues x- +window+1 +to x, where window is the window length. +.para +If dialogue is requested the user can select a window +length, and the angle used for the hydrophobic moment +calculation. +.para +Note that according to Eisenberg et al, in transmembrane proteins an +"initiator" is required. This is either a very hydrophobic single helix +with >=0.68, or a moderately hydrophobic pair of helices whose +sum +to >= 1.1. Other helices are then accepted as transmembrane if their +>= +0.42 +.para +The following rules are claimed: if < 0.51 and points lie below the +line = -0.392 + 0.603x they are "globular", if they lie above this +line they are "surface". If > 0.51 and they lie above the line = +0.6 - 0.342x they are "monomeric", if above "multimeric". +.para +Typical dialogue follows. +.lit + +? Menu or option number=d25 +? Angle (1-130) (100) = +? Window (1-60) (18) = +? Plot interval (1-101) (3) = + + missing graphics + + +.end lit +.left margin1 +@27. TX 1 @Back translate to dna +.LEFT MARGIN2 +.para +This routine back translates protein sequences into DNA using the +standard +genetic code. The level of redundancy can be plotted and the +backtranslation saved to a file. +.para +The translation can use either the IUB symbols shown below, or a set of +codon +preferences. If a set of codon preferences are used they must conform to +the format of codon tables produced by the nucleotide analysis +program, and the back +translation +will contain the favoured codons. If there is no favoured codon +the IUB symbols will be employed. The window length for +plotting the redundancy is in codons. +.para +The program will plot the redundancy along the sequence and hence can +be +used to find the best sequences to use as primers. Note that the program +plots the inverse, and so the higher the +plot the LESS redundant the sequence. For primers look for peaks rather +than +troughs. +.para +The DNA sequence can be saved to a file and analysed using the nucleotide +analysis program. +Depending on the application it is often useful to produce a back +translation using both a table of codon preferences and one using the IUB +symbols. This is because the restriction enzyme search program can +distinguish between definite and possible cuts in the sequence. +These matches are what the program terms "definite matches" and are +ones in +which the specification of the recognition sequence corresponds +exactly to that of the back translation. The program will also find what +it +terms "possible matches" which are ones that depend on the particular +codons +chosen for each amino acid. +These are sites at which recognition +sequences could be engineered to produce a cut in the DNA +without changing the amino +acid, but which are not +necessarily found in the original sequence. +.LIT + + + NC-IUB SYMBOLS + + A,C,G,T + R (A,R) 'puRine' + Y (T,C) 'pYrimidine' + W (A,T) 'Weak' + S (C,G) 'Strong' + M (A,C) 'aMino' + K (G,T) 'Keto' + H (A,T,C) 'not G' + B (G,C,T) 'not A' + V (G,A,C) 'not T' + D (G,A,T) 'not C' + N (G,A,C,T) 'aNy' + + Typical dialogue follows. + +? Menu or option number=d27 + Back translate +? (y/n) (y) No codon preference +? (y/n) (y) Plot redundancy n +? (y/n) (y) Save DNA to disk +? File name for DNA sequence=tt: +ATGCARYTNAAYWSNACNGARATHWSNGARYTNATHAARCARMGNATHGCNCARTTYAAY +GTNGTNWSNGARGCNCAYAAYGARGGNACNATHGTNWSNGTNWSNGAYGGNGTNATHMGN +ATHCAYGGNYTNGCNGAYTGYATGCARGGNGARATGATHWSNYTNCCNGGNAAYMGNTAY +GCNATHGCNYTNAAYYTNGARMGNGAYWSNGTNGGNGCNGTNGTNATGGGNCCNTAYGCN +GAYYTNGCNGARGGNATGAARGTNAARTGYACNGGNMGNATHYTNGARGTNCCNGTNGGN +MGNGGNYTNYTNGGNMGNGTNGTNAAYACNYTNGGNGCNCCNATHGAYGGNAARGGNCCN +YTNGAYCAYGAYGGNTTYWSNGCNGTNGARGCNATHGCNCCNGGNGTNATHGARMGNCAR +WSNGTNGAYCARCCNGTNCARACNGGNTAYAARGCNGTNGAYWSNATGATHCCNATHGGN +MGNGGNCARMGNGARYTNATHATHGGNGAYMGNCARACNGGNAARACNGCNYTNGCNATH +GAYGCNATHATHAAYCARMGNGAYWSNGGNATHAARTGYATHTAYGTNGCNATHGGNCAR +AARGCNWSNACNATHWSNAAYGTNGTNMGNAARYTNGARGARCAYGGNGCNYTNGCNAAY +ACNATHGTNGTNGTNGCNACNGCNWSNGARWSNGCNGCNYTNCARTAYYTNGCNMGNATG +CCNGTNGCNYTNATGGGNGARTAYTTYMGNGAYMGNGGNGARGAYGCNYTNATHATHTAY +GAYGAYYTNWSNAARCARGCNGTNGCNTAYMGNCARATHWSNYTNYTNYTNMGNMGNCCN +CCNGGNMGNGARGCNTTYCCNGGNGAYGTNTTYTAYYTNCAYWSNMGNYTNYTNGARMGN +GCNGCNMGNGTNAAYGCNGARTAYGTNGARGCNTTYACNAARGGNGARGTNAARGGNAAR +ACNGGNWSNYTNACNGCNYTNCCNATHATHGARACNCARGCNGGNGAYGTNWSNGCNTTY +GTNCCNACNAAYGTNATHWSNATHACNGAYGGNCARATHTTYYTNGARACNAAYYTNTTY +AAYGCNGGNATHMGNCCNGCNGTNAAYCCNGGNATHWSNGTNWSNMGNGTNGGNGGNGCN +GCNCARACNAARATHATGAARAARYTNWSNGGNGGNATHMGNACNGCNYTNGCNCARTAY +MGNGARYTNGCNGCNTTYWSNCARTTYGCNWSNGAYYTNGAYGAYGCNACNMGNAARCAR +YTNGAYCAYGGNCARAARGTNACNGARYTNYTNAARCARAARCARTAYGCNCCNATGWSN +GTNGCNCARCARWSNYTNGTNYTNTTYGCNGCNGARMGNGGNTAYYTNGCNGAYGTNGAR +YTNWSNAARATHGGNWSNTTYGARGCNGCNYTNYTNGCNTAYGTNGAYMGNGAYCAYGCN +CCNYTNATGCARGARATHAAYCARACNGGNGGNTAYAAYGAYGARATHGARGGNAARYTN +AARGGNATHYTNGAYWSNTTYAARGCNACNCARWSNTGG--- + + +.end lit + +.LEFT MARGIN1 +@28. TX 5 @Search for patterns of motifs +.left margin2 +.para +This option searches for patterns of motifs. Patterns can be defined +interactively or read from files. Results can be displayed in several ways +in both graphical and textual form. Used to create pattern files for +searching libraries. The option is extremely flexible and consequently the +following documentation is quite lengthy. However the routine is capable +of searching for almost any known pattern. In addition the flexibility +does not necessitate difficulty of use, and the userinterface has been +simplified considerably since the methods were first published. +.para +Users should refer to the "typical dialogue" shown below for the most +helpful information on using the program. +.para +There are currently +four ways to display the matching patterns: 1=each individual +motif and its position is listed; 2=all the sequence between, and +including the two +outermost motifs is listed; 3=graphical, with a vertical line marking the +position +of the leftmost motif; 4 = EMBL feature table format, where the KEYNAM +field is the motif name, the FROM and TO fields denote the ends of the +match, and the DESCRIPTION field is "Program". +.para +When it is defined for the first time a pattern must be entered +interactively at the keyboard, but the pattern description +can be saved to a file. +This file can be used for all subsequent searches. +.para +When defining a pattern interactively +select a motif class and the program will request the required inputs. +.para +The program gives each motif an identifying name and number. +For motifs other than the first, a range of allowed positions must be +defined (Note that sets of motifs included using the OR operator will all +be given the same range, and so the program will only request range +values +for the first motif in any such set). +To specify the allowed range for a motif the user must supply the +following: the +identifying number of the motif, relative to which the current motifs +positions are to be defined (termed the "reference motif"); a "relative start +position" and a range. The relative start position can be negative or positive. +A negative start position means that although the reference motif +is searched for first, the current motif can be found to its left. +A zero relative start position means their left ends are superimposed. The +default start position is to butt-joint the motif to righthand end of the +"reference motif". The range is "the number of extra positions" that the +motif can take. +.para +The program will display the probability of finding each motif. These +values are presented in the following form: .1234E-5 means 0.1234 times +10 +to the power -5. +.para +After the pattern has been defined, the program will type a description +of +it on the screen. It will then allow the user to give an overall cutoff +score and overall probability cutoff. +.para +Typical dialogue for all the different motif classes is displayed below. +.lit + +? Menu or option number=28 + Pattern searcher +? (y/n) (y) Read pattern from keyboard +X 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Direct repeat + 6 Membership of set + 7 Pattern complete +? 0,1,2,3,4,5,6,7 = +? Motif name=aa +? String=aa +Probability of score 2.0000 = 0.123E-01 +X 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Direct repeat + 6 Membership of set + 7 Pattern complete +? 0,1,2,3,4,5,6,7 =2 +? Motif name=pmatch +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-1) (1) = +? Relative start position (-1000-1000) (3) = +? Number of extra positions (0-1000) (0) = +? String=qqq +? Minimum matches (1.00-3.00) (3.00) =2 +Probability of score 2.0000 = 0.858E-02 + 1 Exact match +X 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Direct repeat + 6 Membership of set + 7 Pattern complete +? 0,1,2,3,4,5,6,7 =3 +? Motif name=sm +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-2) (2) = +? Relative start position (-1000-1000) (4) = +? Number of extra positions (0-1000) (0) = +? String=wqa +? Minimum score (11.00-53.00) (53.00) =36 +Probability of score 36.0000 = 0.531E-02 + 1 Exact match + 2 Percentage match +X 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Direct repeat + 6 Membership of set + 7 Pattern complete +? 0,1,2,3,4,5,6,7 =4 +? Motif name=hth +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-3) (3) = +? Relative start position (-1000-1000) (4) = +? Number of extra positions (0-1000) (0) = +? Weight matrix file name=hth + HELIX TURN HELIX PABO SAUER WEIGHTS 17-11-87 +Probability of score -51.5860 = 0.230E-04 + 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix +X 4 Cut-off score and weight matrix + 5 Direct repeat + 6 Membership of set + 7 Pattern complete +? 0,1,2,3,4,5,6,7 =5 +? Motif name=repeat +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-4) (4) = +? Relative start position (-1000-1000) (21) = +? Number of extra positions (0-1000) (0) =3 +? Repeat length (1-60) (6) =3 +? Minimum gap (0-60) (0) = +? Maximum gap (0-60) (0) =2 +? Minimum score (11.00-60.00) (36.00) = +Probability of score 36.0000 = 0.445E-01 + 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix +X 5 Direct repeat + 6 Membership of set + 7 Pattern complete +? 0,1,2,3,4,5,6,7 =6 +? Motif name=mset +X 1 And + 2 Or + 3 Not +? 0,1,2,3 = +? Number of reference motif (1-5) (5) = +? Relative start position (-1000-1000) (1) = +? Number of extra positions (0-1000) (0) = +X 1 Keyboard input + 2 File input +? 0,1,2 = +Separate sets with commas +? String=AVL,AST,,WYRF +? Minimum matches (1.00-4.00) (4.00) =3 +Probability of score 3.0000 = 0.718E-02 + 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Direct repeat +X 6 Membership of set + 7 Pattern complete +? 0,1,2,3,4,5,6,7 =7 +? (y/n) (y) Save pattern in a file +? Pattern definition file=EXAM.PAT +Motif 6 needs a file name to store set as a weight matrix +? Weight matrix file name=DEMO.WTS +Weight matrix needs a title +? Title=Demonstration class 6 weight matrix + +Pattern description + +Motif 1 named aa is of class 1 +Which is an exact match to the string +aa +Motif 2 named pmatch is of class 2 +which is a match of score 2. to the string +qqq +and the N-terminal residue can take positions 3 to 3 +relative to the N-terminal end of motif 1 +It is anded with the previous motif. +Motif 3 named sm is of class 3 +which is a match of score 36. to the string +wqa +and the N-terminal residue can take positions 4 to 4 +relative to the N-terminal end of motif 2 +It is anded with the previous motif. +Motif 4 named hth is of class 4 +Which is a match to a weight matrix with score -51.586 +and the N-terminal residue can take positions 4 to 4 +relative to the N-terminal end of motif 3 +It is anded with the previous motif. +Motif 5 named repeat is of class 5 +Which is a repeat with repeat length 3 and score 36. +The loop-out can have sizes 0 to 2 +and the N-terminal residue can take positions 21 to 24 +relative to the N-terminal end of motif 4 +It is anded with the previous motif. +Motif 6 named mset is of class 6 +Which is membership of a set with score 3.000 +It is anded with the previous motif. +Probability of finding pattern = 0.4109E-14 +Expected number of matches = 0.2539E-10 +? Maximum pattern probability (0.00-1.00) (1.00) = +? Minimum pattern score (-9999.00-9999.00) (-9999.00) = + Select display mode +X 1 Motif by motif + 2 Inclusive + 3 Graphical + 4 EMBL feature table +? 0,1,2,3,4 = + Searching + +Total matches found 0 +Menus and their numbers are +m0 = This menu +m1 = General +m2 = Screen control +m3 = Statistical analysis of content +m4 = Structure +m5 = Search + ? = Help + ! = Quit +? Menu or option number=6 +Page through text files +? Name of file to read=exam.pat + A1 aa Class + aa + @ End of string + A2 pmatch Class + 1 Relative motif + 3 Relative start position + 0 Number of extra positions + qqq + @ End of string + 2.00000 Cutoff + A3 sm Class + 2 Relative motif + 4 Relative start position + 0 Number of extra positions + wqa + @ End of string + 36.00000 Cutoff + A4 hth Class + 3 Relative motif + 4 Relative start position + 0 Number of extra positions +hth File name + A5 repeat Class + 4 Relative motif + 21 Relative start position + 3 Number of extra positions + 3 Length + 0 Minimum loop + 2 Maximum loop + 36.00000 Cutoff + A6 mset Class + 5 Relative motif + 1 Relative start position + 0 Number of extra positions +DEMO.WTS File name +End of file +Menus and their numbers are +m0 = This menu +m1 = General +m2 = Screen control +m3 = Statistical analysis of content +m4 = Structure +m5 = Search + ? = Help + ! = Quit +? Menu or option number=6 +Page through text files +? Name of file to read=demo.wts + Demonstration class 6 weight matrix + 4 0 3.000 4.000 + P 1 2 3 4 + N 0 0 0 0 + C 0 0 0 0 + S 0 1 0 0 + T 0 1 0 0 + P 0 0 0 0 + A 1 1 0 0 + G 0 0 0 0 + N 0 0 0 0 + D 0 0 0 0 + E 0 0 0 0 + Q 0 0 0 0 + B 0 0 0 0 + Z 0 0 0 0 + H 0 0 0 0 + R 0 0 0 1 + K 0 0 0 0 + M 0 0 0 0 + I 0 0 0 0 + L 1 0 0 0 + V 1 0 0 0 + F 0 0 0 1 + Y 0 0 0 1 + W 0 0 0 1 +End of file +Menus and their numbers are +m0 = This menu +m1 = General +m2 = Screen control +m3 = Statistical analysis of content +m4 = Structure +m5 = Search + ? = Help + ! = Quit +? Menu or option number=28 + Pattern searcher +? (y/n) (y) Read pattern from keyboard +X 1 Exact match + 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Direct repeat + 6 Membership of set + 7 Pattern complete +? 0,1,2,3,4,5,6,7 =2 +? Motif name=avlst +? String=avlst +? Minimum matches (1.00-5.00) (5.00) =3 +Probability of score 3.0000 = 0.394E-02 + 1 Exact match +X 2 Percentage match + 3 Cut-off score and score matrix + 4 Cut-off score and weight matrix + 5 Direct repeat + 6 Membership of set + 7 Pattern complete +? 0,1,2,3,4,5,6,7 =7 +? (y/n) (y) Save pattern in a file n + +Pattern description + +Motif 1 named avlst is of class 2 +which is a match of score 3. to the string +avlst +Probability of finding pattern = 0.3941E-02 +Expected number of matches = 0.2030E+01 +? Maximum pattern probability (0.00-1.00) (1.00) = +? Minimum pattern score (-9999.00-9999.00) (-9999.00) = + Select display mode +X 1 Motif by motif + 2 Inclusive + 3 Graphical + 4 EMBL feature table +? 0,1,2,3,4 =4 + Searching + +FT avlst 152 156 Program +Total matches found 1 +Minimum and maximum observed scores 3.00 3.00 + +.end lit +.para +General notes +.para +These methods allow users to define and search for +complex patterns of motifs defined as single objects. +The programs allow individual DNA motifs to be defined in eight +different +ways, and protein motifs in six. Motifs are combined, using the logical +operators AND, OR and NOT, to describe a pattern. The pattern also +specifies the ranges of allowed relative separations of the individual +motifs. +.para +First some definitions. +.para +A MOTIF is a contiguous subsequence of fixed length. +At its simplest +it could be a single definite base or amino acid; a more complex motif +might be better represented as a consensus or a weight matrix; +two more-abstract types of +motif are direct and inverted repeats. +.para +A PATTERN is a higher order of structure defined by a list of motifs. The +motifs in a pattern are combined using the logical operators AND, OR and +NOT. The list also defines the allowed relative separations of the +motifs. In the current versions of the programs up + to 50 motifs can be combined into a single pattern. So using these +definitions there are two +differences between motifs and patterns: 1) the distances between all +elements of a motif are fixed, but +the separations of parts of patterns can vary; + 2) all characters in a motif are defined +using the same method (class), but different parts of a pattern can be +defined in completely different ways. +.para +Each motif +can be represented in 9 ways (known as the motif class): +.sk1 +.lit + MOTIF CLASSES +CLASS DESCRIPTION + 1 Exact match to a short defined sequence. The IUB symbols + can be used for DNA sequences. + 2 Percentage match to a defined short sequence. In nucleic acids, + the IUB symbols can be used. + 3 Match to a defined sequence, using a score matrix and cutoff + score. The DNA matrix (see option 18) gives scores to IUB symbols + depending on their level of redundancy. MDM78 is used for proteins. + 4 Match to a weight matrix with cutoff score. + 5 As class 4 but on the complementary strand. + 6 Inverted repeat or stem-loop. Fixed stem length, range of + loop sizes, and cutoff score using A-T, G-C=2; G-T=1. + 7 Exact match to short sequence but with a defined step size. + 8 Direct repeat. Fixed repeat length, range of loop-out sizes, + cutoff score, and score matrix (for protein sequences MDM78 and + for nucleic acids an identity matrix). + 9 Membership of a set. A list of sets of allowed amino acids for + each position in the motif. The sets are separated by commas(,). + For example IVL,,,DEKR,FYWILVM defines a motif of length 5 amino + acids in which one of I,V or L must be found in the first position, + then anything in the next two positions, D,E,K or R in the fourth + position and F,Y,W,I,L,V or M in the fifth. This class only applies + to protein sequences because for nucleic acids "membership of a +set" + can be achieved using IUB symbols. + + Classes 1 - 4, 8 and 9 apply to protein sequences, and classes 1-8 to + nucleic acids. + +.end lit +.para +Class 1: exact match. +.para +The motif is defined by a short sequence, which for nucleic acids, + may include IUB symbols. All symbols must match. +.para +Class 2: percentage match +.para +The motif is defined by a short sequence, which for nucleic acids, +may include IUB symbols. The minimum number of matching characters +must +also be specified. +.para +Class 3: match using a score matrix +.para +The motif is defined by a short sequence, which for nucleic acids, +may include IUB symbols. The motif is not compared directly with the +sequence to count the number of matching characters. Instead a matrix is +used to provide a score for all possible pairs of characters. The motif +score for +any position along the sequence is the sum of the scores found by +looking-up the scores for each pair of aligned characters. A match is +declared if some minimum score is achieved. +.para +Class 4: weight matrix +.para +The motif is defined by a table of values (called weights or scores). The +table gives a score for finding each possible character at each position +along the length of the motif. It therefore +has dimension motif-length x character-set-size, and allows us to give +different scores for each character at each position. It is equivalent to +having a different score matrix for each position along the motif, and +provides the most flexible and specific method of defining motifs. The +weight matrices are created by program PIP option 20 and +stored as files. The file contains the values +for each position, as well as an overall minimum score. +There are two ways in which these values can be used to calculate an +overall +score for any section of the sequence. The simplest way is to add the +values in the file. (This means that the highest possible score +can be calculated by adding the top value at each column +position, and the lowest +by adding the bottom value.) + The normal way of using the values in the file is as +follows. +First the programs divide the values in each column by the column total +so +that they sum to 1.0 +Then the natural +logs of these values are used as scores. When the matrix is applied to a +sequence these logarithmic values are summed (which is of course +equivalent +to multiplying the frequencies). +Note that using the natural logs of the frequencies as +weights and +adding them means that the overall cutoff score must be less than zero, +whereas if the original +values in the weight matrix file are added, the cutoff score will be +greater than zero. The search routines therefore decide whether the user +wants to add values or multiply frequencies +by examining the value of the cutoff score: it will add if the cutoff +is +greater than zero and add logs of frequencies if it is less than zero. + Hence we effectively get two +motif classes in one. The program PIP, when creating weight matrix +files, will ask the user whether the scores should be added or multiplied. + If the values in the table have been defined +without using a set of aligned sequences +it is easier for the user to +choose a cutoff score if the values are added. +.para +Class 5: complement of weight matrix +.para +The motif is defined by a weight matrix, but the program searches for its +complement. +.para +Class 6: inverted repeat, or stem-loop +.para +The motif is defined by a repeat length, a minimum score + and a range of loop sizes. The scores are A-T=2, G-C=2, G-T=1, else=0. +The loop sizes are defined by a minimum +and maximum distance from the 3' end of the stem. +For a stem-loop these will be positive numbers. For example to +define a stem of length 8 and loop sizes varying from 3 to 5, the stem +would be set to 8, the minimum start distance to 3 and the maximum +to 5. To define an +inverted repeat the minimum distance will be negative. For example stem +length=9, +minimum distance=-9, and maximum distance=-8 will find +inverted repeats of lengths 9 and 10. +E.g. AAAAATTTT and AAAAATTTTT would be found, the first having a base +at +its centre, the second having none. +.para +Class 7: exact match, defined step size. +.para +The motif is defined by a short sequence, which for nucleic acids, + may include IUB symbols. All symbols must match. The class differs +from +class 1 in that searches will move in steps of some given size. For +example +we could search for a certain codon and use a step size of 3 and hence + keep in a +single reading frame. +.para +Class 8: direct repeat +.para +The motif is defined by a repeat length, a minimum score + and a range of loop sizes. The scores are defined using MDM78 for protein +sequences and an identity matrix for nucleic acids. +The loop sizes are defined by a minimum +and maximum distance from the 3' end of the stem. +.para +Class 9: membership of a set +.para +This motif class is for protein sequences. It is defined by lists of +allowed amino acids for each position in the motif, and a cut-off score. +Positions at which any amino acid can occur are left blank. +All allowed amino acids for each position give a score of 1. +The motifs can be defined in two ways: either typed at the keyboard or +read +in as a weight-matrix-like file. +When the motif is defined at the keyboard the sets of allowed amino +acids +are separated by commas(,). + For example IVL,,,DEKR,FYWILVM defines a motif of length 5 amino + acids in which one of I,V or L must be found in the first position, + then anything in the next two positions, D,E,K or R in the fourth + position and F,Y,W,I,L,V or M in the fifth. To specify that the +whole motif must match a score of 3 would be required (i.e. one of the +allowed amino acids must be found for each of the three defined +positions). +If the motif is read from a file the file must have been written by +program +PIP, or have been saved by the pattern searching routines. If the +user +elects to save a pattern, and it includes class 9 motifs typed at the +keyboard, then the program will save the class 9 motifs as weight matrix +files. Therefore it will request file names for each motif of this class. +If the motif given above as an example were saved the weight matrix file +would have 5 columns. +The first column +would contain zeroes except for the I, V and L rows +which would be set to 1; the next two columns would all be zero; the next +would be zero except for the D,E,K and R rows which would be 1; the final +column would contain 1's in rows F,Y,W,I,L,V and M, with +the rest zero. +.para + +The logical operator (AND, OR or NOT) used to add each motif to the +pattern +is specified by preceding +the class number by the letters A, O or N. A = AND, O = OR, N = NOT. +The default is A, so N2 means include, using the NOT operator, a class 2 +motif; O2 means include, using the OR operator, a class 2 motif; both A2 +and +2 mean include, using the AND operator, a class 2 motif. + +.para +Range setting. +.para +The motifs in a pattern are numbered according to their order in the list. +Apart from the first motif in a pattern all motifs are given a range +of allowed positions relative to a motif further up the list. +For example +suppose we have a pattern defined by A AND B AND C AND D. +Motif A can occur anywhere, but B must have its range of allowed +positions defined relative to the position of motif A, and C's positions +can be defined relative to either A or B, depending on which is most +convenient, and likewise D's positions can be relative to A or B or C. +.para +Notice that the positions of motifs can be defined relative to more than +one motif. Suppose we have a pattern consisting of +motifs A, B and C, and that B occurs 5-10 residues right of A, C occurs 5- +10 +residues right of B, and also C is never more than 15 residues from A. +Then +it is quite consistent with the methods to include motif C into the +pattern +twice using the AND operator: once relative to A and once relative to B. +This will define the relative spacing and the ORDER of the motifs in the +pattern. (If we simply defined the position of C relative to A it could be +found to the left of B). +.para +Motifs combined together using the OR operator are all given the same +range. For example suppose we had a pattern A AND (B OR C) AND (D OR E), + then B and C each have the same range, and D and E also have +the same range as one another. The range for D and E can be relative to +A or to B. +.para +Motifs cannot have their ranges defined relative to motifs that are +included using the NOT operator. For example if we had the pattern A NOT +B +AND C, then the range for C can only be defined relative to motif A. +.para +Speed can be gained by arranging the order +of the motifs so that those higher up the list are of types that can be +searched for rapidly and that are also unlikely to be found. +.para +Motifs combined by the OR operator are alternatives: if any one of a set +of motifs +combined by the OR operator is found, then a match is declared. All +alternatives will be reported. For example if we had a pattern defined by +A +AND (B OR C), then all places where A occurs and B is found within range, +and all places where A is found and C is found within range will be +reported. A typical use would be where we might allow a motif to appear +on +either strand of the DNA sequence. For example a weight matrix +representing +the heatshock element could be used in a pattern which included +heatshock +as a motif class 4 combined using the OR operator +with heatshock as a motif class 5. +.para +The probability calculations are performed for each motif as it is +defined. +If an overall probability cut-off is given the calculation is repeated for +each match found. To achieve maximum searching speed do not give an +overall +probability cut-off. Overall cut-off scores should only be used if the +motif +classes used are compatible. +.para +There are currently +several ways to display the matches: 1 = each +motif and its position is listed; 2 = all the sequence between the two +outermost motifs is listed; 3 = graphical, with a spike marking the +position +of the leftmost motif. The library versions also give entry names, and a +one +line title; in addition they can be used to produce aligned families of +sequences. When this mode of output is selected the program will write a +separate file for each match. The files will be called ENTRYNAME.DAT +where +ENTRYNAME is the name of the entry in the library. The matching +sequence +will be written out so that the spacing between motifs is constant, and +set to the maximum allowed by the pattern definition. Any gaps will be +filled with dashes (-). If the individual sequences were subsequently +written one above the other +they should line up so that all motifs are in register. There two types of +output of this sort: one, option 4, writes out whole sequences, the other, +option 5, writes out only the sequences between the two outermost +motifs. +If the individual sequences were subsequently +written one above the other +they should line up so that all motifs are in register. There two types of +output of this sort: one, option 4, writes out whole sequences, the other, +option 5, writes out only the sequences between the two outermost +motifs. +Note that for option 4 users are asked to type the position of the +first motif, and the reason for +this is explained below. +Consider a pattern found in several sequences. Consider only +the first motif in +the pattern and suppose that it was found in different positions in these +sequences. +Say that of these positions the one furthest from the left end was +position 100. Then, in order to ensure that all the sequences would align, +we must specify that motif 1 must start at position 100. +Any sequences in which motif 1 started +nearer to the left end than position 100 would be padded accordingly. +These modes of output +should only be used when the position of each motif is defined relative to +its +immediate neighbour. +.para +The pattern descriptions can be saved to files. These files +can be used instead of typing definitions again at the keyboard. As the +files are annotated, +they can easily +be changed using system editors, and the modified versions used to +define the variant patterns for the programs. +.para +.para +Use of lists of entry names +.para +The two programs that operate on libraries have the ability to +restrict their searches to subsets of the libraries. This does not require +sublibraries to be created but instead is achieved by using files +containing a list of the entry names of sequences. The user may choose to +search only those entries on the list or, alternatively to search all but +those on the list (i.e. in the latter case +the list contains the names of those to be excluded). + The programs can search libraries that have indexes and those that +do not. + If a list of names for inclusion is used, +then the search will be faster if the index is present. In all other +circumstances the whole library will be read. +The list must be in library order except when it is used +to include entries, and an index is available. +The list must contain each entry name on a separate line, with the name +starting in column 1 of the line. ie there must be no spaces at the start +of the line. +The list of entry names +can be produced by the keyword searches of nip, pip, etc as long +as the listings produced have a space character separating the entry name +from the entry description. This will depend on how well the library +reformatting programs work. For example swissprot entry names tend to run +into the beginning of the descriptions, but other libraries are generally +OK. + +.para +One use of the programs is to look for patterns that we already know +about, but in new sequences. However it is hoped that they will also be +useful for finding new motifs. For example +several known control regions in +nucleic acid +sequences consist of particular direct or inverted repeats; +the inclusion of +direct and inverted repeats as motif classes +makes it possible to +find previously unknown +motifs of these types. +Using these new programs we can +ask questions like: "are there any inverted or direct repeats near to +sections of sequence that contain both a +CCAAT box and a TATA box?"; and to search for such things throughout +the +libraries. In addition, the mode of output in which all the sequence +between +the two outermost motifs found is printed out, allows us to extract +sequences and examine them in more detail for further common +subsequences. +For example we might want to collect together all the sequences +between +putative CCAAT and TATA boxes. +.para +A further use of the inverted repeat motif class is the following. If a +regulatory sequence in DNA is poorly defined but also an inverted repeat, +then it might be an advantage to specify it both as a consensus sequence +and +a superimposed inverted repeat. In this way two weak definitions can be +combined to produce a stronger pattern. +.para +Given only a few examples of a motif it +should be possible to perform initial searches using a +class 3 motif, and then, using plausible matching sequences, create a +more +specific weight matrix for the same motif. +.para +If motifs are combined with the first motif using the OR operator +they will be ignored until all +permutations that include the first motif have been looked for. +The whole search will then be repeated, in +turn, for each of +those motifs that are combined with the first motif using the OR +operator. +An interesting consequence of this is that the program can be used, +without +change, to compare any newly determined sequence with all known +individual +motifs. We achieve this by having a pattern in which all known relevant +motifs are combined using the OR operator. +If we ask to use this pattern with +a sequence, the program will automatically compare each individual +motif in +the pattern with the whole length of the +sequence. As the number of known +motifs grows this should become an increasingly useful standard +procedure. +.para +The NOT operator is obviously +useful for making sure particular motifs are not present, but it can also +be used to bracket the levels of matches found. We may want a degree of +match that lies between two limits - binding should occur, but not too +strongly; or base-pairs should form, but not too many. We can specify +this +by asking for a match with a low score, in combination with a match and +a +high score, both for the same motif, but with the high score included +using +the NOT operator. +.para +The algorithm is designed to find all sections of a sequence that satisfy +the pattern rather than only the best match. +Particularly if some of the motifs in a pattern are less well defined than +others, this can often result in the same region of a sequence being +reported as having several matches, but which only vary in the +positions of the weakest motifs. +.para +General remarks on motif searching +.para +Generally motifs are short subsequences that are thought to be +associated with +particular functions in some known sequences. Often +we search for them to try to +understand or interpret other sequences. Sometimes we search for +motifs and +patterns to +test a hypothesis about their role: are they found in the expected +positions in the expected sequences. In doing so we should remember +that, in both proteins and nucleic acids, + what we are really looking for is a particular +three dimensional structure with certain affinities for other structures, +and that we are assuming that the sequence of the motif alone +defines the 3D structure we searching for. + The overall structure +may be completely different to those in which the motif is functional, +and +hence the motif may have a different shape or be inaccessible. +We should be aware of the +importance of the context in which a motif is found. Where does it lie +relative to the overall structure, is it accessible, is the three +dimensional spacing between +it and other motifs correct? For example, is it on the same side of the +double helix, and the correct distance from some other motif? How does +context affect our assessment of the significance of finding a motif? +Finding false mammalian mRNA splice junctions in non-coding sequences +is +far less important than finding false sites in pre-mRNA sequences, but +finding them in the correct places is most important! In other words, it +is +often the case that when we are searching for a motif that is known to +be +necessary for some function, then a positive result in the form of a +match +in the required position, is more important than a high background of +matches in the wrong positions. Being + able to write +down the probability of finding a motif in a random sequence tells us how +well it is defined. +In nucleic +acids the DNA may contain many superimposed types of information such +as +those concerned with histone phasing, protein coding or mRNA secondary +structure. These overlapping "codes" may interfere with one another +causing +matches to motifs to be poorer than expected. +In general we will only have a limited number of examples of the +motif and we do not know how representative they are. +.para +Sequences have superimposed functions: some parts may be of general +structural +importance and give rise to an overall framework, and other parts give +specificity and hence are not common; we may want to use a set of +aligned +sequences to define a motif, but want to use only the framework +positions. + Alternatively we may want to pick out +only those parts of a set of aligned sequences that give a particular +property, and to ignore other similarities that are due to some other +property +and which could obscure the pattern +we are interested in. +It is possible to apply a mask to a set of aligned sequences in +order to give weight to selected positions only. + The ability to define a mask allows certain positions +to be used in the motif and others to be ignored, and yet still permits the +use of a set of aligned sequences to calculate weights. The mask is +requested and applied +by the program and results in the masked positions being zero +in +the weight matrix. The mask is defined in the following way. +Suppose we had a motif of length 15, then the mask +x--x--xx-x will give zero weights to positions 2,3,5,6 and 9 (note it is +the dashes (-) that are significant and that positions +1,4,7,8,10,11,12,13,14 and 15 +will be non-zero). Of course +the same set of sequences could be used with several alternative masks +in +order to extract different features and create corresponding weight +matrices. +.para +The programs are described in Staden,R. +CABIOS 4, 53-60, 1988; Staden,R. + CABIOS 5, 89-96, 1989, anf a forthcoming Methods in +Enzymology. +.left margin1 +@ end of help diff --git a/help/README b/help/README new file mode 100644 index 0000000..b662de2 --- /dev/null +++ b/help/README @@ -0,0 +1,38 @@ + README file for help directory of staden package + ----------------------------------------------- + +Should contain (at least) ProgramName_help where ProgramName is each of +bap, dap, gip, mem, mep, nip, nipf, pip, sap, sip and also staden_help +and stadenp_help. + +There are 3 main formats of file in this directory: + +PROGRAM.RNO: + This is the unformatted (runoff/nroff style) help for PROGRAM. + Any changes to the help should be performed on this file. + +program_help: + This is the online formatted help used by PROGRAM. It can also + be printed to produce hardcopy documentation. + +program_menu: + This is a file that describes the menus used in PROGRAM, + together with an index into the program_help file for the + online help. The format for each line is: + +