staden-lg/doc/manual.rtf

5155 lines
638 KiB
Plaintext

{\rtf1\mac\deff2 {\fonttbl{\f0\fswiss Chicago;}{\f2\froman New York;}{\f3\fswiss Geneva;}{\f4\fmodern Monaco;}{\f5\fscript Venice;}{\f6\fdecor London;}{\f7\fdecor Athens;}{\f8\fdecor San Francisco;}{\f11\fnil Cairo;}{\f12\fnil Los Angeles;}
{\f13\fnil Zapf Dingbats;}{\f14\fnil Bookman;}{\f15\fnil N Helvetica Narrow;}{\f16\fnil Palatino;}{\f18\fnil Zapf Chancery;}{\f20\froman Times;}{\f21\fswiss Helvetica;}{\f22\fmodern Courier;}{\f23\ftech Symbol;}{\f24\fnil Mobile;}{\f33\fnil Avant Garde;}
{\f34\fnil New Century Schlbk;}}{\colortbl\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;}{\stylesheet{\s243\qc\sa60\sl280
\f20 \sbasedon222\snext0 footer;}{\s244\sl220\tqc\tx4320\tqr\tx8640 \f4\fs16 \sbasedon0\snext0 header;}{\sl220 \f4\fs16 \sbasedon222\snext0 Normal,Screen Font;}{\s2\qc\sa200\sl480 \b\f20\fs36 \sbasedon222\snext2 Chapter Heading;}{\s3\sb200\sa120\sl360
\b\f20\fs32 \sbasedon222\snext0 Main Subheading;}{\s4\qj\sa120\sl280 \f20 \sbasedon222\snext4 Body text;}{\s5\sb400\sa60\sl320\tx560 \b\f20\fs28 \sbasedon222\snext5 Subheading;}{\s6\sb240\sa60\sl280\tx560\tx860 \b\f20 \sbasedon5\snext6 SubSub heading;}{
\s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \sbasedon4\snext7 Indent Body;}{\s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 \sbasedon222\snext8 Figure legends;}{\s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 \sbasedon6\snext9 SubSubSub heading;}}
\paperw11880\paperh16820\margl1440\margr1440\widowctrl\ftnbj\ftnrestart \sectd \linemod0\linex0\cols1\endnhere \pard\plain \s2\qc\sa600\sl480 \b\f20\fs36 {\i\fs48 Contents\par
}\pard\plain \s7\qj\fi-560\li560\sa120\sl400\tx560\tqr\tldot\tx8980 \f20 1\tab Preface\tab 1\par
2\tab Introduction\tab 3\par
3\tab Sequence input, editing and sequence library use\tab 17\par
4\tab Managing sequencing projects\tab 26\par
5\tab Analysing sequences to find genes\tab 51\par
6\tab Searching for motifs in nucleic acid sequences\tab 60\par
7\tab Using patterns to analyse nucleic acid sequences\tab 69\par
8\tab Searching for restriction sites\tab 77\par
9\tab Statistical and structural analysis of nucleotide sequences\tab 83\par
10\tab Translating and listing nucleic acid sequences\tab 93\par
11\tab Statistical and structural analysis of protein sequences\tab 99\par
12\tab Searching for motifs in protein sequences\tab 104\par
13\tab Using patterns to analyse protein sequences\tab 112\par
\pard \s7\qj\fi-560\li560\sa120\sl400\tx560\tqr\tldot\tx8980 14\tab Comparing sequences\tab 123\par
\pard\plain \s2\qc\sa200\sl480\tqr\tldot\tx8980 \b\f20\fs36 \sect \sectd \pgnrestart\linemod0\linex0\cols1\endnhere {\footer \pard\plain \s243\qc\sa60\sl280 \f20 \chpgn \par
}\pard\plain \s2\qc\sa600\sl480 \b\f20\fs36 1. Preface (November, 1992)\par
\pard\plain \s4\qj\sa120\sl280 \f20 This second edition of the manual contains only minor revisions. The changes are mostly to do with managing sequencing pro
jects which is the subject on which we are currently concentrating our efforts. We have replaced our previous Developing Assembly Program DAP with another developing assembly program BAP that can assemble Bigger projects. Although this new program can hand
le 8000 readings as opposed to the miserly 1000 of the previous version, it actualy uses its space more efficiently over the course of a project. It contains a mechanism for preventing simultaneous use (and hence corruption) of databases. In addition it is
approximately four times faster during assembly and five times faster when looking for "internal joins". It now contains a routine for selecting primers and templates during the "walking" stage of a project . The "find internal joins" function now calls u
p the contig joining editor with the two contigs aligned in the window and the editor has also been speeded up. Numerous other changes have also been made but we still regard BAP as temporary, and are actively working on its replacement which we believe wi
ll overcome the limitations that BAPs aged structure has imposed on it. We have also included routines for converting ABI 373A and Pharmacia A.L.F. data to our new trace file format, for automatically marking poor quality regions of readings from these mac
hines and for converting DAP databases to BAP databases.\par
\pard \s4\qj\sa120\sl280 Other changes include providing a postscript option for saving graphics output, and facilities for using the author and freetext indexes of the sequence libraries. The sequence library indexes are v
ery useful and allow rapid searching. The freetext index is derived from ALL the text in the annotations - not just the keywords. We have also added a new repeat examining routine in NIP and a new repeat listing option in SIP.\par
\pard \s4\qj\sa120\sl280 \par
\pard\plain \s2\qc\sa600\sl480 \b\f20\fs36 1. 1 Preface to first edition \par
\pard\plain \s4\qj\sa120\sl280 \f20
It could be said that this manual is long overdue, for, apart from the extensive online help available from within the programs, it is the first printed guide to using a package that has been around for longer than I care remember. On the other hand, to
misquote a cliche much used by reviewers, it could be said that this manual fills a much needed gap, in that I believe the best way to learn about computer programs is to use them. Those who are prepared to experiment and play with programs will discover
far more than any manual of reasonable size can hope to convey. However the manual serves to give users an overview of what is available and a starting point for their exploration of the programs.\par
\pard \s4\qj\sa120\sl280 One of my objectives was to be able to distribute the manua
l on floppy disk so that each site using the programs could print as many copies as they need. We had to balance the quality of the graphics and the sophistication of the layout, against the ease of producing updates and the availability of software, and d
ecided to to use the WORD4 program running on the Apple Macintosh. The graphics figures reproduced in the manual are far below the quality seen on the terminal screen, and in some cases should be viewed as merely schematic.\par
\pard \s4\qj\sa120\sl280 Most of the chapters are self-contained but users are strongly advised to read sections 3 to 7 in chapter 1, as to do so will save a lot of time.\par
\pard \s4\qj\sa120\sl280 In future editions we will add chapters on other programs in the package and expand the Notes sections to give more information about the theory and algorithms used. We welcome comments and suggestions for improvements.\par
\pard \s4\qj\sa120\sl280 I thank Brian Pashley for transforming my original documents into, what I hope will be, a useful manual.\par
\pard\plain \s3\sb200\sa120\sl360 \b\f20\fs32 Rodger Staden, March 1992.\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 2. Introduction\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Materials\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 2.1\tab Versions\par
2.2\tab Terminals\par
2.3\tab Digitizers\par
2.4\tab Sequencing machines\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab User interfaces\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 3.1\tab The xterm and VAX interface\par
3.2 \tab The X interface\par
3.3\tab Use of the bell\par
3.4\tab Printing and saving results in files\par
3.5\tab Use of feature tables\par
3.6\tab Use of graphics\par
3.7\tab The active region\par
3.8\tab Files of file names\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Character sets\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 4.1\tab Character sets for finished sequences\par
4.2\tab Symbols used in gel readings\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Sequence formats\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 5.1\tab Personal sequence files\par
5.2\tab Sequence libraries\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Conventions used in text\par
7.\tab Notes\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20
In this chapter we give an overview of the chapters on the "Staden Package" of programs. Here we describe the equipment required and outline the scope of the package and the user interfaces. In the next chapter we cover character sets, sequence formats and
sequence library access.\par
\pard \s4\qj\sa120\sl280 The main programs in the package are as follows\:\par
\pard\plain \s7\qj\sa120\sl280\tx1120 \f20 GIP\tab Gel input program\par
\pard \s7\qj\sa120\sl280\tx1120\tx1580 SAP\tab Sequence assembly program\par
\pard \s7\qj\sa120\sl280\tx1120 BAP\tab Sequence assembly program\par
NIP\tab Nucleotide interpretation program\par
PIP\tab Protein interpretation program\par
SIP\tab Similarity investigation program\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx1120 MEP\tab Motif exploration program\par
NIPL\tab Nucleotide interpretation program (library)\par
PIPL\tab Protein interpretation program (library)\par
SIPL\tab Similarity investigation program (library)\par
XBAP\tab Sequence assembly program\par
XNIP\tab Nucleotide interpretation program\par
XPIP\tab Protein interpretation program\par
XSIP\tab Similarity investigation program\par
XMEP\tab Motif exploration program\par
\pard\plain \s4\qj\sa120\sl280 \f20 GIP uses a digitiser for entry of DNA sequences from autoradiographs. SAP, BAP and XBAP handle everything relating to assembling and edi
ting gel readings. NIP provides functions for analysing and interpretting individual nucleotide sequences. PIP provides functions for analysing and interpretting individual protein sequences. MEP analyses families of nucleotide sequences to help discover n
ew motifs. NIPL performs pattern searches on nucleotide sequence libraries. PIPL performs pattern searches on protein sequence libraries. SIP provides functions for comparing and aligning pairs of protein or nucleotide sequences. SIPL searches nucleotide a
nd protein sequence libraries for entries similar to probe sequences. The programs whose names begin with a letter X are X11 (see below) versions of the programs. For example XNIP is an X11 version of NIP.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Materials\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Versions.\par
\pard\plain \s4\qj\sa120\sl280 \f20
The programs run on Apple Macintosh computers, on VAX computers using the VMS operating system, and on SUN workstations (which use the UNIX operating system.) The SUN version should run, with only minor changes, on other machines running UNIX and currently
we are aware of versi
ons running on DEC ULTRIX, Silicon Graphics, Alliant FX2800 and Convex machines. Currently the Macintosh version is "frozen" in its April 1990 state, the VAX version is "frozen" in its April 1991 state and all development is being done on the SUN version.
\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.1\tab VAX version.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The VAX version will run on any VAX using the VMS operating system. A FORTRAN compiler is required.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.2\tab UNIX version.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The UNIX version is being used here on SPARCstations and DECstation 5000/240s with at least 8 megabytes of memory, 20
0 megabyte internal disk drives and 700 megabyte external disks. Colour monitors such as the GX are preferable for running the programs which display traces from fluorescent sequencing machines, but monochrome displays are adequate for all other programs.
We also use tape desktop backup packs for archiving, and a cdrom drive for handling the sequence libraries.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.3\tab Other UNIX versions.\par
\pard\plain \s4\qj\sa120\sl280 \f20 Users of UNIX machines other than SUN SPARCstations, DECstation 5000/240 and SGI Indigo R3000 will require a FORTRAN comp
iler and ANSI C. When operated directly on the workstation screen all UNIX versions require X11 release 4.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.4\tab The Macintosh version\par
\pard\plain \s4\qj\sa120\sl280 \f20
The Macintosh version of the package requires a machine with at least 1 megabyte of memory and a 20 megabyte hard disk. It only operates on monochrome screens or colour screens set to black/white mode. The package contains only programs SAP, GIP, NIP, PIP
and SIP. All further information about this version of the package is contained in the notes.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Terminals.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The program
s can also be operated via a serial port using Tektronix terminals, PC's running MS-Kermit, or Apple Macintoshs running Versaterm Pro. The UNIX versions can also be run from X teminals or microcomputers running X emulators.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Digitizers.\par
\pard\plain \s4\qj\sa120\sl280 \f20
The gel reading input program uses a sonic digitizer called a GRAPHBAR GP7 made by Science Accessories Corp., 200 Watson Blvd., Stratford, CT 06497, USA. When ordering specify that the device should be set to use metric units.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Sequencing machines.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The programs can handle data produced by the Applied Biosystems Inc. 373A and Pharmacia A.L.F fluorescent sequencing machines.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab User Interfaces\par
\pard\plain \s4\qj\sa120\sl280 \f20
The programs have two user interfaces. The first runs under the terminal emulator xterm and the second runs directly under X. On the VAX, at present only the xterm interface is available, but on UNIX systems either interface can be used. The xterm version
of the package will operate on the workstation screen, X terminals, Tektronix terminals, PC's or Macintoshes (see above). When run
on the workstation screen the programs have separate text and graphics windows, each of which can be moved, resized and iconized, and the text windiow can be scrolled in both directions. The versions that run directly under X can only be used on the works
tation screen, X terminals or using an X emulator. They produce separate text and graphics windows, an independent, constantly available help window and a separate dialogue window. All input is controlled by mouse selection and dialogue boxes.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.1\tab The xterm and VAX interface\par
\pard\plain \s4\qj\sa120\sl280 \f20 The user interface is common to all programs. It consists of a set of menus and a uniform way of presenting choices and obtaining input from the user. This section describes\:
the menu system; how options are selected and other choices made; how values are
supplied to the program; how help is obtained, and how to escape from any part of a program. In addition it gives information about saving results in files and the use of graphics for presenting results.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.1.\tab Menus and option selection\par
\pard\plain \s4\qj\sa120\sl280 \f20
Each program has several menus and numerous options. Each menu or option has a unique number that is used to identify it. Menu numbers are distinguished from option numbers by being preceded by the letter m (or M, all programs make no distinction between u
pper and lower case letters). With the exception of some parts of program SAP, the menus are not hierachical, rather the options they each contain are simply lists of related functions and their identifying numbers. Therefore options can be selected inde
pendently of the menu that is currently being shown on the screen, and the menus are simply memory aides. All options and menus are selected by typing their option number when the programs present the prompt \par
\pard \s4\qc\sb120\sa180\sl280 "? Menu or option number =" \par
\pard \s4\qj\sa120\sl280
To select a menu type its number preceded by the letter M. To select an option type its number. If users type only "return" they will get menu m0 which is simply a list of menus. If users select an option they will return to the current menu after the func
tion is completed. Where possible, equivalent or identical options have been given the same numbers in all programs, and so users quickly learn the numbers for the functions they employ most often.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.2\tab Execution and dialogue\par
\pard\plain \s4\qj\sa120\sl280 \f20
All inputs requested by the program (apart from file names) have default values. In addition most of the analytical functions have a default path through which they will pass, so when users select an option, in many cases the program will immediately perfo
rm the operation selected without further dialo
gue. However if users precede an option number by the letter d (e.g. D17), they will force the program to offer dialogue about the selected option before the function operates, hence allowing them to change the value of any of its parameters. In addition,
alternative suboptions will be made available.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.3\tab Help\par
\pard\plain \s4\qj\sa120\sl280 \f20 Help about each option can be obtained by preceding the option number by the symbol ? when users are presented with the prompt "? Menu or option number", (e.g. ?17 gives help on the option 17), but
there are two further ways of obtaining help. Whenever the program asks a question users can respond by typing the symbol ? and they will receive information about the current option. In addition, option number 1 in all the programs will give help on all o
f a programs functions. \par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.4.\tab Quitting \par
\pard\plain \s4\qj\sa120\sl280 \f20 To exit from any point in a program users type ! for quit. If a menu is on the screen this will stop the program, otherwise they will be returned to the last menu. \par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.5.\tab Making selections\par
\pard\plain \s4\qj\sa120\sl280 \f20 Questions and choices are dealt with in three ways. Where there are choices that are not obvious opposites, or there are more than two choices, "radio buttons" and "check boxes" are used.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\pagebb\tx1140 \b\f20 3.1.5.1.\tab Choosing between opposites.\par
\pard\plain \s4\qj\sa120\sl280 \f20 Obvious opposites such as "clear screen" and "keep picture" are presented with only the default shown. For example in this case the default is generally "keep picture" so the program will display\: \par
\pard\plain \li1720\sa200\sl220 \f4\fs16 "Keep picture (y/n) (y) =" \par
\pard\plain \s4\qj\sa120\sl280 \f20 and the picture will be retained if the user types Y or y or only return. If the user types N or n the picture will be cleared. Anything other than these or ? or ! will cause the question to be asked again.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.5.2. \tab Choosing one from many.\par
\pard\plain \s4\qj\sa120\sl280 \f20
Radio buttons are used when only one of a number of choices can be made at any one time. The choices are presented arranged one above the other, each choice with a number for its selection, and the default choice marked with an X. For example when the user
is reading a new sequence file the following choices of format are offered.\par
\pard\plain \li1720\sb300\sl220\tx2460\tx3400 \f4\fs16 Select sequence file format\par
\pard \li1720\sl220\tx2460\tx3400 \tab 1\tab Staden\par
\tab 2\tab EMBL\par
X\tab 3\tab GenBank\par
\tab 4\tab PIR\par
\tab 5\tab GCG\par
6 FASTA\par
\pard \li1720\sa300\sl220\tx2460\tx3400 ? Selection (1-5) (3) =\par
\pard\plain \s4\qj\sb60\sa120\sl280 \f20 Any single option can be selected by typing the option number, and the default option, (here shown as 3), is also obtained by typing only "return". Again help can be obtained by typing ? and quit by typing !.
\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.5.3.\tab Choosing at least one from many.\par
\pard\plain \s4\qj\sa120\sl280 \f20 Check boxes are used when any number of a set of choices can be made (i.e. the choices are not exclusive). Choices are made by typing choice numbers. Each choice c
an be considered as a switch whose setting is reversed when it is selected. Choices that are currently switched on are marked with an X. The user quits from making selections by typing only "return". For example in the routine that plots base composition u
sers can elect to plot the frequencies of any combination of bases, e.g. only A, or A+T, or A+T+G etc. The following check box is offered to the user\: \par
\pard\plain \li1720\sb300\sl220\tx2420\tx3400 \f4\fs16 X\tab 1\tab T\par
\pard \li1720\sl220\tx2420\tx3400 \tab 2\tab C\par
X\tab 3\tab A\par
\tab 4\tab G\par
\pard \li1720\sa300\sl220\tx2420\tx3400 ? Selection (1-4) ( ) =\par
\pard\plain \s4\qj\sb60\sa120\sl280 \f20 As shown this will plot the A+T composition. To switch off T select 1, to switch on C select 2, etc, to quit, having set the bases required type only "return". \par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 \page 3.1.6.\tab Input of numerical values \par
\pard\plain \s4\qj\sa120\sl280 \f20 All input of integer or decimal numbers is presented in a standard way with the allowed range shown in brackets and the default value also in brackets. For example\: \par
\pard\plain \li1700\sb160\sa300\sl220 \f4\fs16 ? Window (5-31) (11) = \par
\pard\plain \s4\qj\sa120\sl280 \f20 In this example users could type any number between 5 and 31, or "return" only, or ! or ? (see above). Any other input will cause the program to ask the question again. Typing only "r
eturn" gives the default value (here 11). \par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.7.\tab Input of character strings\par
\pard\plain \s4\qj\sa120\sl280 \f20 Character strings are requested using informative prompts of the form\:\par
\pard\plain \li1720\sb160\sa300\sl220 \f4\fs16 ? Search string =\par
\pard\plain \s4\qj\sa120\sl280 \f20 Or where possible the prompt will be preceded by a default value as in\:\par
\pard\plain \li1720\sb160\sl220 \f4\fs16 Default search string = atatatata\par
\pard \li1720\sa300\sl220 ? Search string =\par
\pard\plain \s4\qj\sa120\sl280 \f20 Question mark (?) or ! will get help or quit. Where appropriate, for example when a whole list of strings have been defined one after the other, typing return only will be a signal to the program that input is complete.
\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.2.\tab The X interface\par
\pard\plain \s4\qj\sa120\sl280 \f20
This interface deals with all the types of interactions described above but options are selected using pulldown menus and all inputs are via appropriately styled dialogue boxes and buttons. Default values are accepted by clicking on an "OK" button, or typi
ng return on the keyboard. Values are changed by overtyping the defaults. Quit is available from each dialogue via a "CANCEL" button. Help is constantly available via a "HELP" button in the main dialogue window. Details such as requestin
g dialogue when an option is selected are dealt with using a button labelled "execute with dialogue" which toggles to "execute".\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.3.\tab Use of the bell \par
\pard\plain \s4\qj\sa120\sl280 \f20 The programs use the bell to indicate that a task is completed. When the bell sounds, the programs will wait until return is typed. Users can quit from these points by typing ! but no help is available.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.4.\tab Printing and saving results in files \par
\pard\plain \s4\qj\sa120\sl280 \f20 A few of the functions in the programs automatically write their textual results to disk files, but for most functi
ons users can choose whether results appear on the terminal screen or go to a file. For these functions the normal, or default, place for results to appear is on the screen, and users need to decide before the function is selected if they want to redirect
the results to a file. In all programs the option "Redirect output" gives control over whether results appear on the screen or go to a file. When a program is started results will be sent to the screen. If the option "Redirect output" is selected users wil
l be given the choice of redirecting either text or graphics to a file or of creating a postscript file for the graphics. The program will then ask users to supply a file name. If users elect to redirect output, from that point on ,all results will be sent
to the file until the option is selected again, in which case the "redirection file" will be closed, and results will again appear on the screen. If these files contain textual results they can be looked at from within the programs by using option "List
a text file". Once the program is left users can employ an appropriate system command to print the files. There is no function within the programs to direct files to a printer. If users elect to create a postscript file for the graphics the graphics will a
lso appear on the screen. If they redirect graphics the graphics commands (in Tektronix codes) will only go to the file and will not appear on the screen\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 3.5.\tab Use of feature tables\par
\pard\plain \s4\qj\sa120\sl280 \f20 One particular use of redirection should be noted. The programs can use EMB
L/GenBank feature tables as input for directing translation of DNA to protein, etc, but the tables must be stored in separate text files, and cannot be read directly from the sequence libraries. The only routines that can read the sequence libraries are th
ose available under "Read a sequence". So to create a text file containing the feature table for a particular library entry users must redirect text output to disk, and then use the "Read a sequence" to display the appropriate feature table. The feature ta
ble will be written to the file, and then the file can be used for controlling translation etc. Note however that the redirection mechanism is a general function and it therefore does not add the required header and tail to saved files. To make the files u
seable as feature tables they need, as a minimum, a line at the top with the word FEATURES starting in column 1, and two empty lines at the end of the file!\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.6.\tab Use of graphics \par
\pard\plain \s4\qj\sa120\sl280 \f20 The analytical programs including NIP, PIP and SIP present the results of many of their analyses graphically.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.1.\tab The drawing board and plot positions\par
\pard\plain \s4\qj\sa120\sl280 \f20
The position at which the results for any function appear on the screen is defined relative to a notional users "drawing board" of dimension 10,000 by 10,000. This drawing board fills the screen and results are drawn in windows defined using symbols x0,y0
and xlength,ylength, where x0,y0 is the position of the bottom left hand corner of the window, and xlength is the width of the window and ylength the height of the window. The win
dow positions for each option are read from a file when a program is started. If required individual users can have their own set of plot positions, and also the positions can be redefined from within the programs using the option "Reposition plots".
\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.2.\tab The plot interval\par
\pard\plain \s4\qj\sa120\sl280 \f20
For those analyses that draw continuous lines to represent results (for example a plot of base composition) the user is asked to supply the "Plot interval". All the analyses produce a value for every point along the sequence but often i
t is unnecessary to actually plot the values for all the points. The plot interval is simply the distance between the points shown on the screen. If the user selects a plot interval of 1, every point will be plotted; a plot interval of 3 will show every th
ird point. \par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.3.\tab The window length\par
\pard\plain \s4\qj\sa120\sl280 \f20 The word "window" is used in a further way by the programs. Most of the functions that analyse the content of a sequence (the simplest such routine plots the base composition) perform their calculations over a segment o
f the sequence of a certain length, display the result, then move on by 1 position, and recalculate. The fixed size of segment over which a calculation is performed is called a "window" and the segment size is the "window length". Many analytical functions
request "? Window length =", or more frequently "? Odd window length =". An odd number is used so that when a result is displayed for a particular window position it is derived from an equal number of points either side of the windows' midpoint.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.4.\tab Use of the cross hair\par
\pard\plain \s4\qj\sa120\sl280 \f20
All programs that produce graphical output provide a function for using a cross hair to examine the plots. After the cross hair function is selected the cross will appear in the graphics window and can be steered around using the mouse or directional keys.
Special keyboard characters hit while the function is in operation produce the following results. For all programs the letter s (for sequence) will show the local sequence around the cross hair position. For the sequence comparison pro
grams that show a dot matrix the two sequences will be displayed above one another. For the sequencing project management programs all the aligned sequences in the contig will be displayed. For the sequence comparison programs the letter m (for matrix) wil
l show a matrix in which all identical characters for a window around the cross hair are marked. The punctuation symbol , will show the local position in sequence units, but leave the cross hair on the screen, whereas the space bar and any other non-specia
l character will show the local position and exit the cross hair function. Further special characters are defined in the chapter on managing sequencing projects.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.5\tab Drawing scales on plots\par
\pard\plain \s4\qj\sa120\sl280 \f20 All the programs have a function "Draw a ruler" which will allow users to add scales to the axes of graphical plots. The scale can be positioned anywhere on the plot.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.6\tab Saving graphics\par
\pard\plain \s4\qj\sa120\sl280 \f20 The best way of saving the graphics is to use the "Redirect output" function to open a postscript file which will then contain a co
py of all plots that appear on the screen. This of course requires the file to be opened before the plots are drawn. Many terminals are not capable of dumping their screen contents to a file for subsequent printing. One convenient way of obtaining hard cop
y of graphical results is to use a micro computer as a terminal. On the Macintosh we use the terminal emulator versa termPro. This allows graphics to be saved as Macintosh files that can be annotated and printed using Macdraw and other painting programs. A
lternatively graphics can be redirected to a file and printed using a laser printer with tektronix capability (see "Printing and saving results in files"). \par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.7.\tab The active region\par
\pard\plain \s4\qj\sa120\sl280 \f20
All the analytical programs use an "active region" for most of their functions. This is simply the current section of the sequence over which the analysis will be applied. When a sequence is first read in the active region will be set to its whole length,
but the user can restrict the scope of analytical functions by use of an opt
ion called "Define active region". However some functions such as "List the sequence" are always given access to the whole sequence and will allow the user to define a limited range after they have been selected.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.8.\tab Files of file names\par
\pard\plain \s4\qj\sa120\sl280 \f20
A useful device that is employed by many of the programs is that of "files of file names". If a program needs to perform the same operation in turn on each of 20 files, the user should not have to type in 20 file names. Instead the user types in the name o
f a single file which contains the names of the other 20 files. This single file is a file of file names. They are used, for example, to process batches of gel readings, or to compare a sequence against a library of motifs.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab Character Sets\par
\pard\plain \s4\qj\sa120\sl280 \f20 There are two types of character sets employed by the programs\: those for finished sequences and those used during sequencing projects.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 4.1\tab Character sets for finished sequences\par
\pard\plain \s4\qj\sa120\sl280 \f20 The analytical programs will operate with uppercase or lowercase sequence characters. For nucleic acids T and
U are equivalent. For proteins the standard 1 letter codes are used. The analytical programs also use IUB symbols for redundancy in back translations and for sequence searches. The symbols are shown in table 2.1 \par
\pard \s4\qj\li2260\ri2220\sb300\sa120\sl280\box\brsp100\brdrth \tx3420\tx4800 A,C,G,T\par
\pard \s4\qj\li2260\ri2220\sa120\sl280\box\brsp100\brdrth \tx3420\tx4800 R\tab (A,G)\tab 'puRine'\par
Y\tab (T,C)\tab 'pYrimidine'\par
W\tab (A,T)\tab 'Weak'\par
S\tab (C,G)\tab 'Strong'\par
M\tab (A,C)\tab 'aMino'\par
K\tab (G,T)\tab 'Keto'\par
H\tab (A,T,C)\tab 'not G'\par
B\tab (G,C,T)\tab 'not A'\par
V\tab (G,A,C)\tab 'not T'\par
D\tab (G,A,T)\tab 'not C'\par
\pard \s4\qj\li2260\ri2220\sa120\sl280\keepn\box\brsp100\brdrth \tx3420\tx4800 N\tab (G,A,C,T)\tab 'aNy'\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Table 1.1\tab The NC-IUB characters used by the analytical programs\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 4.2\tab Symbols used in gel readings\par
\pard\plain \s4\qj\sa120\sl280 \f20 Th
e information stored about a sequence reading has to show the original sequence, recording any doubts about its interpretation, and also, where possible, allow the changes made during editing to be indicated. Lowercase characters are used by the sequence p
roject management programs for recording readings, and uppercase symbols are used when changes are made during editing. Alternatively the reverse convention can be used. Any other characters in a sequence are treated as dash (-) characters. The symbols are
shown in table 2.2.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 5.\tab Sequence Formats\par
\pard\plain \s4\qj\sa120\sl280 \f20
The data formats for the programs that deal with sequencing projects are described in the chapter on managing sequencing projects. All analytical programs can read sequences stored in several formats. We distinguish between two sources of input namely\:
"sequence libraries" and "personal files".\par
\pard \s4\qj\sa120\sl280 \par
\pard \s4\qj\li1120\ri1200\sa120\sl280\box\brsp100\brdrth \tqc\tx2800 {\b Symbol \tab Meaning}\par
\pard \s4\qj\li1120\ri1200\sl280\box\brsp100\brdrth \tx1400\tqc\tx2800\tqc\tx4240\tqc\tx5640\tx6820 \tab c\tab Definitely\tab c\par
\tab t\tab "\tab t\par
\tab a\tab "\tab a\par
\tab g\tab "\tab g\par
\tab 1\tab Probably\tab c\par
\tab 2\tab "\tab t\par
\tab 3\tab "\tab a\par
\tab 4\tab "\tab g\par
\tab d\tab "\tab c\tab Possibly\tab cc\par
\tab v\tab "\tab t\tab "\tab tt\par
\tab b\tab "\tab a\tab "\tab aa\par
\tab h\tab "\tab g\tab "\tab gg\par
\tab k\tab "\tab c\tab "\tab c-\par
\tab l\tab "\tab t\tab "\tab t-\par
\tab m\tab "\tab a\tab "\tab a-\par
\tab n\tab "\tab g\tab "\tab g-\par
\tab r\tab a or g\par
\tab y\tab c or t\par
\tab 5\tab a or c\par
\tab 6\tab g or t\par
\tab 7\tab a or t\par
\tab 8\tab g or c\par
\tab -\tab a or g or c or t\par
\pard \s4\qj\li1120\ri1200\sl280\box\brsp100\brdrth \tx1400\tqc\tx3780\tqc\tx4240\tqc\tx5640\tx6820 \tab A\tab a set by auto edit or corrected by user\par
\tab C\tab c set by auto edit or corrected by user\par
\tab G\tab g set by auto edit or corrected by user\par
\tab T\tab t set by auto edit or corrected by user\par
\pard \s4\qj\li1120\ri1200\sl280\box\brsp100\brdrth \tx1400\tqc\tx4020\tqc\tx5640\tx6820 \tab *\tab padding character placed by auto assembler\par
\pard \s4\qj\li1120\ri1200\sl280\keepn\box\brsp100\brdrth \tx1400\tqc\tx2800\tqc\tx4240\tqc\tx5640\tx6820 else = -\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa300\sl240\tx1140 \f21\fs20 Table 2.2\tab The symbols used to record gel readings\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 5.1\tab Personal sequence files\par
\pard\plain \s4\qj\sa120\sl280 \f20 The programs can read sequences from files in PIR, EMBL, GenBank, GCG, FASTA and Staden formats. Staden format
means text files with records of up to 80 characters; all spaces are removed; lines with ";" in the first position are treated as comments and will be displayed when the file is read but not included in the sequence; if the first line of data contains a 2
0 character header of the form <---abcdefghij-----> it too will not be included in the processed sequence. This last facility allows the programs to read consensus sequences created by the sequence project management programs. Files in PIR format can conta
in any number of entries (which the user selects by entry name), but all other formats are expected to contain only one sequence. If they contain more only the first will be read.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 5.2\tab Sequence libraries\par
\pard\plain \s4\qj\sa120\sl280 \f20
Users may not appreciate the fact that because the sequence libraries are so large, programs need to use indexes to provide rapid retrieval of individual entries. An index is a list of entry names and pairs of offsets. For each entry name the offsets defin
e the position at which its sequence and annotation s
tart in the large file. The index, which is in any case relatively small, is arranged so that it can be searched quickly - for example the EMBL cdrom index is sorted alphabetically. When the user supplies an entry name the program rapidly finds it in the i
ndex file and then uses the associated offsets to locate the entry in the larger sequence files.\par
\pard \s4\qj\sa120\sl280 The sequence libraries are stored in different ways on the VAX and the SUN. On the VAX we adopted the widely used PIR format and indexing method and on the SUN we use the EMBL cdrom format and indexes.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 5.2.1\tab Sequence libraries on the VAX\par
\pard\plain \s4\qj\sa120\sl280 \f20
On the VAX all libraries are stored in PIR format, and except for the facility to select entries by accession number, the same functions are provided as those on the SUN. Note that this means that most libraries need reformatting after they have been read
from the distribution media. Because, for each entry, the sequence and its annotation are stored separately, the reformatting process consumes significant computer resources. T
hese reformatting programs are available from PIR and we give no further information here. The programs that search whole libraries of sequences also expect the libraries to be in PIR format.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 5.2.2.\tab Sequence libraries for the UNIX version\par
\pard\plain \s4\qj\sa120\sl280 \f20
For the UNIX version of the programst we use the EMBL cdrom as the primary source of sequence data and have chosen their indexing method for all libraries. These indexes leave the sequence libraries in their distribution format and simply provide offsets t
o the original fi
les. The cdrom provides the EMBL nucleic acid sequence library and the SWISSPROT protein sequence library. Currently it also includes indexes for entry names, accession numbers, authors and freetext and has an additional "title" file which, for each entry,
consists of entry name, entry length and an 80 character description of the entry. These indexes allow rapid retrieval of entries by name or accession number, and the author and freetext indexes can be searched very rapidly. The files can be left on the
cdrom or transfered to a hard disk. The programs that search whole libraries of sequences expect the libraries to be in cdrom format or PIR format.\par
\pard \s4\qj\sa120\sl280
We have written our own programs for producing EMBL cdrom type indexes for other sequence libraries. These allow us to use the PIR protein libraries in CODATA format and between release updates of the EMBL nucleotide library. Others may wish to use them to
produce indexes for libraries such as GenBank. In addition to our own programs the scripts that produce the indexes also use the UNIX sort program. We give no further details here but the programs are described in Staden and Dear, 1992.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 5.2.2.1\tab Library description files.\par
\pard\plain \s4\qj\sa120\sl280 \f20
The following information is only relevent to those installing the sequence libraries on a SUN. To make the sequence library handling as flexible as possible we use several level of files. As stated above, at present we only deal with the EMBL and SWISSPRO
T libraries as distributed on cdrom and the PIR protein library in CODATA format. By including a "library type" flag in the library description file we also leave open the possibility of using alternative formats. \par
\pard \s4\qj\sa120\sl280 We describe the libraries at 3 levels\:
1) a list of libraries and their types, which points to 2) the files which name the libraries individual files and their file types, then, finally 3) the librairies individual files. The files used are described below.\par
\pard\plain \s7\qj\fi-1120\li1120\sa120\sl280\pagebb\tx1120 \f20 Level 1)\tab The top level file is a list of available libraries which contains\: the library type, the name of the file containing th
e names of each libraries individual files, and the prompt to appear on the users screen. \par
\pard\plain \s4\qj\sa120\sl280 \f20 Example\: \par
\pard \s4\qj\li1100\sa120\sl280 File name\: SEQUENCELIBRARIES\par
File contents\:\par
\pard\plain \li1120\sl220 \f4\fs16 A\tab EMBLLIBDESCRP EMBL nucleotide library ! in cdrom format\par
A\tab SWISSLIBDESCRP SWISSPROT protein library! in cdrom format\par
\pard \li1120\sa300\sl220 B\tab PIRLIBDESCRP PIR protein library! in CODATA format\par
\pard\plain \s4\qj\sa180\sl280 \f20 The first two libraries are of type A. The logical names are EMBLLIBDESCRP and SWISSLIBDESCRP, and the prompts are "EMBL nucleotide library" and "SWISSPROT protein library". The third library is o
f type B with logical name PIRLIBDESCRP. Space is used as a delimiter and anything to the right of a ! is a comment.\par
\pard\plain \s7\qj\fi-1100\li1100\sa120\sl280\tx1120 \f20 Level 2)\tab The file containing the names of the libraries individual files contains flags to define the file types and the path or logical names of the files. Current file types are\: \par
\pard\plain \fi100\li980\sl220 \f4\fs16 A\tab Division_lookup\par
B\tab Entryname_index\par
C\tab Accession_target\par
D\tab Accession_hits\par
E\tab Brief_directory.\par
F\tab Freetext_target\par
G\tab Freetext_hits\par
H\tab Author_target\par
I\tab Author_hits\par
\pard\plain \s4\qj\sa120 \f20 Example\par
\pard \s4\qj\li1120\sa120 File name\: EMBLLIBDESCRP\par
File contents\:\par
\pard\plain \fi100\li980\sl220 \f4\fs16 A\tab STADTABL/EMBLdiv.lkp\par
B\tab /cdrom/indices/embl/entrynam.idx\par
C\tab /cdrom/indices/embl/acnum.trg\par
D\tab /cdrom/indices/embl/acnum.hit\par
E\tab /cdrom/indices/embl/brief.idx\par
F\tab /cdrom/indices/embl/freetext.trg\par
G\tab /cdrom/indices/embl/freetext.hit\par
H\tab /cdrom/indices/embl/author.trg\par
I\tab /cdrom/indices/embl/author.hit\par
\pard \li1120\sa300\sl220 \par
\pard\plain \s7\qj\fi-1120\li1120\sa120\sl280\tx1120 \f20 Level 3)\tab
The individual library files. The contents of all files below Division_lookup are exactly as they appear on the cdrom. The Division_lookup file is rewritten so the directory structure and file names can be chosen locally. Its format is I6,1x,A. \par
\pard\plain \s4\qj\sb300\sa180\sl280 \f20 The files which define all the programs and standard data files used by the package\:
staden.login and staden.profile, define the file SEQUENCELIBRARIES which contains the list of available libraries. As should be clear from the description above the three
levels need to be created (actually modified from the contents of the distribution tape) and all names can be changed locally, or set to be the same as those on the cdrom.\par
\pard\plain \s7\qj\fi-1120\li1120\sa120\sl280\tx1120 \f20 \par
\pard\plain \s4\qj\sa120\sl280 \f20 Example of Division_lookup file \par
\pard \s4\qj\li1120\sa120\sl280 File name\: STADTABL/EMBLdiv.lkp\par
Contents\:\par
\pard\plain \li1120\sl220 \f4\fs16 1\tab /cdrom/embl/fun.dat\par
2\tab /cdrom/embl/inv.dat\par
3\tab /cdrom/embl/mam.dat\par
4\tab /cdrom/embl/org.dat\par
5\tab /cdrom/embl/phg.dat\par
6\tab /cdrom/embl/pln.dat\par
7\tab /cdrom/embl/pri.dat\par
8\tab /cdrom/embl/pro.dat\par
9\tab /cdrom/embl/rod.dat\par
10\tab /cdrom/embl/syn.dat\par
11\tab /cdrom/embl/una.dat\par
12\tab /cdrom/embl/vrl.dat\par
13\tab /cdrom/embl/vrt.dat\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 6.\tab Conventions Used In Text\par
\pard\plain \s4\qj\sa120\sl280 \f20 Obviously the programs can perform many more operations than there is space to describe but, in the selection of uses shown, we have tried to give some feel for the programs' sco
pe. For this reason, and the need to conform as closely as possible to the format of the book, we have chosen specific paths through the programs, rather than attempt to describe all routes. For some sections, such as that on the facilities available for e
diting contigs, this has not been possible and we have instead described how the major commands are used. It should also be noted that the user interactions described in the methods sections are those that would be required if the options were selected in
the "Execute with dialogue" mode. In practice many of the options would normally be used without any dialogue being required.\par
\pard \s4\qj\sa120\sl280
In the section on the user interface we outlined the different modes of obtaining input from users. Throughout the specific chapters we have adopted the following conventions to indicate which mode of input is being employed. When a program requests numeri
cal or string input we have used the term "Define", as in Define "Minimum search score". When a program requests that a choice is
made between several options, as in the case of radio buttons or check boxes, we have used the term "Select". When a program offers a choice between two options in the form of a yes or no answer, as in "Hide translation", we use the terms "Accept" or "Reje
ct". When the digitizer program uses the stylus for input we have used the term "Hit".\par
\pard \s4\qj\sa120\sl280 Because it is difficult to produce figures including pull down menus and dialogue boxes, almost all examples containing user input are taken from the xterm interface. Ho
wever the actual wording of the prompts is the same for both interfaces.\par
\pard \s4\qj\sa120\sl280
The programs contain routines for drawing scales on plots and for simple annotation, but in general such embellishment is not done automatically by the programs. This is because the programs are designed so that many plots can be superimposed, and it is be
tter for the user to explicitly decide to add scales and annotation. More elaborate annotation can be added by saving the graphics output to files which can be handled by, say Macinto
sh, painting and drawing programs. None of the examples of graphical results shown in the following chapters have added scales\: all are exactly as drawn by the programs.\par
\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \par
\par
\par
\par
7.\tab NOTES\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 7.1\par
\pard\plain \s4\qj\sa120\sl280 \f20
Although all the programs in the Macintosh version of the package work, the conversion to this machine was never finished. The package does not provide access to the sequence libraries, handling only simple text files containing sequences, or those generat
ed by the assembly program SAP. The user interface, although using pu
ll down menus and dialogue boxes for all interactions, is not as "Mac like" as many would expect. However many people find this version very useful, and for others, the digitizer program alone makes the package worth having. Data input from a digitizer is
a task suited to a machine like the Macintosh, and the data files can be transferred to a larger machine for assembly and other analysis. With the exception of sequence library access, all the options available in the 1990 VAX version are contained in the
package (See Staden, 1990). We give no further details specific to the Macintosh version.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 8.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1990. An improved sequence handling package that runs on the Apple Macintosh. Comput. {\i Applic. Biosc}. {\b 4}, 387-393.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Staden, R. and Dear, S. 1992. Indexing the sequence libraries\: Software providing a common indexing system for all the standard sequence libraries. {\i DNA Sequence} {\b 3}, 99-105.\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 3. Sequence Input, Editing and Sequence Library Use\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 1.1\tab Introduction to sequence input\par
1.2 \tab Introduction to keyboard input\par
1.3\tab Introduction to input from digitizer\par
1.4\tab Introduction to editing single sequences\par
1.5\tab Introduction to using the sequence libraries\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Sequence input from keyboard\par
2.2\tab Sequence input from digitizer\par
2.3\tab Sequence input from the Pharmacia A.L.F.\par
2.4\tab Sequence input from the ABI 373A.\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.5\tab Editing a nucleic acid sequence using restriction sites and a translation and base numbering as landmarks.\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.6\tab Searching the freetext and author indexes of a sequence library\par
2.7\tab Using accession numbers to retrieve data from a sequence library\par
2.8\tab Displaying the annotations for an entry in a sequence library\par
2.9\tab Reading a sequence from sequence library\par
2.10\tab Worked example of sequence library access\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
4.\tab References\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we describe sequence input and editing and the use of sequence libraries.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.1\tab Introduction to sequence input and editing\par
\pard\plain \s4\qj\sa120\sl280 \f20 The package contains facilities for input of sequence data from the keyboard, sonic digitizer
s, and ABI 373A and Pharmacia A.L.F fluorescent sequencing machines. Editing of single sequences can be performed using system editors such as EDT on the VAX and EMACS on the SUN. Editing of sequence alignments is discussed in the chapter on managing sequ
encing projects.\par
\pard\plain \s6\sa60\sl280\pagebb\tx560\tx860 \b\f20 1.2\tab Introduction to keyboard input\par
\pard\plain \s4\qj\sa120\sl280 \f20 The program SAP contains an option to enter sequence at the keyboard. It also creates a file of file names and will list the sequences. Users may choose any 4 keys to represent the characters A, C, G and
T. For example 4 adjacent keys in the same order as the lanes on a gel could be used. The program translates these symbols to A, C, G and T, and any other characters are left unchanged. No line of input should be longer than 80 characters. Terminate input
with the symbol @.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.3\tab Introduction to input from digitizer\par
\pard\plain \s4\qj\sa120\sl280 \f20 Digitisers provide a convenient way of entering sequences from films into a computer. The digitiser, which is connected directly to the computer, operates on a light box, and is controlled by a pr
ogram named GIP (1). The film to be read is taped firmly to the surface of the light box, and the user defines the lane order and the centres of the four lanes to be read. These positions are defined at the point where reading will commence and the program
adjusts their values as the film is read. The user reads the sequence and transfers it to the computer by hitting the centres of the bands progressing up the film. Any number of sets of lanes and films can be read in a single run of the program. Each sequ
ence is stored in a separate file and a file of file names is also written. The program also uses a menu, which is a series of reserved areas of the light box surface, for entering commands and uncertainty codes. When the pen is pressed in these areas the
program responds accordingly. Each time the pen tip is depressed in the digitizing area the program sounds the bell on the terminal to indicate to the user that a point has been recorded. As the sequence is read the program displays it on the screen.
\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.4\tab Introduction to editing single sequences\par
\pard\plain \s4\qj\sa120\sl280 \f20
The editing method used by the programs is designed to give users access to an editor with which they are familiar - i.e. the one on their machine, say EDT on a VAX or EMACS on a UNIX system, and yet to allow them to edit a sequence which contains all the
landmarks they need in order to know where they are. Users can create a file containing a simple listing of the sequence (single stranded) with numbering, using "list the sequence", and then edit it with their syste
m editor, using the numbering to know where they are within the sequence. When the edits are complete they exit from the editor and the program "analyses" the edited file to extract only the sequence characters. Similarly a file containing a three phase tr
anlslation, or a file containing a sequence plus its three phase translation, plus its restriction sites marked above the sequence (see figure 3.1), can be edited. In order to be able to "analyse" such complicated listings and correctly extract the sequenc
e the following simple rule is used\:
all lines in the file that contain a character that is not A,C,T,G or U are deleted. It is obviously important to be aware of this rule and its implications. For protein sequences only a simple listing i.e. the sequence plus numbering, can be used.\par
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 1.5\tab Introduction to using the sequence libraries\par
\pard\plain \s4\qj\sa120\sl280 \f20 The installation of the sequence libraries is described in the introductory chapter. Direct access to the libraries is provided by all programs that need such a facility\: it is
not performed by separate programs. The facilities currently offered in NIP, PIP, SIP, NIPL, PIPL, and SIPL include the following\:\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab Get a sequence by knowing its entry name\par
\tab Get a sequences' annotation by knowing its entry name\par
\tab Get an entry name by knowing its accession number\par
\pard\plain \li1120\ri1240\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 HapII\par
\pard \li1120\ri1240\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth HpaII\par
MspI MseI\par
. .HincII\par
. .HindII\par
. .HpaI DsaV\par
. .. EcoRII\par
. .. TspAI\par
. .. . ApyI\par
. .. . BstNI\par
. .. . MvaI\par
. .. . ScrFI MaeIII\par
. .. . . . BsrI MseI\par
ccggttagactgttaacaacaaccaggttttctactgatataactggttacatttaacgc\par
10 20 30 40 50 60\par
P V R L L T T T R F S T D I T G Y I * R\par
R L D C * Q Q P G F L L I * L V T F N A\par
\pard \li1120\ri1240\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth G * T V N N N Q V F Y * Y N W L H L T P\par
\pard\plain \s8\qj\fi-1140\li1140\sb80\sa120\sl240\tx1140 \f21\fs20 Figure 3.1\tab The first page width of a sequence display that can be edited by the program.\par
\pard\plain \s7\qj\fi-560\li560\sb360\sa120\sl280\tx560 \f20 \tab Search the author index for author names\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab Search the freetext index for keywords\par
\pard\plain \s4\qj\sa120\sl280 \f20 The facilities currently offered in NIPL, PIPL and SIPL include\:\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab Search whole library\par
\tab Search only a list of entry names\par
\tab Search all but a list of entry names\par
\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Sequence input from keyboard\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Type in gel readings".\par
2.\tab Accept "Use special keys for A,C,T,G".\par
3.\tab Define the keys in turn.\par
4.\tab Define "File file names". A file of file names so the readings can be processed as a batch.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define in the sequence by typing it in using the selected keys. Finish by typing an @ symbol.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "File name for this gel reading". This is the name for the sequence just entered.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Accept "Type in another reading". This cycles round to step 5. If rejected the next step follows.\par
8.\tab Accept "List gel readings". The batch of readings entered will each be listed, one after the other, headed by their file names, on the screen.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Sequence input from digitizer\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Tape the autoradiograph down securely on the light box.\par
2.\tab Start the program (GIP).\par
3.\tab Define "File of file names".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Using the digitizer pen hit the digitizer menu ORIGIN, program menu ORIGIN, program menu START.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab After the bell has sounded the program will give the default lane order. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab If correct hit CONFIRM otherwise hit RESET. To reset the lane order hit the A,C,G,T boxes in the menu in left to right order.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Hit START, then hit in left to right order, at a height level with the first band to be read, the start positions for the next four lanes. The progr
am will report the mean lane separations and asks for confirmation that they are correct.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Hit START\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Hit the bands on the film in sequence order. If necessary use the uncertainty codes in the program menu. Continue until the sequence is finished.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Hit STOP.\par
10.\tab Define "Name for this reading".\par
11.\tab Accept "Read another sequence". Otherwise the program will stop.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Sequence input from the Pharmacia A.L.F.\par
\pard\plain \s4\qj\sa120\sl280 \f20 After processing and base calling on the PC the data for all 10 clones is contained in a single f
ile, and the user names each using local conventions. Then this single file is transfered to the SUN using PC-NFS. This program allows SUN directories to be mounted as if they were DOS disks and data can be transfered by use of the DOS copy command. On th
e SUN, to prepare for processing by program XBAP the 10 clones are split into 10 separate files each with the names given on the PC. In addition a file of file names is written Then the reads for the individual clones need to be examined to clip off the v
ector sequence and the poor data at the 5' end. See note 2.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Sequence input from the ABI 373A.\par
\pard\plain \s4\qj\sa120\sl280 \f20 After processing and base calling on the Macintosh the data for each clone is contained in 2 files\:
one is simply the sequence but the main file contains the raw data, trace data and sequence. For our processing we do not use the sequence file as we can ex
tract all we need from the main file. The user names each file using local conventions and then the folder is transfered to the SUN using TOPS. This program
allows SUN directories to be mounted as if they were on the Macintosh and data can be transfered by simply dragging folders on the Macintosh screen. On the SUN, to prepare for processing by program XBAP, a file of file names is written and the reads for t
he individual clones are examined to clip off the vector sequence and the poor data at the 5' end. See note 2.\par
\pard\plain \s6\fi-560\li560\sb240\sa120\sl280\tx560\tx980 \b\f20 2.5\tab Editing a nucleic acid sequence using restriction sites and a translation and base numbering as landmarks.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select NIP.\par
2.\tab Read in the sequence to be edited.\par
3.\tab Direct output to disk, say creating file edit.seq.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Use the restriction enzyme site search routine (See the relevant chapter) to create a file showing "Names above the sequence", as in figure 3.1.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Close the redirection file.\par
6.\tab Select "Edit the sequence". \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Name of file to edit". This is the file containing the sequence listing, say edit.seq.The sytem editor will start up.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Edit the sequence.\par
9.\tab Exit from the editor.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Accept "Make edited sequence active". The edited sequence will replace the original sequence. \par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Searching the freetext (or author) index of a sequence library\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Sequence library". The alternative is "Personal file", and if taken would be followed by questions about which of the formats "Staden, EMBL, GenBank, PIR, GCG or FASTA" it was stored in.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select, say, "EMBL nucleotide library".\par
4.\tab Select "Search text index for keywords".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Keywords". Type up to 5 keywords separated by spaces - i.e.space is the delimiting character (see note below about author searches).\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab
The search will start and for each match the program will display the contents of the matching line which includes the entry name, primary accession number, its length and a 80 character description. After every 20 matches the program will ring the bel
l and the user can escape by typing "!".\par
\tab The commands for searching the author index are effectively the same. Note that for authors it is useful to be able to link words together for names s
uch as De Gaule or von Meyenberg. The symbol underscore (_) can be used for this purpose - e.g. De_Gaule or von_meyenberg. The same facility is available for the keyword searches.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.7\tab Using accession numbers to retrieve data from a sequence library\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par
2.\tab Select "Sequence library".\par
3.\tab Select, say, "EMBL nucleotide library".\par
4.\tab Select "Get entry names from accession numbers".\par
5.\tab Define "Accession number". \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab The program will display the entry names corresponding to the accession number. The last entry name found will become the default entry name.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.8\tab Displaying the annotations for an entry in a sequence library\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par
2.\tab Select "Sequence library".\par
3.\tab select, say, "EMBL nucleotide library".\par
4.\tab Select "Get annotations".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Entry name". The program will display the annotation for the entry. After every 20 lines the program will ring the bell and the user can escape by typing "!".\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.9\tab Reading a sequence from a sequence library\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par
2.\tab Select "Sequence library".\par
3.\tab Select, say, "EMBL nucleotide library".\par
4.\tab Select "Get a sequence".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Entry name". The program will make the sequence the active sequence and display its base composition.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.10\tab Worked example of sequence library access\par
\pard\plain \s4\qj\sa120\sl280 \f20
The worked example in figure 3.2 shows a search of the text index for the keywords p53 and mouse, followed by a search of the author index for the names sanger and coulson, followed by search on accession number v00636, followed by "Get annotatio
ns" for entry lambda, and finally "Get a sequence" for entry lambda. \par
\pard\plain \sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 {\f22\fs18 Select sequence source\par
}\pard \sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 X 1 Personal file\par
2 Sequence library\par
? Selection (1-2) (1) =2\par
Select a library\par
X 1 EMBL 29 nucleotide library Dec 91\par
2 SWISSPROT 20 protein library Nov 91\par
3 PIR 31 protein library Dec 91\par
4 NRL3D 58 From Brookhaven protein library Dec 91\par
5 GenBank example\par
? Selection (1-5) (1) =\par
Library is in EMBL format with indexes\par
Select a task\par
X 1 Get a sequence\par
2 Get annotations\par
3 Get entry names from accession numbers\par
4 Search author index\par
5 Search text index for keywords\par
? Selection (1-5) (1) =5\par
Search for keywords\par
? Keywords=p53 mouse\par
P53 hits 73\par
MOUSE hits 10140\par
\'00\par
MMANT01 X00875 536 Murine gene fragment for cellular tumour antigen\par
MMANT02 X00876 83 Murine gene fragment for cellular tumour antigen\par
MMANT03 X00877 21 Murine gene fragment for cellular tumour antigen\par
MMANT04 X00878 261 Murine gene fragment for cellular tumour antigen\par
MMANT05 X00879 184 Murine gene fragment for cellular tumour antigen\par
MMANT06 X00880 113 Murine gene fragment for cellular tumour antigen\par
MMANT07 X00881 110 Murine gene fragment for cellular tumour antigen\par
MMANT08 X00882 137 Murine gene fragment for cellular tumour antigen\par
}\pard \sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 MMANT09 X00883 74 Murine gene fragment for cellular tumour antigen\par
MMANT10 X00884 107 Murine gene for cellular tumour antigen p53 (exon\par
MMANT11 X00885 562 Murine p53 gene 3' region with exon 11\par
MMANTP53 M26862 536 Mouse tumor antigen p53 gene, 5' end.\par
MMLYN M64608 2044 Mouse lyn protein mRNA, complete cds.\par
MMP53 X00741 1377 Mouse mRNA for transformation associated protein\par
MMP53A M13872 1285 Mouse p53 mRNA, complete cds, clone pcD53.\par
MMP53B M13873 1241 Mouse p53 mRNA, complete cds, clone p53-m11.\par
MMP53C M13874 1322 Mouse p53 mRNA, complete cds, clone p53-m8.\par
MMP53G1 X01235 554 Mouse genomic DNA for 5' region of cellular tumou\par
MMP53IN4 X60470 729 M.musculus p53 gene for p53 protein, intron 4\par
\'00\par
MMP53P X01236 2132 Mouse pseudogene for cellular tumour antigen p53\par
MMP53R X01237 1773 Mouse mRNA for cellular tumour antigen p53\par
MMRSB2P5 M64597 196 Mouse B2 repeat in the 3' flank of protein 53 (p5\par
MMSFFV1 X64656 165 M.musculus Friend spleen focus forming virus (SFF\par
MMSFFV2 X64657 142 M.musculus Friend spleen focus forming virus (SFF\par
24 different entries found\par
\'00\par
Select a task\par
X 1 Get a sequence\par
2 Get annotations\par
3 Get entry names from accession numbers\par
4 Search author index\par
5 Search text index for keywords\par
? Selection (1-5) (1) =4\par
Search for keywords\par
? Keywords=coulson sanger\par
COULSON hits 935\par
SANGER hits 15\par
\'00\par
LAMBDA V00636 48502 Genome of the bacteriophage lambda (Styloviridae)\par
MIBTXX V00654 16338 Complete bovine mitochondrial genome.\par
MIHSCG J01415 16569 Human mitochondrion, complete genome.\par
MIHSM1 M10546 2771 Human mitochondrial DNA, fragment M1, encoding tr\par
MIHSXX V00662 16569 H.sapiens mitochondrial genome\par
MIPX1C01 M10860 130 Bacteriophage phi-X174, nucleotides 3920-4049.\par
MIPX1C02 M10861 115 Bacteriophage phi-X174, nucleotides 3480-3595.\par
MIPX1C03 M10862 121 Bacteriophage phi-X174, nucleotides 4260-4380.\par
MIPX1CTI M10849 130 Bacteriophage phi-X174, nucleotides 3389-3520.\par
PHIX174 V01128 5386 Bacteriophage phi-X174 (cs70 mutation) complete g\par
R17CPRAA M24826 61 Bacteriophage R17 coat protein RNA fragment.\par
11 different entries found\par
\'00\par
Select a task\par
X 1 Get a sequence\par
2 Get annotations\par
3 Get entry names from accession numbers\par
4 Search author index\par
5 Search text index for keywords\par
? Selection (1-5) (1) =3\par
? Accession number=v00636\par
Entry name LAMBDA\par
Select a task\par
X 1 Get a sequence\par
2 Get annotations\par
3 Get entry names from accession numbers\par
4 Search author index\par
5 Search text index for keywords\par
? Selection (1-5) (1) =2\par
Default Entry name=LAMBDA\par
? Entry name=\par
ID LAMBDA standard; DNA; PHG; 48502 BP.\par
}\pard \sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 XX\par
AC V00636; J02459; M17233; X00906;\par
XX\par
DT 09-JUN-1982 (Rel. 01, Created)\par
DT 03-JUL-1991 (Rel. 28, Last updated, Version 3)\par
XX\par
DE Genome of the bacteriophage lambda (Styloviridae).\par
XX\par
KW circular; coat protein; DNA binding protein; genome;\par
KW origin of replication.\par
XX\par
OS Bacteriophage lambda\par
OC Viridae; ds-DNA nonenveloped viruses; Siphoviridae.\par
XX\par
RN [1]\par
RP 1-48502\par
RA Sanger F., Coulson A.R., Hong G.F., Hill D.F., Petersen G.B.;\par
RT "Nucleotide sequence of bacteriophage lambda DNA";\par
RL J. Mol. Biol. 162\:729-773(1982).\par
XX\par
\'00\par
Select a task\par
X 1 Get a sequence\par
2 Get annotations\par
3 Get entry names from accession numbers\par
4 Search author index\par
5 Search text index for keywords\par
? Selection (1-5) (1) =\par
Default Entry name=LAMBDA\par
? Entry name=\par
DE Genome of the bacteriophage lambda (Styloviridae).\par
Sequence length 48502\par
Sequence composition\par
T C A G -\par
11988. 11360. 12336. 12818. 0.\par
}\pard \sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 24.7% 23.4% 25.4% 26.4% 0.0%\par
}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 3.2\tab A worked example of sequence library use.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab NOTES\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
The program menu for GIP is simply a set of boxes drawn on the digitizing surface that each contain a command or uncertainty code. Right handed users will find it is best to position the menu to the right of the digitizing
area, but in practice as long as its top edge is parallel to the digitizer box, it can be put anywhere in the active region. As well as the codes a,c,g,t,1,2,3,4,b,d,h,v,r,y,x,-,5,6,7,8 the following commands are included in the menu\:
DELETE removes the la
st character from the sequence; RESET allows the lane centres to be redefined; START means begin the next stage of the procedure; STOP means stop the current stage in the procedure; CONFIRM means confirm that the last command or set of coordinates are corr
ect. \par
\tab
The digitizing device also has a menu of its own. This lies in a two inch wide strip immediately in front of the digitizing box. Pen positions within this two inch strip are interpretted as commands to the digitizer and are not sent to the GIP program. In
general the only time users will need to use the device menu is when they tell GIP where the program menu lies in the digitizing area. This is done by first hitting ORIGIN in the device menu and then hitting the bottom left hand corner of the progra
m menu. If the bell does not sound after hitting START try hitting METRIC in the device menu (the program uses metric units, and some digitizers are set to default to use inches; hitting metric switches between the two).\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab
The user should try to hit the bands as near as possible to the centre of the lanes because the program tracks the lanes up the film using the pen positions. If the lane centres get too close the program stops responding to the pen positions of bands and
hence does not ring the bell. If t
his occurs users must hit the reset box in the menu and the program will request them to redefine the lane centres at the current reading position. Then they can continue reading. As a further safeguard the program will only respond to pen positions either
in the menu or very close to the current reading position.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Details about preparing the data from fluorescent sequencing machines for processing by XBAP are contained in the notes for the chapter on managing sequencing projects. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab All of the operation
s described for the EMBL nucleotide library can be performed in exactly the same way for GenBank and the SWISSPROT and PIR protein libraries. For keyword searching the freetext index is most useful because it contains all words in feature tables, definiti
on lines, title lines, keywords and comment lines. The searches are very fast. The search will find all words that start with the given keywords\:
e.g. keyword sugar will match with sugar, sugaractivating, sugars, etc. When several keywords are used together, only entries indexed on all the words will be reported. On the VAX, EMBL, GenBank, SWISSPROT and PIR can all be processed. \par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1984. A computer program to enter DNA gel reading data into a computer. {\i Nucl. Acids Res}. {\b 12}, 499-503.\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 4. Managing Sequencing Projects\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Starting a project database\par
2.2\tab Screening against restriction enzyme recognition sequences\par
2.3 \tab Screening against vector sequences\par
2.4 \tab Entering readings in to the project database (assembly)\par
2.5\tab Searching for internal joins\par
2.6\tab Editing in XBAP\par
2.7\tab Joining contigs interactively in XBAP\par
2.8\tab Selecting primers and templates\par
2.9\tab Examining the quality of a consensus\par
2.10\tab Using graphical displays to examine contigs\par
2.11\tab Disassembling contigs\par
2.12\tab Shuffling pads\par
2.13\tab Displaying a contig\par
2.14\tab Highlighting differences between readings and the consensus\par
2.15\tab Screen editing contigs in SAP\par
2.16\tab Automatic editing in SAP\par
2.17\tab Using the original editor in SAP\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
4.\tab References\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20
Data input, assembly, checking and editing are the major tasks of sequence project management. Data input is described in a previous chapter and here we cover everything else. The programs can deal with data derived from autoradiographs and from automated
gel reading machines such as the Applied Biosystems 373A and the Pharmacia A.L.F. and film readers such as the Amersham scanner \par
\pard \s4\qj\sa120\sl280 We describe two alternative programs for managing sequencing projects. They contain the same assembly and vector screen
ing routines but they differ in their editing methods. One program SAP (see references 1 and 2) can be operated from simple terminals and emulators but the other XBAP (3) requires an X terminal or emulator. XBAP contains a superior editor plus the facility
to annotate sequences and display the coloured traces for data derived from fluorescent sequencing machines. Those using autoradiographs will find that SAP is adequate but XBAP is essential for users of fluorescent sequencing machines. Readers should note
that several of the methods for displaying contigs described below are probably of value only to those unable to use the screen based contig editor in XBAP.\par
\pard \s4\qj\sa120\sl280
Fluorescent sequencing machines provide machine readable data. This means, given appropriate software, that while making editing decisions the user can see, displayed on the screen, the coloured traces used to derive the sequence. However data from these
machines requires some extra processing. First the machines tend to produce long sequences with po
or quality at their 3' ends and so we have to decide how much of the data to use. Secondly the sequencing machine does not recognise the primer region (as the user would) so we need to have some way of removing it from the data. The poor quality data from
both ends of the sequence and the vector sequences are identified non-interactively by programs clip-seqs and vep. Alternatively these tasks can be performed interactively using program TED (4). We term the data from the 3' end of a reading that is not emp
loyed in the assembly process "unused" sequence. Note that we do not lose this data but simply ignore it until such time as it can be useful for locating joins between contigs, or for double stranding regions of the sequence.\par
\pard \s4\qj\sa120\sl280
The method described here uses a database to store all the data for each sequencing project. The individual sequence readings derived from autoradiographs or from sequencing machines are initially stored in separate files but the program copies them into t
he database during the assembly process. For normal operation the program handles batches of readings - say 24 from a film or machine run. Batch processing is achieved by use of files of file names. \par
\pard \s4\qj\sa120\sl280 Depending on the strategy employed and the stage of the project the following operations may be performed.\par
\pard\plain \s7\qj\fi-560\li560\sb100\sa120\sl280\tx560 \f20 1)\tab Start a project database.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2)\tab Select primers and templates.\par
3)\tab Obtain readings.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4)\tab Put individual readings into the computer and write a file of file names. For data derived from fluorescent sequencing machines choose which data from
the 3' end of the reading should not be used for the assembly process.\par
5)\tab Screen the batch against any vectors that may be present, excising any vector sequence found and passing to the next step, the names of those readings that contain some non-vector sequence.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6)\tab Screen the batch against any restriction sites whose presence would indicate a problem, passing those that do not match on to the next step.\par
7)\tab Compare each reading in the batch with the current contents of the project database adding them to the contigs they overlap, joining contigs or starting new contigs.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8)\tab
Check the number of contigs and the quality of the consensus sequence and plan further experiments. Try to join contigs by searching for overlaps between their ends. (This is particularly useful for those using data from fluorescent sequencing machines,
where although the 3' end of the sequence is not good enough for automatic assembly, it can be valuable for finding overlaps between contigs).\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9)\tab Edit the contigs to resolve dissagreements.\par
10)\tab Produce a consensus sequence.\par
11)\tab Analyse the consensus sequence, possibly discovering further errors.\par
\pard\plain \s4\qj\sa120\sl280 \f20
Subsets of these operations will be cycled through repeatedly. A pure shotgun strategy would continue using steps 3-7, a pure primer walking strategy would also include step 2. A number of the steps require almost no user intervention, however checking qua
lity and final editing decisions are still interactive procedures. The program contains several options, such as displays of the overlapping reading
s in a contig, to help indicate, not only the poorly determined regions, but also which clones could be resequenced to resolve ambiguities, or those which can usefully be extended or sequenced in the reverse direction, to cover difficult regions. It is bes
t to use a command procedure or script for handling steps 5-7.\par
\pard \s4\qj\sa120\sl280 For our projects we have a script which users employ by typing "assemble filename", where filename is the file of file names for the current batch of readings. This script calls all the necessa
ry options in SAP or BAP (see notes) in order to make a backup of the database, screen against any vectors, assemble readings and print a report. In the text below we describe how these operations are performed interactively. \par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Starting a project database\par
\pard\plain \s4\qj\sa120\sl280 \f20 The assembled data for each project is stored in a database. At the beginning of a project it is necessary to create an empty database using program SAP or XBAP.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Open database"\par
2.\tab Select "Start new database"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define the database name. Database names can have from one to 12 letters and must not include full stop (.). \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Database is for DNA"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
Define "Database size". This is an initial size and if necessary can be increased later using "Copy database". Roughly speaking it is the number of readings expected to be needed to complete the project. Currently BAP limits the maximum to 8000 and SAP
has a limit of 1000.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Maximum reading length". This is the length of the longest reading that will be added to the database. The minimum is 512 bases, and the maximum 4096.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The program should confirm that "copy 0" of the database has been started. See Note 14 for important information.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Screening against restriction enzyme recognition sequences\par
\pard\plain \s4\qj\sa120\sl280 \f20
For some strategies it is necessary to compare readings against any restriction enzyme recognition sequences that may have been used during cloning and which should not be present in the data. The function operates on single readings or processes batches a
ccessed through files of file names. The algorithm looks for exact matches to recognition sequences. The recognition sequences should be stored in a simple text file with one recognition sequence per record.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Accept "Use file of filenames".\par
2.\tab Define "File of gel reading names". The input file of file names.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
Define "File for names of sequences that pass". A file of file names for those readings that do not contain the recognition sequences. After the run it will contain the names of all the files in the batch that do not match any
of the restriction enzyme recognition sequences. Hence it can be used for further processing of the batch.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "File name of recognition sequences". The name of the file of recognition sequences.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Screening against vector sequences\par
\pard\plain \s4\qj\sa120\sl280 \f20
For most strategies it is necessary to compare readings against any vector sequences that may have been picked up during cloning. The package contains two routines for screening against vectors. The original function simply reports any matches between the
readings and t
he vector sequences and only passes on those that do not match. This function should now only be used to screen for any other sequences that should be excluded from the database, because the newer one (program name VEP for vector excising program) is capab
le of both finding the vector sequences and editing them out automatically. \par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.3.1\tab Clipping off vector sequences\par
\pard\plain \s4\qj\sa120\sl280 \f20 There are two types of vector that may need to be screened out of gel readings\: the sequencing vector and, for cases where, say, whole cosmids
have been shotgunned, the cloning vector. The two tasks are different. When screening out the sequencing vector we may expect to find data to exclude, both from the primer region and from the other side of the cloning site (when, for example, the insert i
s short). When screening out cosmid vector we may find that either the 5' end, or the 3' end, or the whole of the sequence is vector. Also for the cosmid search we need to compare both strands of the sequence. The program (VEP) works slightly differently f
or each of the two cases. Having read the vector sequence from a file the program asks for the "Position of the cloning site". A value of zero signifies that the search will be for the cosmid vector. A nonzero value signifies that the search is for the seq
uencing vector, and so in this case the program then asks for the "Relative position of the primer site". A negative relative position signifies that a reverse primer is being used, otherwise a forward primer is assumed.\par
\pard \s4\qj\sa120\sl280 The program screens a batch of read
ings using a file of file names and creates a new file of file names which contains the names of all those sequences that include some nonvector sequence. For each sequence that contains some vector it writes out a new copy of the file in which the vector
portion is identified.\par
\pard \s4\qj\sa120\sl280
The search, which uses a hashing algorithm, is very rapid. Users specify a "Word length", the "Number of diagonals to combine" and a "Minimum score". The word length is the minimum number of consecutive bases that will count as a mat
ch. The algorithm treats the problem like a dot matrix comparison and finds the diagonal with the highest score. Then it adds the scores for the adjacent "Minimum number of diagonals to combine". If the combined score is at least "Minimim score" the sequen
ce is marked to indicate that it contains vector. The score represents the proportion of a diagonal that contains matching words, so the maximum score for any diagonal is 1.0.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Define "Input file of file names". This is the file containing the names of all the readings to be screened.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "File name of vector sequence". \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
Define "Position of cloning site". This is the base number, relative to the beginning of the vector sequence, that is on the 3' side of the insert site. For example for m13mp18 the SmaI site is at 6249. A zero value signifies that the search is for cosm
id vector.\par
4.\tab Define "Relative position of 3' end of primer site". This is the position, relative to the cloning site, of the first base that could be included in the sequence. For m13mp18, the 17mer Sequencing Primer and the SmaI site, the position is 41.
\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Word length". Only words of this length will be counted as matches.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Number of diagonals to combine". The scores for this number of diagonals around the highest scoring diagonal will be combined to give the total score.\par
7. \tab Define "Cutoff score". For a match, at least this proportion of the total length of the summed diagonals must contain identical words. \par
8.\tab Define "Output file of passed file names". The name of the file to contain the names of the readings to pass on to the assembly program.\par
\pard\plain \s4\qj\sa120\sl280 \f20 Processing will commence and finishes with a summary stating the number of files processed, the number completely vector, the number partly vector and the number free of vector.\par
\pard\plain \s9\fi-560\li860\sb160\sa60\sl280\tx1140 \b\f20 2.3.2\tab Screening for "vectors"\par
\pard\plain \s4\qj\sa120\sl280 \f20 This function is contained in both SAP and XBAP and operates on single readings or processes batches accessed through files of file names. The algorithm looks for exact matches of length "minimum match length" and disp
lays the overlapping sequences.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Accept "Use file of filenames".\par
2.\tab Define "File of gel reading names". The input file of file names.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
Define "File for names of sequences that pass". A file of file names for those readings that do not contain the vector sequence. After the run it will contain the names of all the files in the batch that do not match the vector sequence. Hence it can be
used for further processing of the batch.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "File name of vector sequence". The name of the file containing the vector sequence.\par
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Entering readings into the project database (Assembly)\par
\pard\plain \s4\qj\sa120\sl280 \f20
Readings are entered into the database using the auto assemble function. This function compares each reading and its complement with a consensus of all the readings already stored in the database. If it finds any overlaps it aligns the overlapping sequence
s by inserting padding characters, and then adds the new reading to the database. Readings that overlap are added to existing contigs and readings that do not overlap any data in
the database start new contigs. If a new reading overlaps two contigs they are joined. Any readings that appear to overlap but which cannot be aligned sufficiently well are not entered and have their names written to a file of failed gel reading names. Not
e that it is possible that a reading may align well with two contigs (indicating a possible join) but that after it has been added to one of the contigs, the two contigs do not align sufficiently well. In this case, although the reading has been entered in
to the database its name will also be added to the file of failed readings. Alignments using more than the maximum number of paddings characters, or exceeding the maximum mismatch may be displayed, but the readings will not be entered into the database. It
is advisable to set the consensus cutoff to 51% before running the assembly routine as this will improve the alignments. A typical run of the assembly routine is shown in figure 4.1.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Accept "Permit entry"\par
2.\tab Accept "Use file of file names"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "File of gel reading names". The name of the input file of file names, probably passed on from "Screen against vector".\par
4.\tab Define "File for names of failures". A file to contain the names of the readings that the program fails to enter, or for which joins are not made.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Perform normal shotgun assembly"\par
6.\tab Accept "Permit joins"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Minimum initial match". Only possible overlaps containing exact matches of at least this number of consecutive identical characters will be considered for alignment.\par
8.\tab Define "Maximum number of pads per reading" This is the maximum number of padding characters permitted in any new reading during the alignment procedure\par
9.\tab Define "Maximum number of pads per reading in contig" This is the maximum number of padding characters permitted in the contig in order to align any new reading.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define "Maximum percent mismatch after alignment"\par
\pard\plain \li560\ri500\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Automatic sequence assembler\par
\pard \li560\ri500\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Database is logically consistent\par
? (y/n) (y) Permit entry\par
? (y/n) (y) Use file of file names\par
? File of gel reading names=demo.nam\par
? File for names of failures=demo.fail\par
Select entry mode\par
X 1 Perform normal shotgun assembly\par
2 Put all sequences in one contig\par
3 Put all sequences in new contigs\par
? Selection (1-3) (1) =\par
? (y/n) (y) Permit joins\par
? Minimum initial match (12-4097) (15) =\par
? Maximum pads per gel (0-25) (8) =\par
? Maximum pads per gel in contig (0-25) (8) =\par
? Maximum percent mismatch after alignment (0.00-15.00) (8.00) =\par
\par
Results skipped to save space\par
\par
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\par
Processing 4 in batch\par
Gel reading name=hinw.009 \par
Gel reading length= 292\par
Working\par
Contig 1 position 263 matches strand 1 at position 14\par
Contig 2 position 1 matches strand 1 at position 156\par
\pard \li560\ri500\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Total matches found 2\par
Trying to align with contig 1\par
Padding in contig= 1 and in gel= 0\par
Percentage mismatch after alignment = 2.9\par
Best alignment found\par
251 261 271 281\par
aattacagcg tt,cctattg acgggcgcat ccac\par
********** ** ** **** ********** ****\par
aattacagcg ttcccvattg acgggcgcat ccac\par
1 11 21 31\par
Trying to align with contig 2\par
Padding in contig= 0 and in gel= 2\par
Percentage mismatch after alignment = 1.4\par
Best alignment found\par
1 11 21 31 41 51\par
tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par
********** ********** ********** ********** ********** **********\par
tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par
156 166 176 186 196 206\par
61 71 81 91 101 111\par
tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccggcagc gcccacactg\par
********** ********** ********** ********** ***** ** * **********\par
tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccg,ca,c gcccacactg\par
216 226 236 246 256 266\par
121 131\par
ctcagacgac ggtcgctgc\par
********** *********\par
ctcagacgac ggtcgctgc\par
276 286\par
Overlap between contigs 2 and 1\par
Length of overlap between the contigs= -122\par
Entering the new gel reading into contig 1\par
This gel reading has been given the number 4\par
Working\par
Trying to align the two contigs\par
Padding in contig= 2 and in gel= 0\par
Percentage mismatch after alignment = 1.5\par
Best alignment found\par
406 416 426 436 446 456\par
tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par
********** ********** ********** ********** ********** **********\par
tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par
1 11 21 31 41 51\par
466 476 486 496 506 516\par
tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccg,ca,c gcccacactg\par
********** ********** ********** ********** ***** ** * **********\par
tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccggcagc gcccacactg\par
61 71 81 91 101 111\par
526 536\par
ctcagacgac ggtcgct\par
********** *******\par
ctcagacgac ggtcgct\par
121 131\par
Editing contig 1\par
\pard \li560\ri500\sa100\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Completing the join between contigs 1 and 2\par
(Results for other readings skipped to save space)\par
\pard \li560\ri500\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Batch finished\par
9 sequences processed\par
9 sequences entered into database\par
\pard \li560\ri500\sa100\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth 2 joins made\par
\pard\plain \s8\qj\fi-1140\li1140\sb60\sa120\sl240\tx1140 \f21\fs20 Figure 4.1\tab Part of a typical run of "Auto assemble".\par
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Searching for internal joins \par
\pard\plain \s4\qj\sa120\sl280 \f20
The purpose of this function is to use data already in the database to find possible joins between contigs. Although most joins will be made automatically during assembly, due to poor alignments, some may not have been done. The function is particularly us
eful for sequences from fluorescent sequencing machines because it may be possible to find potential joins within the unused data from the 3' ends of readings. For each potential
join found, when the X version is used, the contig joining editor is automatically called up with the two contigs aligned in the edit windows.\par
\pard \s4\qj\sa120\sl280
The program strategy is as follows. Take the first contig and calculate its consensus. If unused data is being employed, examine all readings that are in the complementary orientation, and sufficiently near to the contigs left end, to see if they have suff
iciently good unused sequence which, if present, would protrude from the left end of the contig. If found add th
e longest such sequence to the left end of the consensus. Do the same for the right end by examining readings that are in their original orientation. Repeat the consensus calculations and extensions for all contigs hence producing an extended consensus for
the whole database. If unused data is not being employed simply calculate the consensus for the whole database. Now look for possible joins by processing the extended consensus in the following way. Take the last, say 500, bases (termed the "probe length"
by the program) of the rightmost consensus, compare it in both orientations with the extended consensus of all the other contigs. Display any sufficiently good alignments. Repeat with the left end of the rightmost contig. Do the same for the ends of all t
he contigs, always comparing only with the contigs to their left, so that the same matches do not appear twice. \par
\pard \s4\qj\sa120\sl280 Good unused data is defined by sliding a window of "Window size for good data scan" bases outwards along the sequence and stopping when greater
than "Maximum number of dashes in scan window" appear in the window. Note that it is advisable to have some sort of cutoff because if we simply take all the data it might be of such poor quality that we wont find any good matches. An initial run employing
no unused data is also recommended. Sufficiently good alignments are defined by criteria equivalent to those used in auto assemble, however here we only display alignments that pass all tests.\par
\pard \s4\qj\sa120\sl280 All numbering is relative to base number one in the contig\: ma
tches to the left (i.e. in the unused data) have negative positions, matches off the right end of the contig (i.e. in the unused data) have positions greater than the contig length. The convention for reporting the orientations of overlaps is as follows\:
i
f neither contig needs to be complemented the positions are as shown. If the program says "contig x in the - sense" then the positions shown assume contig x has been complemented. For example in the results given in figure 4.2 the positions for the first o
verlap are as reported, but those for the second assume that the contig in the minus sense (i.e. 443) has been complemented.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find internal joins".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Minimum initial match". Only matches containing this number of consecutive identical characters will be found.\par
3.\tab Define "Maximum pads per sequence". Only alignments containing less than or equal this number of padding characters in each sequence will be found.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Maximum percent mismatch after alignment". Only alignments with at lea
st this level is similarity will be found. Particularly when poor data from the 3' ends of sequences derived from fluorescent sequencing machines is used, it is important to allow for a high degree of mismatch - say around 75%.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Probe length". This is the size of sequence from each end of each contig, that is compared with the total length of all other contigs.\par
6.\tab Accept "Employ unused data". This means, where available, add the unused data from the 3' ends of sequences, to the ends of the contigs.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab
Define "Window size for good data scan". To decide how much of the unused data should be added to the end of a contig the program scans outwards, counting the numbers of dashes (-) over a window of the size defined here.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Number of dashes in scan window". If the program finds this many dashes in the scan window it will add no more of the unused data to the end of the contig.\par
\pard\plain \qj\li680\ri780\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Possible join between contig 445 in the + sense and contig 405\par
\pard \li680\ri780\sl220\box\brsp100\brdrth Percentage mismatch after alignment = 4.9\par
412 422 432 442 452 462\par
405 TTTCCCGACT GGAAAGCGGG CAGTGAGCGC AACGCAATTA ATGTGAG,TT AGCTCACTCA\par
********* * ******** ***** *** ********** ********** **********\par
445 -TTCCCGACT G,AAAGCGGG TAGTGA,CGC AACGCAATTA ATGTGAG-TT AGCTCACTCA\par
-127 -117 -107 -97 -87 -77\par
472 482 492 502 512\par
405 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT\par
********** ********** ********** ********** **\par
445 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT\par
-67 -57 -47 -37 -27\par
Possible join between contig 443 in the - sense and contig 423\par
Percentage mismatch after alignment = 10.4\par
64 74 84 94 104 114\par
423 ATCGAAGAAA GAAAAGGAGG AGAAGATGAT TTTAAAAATG AAACG-CGAT GTCAGATGGG\par
**** ***** ********** ********** ****** ** ***** **** *********\par
443 ATCG,AGAAA GAAAAGGAGG AGAAGATGAT TTTAAA,,TG AAACGACGAT GTCAGATGG,\par
3610 3620 3630 3640 3650 3660\par
124 134 144 154 164\par
423 TTG-ATGAAG TAGAAGTAGG AG-AGGTGGA AGAGAAGAGA GTGGGA\par
*** ****** ********** ** ******* *** ***** ** **\par
443 TTGGATGAAG TAGAAGTAGG AGGAGGTGGA ,GAG,AGAGA GTTGG-\par
\pard \li680\ri780\sl220\keepn\box\brsp100\brdrth 3670 3680 3690 3700 3710\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.2\tab Typical output from "Find internal joins".\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Editing in XBAP\par
\pard\plain \s4\qj\sa120\sl280 \f20 The XBAP editor is mouse-driven and can insert, delete and change readings in contigs. It has facilities to display the traces for data from fluorescent sequenci
ng machines and for annotation of readings. In addition it allows the poor quality data from the ends of readings to be viewed and, if required, added to the sequences. \par
\pard \s4\qj\sa120\sl280
A typical view of the editor is shown in figure 4.3. This includes the edit window showing an 80 character section of a contig, (position 3899 to 3978). Each reading is numbered and named in the left hand panel, minus signs indicating those in their revers
e orientation. Underneath is their consensus. Some of the sequence letters are lighter
than the majority showing that they are "unused". One segment (3933 to 3949) is shaded which signifies that it has been annotated. The editing cursor is at position 3921. Above this window are the main buttons the user employs to direct the editing proces
s. Below the edit window is a panel showing the traces for readings 37 and 123. Notice they are centred on the cursor position. Here the traces are shown in four different line styles, but on a colour screen they each have different colours. In the bottom
of the figure is the search window. These features are described in the relevant sections below.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.1\tab Scrolling through the contig\par
\pard\plain \s4\qj\sa120\sl280 \f20 The editor allows scrolling from one end of a contig to the other using the scroll bar and scroll buttons and also the arrow keys.\par
\pard \s4\qj\sa120\sl280 Action of mouse button presses when the mouse pointer is in the scroll bar\:\par
\pard \s4\qj\li1720\sa120\sl280\tx4520 Middle Mouse Button\tab Set editor position\par
Left Mouse Button\tab Scroll forward one screenful\par
Right Mouse Button\tab Scroll backwards one screenful\par
\pard\plain \li80\ri20\sl220\keepn\box\brsp20\brdrth \f4\fs16 {{\pict\macpict\picw444\pich344
82daffffffff015701bb1101a0008201000affffffff015701bb0900000000000000003100000000015601ba98007e00000000030703e900000000030703e900000000015601ba000102830002830002830007000286aa01a00007000186550140000700028600012000070001860001400007000286000120000b02013ff8
8a00030ffe40000d0402200807c18c0003089220000f06012c28040110808e0003089240000f06022648040100808e0003089220001007012348040f31e3968f00030f924000100702220807911084598f000308122000100701258804111084508f00030812400010070224c804111084508f00030ff22000100701286804
111094508f000308024000100702200807cf3863908f0003080220000b02013ff88a00030ffe4000070002860001200007000186000140000700028600012000070001865501400007000286aa01a00002830002830002830002830026e500001ff9ff0087f8ff01f87ff5ff01fe1fefff0087fcff01fe1ff2ff01f87ff2ff
00e0fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd00380200003cfa
000203fc03fa0008630c1800018180001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd006502000066fa0002030003fa000ac30c38000380c0001f807ffbff05841f8000003cfd0002087f87fbff07e01fe7fffffe1f81fcff03f0ffff87fdff0dc3f0ffff84186000060002106180fd
00051f800000600ffe0002084180fb00021fe018fc000020fd006b020000c3fe0008c01800000300030603fd000a01830c7800078060001ff3faff058418c000000cfd0002087f33fbff07e7ffe7fffffe1f9cfcff03fcffff33fdff0d99e67fff84186000060002107180fd000518c000006003fe0002084180fb00021800
18fc000020fd0072020000c0fe0008c01800000300030603fd000a01830cd8000d8060001ff3fcff07f9ff84186000000cfd0002087e79fbff08e7ffe7cfe7fe1f9e7ffdff1ffcfffe79fff9ffff99e67fff8418600006000210718000006000186000006003fe0002084180fb00041800183018fe000020fd0072020000c0
fe0000c0fe00040300030003fd000a0301989800098030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08e7ffe7cfe7fe1f9e7ffdff1ffcfffe7ffff9ffff9fe7ffff8418600006000210798000006000186000006003fe0002084180fb00041800183018fe000020fd00731d0000c00f0dc3f0781f4003003b1e0f
c0f0de000301981800018030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08e7ffe7ffe7fe1f9e7ffdff1ffcfffe7ffff9ffff9fe7ffff8418600006000210798000006000186000006003fe0002084180fb00041800180018fe000020fd007d790000c0198e60c01831c003f06706030198730003019818000180
30001ff3e47c0f8790e07f841861e1b80c0fc1f078087f3f9e647e1e43ffe7fe270f81fe1f9e7879e787c0fcfffe7f9e607e1f9fe7f03f841866e0761e02106d878619f8001866f0786e0301e16c0841801e0fc61878001801d8f07e0786f020fd007d790000c030cc30c01831800300c30603030c60000300f01800018030
001ff3e339e733c679ff8418c331cc0c186318cc087f879e633ccf19ffe07cc7cfe7fe1f9cf339e7339e7cfffe7f9e79fccf9fe7e79f84186730ce3302106d8cc330600018c398cc73030331fe08418033186618cc001f833830180cc39820fd007d790000c030cc30c01831800300c30603030c60000300f0180001803000
1ff3e799fe79cff9ff841f8619860c00660186087ff39e6799e73fffe7f9e7cfe7fe1f81e79cce79fe7cfffe7f9e79f9e60781e7ff8418661986618210679861e060001f83018661830619b6084180618063318600180618301818630020fd007d790000c030cc30c01831800300c30603030c60000180f01800018060001f
f3e79c0e01cff9ff841987f9860c0fe601fe087ff99e6798073fffe7f9e7cfe7fe1f99e01cce01c07cfffe7f9e79f9e79fe7f03f8418661986618210679fe0c0600018030186618307f9b60841807f8fe331fe00180618301818630020fd007d790000c330cc30c0181f000300c30603030c60000180601800018060001ff3
e79fe67fcff9ff8418c601860c18660180087ff99e6799ff3fffe7f9e7cfe7fe1f9ce7fe1e7f9e7cfffe7f9e79f9e79fe7ff9f8418661986618210639800c060001803018661830601b6084180601861e18000180618301818630020fd007d79000066198c30cc18300003006706033198600000c06018070180c0001ff3e7
9fe67fcff9ff8418c601860c18660180087e799e6799ff3fffe7f9e7cfe7fe1f9ce7fe1e7f9e7cfffe799e79f9e79fe7ff9f8418661986618210639801e060001803018661830601b6084180601861e18000180618301818630020fd007d7900003c0f0c3078ff1f8003fc3b3fc1e0f06000006060ff070ff180001ff3e799
e739cff99f84186319cc0c186318c6087f33cc633ce73fffe7fcc7cfe67e1f9e739f3f399e7cffff33cc799ccf9fe7e79f840cc618ce330210618c63306600180300cc73030319b6084180319860c0c60018033830198cc30020fd0068f9000130c0ef005d1f80679c0f83cffc3f841861f1b87f8fa1f07c087f87e2647e0f
3fffe01e2601f0fe1f9e783f3f83c1601fff87e27c3e1f9fe7f03f84078618761e02106187c6183c00180300786e1fe1f1b60841fe1f0fa0c07c001fe1d9fe0f07830020fd0032f9000130c0ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc00010210f200010840f2000020fd0032f9000130c0
ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc00010210f200010840f2000020fd0032f900011f80ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc00010210f200010840f2000020fd002de500001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc
00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2
000020fd0026e500001ff9ff0087f8ff01f87ff5ff01fe1fefff0087fcff01fe1ff2ff01f87ff2ff00e0fd000283000283000283000283000283000283000283000283000901001f88ff00feff001a010010fc000006fe00010180fe000060fc00000c9d000002ff001f010010fc000006fe00010180fe000060fc00000cc2
000102a8de000002ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de000002ff002316001000004010000600004001800200006000100400000cc200010554de000002ff00231600100000c03000060000c001800300
006000180600000cc2000102a8de000002ff0023160010000180600006000180018001800060000c0300000cc200010554de000002ff0023160010000300c00006000300018000c0006000060180000cc2000102a8de000002ff0023160010000601800006000600018000600060000300c0000cc200010554de000002ff00
23160010000c03000006000c0001800030006000018060000cc2000102a8de000002ff00231600100018060000060018000180001800600000c030000cc200010554de000002ff0023160010000c03000006000c0001800030006000018060000cc2000102a8de000002ff0023160010000601800006000600018000600060
000300c0000cc200010554de000002ff0023160010000300c00006000300018000c0006000060180000cc2000102a8de000002ff0023160010000180600006000180018001800060000c0300000cc200010554de000002ff00231600100000c03000060000c001800300006000180600000cc2000102a8de000002ff002316
001000004010000600004001800200006000100400000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de000002ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de0000
02ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de000002ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001a010010fc000006fe00010180fe000060fc00000c9d000002ff0009
01001f88ff00feff000901001f88ff00feff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff004a010010ed00030c0300c0fa00040781e0300cf90004781e
0780c0fa00040781e0780cf90004781e0040c0fa00040781e1fe0cf90004781e0780c0fa00040781e1fe0cf90002781e02ff004a010010ed00030c0781e0fa00040cc330701ef90004cc330cc1e0fa00040cc330cc1ef90004cc3300c1e0fa00040cc331801ef90004cc330cc1e0fa00040cc330061ef90002cc3302ff004e
010010ed00030c0cc330fa0004186618f033fa0005018661986330fa00041866198633fa000501866181c330fa00041866198033fa0005018661986330fa00041866180633fa000301866182ff004e010010ed00030c0cc330fa0004186619b033fa0005018661986330fa00041866198633fa000501866183c330fa000418
66198033fa0005018661980330fa00041866180c33fa000301866182ff004a010010ed00030c186618f900046619306180fa00040661806618f900046618066180fa0004066186c618f900046619806180fa00040661980618f9000466180c6180fa0002066182ff004a010010ed00030c186618f90004c618306180fa0004
0c61806618f90004c6180c6180fa00040c618cc618f90004c619b86180fa00040c619b8618f90004c618186180fa00020c6182ff004e010010ed00030c186618fa0005038338306180fa0004383380c618fa0005038338386180fa0004383398c618fa0005038339cc6180fa000438339cc618fa0005038338186180fa0002
383382ff004a010010ed00030c186618f90004c1d8306180fa00040c1d838618f90004c1d80c6180fa00040c1d98c618f90004c1d8066180fa00040c1d986618f90004c1d8306180fa00020c1d82ff004a010010ed00030c186618f900046018306180fa00040601860618f900046018066180fa000406019fe618f9000460
18066180fa00040601986618f900046018306180fa0002060182ff004e010010ed00030c0cc330fa00041860183033fa00050186018c0330fa00041860198633fa000501860180c330fa00041860180633fa0005018601986330fa00041860186033fa000301860182ff004e010010ed00030c0cc330fa00041866183033fa
0005018661980330fa00041866198633fa000501866180c330fa00041866198633fa0005018661986330fa00041866186033fa000301866182ff004a010010ed00030c0781e0fa00040cc330301ef90004cc331801e0fa00040cc330cc1ef90004cc3300c1e0fa00040cc330cc1ef90004cc330cc1e0fa00040cc330c01ef9
0002cc3302ff004a010010ed00030c0300c0fa00040781e1fe0cf90004781e1fe0c0fa00040781e0780cf90004781e00c0c0fa00040781e0780cf90004781e0780c0fa00040781e0c00cf90002781e02ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d0100
10ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff0076010010fc00041e0003c078f700650c0781e0301e0307f9fe0c0780c0300c0787f9fe0c0781e1fe1e1fe1e0000c0300c0781e0787f8781e0300c1fe1e0300c0781e0780c0301e1fe7f8
301e0300c0787f8781e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff0076010010fc0004330000c0ccf700650c0cc33078330780c0301e0cc1e0781e0cc0c0301e0cc3303033030330001e0781e0cc330cc0c0cc330781e030330781e0cc330cc1e078330300c078330781e0cc0c0c
c3303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff0076010010fc0004618000c186f700650c186618cc618cc0c03033186330cc331860c03033186618306183061800330cc33186619860c186618cc33030618cc3318661986330cc618300c0cc618cc331860c18661830618cc618cc
33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff0076010010fc0004618000c186f700650c180600cc600cc0c03033180330cc331800c030331806003060030600cc330cc33180601800c180600cc33030600cc3318060180330cc600300c0cc600cc331800c18060030600cc600cc33030600cc330303
30300c0cc0c180330cc6018033180600cc3302ff0076010010fc0004018000c006f700650c18060186601860c0306198061986619800c030619806003060030600cc6198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c18061
9866018061980601866182ff007b010010fc0009018000c006000fc1e076fc00650c18060186601860c0306198061986619800c030619806003060030600786198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c18061986601
8061980601866182ff007d010010fe000b01fe030000c00c00186330cefc00650c18067986679860c03061980619866199e0c030619806003067830601fe61986619806799e0c19e6018661830679866199e6018061986678300c18667986619800c180678306018667986618306798661830618300c1860c1806198667980
6199e601866182ff007b010010fc00090e0000c0380018061986fc00650c180619fe619fe0c0307f9807f9fe7f9860c0307f9806003061830600787f9fe7f980619860c186601fe7f830619fe7f986601807f9fe618300c1fe619fe7f9800c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f9866
01fe7f82ff007b010010fc0009180000c060000fc7f986fc00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c1806198661980619866018661
82ff007b010010fc0009300000c0c00000660186fc00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601866182ff00
7b010010fc0009600000c1800000660186fc00650c18661986619860c0306198661986619860c030619866183061830618006198661986619860c186619866183061986619866198661986618300c18661986619860c186618306198661986618306198661830618300c1860c186619866198661986619866182ff007b0100
10fc0009600000c1800e186318cefc00650c0cc33986339860c030618cc61986618ce0c030618cc33030338303300061986618cc338ce0c0ce331866183033986618ce330cc61986338300c18633986618cc0c0cc338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff007b010010fc00
097f8007f9fe0e0fc1f076fc00650c0781e9861e9860c03061878619866187a0c030618781e0301e8301e00061986618781e87a0c07a1e186618301e9866187a1e078619861e8300c1861e986618780c0781e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff0011010010f3000006fc
00000c9d000002ff0011010010f3000006fc00000c9d000002ff0011010010f3000006fc00000c9d000002ff0011010010f3000006fc00000c9d000002ff000d010010ed00000c9d000002ff0013010010ed00000cd7000002ecaadd000002ff0013010010ed00000cd7000001ec55dd000002ff007e010010fd000c787f80
0000041e0001e0000003fe00650c0781e0301e0307f9fe0c0780c0300c0787f9fe0c0781e1fe1e1fe1e0000c0300c0781e0787f8781e0300c3febeabaaeafabeafaaeababebfeffababeabaaeafa7f8781e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff007e010010fd000ccc0180
00000c33000330000007fe00650c0cc33078330780c0301e0cc1e0781e0cc0c0301e0cc3303033030330001e0781e0cc330cc0c0cc330781e1757757d5f5dd775dd5f57d775755d57d7757d5f5dd0c0cc3303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff007f010010fe000d018601
8000001c6180061800000ffe00650c186618cc618cc0c03033186330cc331860c03033186618306183061800330cc33186619860c186618cc332baebaeebbbaeebbaebbaeeebabaaeaeeebaeebbbae0c18661830618cc618cc33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff007f010010fe00020186
03fe00073c6000061800001bfe00650c180600cc600cc0c03033180330cc331800c030331806003060030600cc330cc33180601800c180600cc33175755dd775d5755d5775dd755755d5dd755dd775d50c18060030600cc600cc33030600cc33030330300c0cc0c180330cc6018033180600cc3302ff007e010010fd000106
03fe00076c60000618000013fe00650c18060186601860c0306198061986619800c030619806003060030600cc6198661980601800c1806018661abaeabaeebbaaeabaaebbaeeaabaaebaeeabaeebbaa0c180600306018660186618306018661830618300c1860c180619866018061980601866182ff007e010010fd000c0c
060003f0cc6e07c618003f03fe00650c18060186601860c0306198061986619800c030619806003060030600786198661980601800c1806018661975755d775dd5755d575dd7755755d5d7755d775dd50c180600306018660186618306018661830618300c1860c180619866018061980601866182ff007f120010000007f8
38060006198c730c6338006183fe00650c18067986679860c03061980619866199e0c030619806003067830601fe61986619806799e0c19e6018661abaefbaeebbbeeabaaebbaeefabaaebaeefbaeebbaa0c180678306018667986618306798661830618300c1860c18061986679806199e601866182ff007e010010fd000c
0c0c0000198c619801d8006003fe00650c180619fe619fe0c0307f9807f9fe7f9860c0307f9806003061830600787f9fe7f980619860c186601fe7f97575dff7fdd7755d57fdff75d755d5ff75dff7fdd50c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe7f82ff007e010010fd000c
060c0003f9fe61980018003f03fe00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c1866018661abaebbaeebbaeeabaaebbaeebabaaebaeebbaeebbaa0c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007f010010fe000d
0186180006180c61980018000183fe00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c186601866197575dd775dd7755d575dd775d755d5d775dd775dd50c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007f010010fe00
0d0186180006180c61980618000183fe00650c18661986619860c0306198661986619860c030619866183061830618006198661986619860c1866198661abaebbaeebbaeebbaeebbaeebabaaebaeebbaeebbae0c186618306198661986618306198661830618300c1860c186619866198661986619866182ff007e010010fd
000ccc300006180c330c6330386183fe00650c0cc33986339860c030618cc61986618ce0c030618cc33030338303300061986618cc338ce0c0ce331866197577dd775ddf775dd75dd777d755d5d777dd775ddd0c0cc338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff007e010010fd
007578300003e80c1e07c1e0383f1fe000000c0781e9861e9860c03061878619866187a0c030618781e0301e8301e00061986618781e87a0c07a1e18661ababebaeebafabeafaebbaebeabaaebaebebaeebafa0c0781e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff0013010010ed
00000cd7000001ec55dd000002ff0013010010ed00000cd7000002ecaadd000002ff0013010010ed00000cd7000001ec55dd000002ff0013010010ed00000cd7000002ecaadd000002ff0013010010ed00000cd7000001ec55dd000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff007f010010
fe000dc0781e000001fe0c000010000003fe00650c02808028080aa2a8200a02000020080282a8aa080280a0aa001fe1e0300c0300c0781e0787f8781e0300c1fe1e0300c0781e0780c0301e1fe7f8301e0300c0787f8781e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff007f1200
10000001c0cc33000001801c000030000007fe00650c04414044140100405011050000501404404010140441101000030330781e0781e0cc330cc0c0cc330781e030330781e0cc330cc1e078330300c078330781e0cc0c0cc3303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff007f12
0010000003c18661800001803c00007000000ffe00650c08222082220200808820888000882208208020220822082000030618cc330cc33186619860c186618cc33030618cc3318661986330cc618300c0cc618cc331860c18661830618cc618cc33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff007f
120010000006c18661800001806c0000f000001bfe00650c10011100110100404440044000441110004010111004001000030600cc330cc33180601800c180600cc33030600cc3318060180330cc600300c0cc600cc331800c18060030600cc600cc33030600cc33030330300c0cc0c180330cc6018033180600cc3302ff00
7f120010000004c00601800001804c0001b0000013fe00650c08020880208200808220082000822088008020208802002000030601866198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c180619866018061980601866182ff
007f010010fe000dc006030003f1b80c07c330003f03fe00650c10041100410100410440104001044110004010411004001000030601866198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c180619866018061980601866182
ff007f120010001fe0c00c0e000619cc0c0c6630006183fe00650c08a2088a2082008082200822a8822088a0802020880200202a8306018661986619806799e0c19e6018661830679866199e6018061986678300c18667986619800c180678306018667986618306798661830618300c1860c18061986679806199e6018661
82ff007f010010fe000dc03803000018060c180630006003fe00650c10455104550100415440154001545510404010551004001000030601fe7f9fe7f980619860c186601fe7f830619fe7f986601807f9fe618300c1fe619fe7f9800c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe
7f82ff007f010010fe000dc060018003f8060c1807f8003f03fe00650c08220882208200808220082000822088208020208802002000030601866198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601
866182ff007f010010fe000dc0c061800618060c180030000183fe00650c10441104410100410440104001044110404010411004001000030601866198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c1806198661980619866
01866182ff007f010010fe000dc18061800619860c180030000183fe00650c08220882208200808220882000822088208020208822082000030619866198661986619860c186619866183061986619866198661986618300c18661986619860c186618306198661986618306198661830618300c1860c18661986619866198
6619866182ff007f010010fe000dc18033000618cc0c0c6030386183fe00650c044410444101004104111040010441044040104104411010000303318661986618cc338ce0c0ce331866183033986618ce330cc61986338300c18633986618cc0c0cc338303318633986618303398661830618300c1860c0cc61986338cc61
8ce331866182ff007f7b0010000007f9fe1e0003e8787f87c030383f1fe000000c02a2082a20820080820a082000822082a08020208280a020000301e18661986618781e87a0c07a1e186618301e9866187a1e078619861e8300c1861e986618780c0781e8301e1861e986618301e98661830618300c1860c078619861e878
6187a1e1866182ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff0040010010fe000bc0307f800001fe0c0000c030fe
0002c0000cbf002201e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff0040160010000001c07801800001801c0001c070000001c0000cbf002203303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff0040160010000003c0cc01800001803c0003
c0f0000003c0000cbf0022061830618cc618cc33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff0040160010000006c0cc03000001806c0006c1b0000006c0000cbf0022060030600cc600cc33030600cc33030330300c0cc0c180330cc6018033180600cc3302ff0040160010000004c1860300000180
4c0004c130000004c0000cbf00220600306018660186618306018661830618300c1860c180619866018061980601866182ff0040010010fe0011c186060003f1b80c0fc0c030000fc0c0000cbf00220600306018660186618306018661830618300c1860c180619866018061980601866182ff0040010010fe0011c1860600
0619cc0c1860c030001860c0000cbf00220678306018667986618306798661830618300c1860c18061986679806199e601866182ff0040010010fe0011c1860c000018060c0060c030001800c0000cbf0022061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe7f82ff0040010010fe0011c1
860c0003f8060c0fe0c030000fc0c0000cbf00220618306018661986618306198661830618300c1860c180619866198061986601866182ff0040010010fe0011c0cc18000618060c1860c030000060c0000cbf00220618306018661986618306198661830618300c1860c180619866198061986601866182ff0040010010fe
0011c0cc18000619860c1860c030000060c0000cbf00220618306198661986618306198661830618300c1860c186619866198661986619866182ff0040010010fe0011c07830000618cc0c1860c0300e1860c0000cbf00220338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff004016
0010000007f830300003e8787f8fa7f9fe0e0fc7f8000cbf002201e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c
9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff007e010010fd0001781efe0007041e03c1e0000003fe00650c0200a0aa2a8200a0282a8282a82808020080280a0282a8280a020080aa0a020080280a028080200a0aa2a8200a020080282a8280a0aa0a0200a020080aa0a020080002a820
2a8aa080aa0a020080280a0200a028080200a028000280a0280a0202a8aa0a02ff007e010010fd0001cc33fe00070c33066330000007fe00650c050110100405011044040440404414050140441104404044110501401011050140441104414050110100405011050140440404411010110501105014010110501400004050
0401014010110501404411050110441405011044000441104411050040101102ff007f010010fe000d0186618000001c6186661800000ffe001b0c088208200808820882080820808222088220822088208082208882fe20298882208220882220882082008088208882208208082208202088820888220202088822000080
88080202fe20198882208220888208822208820882000822088220888080202082ff007f010010fe0002018060fe00073c6006061800001bfe00650c04440010040444010004100041001104411100401000410040044110104004411100401001104440010040444004411100041004001040044400441101040044110000
40440401011010400441110040044401001104440100001004010040044040104002ff007f010010fe0002018060fe00076c60060018000013fe00650c082200200808220080080800808020882208802008008080200822082020082208802008020882200200808220082208800808020020200822008220820200822080
0080820802020820200822088020082200802088220080000802008020082080202002ff007f010010fe000d01b86e0003f0cc6e060030003f03fe00650c1044001004104401000410004100411044110040100041004010441010401044110040100411044001004104401044110004100400104010440104410104010441
000041040401041010401044110040104401004110440100001004010040104040104002ff007f120010000007f9cc730006198c730600e0006183fe00650c0822282008082200800808a0808020882208802288a0808a20082208202288220880200802088222820080822288220880080802282020082228822082022882
208aa080820802020820200822088a2008222880208822008a2a8802288a20082080202282ff007f010010fe000d0186618000198c619f8030006003fe00650c154410100415440100041040410055154551004110404104401545501041154551004010055154410100415441154551000410041010401544115455010411
5455000041540401055010401545510440154411005515440104001004110440154040104102ff007f010010fe000d0186618003f9fe61860018003f03fe00650c0822082008082200800808208080208822088020882080822008220820208822088020080208822082008082208822088008080208202008220882208202
088220800080820802020820200822088220082208802088220082000802088220082080202082ff007f010010fe000d0186618006180c61860618000183fe00650c10441010041044010004104041004110441100411040410440104410104110441100401004110441010041044110441100041004101040104411044101
04110441000041040401041010401044110440104411004110440104001004110440104040104102ff007f010010fe000d0186618006180c61860618000183fe00650c082208200808220882080820808220882208822088208082208822082020882208822088220882208200808220882208820808220820208822088220
8202088220800080820802020820208822088220882208822088220882000822088220882080202082ff007e010010fd000ccc330006180c33060330386183fe00650c104110100410411044040440404441104410441104404044111044101011104410441104441104110100410411104410440404411010111041110441
0101110441000041040401041010111044104411104110444110411044000441104411104040101102ff007e010010fd0075781e0003e80c1e0601e0383f1fe000000c0820a820080820a0280802a0802820882208280a82a0802a0a082208200a882208280a028208820a820080820a88220828080280a8200a0820a88220
8200a882208000808208020208200a0822082a0a0820a828208820a02a000280a82a0a082080200a82ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff0012010010ed00000ce6000103ffba
000002ff0012010010ed00000ce6000103ffba000002ff007b010010fa007201e078618787f9861e1861e0000c0781e0301e0307f9fe0c0780c0300c0787f9fe0c0781e1fe1e1fe1e3ff0c0300c0781e0787f8781e0300c1fe1e0300c0781e0780c0301e1fe7f8301e0300c0787f8781e1fe1e0301e0300c1fe1e0300c1fe0
c1fe7f8307f8780c0301e0780c0781e0300c02ff007b010010fa00720330cc718cc601c633186330000c0cc33078330780c0301e0cc1e0781e0cc0c0301e0cc3303033030333ff1e0781e0cc330cc0c0cc330781e030330781e0cc330cc1e078330300c078330781e0cc0c0cc3303033078330781e030330781e0301e0300c
0780c0cc1e078330cc1e0cc330781e02ff007b010010fa007206198671986601c661986618000c186618cc618cc0c03033186330cc331860c03033186618306183061bff330cc33186619860c186618cc33030618cc3318661986330cc618300c0cc618cc331860c18661830618cc618cc33030618cc33030330300c0cc0c1
86330cc6198633186618cc3302ff007b010010fa007206018679980601e660186600000c180600cc600cc0c03033180330cc331800c030331806003060030603ff330cc33180601800c180600cc33030600cc3318060180330cc600300c0cc600cc331800c18060030600cc600cc33030600cc33030330300c0cc0c180330c
c6018033180600cc3302ff007b010010fa007206018679980601e660186600000c18060186601860c0306198061986619800c030619806003060030603ff6198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c1806198660180
61980601866182ff007b010010fa00720601866d8c0601b630186300000c18060186601860c0306198061986619800c030619806003060030603ff6198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c1806198660180619806
01866182ff007b010010fa00720601866d8787e1b61e1861e0000c18067986679860c03061980619866199e0c0306198060030678306020161986619806799e0c19e6018661830679866199e6018061986678300c18667986619800c180678306018667986618306798661830618300c1860c18061986679806199e6018661
82ff007b010010fa00720601866780c6019e03186030000c180619fe619fe0c0307f9807f9fe7f9860c0307f9806003061830603ff7f9fe7f980619860c186601fe7f830619fe7f986601807f9fe618300c1fe619fe7f9800c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe7f82ff00
7b010010fa0072060186678066019e01986018000c18061986619860c0306198061986619860c030619806003061830603ff6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007b0100
10fa0072060186638066018e01986018000c18061986619860c0306198061986619860c030619806003061830603ff6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007b010010fa00
72061986639866018e61986618000c18661986619860c0306198661986619860c03061986618306183061bff6198661986619860c186619866183061986619866198661986618300c18661986619860c186618306198661986618306198661830618300c1860c186619866198661986619866182ff007b010010fa00720330
cc618cc60186330cc330000c0cc33986339860c030618cc61986618ce0c030618cc3303033830333ff61986618cc338ce0c0ce331866183033986618ce330cc61986338300c18633986618cc0c0cc338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff007b010010fa007201e0786187
87f9861e0781e0000c0781e9861e9860c03061878619866187a0c030618781e0301e8301e3ff61986618781e87a0c07a1e186618301e9866187a1e078619861e8300c1861e986618780c0781e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff0012010010ed00000ce6000103ffba00
0002ff0012010010ed00000ce6000103ffba000002ff0012010010ed00000ce6000103ffba000002ff0012010010ed00000ce6000103ffba000002ff0012010010ed00000ce6000103ffba000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed
00000c9d000002ff000901001f88ff00feff0006fe008955fe0006fe0089aafe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000dfe000187ff8e000201ffc2fe000ffe0003440100f8900002011241fe000ffe000385850020900002011242fe000ffe000344c90020900002011241fe
0013fe000784690022c71c71c094000201f242fe0013fe00074441002320a28a20940002010241fe0013fe000784b1002207a0f980940002010242fe0013fe00074499002208a0804094000201fe41fe0013fe0007850d002208a28a20940002010042fe0013fe000744010022079c71c0940002010041fe000dfe000187ff
8e000201ffc2fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe0006fe0089aafe0006fe008955fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe0012fe000083fdff00f0fdff00fc95000002fe0013fe000042fd000110
80fe00000495000001fe0013fe000082fd00011080fe00000495000002fe0013fe000042fd00011080fe00000495000001fe0013fe000082fd00011080fe00000495000002fe0014fe000a420000180010860301800495000001fe0014fe000a82000018c010860301800495000002fe0014fe000042fe0006c01086000180
0495000001fe0014fe000a820fb338f01087c70f9e0495000002fe0014fe000a4219b318c010866319b30495000001fe0014fe000a8219b318c010866319bf0495000002fe0014fe000a4219b318c010866319b00495000001fe0014fe000a820fb318cc10866319b30495000002fe0014fe000a42019f7e7810866fcf9e04
95000001fe0014fe000682018000001080fe00000495000002fe0014fe000642018000001080fe00000495000001fe0013fe000082fd00011080fe00000495000002fe0013fe000042fd00011080fe00000495000001fe0012fe000083fdff00f0fdff00fc95000002fe000afe0000408b000001fe000afe0000808b000002
fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000efe000080f10000019cff0082fe000efe000040f10000019c000081fe0014fe000080f1000001d000fdaa00a8d20000
82fe0016fe000040f1000001d1000001fd550050d2000081fe0014fe000080f1000001d000fdaa00a8d2000082fe0016fe000040f1000001d1000001fd550050d2000081fe0014fe000080f1000001d000fdaa00a8d2000082fe0016fe000040f1000001d1000001fd550050d2000081fe0021fe000080fe0009079f80000c
f003c0000cfe000001d000fdaa00a8d2000082fe0023fe000040fe00090cc180001d980660001cfe000001d1000001fd550050d2000081fe0020fe000080fd0008c180003d800660002cfe000001d000fdaa00a8d2000082fe0022fe000040fd0008c3003c6d81e6600f0cfe000001d1000001fd550050d2000081fe0021fe
000d80000007e3830006cdf333e0198cfe000001d000fdaa00a8d2000082fe0023fe000d40000007e0c6003ecd9b00600c0cfe000001d1000001fd550050d2000081fe001afe000080fd0008c60066fd9b0060030cfe0000019c000082fe001bfe000040fe00090ccc00660d9b3663198cfe0000019cff0081fe001bfe0000
80fe0009078c003e0cf1e3c78f3ffe0000019cff0082fe0012fe000040f7000003fc0000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0070fe000080f1000301f00002fe0006e0000001000002fe00
1e8000003800003e0001f00000010000e000000e000007c00007000001f0001cfe000610000001000002fe00247000000e00000380007c00007000001c000004000020000003e00007000004000020000007fe00011c82fe0070fe000040f1002f014000050000011000000280000500000140000044000008000040000002
800110000011000001000008800000400022fe000628000002800005fe00158800001100000440001000008800002200000a000050fe001080000880000a0000500000088000002281fe0070fe000080f1000601400008800001fe001504400008800002200000400000080000400000044001fe000610000001000008fe00
02400020fe001f4400000440000880000080000010000004000010000080000020000011000088fe000b800008000011000088000008fe00012082fe0075fe0001401ffaff00f8fa000601400008800001fe001504400008800002200000400000080000400000044001fe000610000001000008fe0002400020fe001f4400
000440000880000080000010000004000010000080000020000011000088fe000b800008000011000088000008fe00012081fe0075fe00018010fa000008fa000601400008800001fe001504400008800002200000400000080000400000044001fe000610000001000008fe0002400020fe001f4400000440000880000080
000010000004000010000080000020000011000088fe000b800008000011000088000008fe00012082fe0075fe00014010fa000008fa00060140000f800001fe001507c0000f800003e000004c000008000040000007c001fe000c10000001000009800000400020fe001f7c000007c0000f80000080000013000004c00010
00009800002000001f0000f8fe001080000980001f0000f80000098000002081fe0075fe00018010fa000008fa000601400008800001fe001504400008800002200000440000080000400000044001fe000c10000001000008800000400020fe001f4400000440000880000080000011000004400010000088000020000011
000088fe00108000088000110000880000088000002082fe0075fe00014010fa000008fa002f014000088000011000000440000880000220000044000008000040000004400110000011000001000008800000400022fe001f4400000440000880000088000011000004400010000088000022000011000088fe0010800008
8000110000880000088000002281fe0079fe0005801078000380fe000008fa002901400008800000e0000004400008800002200000380000080000400000044000e000000e000001000007fe000240001cfe001f440000044000088000007000000e00000380001000007000001c000011000088fe000b8000070000110000
88000007fe00011c82fe0017fe00054010cc000180fe000008fa0000019c000081fe0017fe00058010c0000180fe000008fa0000019c000082fe0017fe00094010c0f1e18780337c08fa0000019c000081fe0017fe000980107998318cc0336608fa0000019cff0082fe0017fe000940100d81f18fc0336608fa0000019cff
0081fe0017fe000980100d83318c00336608fa0000019c000082fe0017fe00094010cd9b318cc0337c08fa0000019c000081fe0017fe0009801078f1f7e7801f6008fa0000019c000082fe0014fe00014010fb00016008fa0000019c000081fe0014fe00018010fb00016008fa0000019c000082fe0013fe00014010fa0000
08fa0000019c000081fe0013fe00018010fa000008fa0000019c000082fe0013fe0001401ffaff00f8fa0000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0013fe0001801ff8ff00e0fc0000019c00
0082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0019fe00074010780003800003fe000020fc0000019c000081fe0019fe00078010cc0001800003fe000020fc
0000019c000082fe0019fe00074010c00001800003fe000020fc0000019c000081fe0019fe000b8010c0f1e187801f3ccdf020fc0000019c000082fe0019fe000b40107998318cc03366cd9820fc0000019c000081fe0019fe000b80100d81f18fc03366cd9820fc0000019c000082fe0019fe000b40100d83318c003366fd
9820fc0000019c000081fe0019fe000b8010cd9b318cc03366fd9820fc0000019c000082fe0019fe000b401078f1f7e7801f3c499820fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00
014010f8000020fc0000019c000081fe0013fe0001801ff8ff00e0fc0000019c000082fe000efe000040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0012fe000080f1000001dc000006c2000082fe0018fe0001401ffdff0080f7000001dc00010180c3000081fe0013
fe00018010fd000080f70000019c000082fe0017fe00014010fd000080f7000001dc000040c2000081fe001bfe00018010fd000080f7000001dc000040e800000cdc000082fe001bfe00014010fd000080f7000001dc000040e7000080dd000081fe001dfe000680100000c00080f7000001dc00018020e8000080dd000082
fe001cfe000640100000c60080f7000001db000010e9000020dc000081fe001cfe000680100000060080f7000001db000010e9000040dc000082fe001cfe000640107d99c78080f7000001db000010e8000010dd000081fe0018fe00068010cd98c60080f7000001c1000010dd000082fe001cfe00064010cd98c60080f700
0001dd000002e7000080dc000081fe001cfe00068010cd98c60080f7000001dd000002e7000080dc000082fe001cfe000640107d98c66080f7000001dd000002e6000004dd000081fe001efe000680100cfbf3c080f7000001dd0002020004e8000004dd000082fe001cfe000240100cfe000080f7000001db000004ea0000
01db000081fe001cfe000280100cfe000080f7000001db000004ea000001db000082fe001bfe00014010fd000080f7000001db000004e8000001dd000081fe001ffe00018010fd000080f7000001c1000001fd000040f3000061f1000082fe0026fe0001401ffdff0080f7000001dd000008e8000002fb000030f300018080
f40002014081fe0022fe000080f1000001dd000008e8000002fb000008f40002010080f40002041082fe001afe000040f1000001dd000008e5000080ee000040f2000081fe001afe000080f1000001dd0002080001e7000040e00002100482fe001afe000040f1000001db000001ea000004fc000002e1000081fe001cfe00
0080f1000001db000001ea000004fc000002e30002400282fe001efe000040f1000001db000001e700042000000202f4000010f0000081fe001efe000080f1000001c000041000000401f40002100020f40002800282fe0024fe000040f1000001ee00004cf1000020e8000008fb000001f40002200020f2000081fe0028fe
000080f100010180ef000080f1000020e8000008fb000001f40002200010f5000301000182fe001cfe000040f100010140de000020e500010332ef000010f2000081fe0020fe000080f1000001ee000001f1000320000080e7000001e2000302000182fe0020fe000040f1000001ef0002020080ef000080eb000010fc0000
08e1000081fe0021fe000080f1000001ef000002ed000080eb000010fc000008e4000304000082fe0021fe000040f1000001e2000040fa000080e6000380100080f5000040f0000081fe002cfe000080f100010110ee000020f800010110f9000003e7000340100080f50002800008f5000304000082fe002bfe000040f100
010108f00002040020f2000040fd000020ed000020fa000080f50002800008f2000081fe0031fe000080f100010108f0000008f600010404fd000040fd000010ed000020fa000040f50002800004f5000308000082fe001ffe000040f100010104de000040fe000010e7000020f0000004f2000081fe0027fe000080f10000
01ed000008f800010801fd0005800000401002e8000010e3000310000082fe0022fe000040f1000001ef0002100008ef0002400002ed000040fc000020e1000081fe002bfe000080f1000001ef000020f60002100040fb000040f70000fcf6000040fc000020e4000310000082fe0029fe000040f1000001e10002100028fd
00014040f900010102f1000308400020f6000002ef000081fe0037fe000080f100010102ee000002f8000020fe000002fc0002400080fb00010202f1000308400020f6000302000002f5000320000082fe0039fe000040f100010102f000044000020070f800040800800080fc0002800040fd00010201f6000040fa000020
f6000304000002f2000081fe003ffe000080f100010102f0000040fe000080fa000620000202010080fa000020fd0002040080f7000080fa000020f60005040000020006f7000340000082fe0032fe000040f100010102ea000001f90002880001fd00048000000810fd0002040040f2000004f0000301002040f5000081fe
003bfe000080f1000001ed00050100010000c0fd0005400000200081fe0005208000200808fd0002080020f2000002ee00012020f8000340000082fe002ffe000040f1000001ef00078000010000c06020f400042000001010fc0002100010f7000080fc000080e1000081fe0040fe000080f1000001f500000efc000080fd
00012180fc000080fd000040fe000020fe000010fc0002100010f8000001fb000080fb000014eb000380000082fe003ffe000040f1000001f5000311000003fc000004f0000011f80002200008f2000302800008fd00034000001cfe000008fd00018004fe000101e0fa000081fe0057fe000080f10002010080fe00000afb
00042080000c80fe00010108fa000001fc000020fe000301000008fb0002200004f2000b018000080060000002000022fe000010fe0007808002003c000218fd000380000082fe0053fe000040f10002010080fd000080fc0005204000306001fe000088f4000002fb0002080002fd0002400004f8000002f9000008fd0007
8000004100004010fe000080fe000342000406fe000080fe000081fe005bfe000080f100070100800e00002020fc0005404000401001fe000010fe000004fe000002fc000024f9000002fd0002400002f8000002f9000c0802060000010000808000b010fe000080fe00074100080100000c41fe000082fe0051fe000040f1
00070100400980008010fc00044020004008f9000002f8000004fe000002fe00018002fd0002800001f2000001fe000a0201000200000100400108fd00078200010080801001fa000081fe005afe000080f100040100001040f900048010008004fd000040fe000002fe000002fc000014fe0005140000048001fe000001fe
0000c0f3000001fc0008800000400100200208fd0002020000fe80052000c0000009fe000082fe0057fe000040f10012010000202002000800000c0000800801000402fe000040fe000002f400040800000480fd000001fe00013ffcfa000004fc000001fb00070400000200200204f900078040400020004004fe000081fe
0056fe000080f100040100004010fc0008130001000402000202f6000004fc000008fe000308000001fc000002fd000003fa000004fc000002fe000008fd0005200400100404fa0008010040800020008002fe000082fe004efe000040f10011010000400804000400002080010002040002fd000040f0000008f9000002fc
0000c0f5000f02800004100020080000040010080220fe0008080000810021000010fb000081fe005cfe000080f100040100208008fc00074040020001180002fd000090fa000008fc000004fe000308000001fe0002540004fc000030f5000f02800002000010000020080010080120fe000b48000042001e000008000003
fe000082fe005dfe000040f10012010020800408000100018040020000e0000104fe000090fb000007fb000010fb000601000041000008fc000008fc00014008f9000002fe0008200000080008100140fe000340000002fd000308010001fe000081fe005afe000080f100040100110004fd000302002004fd000401040000
01fc0003400018c8fc000012f8000340010008fc000008fd0002011010f900010220fd0006101000081000c0fe000340000004fd000304020002fe000082fe005cfe000040f1000c01001100021000008004002004fd000001f8000340002020fc000020fe000910000002000024000010fc000004fd00010404fa000e4000
00400004800000100004600040fe000350000024fd000002fb000081fe0061fe000080f100040100020001fd000318001008fd000001fd000004fd000320004030fc000d2102000014000004800020004020fc000502000e000010f9000040fd000906000008200003800040fe000310000018fd00070200000480000082fe
005bfe000040f1000c01000200009000004020000808fc000090fe000004fd000320018010fb000c08800004000004800010000020fc00070100118000400120fc000008f8000040fd000020fb000008fd00070104000040000081fe005afe000080f10005010004000080fe000320000410fc000090f800020e0010fb0006
a0200004000004fd00011040fb000680204001000020fc000008fe000680000800000880fd000020fb000008fd000301080008fe000082fe005afe000040f1000c010004000060000020c0000220fc000340000004fb0002f00008f8000022fc000320000480fb000640c02004000080fc000610200001000002fe000080fe
00010120fe000340000018fd000001fb000081fe005cfe000080f1000c01000c000020000003000001c0fc00044000000802fd000303000028fa0002100040fe00006afd000080fb0002230020f8000910100001000011000005fd00010210fe000360000014fc000685000810000082fe0059fe000040f100080100080000
60000014fd000018fd00046000001002fd000304000008fc000080fa0004c000880001fa00061c001010000040f9000001fc000002fd00010410fe0003202a8020fc000690000010000081fe005cfe000080f1000801000c00001000000cfd00018180fe000c6000001000000140000c000088fc0002800008fb0003820800
01f8000010fe000040f9000601000020000004fd00010408fe000310800020fc0002b04010fe000082fe005cfe000040f1000801000c000090000011fe000001fc000020fc000604100008000004fd000001fd000980000021002804000280f900040820000040fb0002280002fe0002800008fc000008fe000390002042fc
000040fc000081fe005cfe000080f10008010010000008000010fc000010fe00001efd001a8000040014000104000003c002000004009f800020100004000420f9000004f700008afd000380400012fc000004fe000382000841fc000660102004000082fe005dfe000040f1000901001000010800002040fe00050a080000
0110fd000b401001002400000200000420fc00046140004410fe00010408f900040280000120fc000380000002fc000060fc000002fb000080fc000660000004000081fe0066fe000080f10008010020000008000040fe0008080001000001081802fe000a40005040001b0200000810fe00050200813c0080fd000008f800
0302000001fe000c7800010100800a800100000181fd00012001fe000308000280fc0006e0044000007082fe0067fe000040f1001501004000010400008010000010208080000008061d40fd0008048001808100001008fd00040180838010fd00011002f9000003fe001d1006008407860402002440010010020001e00000
4000c000012000010080fd00011001fe00018c81fe0064fe000080f100070100410000040001fa001d014000050000100019000001000804110000100400004001010000604004fe000010f700208000080081808380055002401e60000010020080015000800020000104000280bcfd000690000002010282fe006bfe0000
40f1001501208100020200020004014000404004101904400404fd00170240101008800020026001b0000100001d00040001002001fc00220400000440001e090001070011fc0000e1200200000c0000040001000010000087fe0cfc00070208000001020281fe006efe000080f1002a01cd00800001000403870410400000
18040212101000000200000c11804000400061c180060c0082000017fe00040100400004fd000051fe00152000a1820002008006038010912807c0003000200008fe000628000a0e01f040fd00010308fe0002040182fe0070fe000040f1002b0102c8a0040080040401d386810010100086020540010001001ff001850000
40018020800803000c000060c0fe0021818001c300000755004000101009c044000480405800610590c01c3004c000001102fe00088680a4320000042020fe00060401e000040c81fe0070fe000080f100660102860800004008000004058006004181004101f0000fe881e00005500001a006000040100099d00040003001
038082000200800091800010804009938030000b5520600018830022600c0700003040c080000205ea084000001810183d403404021800283082fe0070fe000040f10066013c28c20887a0300000cc05048184079c0100860e00b03e1200001820000011e80018202001c0200029000f00bc78020004100803401f87870780
9f86078d103000518000060c00119003fde000c10021ea0008086c018000002005441038003a0c0700100081fe000efe000080f10000019cff0082fe000efe000040f10000019cff0081fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b00
0002fe000afe0000408b000001fe000efe000080f10000019cff0082fe000efe000040f10000019c000081fe0014fe000080f1000001e000fbaa00a0c4000082fe0016fe000040f1000001e1000001fb550040c4000081fe0014fe000080f1000001e000fbaa00a0c4000082fe0016fe000040f1000001e1000001fb550040
c4000081fe0014fe000080f1000001e000fbaa00a0c4000082fe0016fe000040f1000001e1000001fb550040c4000081fe0021fe000d80000001878f0000fc600060000cfe000001e000fbaa00a0c4000082fe0023fe000d400000038cd98000c0e000e0001cfe000001e1000001fb550040c4000081fe0021fe000d800000
058cc18000c16001e0002cfe000001e000fbaa00a0c4000082fe0023fe000d4000000180c1803cf861e3600f0cfe000001e1000001fb550040c4000081fe0021fe000d800003f183870006cc633660198cfe000001e000fbaa00a0c4000082fe0023fe000d400003f18601803e0c6306600c0cfe000001e1000001fb550040
c4000081fe001bfe000d800000018c0180660c6307e0030cfe0000019c000082fe001bfe000d400000018c198066cc633063198cfe0000019cff0081fe001bfe000d80000007efcf003e79f9e0678f3ffe0000019cff0082fe0012fe000040f7000003fc0000019c000081fe000efe000080f10000019c000082fe000efe00
0040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe006ffe000080f1000d01007c00001f0000200001c00010fe0017040000100000e0001f000007c002000001c00000e000007cfd001d3e00003800000400008000400080007000001c0000070003e0001c00001cfe0018
40004000007c000e0000200000080001c000000e00000e0082fe006ffe000040f1000d0100100000040000500002200028fe00170a0000280001100004000001000500000220000110000010fd001d0800004400000a00014000a0014000880000220000088000800022000022fe0018a000a0000010001100005000001400
02200000110000110081fe006ffe000080f1000d0100100000040000880002000044fe00131100004400010000040000010008800002000001fe000010fd003008000040000011000220011002200080000020000008000080002000002000000110011000001000100000880000220002fe0005100000100082fe0074fe00
01401ffaff00f8fa000d0100100000040000880002000044fe00131100004400010000040000010008800002000001fe000010fd003008000040000011000220011002200080000020000008000080002000002000000110011000001000100000880000220002fe0005100000100081fe0075fe00018010fa000008fa0024
0100100000040000880002000044001f001100004400010000040000010008800002000001fe0035100000f80008000040000011000220011002200080000020000008000080002000002000000110011000001000100000880000220002fe0005100000100082fe0074fe00014010fa000008fa000d0100100000040000f8
000200007cfe00131f00007c0001300004000001000f800002000001fe000010fd00390800004000001f0003e001f003e000800000260000098000800026000020000001f001f000001000130000f800003e0002600000100000100081fe0074fe00018010fa000008fa000d0100100000040000880002000044fe00131100
004400011000040000010008800002000001fe000010fd003908000040000011000220011002200080000022000008800080002200002000000110011000001000110000880000220002200000100000100082fe0074fe00014010fa000008fa000d0100100000040000880002200044fe0017110000440001100004000001
000880000220000110000010fd003908000044000011000220011002200088000022000008800080002200002200000110011000001000110000880000220002200000110000110081fe0078fe0005801078000380fe000008fa000d0100100000040000880001c00044fe0017110000440000e000040000010008800001c0
0000e0000010fd00390800003800001100022001100220007000001c000007000080001c00001c000001100110000010000e0000880000220001c000000e00000e0082fe0017fe00054010cc000180fe000008fa0000019c000081fe0017fe00058010c0000180fe000008fa0000019c000082fe0017fe00094010c0f1e187
80337c08fa0000019c000081fe0017fe000980107998318cc0336608fa0000019cff0082fe0017fe000940100d81f18fc0336608fa0000019cff0081fe0017fe000980100d83318c00336608fa0000019c000082fe0017fe00094010cd9b318cc0337c08fa0000019c000081fe0017fe0009801078f1f7e7801f6008fa0000
019c000082fe0014fe00014010fb00016008fa0000019c000081fe0014fe00018010fb00016008fa0000019c000082fe0013fe00014010fa000008fa0000019c000081fe0013fe00018010fa000008fa0000019c000082fe0013fe0001401ffaff00f8fa0000019c000081fe000efe000080f10000019c000082fe000efe00
0040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0013fe0001801ff8ff00e0fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe000180
10f8000020fc0000019c000082fe0019fe00074010780003800003fe000020fc0000019c000081fe0019fe00078010cc0001800003fe000020fc0000019c000082fe0019fe00074010c00001800003fe000020fc0000019c000081fe0019fe000b8010c0f1e187801f3ccdf020fc0000019c000082fe0019fe000b40107998
318cc03366cd9820fc0000019c000081fe0019fe000b80100d81f18fc03366cd9820fc0000019c000082fe0019fe000b40100d83318c003366fd9820fc0000019c000081fe0019fe000b8010cd9b318cc03366fd9820fc0000019c000082fe0019fe000b401078f1f7e7801f3c499820fc0000019c000081fe0013fe000180
10f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0017fe00018010f8000020fc000001c3000006db000082fe0013fe00014010f8000020fc0000019c000081fe001bfe0001801ff8ff00e0fc000001c3000010e0000040fd000082fe0017fe000040f1000001c300012040e1000090fd00
0081fe0016fe000080f1000001c2000020e1000004fd000082fe0012fe000040f1000001a2000002fc000081fe0016fe000080f1000001c3000040e0000002fd000082fe001cfe0001401ffdff0080f7000001c300018010e2000004fc000081fe001ffe00018010fd000080f7000001da000080ea000008e1000001fd0000
82fe001bfe00014010fd000080f7000001db000001c9000004fc000081fe001ffe00018010fd000080f7000001db000001ea000080e0000001fd000082fe0022fe00014010fd000080f7000001db00010208ec0002010004e2000008fc000081fe0020fe000680100000c00080f7000001da000004ea000002e0000080fe00
0082fe001cfe000640100000c60080f7000001da000004ca000008fc000081fe0020fe000680100000060080f7000001da000004ec000002de000080fe000082fe001efe000640107d99c78080f7000001c40002020001e2000010fc000081fe0020fe00068010cd98c60080f7000001db000008e9000001e0000080fe0000
82fe001cfe00064010cd98c60080f7000001db000010c9000010fc000081fe0020fe00068010cd98c60080f7000001db000010eb000002de000040fe000082fe0024fe000640107d98c66080f7000001db00011001ec000304000080e3000020fc000081fe0024fe000680100cfbf3c080f7000001e0000080fc000001e900
0040e1000040fe000082fe0021fe000240100cfe000080f7000001e100010220fc000001ca000020fc000081fe0020fe000280100cfe000080f7000001da000001ec000004de000020fe000082fe0023fe00014010fd000080f7000001e100010408e6000304000020e3000020fc000081fe001ffe00018010fd000080f700
0001db000040e8000020e1000020fe000082fe0020fe0001401ffdff0080f7000001e100010804fd000040c9000040fc000081fe001efe000080f1000001db000080eb000008ed000080f3000020fe000082fe002cfe000040f1000001fd000060e600011002fd0002800080ed000308000010f1000001f4000040fc000081
fe002dfe000080f1000001fd000018e40002800280fe000080ea00040800000380f500010208f3000010fe000082fe002bfe000040f1000001eb000080f800042000000420fe000080e7000004f400010408f5000040fc000081fe002bfe000080f1000001fe000008ef000040f600014010fd000040ed000010ed000004f3
000010fe000082fe0029fe000040f1000001fe000010e500042000100010e9000310000002f0000004f5000080fc000081fe0026fe000080f1000001fe000020f000010408f6000305400001e7000001e1000010fe000082fe002cfe000040f1000001fe00014001f100010404f8000020fe00011002e6000320000010e700
0080fc000081fe0029fe000080f1000001fc000080e2000002ea000010fe000310001010f5000020f2000008fe000082fe002ffe000040f1000001fc000080e7000040fe00040802000020ed000020fc00012010f5000020f5000001fb000081fe002bfe000080f1000001fc000080f200011002f0000020e9000302002010
f500014002f3000008fe000082fe002efe000040f1000001ec00012002f8000040fe000004fe000020e90002020020f400014002f6000001fb000081fe0021fe000080f1000301000004dc000020ed000020ed000001f3000004fe000082fe0029fe000040f1000301000008e4000080fe000004fd000003ee000020ed0000
01f6000001fb000081fe0027fe000080f1000301000030ef00014001f3000004fe00010820ea000080e3000004fe000082fe0032fe000040f1000201001cfe000020f20002400080fa000001fd00010204fe00011010ea0002800004e8000001fb000081fe002dfe000080f10002010010fe000020e2000008fb000010f100
0040fc00018004f6000001f1000004fe000082fe0033fe000040f1000001fc000010e8000001fd0008020800001000040028f1000040fc00018004f6000002f4000002fb000081fe0031fe000080f1000001fc000010f20002800080f100041040040040ec0002410004f6000302000080f4000004fe000082fe0031fe0000
40f1000001ed000301000040fa000001fd000002fe00011080e9000041f4000302000080f7000002fb000081fe001efe000080f1000001d9000010ed000080ec000080f4000002fe000082fe002efe000040f10002010080e4000002fd000001fc0002010001fc000004f7000080ec000040f7000002fb000081fe0032fe00
0080f100010101ee000301000040f40008100000010001000080fd00000bf2000020f0000003f5000002fe000082fe003afe000040f100010101fd000008f3000302000020fa000002fd000901100000010000020080fd00011080f30002100002e8000002fb000081fe003dfe000080f100010101fd000004ef000002f500
0010fc0002020080fd00012040f8000080fd0002020002f6000008fe00011020f600040200080082fe0042fe000040f1000001fc000004ef000001fb000004fc0006a0000004000084fb00012020f9000001fc0002020002f6000008fe00012010f9000004fd0002200081fe0042fe000080f1000001fc000004f700000efe
00070200001000c003c0f5000306000084fb00014010f30002120001fd000008fb000310000020f400040100020082fe0046fe000040f1000001f7000040fc000009fe0007040000104000c002fe000008fc000340000006f800018010f3000014fb000002fb000310000020f7000004fd0002400081fe0042fe000080f100
0001f7000010fc000310800070fd000340030001f5000004fb00042a00008008f9000001f5000020fc000060fe0002204004f600040100010082fe004efe000040f100010104f900010104fc000320400088fd000080fe000380000008fc000040fc0008400020008000010004f9000001f5000001fc000090fe0002204004
f9000008fd0002800081fe0049fe000080f100010108f20008202000840800000880fe000040f9000f4000000400002000200000800100020ef4000008fe00030c000040fc000090f8000103e0fa000380010082fe0057fe000040f100010108fd000002fe00010201fc00074020010208000008fa000008fc000f60000004
000020001001000002000231f400040800008040fe000080fe00010108f800010c10fe000008fe000301000081fe0051fe000080f100010108fd000002fa000670000040100202f0000040fc000a200010000040020001c0c0fb000002fc00070800008040c00080fd0002020840fe00018001fe0001700cfa000340008082
fe005efe000040f1000001fc000001fe000a0400400188000080080401f6000010fc000ca0000002000060000002000004fe000040fb000002fc000310000080fd000040fe0010020440000001000100c000800200026008fe000302000081fe0056fe000080f1000001fc000001fb000b020600008008040110000008f100
030a000050fd00011004fe000020fc000050fb000614000080000001fe00100180040440000010000001300080020004fc000340004082fe005cfe000040f1000001f8000e080020020100010004080090000004fd000308000010fc000310000009fc000304000008fe000020f5001c140000010008000020000240040240
0000100000020803000100000410fe000304000081fe0053fe000080f1000001f5000b0400c0010004100080000004fd000008f6000001fa00010808fe000010fd0002010804f9000301000402fe00030c200802fe000a1200008204040001000002fd000310002082fe0057fe000040f100010140f9000e10000808002002
0002200080000008fd000304000020fc000008fd00077810000804000010fe000010fb000004f40005200030100802fe000b120000840218000080100010fb000081fe0057fe000080f100010140f6000b100010020001c0006000000afd0001041cfb000001fe000910000008000400000220fe000008fd00010202fb0000
02fc00070400001fc0081002fb00060801e000008010fa00012082fe005afe000040f100010180fc00098000001000041000080cfd000340000002fc0002620040fd0005020400001004fe00040408000020fe000008f5000d0200004400020000102000081001fb000010fd000340000090fb000081fe005dfe000080f100
010180fc000040fc000320000410fd000040f900018180fc000002fd000008fe000402000001c0fe000004fd0002020108fc000d4000004400010400004000042001fe000304000030fd000340000080fc00011082fe005efe000040f1000001fb000941c000200002200003e0fd000040f90002804080fd00010204fe0007
3000040000100001fd000304000001fe000008fc000340000040fd000c08800002200180000004000020fd000340400020fb000081fe005afe000080f1000001fb00014220fd000040fa000060fe000080fe0002010020f8000320000004fd00010e40fe000704000006c0040080fc0014810000400000080000800001c002
80000008000020fd00014040fa00010982fe005bfe000040f1000001fb000604180040000140fa0004a000002080fe0002030011fb000302000020fc0002200010fd00050201f0082004fa0008810000080000800009fd00070285000008000040fd000320000020fc00010181fe0052fe000080f1000001fb00010804fd00
0080fd000608000010000020fd0002030008f000012020fe00070202081010000020f90005080000900001fc0006d0400008000090fd000320000020fc00010682fe0058fe000040f100010180fc000608040080000080fd000602000010000040fd0002050006fb000001fc0005020001200040fd0007010206e010080020
f4000006fd0007014000000c000090fd000310800040fc00010281fe005dfe000080f100010180fc00011003fe000001fc00072000011000004040fe0002048002fc000004fe000940000001000080004010fd00068c010008000010fb000080fd0002100004fd0002044020fe000001fc000011f900010282fe005cfe0000
40f100010140fc000610008080000140fd000301000108fe000040fe0002040006fc00040800800040fc0002c00080fc00049000000810f900074000100000200006fd00011040fd000003fc000310000048fc00010481fe005afe000080f100010140fc0005300080000002fc000340000008fa0002080001fc000008f800
0340010008fd000660000004000044fd000004fe0005100000200008fd00070820100010000408fd000310000008fc00010582fe005efe000040f100010140fc0008300041000002200001fe0002400008fc0004e000080009fe0004c000100040fc000380004002f8000304100080fd000008fe000010fd000009fd000748
20000010000408fd00030a000080fc00010881fe005ffe000080f100010140fc000e480020000004000007400080000404fe000a2000940010000080000130fc000080fe0005800000020004fa000302000002fd000910202000100000400010fd000610100800020008fc00000af900010882fe0060fe000040f100010120
fc001c4000120000042000040f80002004040002001001080010202040000208fe00021000e0fc00018004f800010220fb000a1008200040000010002088fe0006901000000a0010fc000308000104fd0002c01081fe0062fe000080f100010120fc000a8000100000080000081819fe000e0200020000024a002020004000
0208fd00010110fb0002040002fa000301000601fc00098200007000008800c022fd000610040022002004fd000304000002fe000301201082fe0064fe000040f100010120fc000680000e00000810fe00120600100002000c005004040020208020000404fe0002080210fe000340011008f90002028140fa0003f0000048
fe000d0700008000020008000022002002fd000a0c000200f8000002182081fe006afe000080f100010110fd002301000001c000100000208803000010020301f8080484804092001000040400800000050cfe000620001010000080fc0011082080200080003800030910008600010038fd00070200080400808040fc000a
0a00080104000004044082fe006afe000040f100010110fe002b3d020300083000100800010000800410010e8e07040802004228000800080200800004060200e00000020990f9001820088040180000c6010204480185000005c0000040001c0008fd0002800020fe000a0200a00104000008038081fe006ffe000080f100
240108000400c702064000080020000040040410000001003001810a02408800000800100103fe00210802013a001c0fdc300000400a000a003800024200e72001010206020002848002fefd000a0610000402024021000188fe000a0102000683000008000082fe006ffe000040f10023010400114110cc0100110f002002
01980000080280008810044010010080000006001001fe002101100bae848073f4a0680078000080388040c0003c0100800100c40c012004024007fd000b1008000004cc008002000302fe000a23a8800800800010000181fe0070fe000080f10048011500400200701118420a08c000060002080801aa8160680040e400a1
20000001006003c00000cc240088022088091c800800602000c070002001900607f057552410010408022018fe000c0e00114000020084008c0009e0fe000a688371c800800321c00082fe0070fe000040f10055010251001dd90441858d919f8001098000a007980026a38008302800ce800fcc03c0cc8043fc1e00e8a070
010301980301f00018802000080b99980ff800150800505000ca12a01620009000101e140000050090004afe000d81e000805cec31207f1070200081fe000efe000080f10000019cff0082fe000efe000040f10000019cff0081fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe
0000408b000001fe000afe0000808b000002fe0006fe008955fe0006fe0089aafe000283000283000283000afe000002b5aa00a8d4000afe000005b5550050d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b500
0010d4000afe000002b5000008d4000afe000004b5000010d40015fe00020200f0fe00053000cc600060c0000008d40015fe0002040198fe00053000cc600060c0000010d40015fe0002020180fe00053000cc000060c0000008d40017fe000d040181e3cf8f3e00cce3e3e79980c2000010d40017fe000d0200f3306cd9b3
00cc63366cd980c2000008d40017fe000d04001bf3ec183300fc63366cd980c2000010d40017fe000d02001b066c183300fc63366cdf80c2000008d40017fe000d04019b366c19b300cc63366cdf80c2000010d40016fe000c0200f1e3ec0f330085fb33e789c1000008d4000afe000004b5000010d4000afe000002b50000
08d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d40014fe000004f600007ff9ff00c3f9
ffd2000010d40014fe000002f6000040f9000043f9ffd2000008d40014fe000004f6000040f9000043f9ffd2000010d40014fe000002f6000040f9000043f9ffd2000008d4001cfe000304001f0cfe00010180fe000040f9000043f9ffd2000010d40025fe00080200198c0000030180fe000343000018fe0003180043f8fc
ff019fffd2000008d40026fe000604001980000003fe00050c0043000018fe0004180043f27ffdff019fffd2000010d40025fe000f0200199c7c78f3c3879f1e0043000018fe0003180043f3fcff019fffd2000008d40027fe001d0400198c66cd9b018cd98c0043e3c799b33cf8f9e043f3e183330c1c187fd2000010d400
27fe001d0200198c60fd83018cd9800043306cdb3306cd9b3043e1cc9933e4c9933fd2000008d40027fe001d0400198c60c183018cd980004333ec1e333ec1998043f3cc9f3304f999ffd2000010d40027fe001d0200198c60cd9b318cd98c0043366c1e3f66c1986043f3cc9f0264f99e7fd2000008d40027fe001d04001f
3f6078f1e7e7999e0043366cdb3f66c19b3043f3cc9f0264f9933fd2000010d40021fe000002f800130c0043e3e799923ec0f9e043f3e19fb704fc187fd2000008d40014fe000004f6000040f9000043f9ffd2000010d40014fe000002f6000040f9000043f9ffd2000008d40014fe000004f6000040f9000043f9ffd20000
10d40014fe000002f6000040f9000043f9ffd2000008d40014fe000004f600007ff9ff00c3f9ffd2000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004
b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4002afe000002f600007ffaff00e1f6ff01f87ffaff00e1f8ff01fe1ffaff01f87ffbff00f0faff01e008d4002bfe000004f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00
011080fb00012010d4002bfe000002f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012008d4002bfe000004f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012010d40033fe000202000ffe000203000cfe000040fa000021f600010840fa0000
21f8ff01fe10fa00010840fb00011080fb00012008d4004ffe000804001980000003000cfe001943e0000600180000210f8000063000000cc000000843f000003ffe000221f87ffdff05fe7ffffe1078fb00110843e000181c00001083c0001c1800002010d40050fe0002020018fe001f03000c000c004330000630180000
210cc000063000000cc000000840c000000cfe000c21f33fffff3ffcfe7ffffe10ccfb001108433000180c0000108660000c18c0002008d40050fe00100400181e3cf8f3e00f999e004330000030fe0004210cc00006fe00090ec000000840c000000cfe000621f33fffff3ffcfeff02fe10c0fb001108433000180c000010
8660000c00c0002010d40053fe004d02000f3306cd9b300cd98c004333c78e3c3879f0210ccf1e3e71f1d00ecf363c0840c3c7400c66f8f021f320c1c30f0c3c7860fe10c0f1f6679f1e3c084337c79f0c3cd810866ccf0c38f1982008d40053fe004d040001bf3ec183300cd9800043e66cc63018cd98210f998366319b30
0fc1bf660840c06cc00c66cd9821f0264c993fe4fe73267e10799b366cd9b3660843e66cd98c66fc10866cc18c18c1982010d40053fe004d020001b066c183300cd98000430666063018cd98210f1f9f66319b300dcfbf7e0840c3ecc00c66cdf821f3264c993f04fe73267e100dfb366fd9b07e0843060cd98c7efc10866c
cf8c18c1982008d40053fe004d040019b366c19b300ccf8c00430661863018cd98210d9833663199e00dd9b3600840c667800c66cd8021f3264c993e64fe73267e100d83366c19b0600843060cd98c60cc10876cd98c18c1982010d40053fe004d02000f1e3ec0f3300f819e0043066cc63318cd98210cd9b366319b000cd9
b3660840c66c000c3ef99821f3264c993264ce73267e10cd99f66cd9b3660843060cd98c66cc1086ecd98c18ccf82008d4004efe000004f90044198c004303c79f9e7e7998210c4f1f3efd99e00ccfb33c0840c3e7800c06c0f021f3264cc387061818667e1078f033e7999e3c084306079f3f3ccc1083c7cfbf7e78182010
d4003dfe000002f900030f000040fa000021fc00010330fd00090840000cc00066c00021f8ff04fe10000030fd00010840fb0002108060fe000301982008d40038fe000004f6000040fa000021fc000101e0fd00090840000780003cc00021f8ff04fe10000030fd00010840fb00011080fc0002f02010d4002bfe000002f6
000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012008d4002bfe000004f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012010d4002afe000002f600007ffaff00e1f6ff01f87ffaff00e1f8ff01fe1ffaff01f87ffbff00f0faff01e008d4000afe
000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b50000
08d40015fe000004f600007ff6ff01fe1ff5ffda000010d40017fe000002f6000040f600010210f6000001da000008d40017fe000004f6000040f600010210f6000001da000010d40017fe000002f6000040f600010210f6000001da000008d4001efe000004fd000301980380fe000040f600010210f6000001da000010d4
001ffe000002fd000301980180fe000040f600020210fcf7000001da000008d40020fe000004fd000701980180000c0040f60002021030f800010c01da000010d40022fe000002fc000691e18ccf1e0040f60005021030000003fb00010c01da000008d40025fe000004fc000690318cd98c0040f6000d0210319be3c7801e
3cd9b1e7cf01da000010d40025fe000002fc0006f1f18cdf800040f6000d0210319b3663003366fdfb366c01da000008d40025fe000004fc000663318cd8000040f6000d0210319b37e0003066fdfbf66c01da000010d40025fe000002fc000663318cd98c0040f6000d0210319b3600003066cd9b066c01da000008d40025
fe000004fc000661f7e7cf1e0040f6000d021030fbe663003366cd9b366cc1da000010d40021fe000002f800020c0040f6000d0210301b03c7801e3ccd99e66781da000008d4001bfe000004f6000058f600050210019b0003fa000001da000010d40019fe000002f600007cf60003021000f3f8000001da000008d40017fe
000004f6000066f600010210f6000001da000010d40017fe000002f6000040f600010210f6000001da000008d40015fe000004f600007ff6ff01fe1ff5ffda000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe0000
04b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d40012fe00010201fbff03e1fffffec0000008d40012fe00010401fb000321000002c0000010d40012fe00010201fb000321000002c00000
08d40012fe00010401fb000321000002c0000010d40012fe00010201fb000321000002c0000008d40014fe00010401fd0005018021001802c0000010d40014fe00010201fd0005018021001802c0000008d40014fe00010401fd0005018021001802c0000010d40015fe000b0201078f1e7c79f021079982c0000008d40015
fe000b04010cd98366cd98210cdb02c0000010d40015fe000b0201061f9f60c198210cde02c0000008d40015fe000b040101983360c198210cde02c0000010d40015fe000b02010cd9b360cd98210cdb02c0000008d40015fe000b0401078f1f60799821079982c0000010d40012fe00010201fb000321000002c0000008d4
0012fe00010401fb000321000002c0000010d40012fe00010201fb000321000002c0000008d40012fe00010401fb000321000002c0000010d40012fe00010201fbff03e1fffffec0000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5
000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000005b5550050d4000afe000002b5aa00a8d400028300028300028300028300028300028300028300028300028300028300028300a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.3\tab A typical display from the contig editor in XBAP\par
\pard\plain \s4\qj\sb160\sa120\sl280 \f20 The four scroll buttons operate as follows\:\par
\pard \s4\qj\li1720\sa120\sl280\tx4520 "<<"\tab Scroll left half a screenful\par
"<"\tab Scroll left one character\par
">"\tab Scroll right one character\par
">>"\tab Scroll right half a screenful\par
\pard \s4\qj\sa120\sl280
The Editor cursor can be positioned anywhere in the edit window by moving the mouse pointer over the character of interest, then pressing the left mouse button. The Editor cursor can also be moved by using the direction arrow keys.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.2\tab Editing operations \par
\pard\plain \s4\qj\sa120\sl280 \f20 The editor operates in two main edit modes - Replace
and Insert. Replace allows a character to be replaced by another. Insert allows characters to be inserted into a reading. Characters are entered by typing them from the keyboard. Only valid characters are permitted. Characters can be deleted by positionin
g the cursor one character to their right, then pressing the delete key. Normally Insert and Delete apply to the consensus line of the contig only. This restraint can be overridden by using the "Super Edit" mode of operation, though it should be employed w
ith caution as misuse may corrupt alignments.\par
Edits can also be performed on the consensus, though they are restricted to insertion and deletion of padding characters ("*"). These edits also have special meanings. A deletion will delete all characters at the position to the left of the cursor in the c
ontig, and move the relative positions of all sequences starting to the right of the cursor position left one character. An insertion will insert the character typed ("*") into all gel reading sequences at the
cursors position in the contig, and move the relative positions of all sequences starting to the right of the cursor position right one character.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.3\tab Use of buttons \par
\pard\plain \s4\qj\sa120\sl280 \f20 The effect of the last edit can be undone by pressing the "Undo" button at the top of the editor window. Pressing it n times will undo the last n edits.\par
\pard \s4\qj\sa120\sl280 The cursor will automatically be positioned at the next problem when the "Find Next Problem" button is selected. The next problem is where the consensus shows either a disagreement ("-") or a pad ("*") character.\par
\pard \s4\qj\sa120\sl280 The edits to the contig can be saved by pressing the "Leave Editor" button and replying "Yes" to the prompt to "Save changes?".\par
As no changes are made to the working copy of the database until this point it is possible to abort the editor if the edit session ends up in an unsatisfactory state.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.4\tab Displaying traces for readings from fluorescent sequencing machines\par
\pard\plain \s4\qj\sa120\sl280 \f20 The original trace data from which the gel reading sequences were derived can be seen by double clicking (two quick clic
ks) with the middle mouse button on the area of interest. The trace will be displayed with the point clicked at the centre of the trace viewport. All traces that are displayed are maintained in one window, which will display a maximum of four traces. When
four traces are already being displayed and a new one is requested, the one at the top of the window is removed and the new one is added to the bottom. Traces can be removed individually by using the "quit" button in the panel next to the trace. \par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.5\tab Extending reads with the unused data\par
\pard\plain \s4\qj\sa120\sl280 \f20
Sequence data from fluorescent sequencing machines is normally clipped to remove the primer region and the poor quality data from the 3' end is marked to be ignored during assembly. Only the sequence used during assembly is made visible in the XBAP editor.
However the unused data is copied into the database and can be viewed from within the editor. Also the position of this "cutoff" can be altered. To display the unused sequences, press the "Display Cutoff" button at the to
p of the editor window. The cutoff sequence appears in grey. This sequence can be incorporated into the editable sequence, by moving the cutoff position. This is done by positioning the cursor at the end of the sequence, and using Meta-Left-Arrow and Meta-
Right-Arrow to adjust the point of cutoff. The Meta key is a diamond on the Sun keyboard.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.6\tab Using the pop-up menu\par
\pard\plain \s4\qj\sa120\sl280 \f20 A pop-up menu is revealed by depressing the "Control" key on the keyboard and at the same time pressing the left mouse button.\par
\pard \s4\qj\sa120\sl280 The menu has the following functions\:\par
\pard\plain \li1880\sl220 \f4\fs16 Find Next Problem\par
Highlight Disagreements\par
Save Contig\par
Create Tag\par
Edit Tag\par
Delete Tag\par
Search\par
Select Oligo\par
\pard\plain \s4\qj\sa120\sl280 \f20 \par
\pard \s4\qj\sa120\sl280 "Find Next Problem" and "Save Contig" are described above. Operations on tags are described in the section on annotation below, and then searching is outlined.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.7\tab Annotating readings\par
\pard\plain \s4\qj\sa120\sl280 \f20 Parts of a sequence can be annotated to record the positions of primers used for walking, or to mark sites, such as compressions, that have caused problems during sequencing. The annotations ar
e termed "tags". Each tag has a type such as "primer", a position, a length and a comment. Each type has an associated colour that will be shown on the display. First the segment to tag is selected, then it is annotated. The consensus sequence cannot be a
nnotated.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.8\tab Creating a new annotation\par
\pard\plain \s4\qj\sa120\sl280 \f20 Use the left mouse button to position the start of the selection. While this button is being held down, move the mouse to the other end of the segment. The selection can be extended further using the right mouse bu
tton. To create the annotation, invoke the pop-up menu, and select the "Create Tag" function. A small "tag editor" will appear which allows users to select the type of the annotation from a pull-down menu, and specify a comment if desired. To select a new
type pull down the Type menu, and select the entry desired. To enter a comment, simply type into the text window in the tag editor. The annotation is created when the "Leave" button on the tag editor is pressed, and is displayed in the colour defined in th
e tag database file (TAGDB).\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.9\tab Editing an existing annotation\par
\pard\plain \s4\qj\sa120\sl280 \f20
Position the cursor with the left mouse button on the tag, and select the "Edit Tag" off the pop-up menu. This invokes the tag editor, and changes to the type and comment of the annotation can be made. The tag is updated when the "Leave" button is pressed.
\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1180 \b\f20 2.6.10\tab Deleting an annotation\par
\pard\plain \s4\qj\sa120\sl280 \f20 To delete an existing annotation, position the cursor with the left mouse button on the tag, and select the "Delete Tag" off the pop-up menu.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1160 \b\f20 2.6.11\tab Searching\par
\pard\plain \s4\qj\sa120\sl280 \f20
Selecting "Search" brings up a window which can remain present during normal editor operation. The window allows the user to select the direction of search, the type of search and a value to search on. The value is entered into a value text window, then pr
essing the "search" button performs the search. If successful, the cursor is positioned accordingly. An audible tone indicates failure. Pressing the "ok" button removes the search window. The search window is automatically removed when the contig editor is
exited. There are seven different search modes.\par
\pard\plain \s9\fi-560\li1120\sb180\sa60\sl280\tx1700 \b\f20 2.6.11.1\tab Search by position\par
\pard\plain \s4\qj\sa120\sl280 \f20
This positions the cursor at the numeric position specified in the value text window. Eg a value of "1234" causes the cursor to be placed at base number 1234 in the contig. Positioning withing a reading is achieved by prefixing the number with the "@" char
acter, eg "@123" positions the cursor at base 123 of the sequence in which the cursor lies. Relative positions can be specified by prefixing the number with a plus or minus charac
ter. Eg "+1234" will advance the cursor 1234 bases. If possible, the cursor is positioned within the same sequence. The direction buttons have no effect on the operation of "search by position".\par
\pard\plain \s9\fi-560\li1120\sb180\sa60\sl280\tx1720 \b\f20 2.6.11.2\tab Search by reading name\par
\pard\plain \s4\qj\sa120\sl280 \f20
This positions the cursor at the left end of the gel reading specified in the value text window. If the value is prefixed with a slash it is assumed to be a gel reading name. Otherwise it is assumed to be a gel reading number. Eg "123" positions the cursor
at the left end of gel readi
ng number 123. "/a16a12.s1" positions at the start of reading a16a12.s1. If the value was "/a16" the cursor is positioned at the first reading which starts with "a16". The direction buttons have no effect on the operation of "search by reading name".
\par
\pard\plain \s9\fi-560\li1120\sb180\sa60\sl280\tx1700 \b\f20 2.6.11.3\tab Search by tag type\par
\pard\plain \s4\qj\sa120\sl280 \f20
This positions the cursor at the start of the next tag which has the the same type as specified by the type value menu. To change the type, select from the menu that pops up when the mouse is clicked on the button labeled "Type\:". Th
e search can be performed either forwards or backwards from the current cursor position. To find all tags, use "search by annotation", with a null text value string.\par
\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.4\tab Search by annotation\par
\pard\plain \s4\qj\sa120\sl280 \f20
This positions the cursor at the start of the next tag which has a comment containing the string specified in the value text window. The search performed is a regular expression search, and certain characters have special meanings. Be careful when your val
ue string contains ".", "*", "[", "^" or "$". The search can be performed either forwards or backwards from the current cursor position.\par
\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.5\tab Search by sequence\par
\pard\plain \s4\qj\sa120\sl280 \f20
This positions the cursor at the start of the next piece of sequence that matches the value specified in the text value window. The search is for an exact match, which means that the case of the value string is important. The search is performed on the gel
readings themselves, rather than the consensus sequence. The search can be performed either forwards or backwards from the current cursor position.\par
\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.6\tab Search by problem\par
\pard\plain \s4\qj\sa120\sl280 \f20 This positions the cursor at the next place in the consensus sequence which is not "A", "C", "G" or "T". The search can be performed either forwards or backwards from the current cursor position.\par
\pard \s4\qj\sa120\sl280 \par
\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.7\tab Search by quality\par
\pard\plain \s4\qj\sa120\sl280 \f20
This positions the cursor at the next place in the consensus sequence where the consensus for each strand is not "A", "C", "G" or "T" or where the two strands disagree. The search can be performed either forwards or backwards from the current cursor posit
ion.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 \par
2.7\tab Joining contigs interactively using XBAP\par
\pard\plain \s4\qj\sa120\sl280 \f20
The operation of the join editor in XBAP is very similar to the one for single contigs described above. It allows the user to align the ends of the two contigs by editing each contig separately. First specify which two contigs are to be joined. The program
checks that the two contig numbers are different (it will not allow circles to be formed!) The Join Editor consists of two Contig Editors in between which is sandwiched a disagreement box. This disagreement box
uses exclamation marks to denote mismatches between the two consensuses. A typical example is shown in figure 4.4. Here we see in the top window the right end of one contig and in the bottom window the left end of another. The left end of the overlap is c
orrectly aligned, as indicated by an absense of exclamation marks, but the top contig has an extra character at position 558 which is spoiling the alignment over the next segment. Notice that the "lock" button is highlighted denoting that the user has aske
d for the two contigs to scroll together.\par
\pard \s4\qj\sa120\sl280 The best strategy for joining is to align the leftmost character of the right contig with its counterpart in the left contig. Then press the \'d2Lock\'d3
button before editing the contigs to make them align for the whole overlap. The overlap must be of at least
one character. Use the scroll bar and the scroll buttons ("<<", "<", ">", and ">>") for positioning the relative positions of the two contigs. The join position can be fixed by pressing the "lock" button at the top
of the Join Editor. Locking allows the two contigs to be scrolled as one when using the scroll bar and buttons, the left ends always in the same position relative to each other. Once locked, it is best to proceed to the right along the contigs, inserting
padding characters ("*") into the consensuses to minimise the disagreements. It is important that the user aligns the two contigs throughout the whole region of overlap before completing the join because it is only at this stage that the two contigs can be
edited independently. If a join is completed leaving a region of mismatch the consensus will consist of dashes and the assembly function will fail to find overlaps in the bad section. Misaligned sections can be corrected using the "super edit" mode of the
editor. The join can be completed by pressing the "Leave Editor" button. The percentage mismatch is displayed, and users are required to confirm that they want to perform the join.\par
\pard\plain \li100\ri80\sb100\sl220\keepn\box\brsp20\brdrth \f4\fs16 {{\pict\macpict\picw441\pich144
4685ffffffff008f01b81101a0008201000affffffff008f01b80900000000000000003100000000008e01b798007c00000000014003db00000000014003db00000000008e01b7000102850002850026e600001ff9ff0087f8ff01f87ff5ff01fe1fefff0087fcff01fe1ffcff01f87ff2ff00e0f40026e600001ff9ff0084
f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f4003701003cfa000203fc03fa0008630c18000181
80001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f4005b010066fa0002030003fa000ac30c38000380c0001f807ffbff05841f8000003cfd0002087f87fbff07e01fe7fffffe107efc00030f000078fd00133c0f0000841860000600021f9ffffffe7ff84180fb00021fe018fc000020
f400610100c3fe0008c01800000300030603fd000a01830c7800078060001ff3faff058418c000000cfd0002087f33fbff07e7ffe7fffffe1063fc0003030000ccfd001366198000841860000600021f9ffffffe7ff84180fb0002180018fc000020f400670100c0fe0008c01800000300030603fd000a01830cd8000d8060
001ff3fcff07f9ff84186000000cfd0002087e79fbff08e7ffe7cfe7fe106180fd001b030001860006000066198000841860000600021f9ffffffe7ff84180fb00041800183018fe000020f400670100c0fe0000c0fe00040300030003fd000a0301989800098030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08
e7ffe7cfe7fe106180fd001b030001800006000060180000841860000600021f9ffffffe7ff84180fb00041800183018fe000020f400681c00c00f0dc3f0781f4003003b1e0fc0f0de000301981800018030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08e7ffe7ffe7fe106180fd001b03000180000600006018
0000841860000600021f9ffffffe7ff84180fb00041800180018fe000020f400726e00c0198e60c01831c003f0670603019873000301981800018030001ff3e47c0f8790e07f841861e1b80c0fc1f078087f3f9e647e1e43ffe7fe270f81fe1061878618783f03000180619f81e060180fc0841866e0761e021f9ff87e0e73
f841801e0fc61878001801d8f07e0786f020f400726e00c030cc30c01831800300c30603030c60000300f01800018030001ff3e339e733c679ff8418c331cc0c186318cc087f879e633ccf19ffe07cc7cfe7fe10630cc618cc6183000180618603306018186084186730ce33021f9ff33ce667f8418033186618cc001f8338
30180cc39820f400726e00c030cc30c01831800300c30603030c60000300f01800018030001ff3e799fe79cff9ff841f8619860c00660186087ff39e6799e73fffe7f9e7cfe7fe107e18633186018300018061860619f87e1800841866198661821f9fe799fe4ff84180618063318600180618301818630020f400726e00c0
30cc30c01831800300c30603030c60000180f01800018060001ff3e79c0e01cff9ff841987f9860c0fe601fe087ff99e6798073fffe7f9e7cfe7fe10661fe331fe3f830001806186061860180fc0841866198661821f9fe799fe1ff841807f8fe331fe00180618301818630020f400726e00c330cc30c0181f000300c30603
030c60000180601800018060001ff3e79fe67fcff9ff8418c601860c18660180087ff99e6799ff3fffe7f9e7cfe7fe10631801e18061830001806186061860180060841866198661821f9fe799fe0ff84180601861e18000180618301818630020f400726e0066198c30cc18300003006706033198600000c06018070180c0
001ff3e79fe67fcff9ff8418c601860c18660180087e799e6799ff3fffe7f9e7cfe7fe10631801e18061830001866186061860180060841866198661821f9fe799fe47f84180601861e18000180618301818630020f400726e003c0f0c3078ff1f8003fc3b3fc1e0f06000006060ff070ff180001ff3e799e739cff99f8418
6319cc0c186318c6087f33cc633ce73fffe7fcc7cfe67e10618c60c0c661830000cc3386633060181860840cc618ce33021f9ff33ce663f84180319860c0c60018033830198cc30020f4005efa000130c0ef00531f80679c0f83cffc3f841861f1b87f8fa1f07c087f87e2647e0f3fffe01e2601f0fe106187c0c07c3e9fe0
00781d83c1e060180fc084078618761e021f80787e0e71f841fe1f0fa0c07c001fe1d9fe0f07830020f40032fa000130c0ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef000084fc0001021ffcff01f840f2000020f40032fa000130c0ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef00
0084fc0001021ffcff01f840f2000020f40032fa00011f80ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef000084fc0001021ffcff01f840f2000020f4002de600001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001
087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0087f8ff01f87ff5ff01fe1fefff00
87fcff01fe1ffcff01f87ff2ff00e0f40002850002850002850002850002850002850002850002850007001f88ff01fe00180010fc000006fe00010180fe000060fc00000c9d00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000c
c9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de0001020024151000004010000600004001800200006000100400000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de0001020024151000018060
0006000180018001800060000c0300000cc9000001fa550040de00010200241510000300c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de00010200241510000c03000006000c000180003000600001806000
0cc9000002faaa00a0de000102002415100018060000060018000180001800600000c030000cc9000001fa550040de00010200241510000c03000006000c0001800030006000018060000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de000102002415
10000300c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000180600006000180018001800060000c0300000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de00010200241510000040100006000040018002000060
00100400000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00
010180fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200180010fc000006fe00010180fe000060fc00000c9d0001020007001f88ff01fe0007001f88
ff01fe000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200420010ed00000cfb000307f8780cf800037f9fe0c0f9000307f8780cf800037f8780c0f9000301e0300cf800031e0300c0f900
0301e0780cf800071e0780c000000200420010ed00000cfb00030600cc1ef80003600061e0f900030600cc1ef80003600cc1e0f900030330781ef80003330701e0f900030330cc1ef80007330cc1e000000200420010ed00000cfb000306018633f8000360006330f9000306018633f8000360186330f900030618cc33f800
03618f0330f9000306198633f800076198633000000200420010ed00000cfb000306018033f800036000c330f9000306018633f8000360186330f900030600cc33f80003601b0330f9000306018633f800076018633000000200460010ed00000cfb00040601806180f900036000c618f900040601866180f9000360186618
f900040601866180f9000360130618f900040600066180f900076000661800000200460010ed00000cfb000406e1b86180f900036e018618f9000406e0cc6180f900036e186618f9000406e1866180f900036e030618f9000406e0066180f900076e00c61800000200460010ed00000cfb00040731cc6180f9000373018618
f900040730786180f90003730ce618f900040731866180f9000373030618f9000407300c6180f900077303861800000200440010ed00000cfa000319866180f9000301830618f8000318cc6180f9000301876618f900040619866180f9000361830618f900040618386180f900076180c61800000200440010ed00000cfa00
0319866180f9000301830618f8000319866180f9000301806618f900040619866180f9000361830618f900040618606180f900076180661800000200400010ed00000cfa0002198633f8000301860330f80002198633f8000301806330f900030618cc33f8000361830330f900030618c033f8000761986330000002004200
10ed00000cfb000306198633f8000361860330f9000306198633f8000361986330f900030618cc33f8000361830330f9000306198033f800076198633000000200420010ed00000cfb00030330cc1ef80003330c01e0f900030330cc1ef80003330cc1e0f900030330781ef80003330301e0f900030331801ef80007330cc1
e000000200420010ed00000cfb000301e0780cf800031e0c00c0f9000301e0780cf800031e0780c0f9000301e0300cf800031e1fe0c0f9000301e1fe0cf800071e0780c0000002000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c
9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102004b0010fe000dc0781e000001fe1e0001e0000003fe002f0c1fe0c0781e000001fe7f9fe7f8780c0787f9fe7f8301e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8780cc000102004b1110000001c0cc330000018033000330000007fe00
050c0301e0cc33fe0026300c0300c0cc1e0cc0c0300c078330781e0303307833078330300c0cc1e0300c0301e0780c0cc0cc000102004b1110000003c18661800001806180061800000ffe002f0c0303318661800000300c0300c186331860c0300c0cc618cc33030618cc618cc618300c186330300c030330cc0c1860cc00
0102004a1110000006c18660000001806000061800001bfe00050c0303318060fe0025300c0300c180331800c0300c0cc600cc33030600cc600cc600300c180330300c030330cc0c18cb000102004a1110000004c186600000018060000618000013fe00050c0306198060fe0025300c0300c180619800c0300c1866018661
8306018660186600300c180618300c030619860c18cb000102004a0010fe000dc0cc6e0003f1b86e0fc618003f03fe00050c0306198060fe0025300c0300c180619800c0300c18660186618306018660186600300c180618300c030619860c18cb000102004b0010fe000dc07873000619cc73186338006183fe002f0c0306
199e679fe7f8300c0300c1806199e0c0300c18667986618306018667986678300c180618300c030619860c19e0cc000102004b0010fe000dc0cc6180001806618061d8006003fe002f0c0307f98661800000300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f8300c0307f9fe0c1860cc000102004b
0010fe000dc186618003f806618fe018003f03fe002f0c0306198661800000300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860cc000102004b0010fe000dc186618006180661986018000183fe002f0c0306198661800000300c0300c180619860c0300c1866198661830601
8661986618300c180618300c030619860c1860cc000102004b0010fe000dc186618006198661986618000183fe002f0c0306198661800000300c0300c186619860c0300c18661986618306198661986618300c186618300c030619860c1860cc000102004b0010fe000dc0cc33000618cc33186330386183fe002f0c030618
ce33800000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc618300c030619860c0ce0cc000102004b4410000007f8781e0003e8781e0fa1e0383f1fe000000c0306187a1e800000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c078618300c030619860c07a0cc000102000b00
10ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200630010fe000dc0041e000001fe0c03c7f8000003fe00470c1fe0c0780c030001fe7f9fe7f8780c0787f9fe7f83
01e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e03e400010200640d10000001c00c33000001801c0666fe000007fe00480c0301e0cc1e078000300c0300c0cc1e0cc0c0300c078330781e0303307833078330300c0cc1e0300c0301e0780c0cc0c0
3033078330300c030330cc1e078330cc330780c0cc330780e500010200640d10000003c01c61800001803c0666fe00000ffe00480c03033186330cc000300c0300c186331860c0300c0cc618cc33030618cc618cc618300c186330300c030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc0e500
010200640d10000006c03c60000001806c0606fe00001bfe00480c03033180330cc330300c0300c180331800c0300c0cc600cc33030600cc600cc600300c180330300c030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc0e500010200640d10000004c06c60000001804c0606fe000013fe0048
0c0306198061986330300c0300c180619800c0300c18660186618306018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601860e500010200640010fe000dc0cc6e0003f1b80c0606e0003f03fe00480c03061980619861e0300c0300c180619800c0300c1866018661
8306018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601860e500010200640010fe000dc18c73000619cc0c060730006183fe00480c0306199e619867f8300c0300c1806199e0c0300c18667986618306018667986678300c180618300c030619860c19e0c0306018
6678300c0306019e6198660180601860c180601860e500010200640010fe000dc18c61800018060c1f8018006003fe00480c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f8300c0307f9fe0c1860c030601fe618300c030601867f9fe60180601fe0c180601fe0e500010200
640010fe000dc1fe618003f8060c060018003f03fe00480c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601860e500010200640010fe000dc00c61800618060c060018000183fe00480c0306
198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601860e500010200640010fe000dc00c61800619860c060618000183fe00480c0306198661986000300c0300c186619860c0300c1866198661830619
8661986618300c186618300c030619860c1860c03061986618300c030619866198661986619860c186619860e500010200640010fe000dc00c33000618cc0c060330386183fe00480c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc618300c030619860c0ce0c0303318633830
0c030330ce61986330cc331860c0cc331860e500010200645d10000007f80c1e0003e8787f8601e0383f1fe000000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c078618300c030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e1860e5000102000b0010
ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102007d1110000001e03001000001fe1e0067f8000003fe00660c1fe0c0780c030001fe7f9fe7f8780c0787f9fe7f8301
e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e0300c0780c1fe1e0307f9fe000780c1fe1e1fe1e0787f8781e0780c0787f82007d0d1000000330700300000180330066fe000007fe00660c0301e0cc1e078000300c0300c0cc1e0cc0c0300c078330
781e0303307833078330300c0cc1e0300c0301e0780c0cc0c03033078330300c030330cc1e078330cc330780c0cc330781e0cc1e030330780c030000cc1e03033030330cc0c0cc330cc1e0cc0c02007d0d1000000618f00700000180618066fe00000ffe00660c03033186330cc000300c0300c186331860c0300c0cc618cc
33030618cc618cc618300c186330300c030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc3318633030618cc0c030001863303061830619860c18661986331860c02007d0d1000000619b00f00000180600066fe00001bfe00660c03033180330cc330300c0300c180331800c0300c0cc600cc33
030600cc600cc600300c180330300c030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc3318033030600cc0c030001803303060030601800c18060180331800c02007d0010fe000919301b00000180600066fe000013fe00660c0306198061986330300c0300c180619800c0300c186601866183
06018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d0010fe000d1830330003f1b86e0766e0003f03fe00660c03061980619861e0300c0300c180619800c0300c18660186618306
018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d0010fe000d303063000619cc730ce730006183fe00660c0306199e619867f8300c0300c1806199e0c0300c1866798661830601
8667986678300c180618300c030619860c19e0c03060186678300c0306019e6198660180601860c180601866198061830601860c0307f9806183060030601800c18060180619800c02007d0010fe000de030630000180661986018006003fe00660c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe
619fe618300c1807f8300c0307f9fe0c1860c030601fe618300c030601867f9fe60180601fe0c180601fe7f9807f830601fe0c030001807f83060030601800c180601807f9800c02007d111000000180307f8003f80661986018003f03fe00660c0306198661986330300c0300c180619860c0300c18661986618306018661
986618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d11100000030030030006180661986018000183fe00660c0306198661986330300c0300c180619860c0300c1866198661830601866198
6618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d11100000060030030006198661986618000183fe00660c0306198661986000300c0300c186619860c0300c186619866183061986619866
18300c186618300c030619860c1860c03061986618300c030619866198661986619860c186619866198661830619860c030001866183061830619860c18661986619860c02007d1110000006003003000618cc330ce330386183fe00660c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338
300c0cc618300c030619860c0ce0c03033186338300c030330ce61986330cc331860c0cc33186618cc61830331860c030000cc6183033030330cc0c0cc330cc618cc0c02007d7b10000007f9fe030003e8781e0761e0383f1fe000000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e830
0c078618300c030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e18661878618301e1860c03000078618301e0301e0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102
000b0010ed00000c9d000102000b0010ed00000c9d000102007d0010fe000dc0787f800001fe0c180010000003fe00660c1fe0c0780c030001fe7f9fe7f8780c0787f9fe7f8301e0300c1fe1e0301e0301e1fe7f8780c1fe001fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e0300c0780c1fe1e0
307f9fe7f8780c1fe1e1fe1e0787f8781e0780c0787f82007d1110000001c0cc01800001801c180030000007fe00660c0301e0cc1e078000300c0300c0cc1e0cc0c0300c078330781e0303307833078330300c0cc1e030000301e0780c0cc0c03033078330300c030330cc1e078330cc330780c0cc330781e0cc1e03033078
0c0300c0cc1e03033030330cc0c0cc330cc1e0cc0c02007d1110000003c18601800001803c18007000000ffe00660c03033186330cc000300c0300c186331860c0300c0cc618cc33030618cc618cc618300c1863303000030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc3318633030618cc0c
0300c1863303061830619860c18661986331860c02007d1110000006c18003000001806c1800f000001bfe00660c03033180330cc330300c0300c180331800c0300c0cc600cc33030600cc600cc600300c1803303000030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc3318033030600cc0c03
00c1803303060030601800c18060180331800c02007d1110000004c18003000001804c1801b0000013fe00660c0306198061986330300c0300c180619800c0300c18660186618306018660186600300c1806183000030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300
c1806183060030601800c18060180619800c02007d0010fe000dc1b8060003f1b80c1b8330003f03fe00660c03061980619861e0300c0300c180619800c0300c18660186618306018660186600300c1806183000030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300c1
806183060030601800c18060180619800c02007d1110001fe0c1cc06000619cc0c1cc630006183fe00660c0306199e619867f8300c0300c1806199e0c0300c18667986618306018667986678300c180618307f830619860c19e0c03060186678300c0306019e6198660180601860c180601866198061830601860c0300c180
6183060030601800c18060180619800c02007d0010fe000dc1860c000018060c186630006003fe00660c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f830000307f9fe0c1860c030601fe618300c030601867f9fe60180601fe0c180601fe7f9807f830601fe0c0300c1807f
83060030601800c180601807f9800c02007d0010fe000dc1860c0003f8060c1867f8003f03fe00660c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c1806183000030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c0300c1806183
060030601800c18060180619800c02007d0010fe000dc18618000618060c186030000183fe00660c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c1806183000030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c0300c180618306
0030601800c18060180619800c02007d0010fe000dc18618000619860c186030000183fe00660c0306198661986000300c0300c186619860c0300c18661986618306198661986618300c1866183000030619860c1860c03061986618300c030619866198661986619860c186619866198661830619860c0300c18661830618
30619860c18661986619860c02007d0010fe000dc0cc30000618cc0c186030386183fe00660c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc6183000030619860c0ce0c03033186338300c030330ce61986330cc331860c0cc33186618cc61830331860c0300c0cc6183033030
330cc0c0cc330cc618cc0c02007d7b10000007f878300003e8787f986030383f1fe000000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c0786183000030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e18661878618301e1860c0300c078618301e0301e
0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200790010fa007301e078618787f9861e1861e0000c1fe0c0780c030001fe7f9f
e7f8780c0787f9fe7f8301e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e0300c0780c1fe1e0307f9fe7f8780c1fe1e1fe1e0787f8781e0780c0787f8200790010fa00730330cc718cc601c633186330000c0301e0cc1e078000300c0300c0cc1e0c
c0c0300c078330781e0303307833078330300c0cc1e0300c0301e0780c0cc0c03033078330300c030330cc1e078330cc330780c0cc330781e0cc1e030330780c0300c0cc1e03033030330cc0c0cc330cc1e0cc0c0200790010fa007306198671986601c661986618000c03033186330cc000300c0300c186331860c0300c0c
c618cc33030618cc618cc618300c186330300c030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc3318633030618cc0c0300c1863303061830619860c18661986331860c0200790010fa007306018679980601e660186600000c03033180330cc330300c0300c180331800c0300c0cc600cc3303
0600cc600cc600300c180330300c030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc3318033030600cc0c0300c1803303060030601800c18060180331800c0200790010fa007306018679980601e660186600000c0306198061986330300c0300c180619800c0300c1866018661830601866018
6600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa00730601866d8c0601b630186300000c03061980619861e0300c0300c180619800c0300c18660186618306018660186600300c18
0618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa00730601866d8787e1b61e1861e0000c0306199e619867f8300c0300c1806199e0c0300c18667986618306018667986678300c180618300c03
0619860c19e0c03060186678300c0306019e6198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa00730601866780c6019e03186030000c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f8300c0307f9fe0c18
60c030601fe618300c030601867f9fe60180601fe0c180601fe7f9807f830601fe0c0300c1807f83060030601800c180601807f9800c0200790010fa0073060186678066019e01986018000c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c0306018
6618300c030601866198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa0073060186638066018e01986018000c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c03060186618300c03
0601866198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa0073061986639866018e61986618000c0306198661986000300c0300c186619860c0300c18661986618306198661986618300c186618300c030619860c1860c03061986618300c030619866198
661986619860c186619866198661830619860c0300c1866183061830619860c18661986619860c0200790010fa00730330cc618cc60186330cc330000c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc618300c030619860c0ce0c03033186338300c030330ce61986330cc3318
60c0cc33186618cc61830331860c0300c0cc6183033030330cc0c0cc330cc618cc0c0200790010fa007301e078618787f9861e0781e0000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c078618300c030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e18
661878618301e1860c0300c078618301e0301e0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200
0b0010ed00000c9d0001020007001f88ff01fe0007001f88ff01fe000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c
0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe
000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0
fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c03
0000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0
f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c
0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300
c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f0000102000b0010ed00000c9d00010200590010ed
00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c000
00030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f0000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed0000
0c9d000102000b0010ed00000c9d000102000b0010ed00000c9d0001020007001f88ff01fe0007001f88ff01fe00180010fc000006fe00010180fe000060fc00000c9d00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc90000
01fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de0001020024151000004010000600004001800200006000100400000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de0001020024151000018060000600
0180018001800060000c0300000cc9000001fa550040de00010200241510000300c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de00010200241510000c03000006000c0001800030006000018060000cc900
0002faaa00a0de000102002415100018060000060018000180001800600000c030000cc9000001fa550040de00010200241510000c03000006000c0001800030006000018060000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de000102002415100003
00c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000180600006000180018001800060000c0300000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de00010200241510000040100006000040018002000060001004
00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180
fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200180010fc000006fe00010180fe000060fc00000c9d0001020007001f88ff01fe0007001f88ff01fe
000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200340010ed00000cf700020300c0f70001780cf700020780c0f70001040cf700021fe0c0f70001780cf700021fe0c0f70003780c020033
0010ed00000cf700020701e0f70001cc1ef700020cc1e0f700010c1ef700021801e0f70001cc1ef6000161e0f70003cc1e0200360010ed00000cf700020f0330f80002018633f70002186330f700011c33f70002180330f80002018633f600016330f800040186330200360010ed00000cf700021b0330f80002018633f700
02186330f700013c33f70002180330f80002018033f60001c330f800040186330200370010ed00000cf70002130618f70002066180f700016618f700026c6180f80002180618f8000301806180f70001c618f800040186618200370010ed00000cf70002030618f70002066180f70001c618f70002cc6180f800021b8618f8
000301b86180f80002018618f70003cc618200390010ed00000cf70002030618f700020c6180f80002038618f80003018c6180f800021cc618f8000301cc6180f80002018618f7000378618200370010ed00000cf70002030618f70002386180f70001c618f80003018c6180f700016618f8000301866180f80002030618f7
0003cc618200380010ed00000cf70002030618f70002606180f700016618f8000301fe6180f700016618f8000301866180f80002030618f800040186618200350010ed00000cf70002030330f70001c033f70002186330f700010c33f600016330f80002018633f70002060330f800040186330200370010ed00000cf70002
030330f80002018033f70002186330f700010c33f70002186330f80002018633f70002060330f800040186330200350010ed00000cf700020301e0f8000201801ef700020cc1e0f700010c1ef700020cc1e0f70001cc1ef700020c01e0f70003cc1e0200350010ed00000cf700021fe0c0f8000201fe0cf700020780c0f700
010c0cf700020780c0f70001780cf700020c00c0f70003780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200100010ed00000cad0001ffc0f300010200100010ed00000cad0001ffc0f3000102007d0010fe0002
01fe01fe0007041e0001e0000003fe00660c1fe0c0780c0307f9fe7f9fe1e0301e1fe7f9fe0c0780c0307f8780c0780c0787f9fe1e0307f9fe7f8300c1fe1e1fe7f8780c0787f9fe7f8781e0300c0781e0780c1fe1e0780c0301e0307f8780c1fe7f9fe00078f3dfe1e1fe1e0787f8781e0780c0787f82007d0010fe000201
8003fe00070c33000330000007fe00660c0301e0cc1e0780c0300c03033078330300c0301e0cc1e0780c0cc1e0cc1e0cc0c030330780c0300c0781e030330300c0cc1e0cc0c0300c0cc330781e0cc330cc1e030330cc1e078330780c0cc1e0300c030000cce1c3033030330cc0c0cc330cc1e0cc0c02007d0010fe00020180
07fe00071c6180061800000ffe00660c03033186330cc0c0300c030618cc618300c03033186330cc0c18633186331860c030618cc0c0300c0cc33030618300c186331860c0300c186618cc33186619863303061986330cc618cc0c186330300c03000186ccc3061830619860c18661986331860c02007d0010fe000201800f
fe00073c6000061800001bfe00660c03033180330cc0c0300c030600cc600300c03033180330cc0c18033180331800c030600cc0c0300c0cc33030600300c180331800c0300c180600cc33180601803303060180330cc600cc0c180330300c03000180ccc3060030601800c18060180331800c02007d0010fe000201801bfe
00076c60000018000013fe00660c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c18661830600300c180619800c0300c180601866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007d0010fe000d01b8330003
f0cc6e078030003f03fe00660c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c18661830600300c180619800c0300c180601866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007d0010fe000d01cc63000619
8c730cc0e0006183fe00660c0306199e619860c0300c03060186678300c0306199e619860c1806199e6199e0c030601860c0300c18661830678300c1806199e0c0300c180679866198060180618306018061986601860c180618300c0307f9809e43060030601800c18060180619800c02007c0010fd000c06630000198c61
986030006003fe00660c0307f9867f9fe0c0300c030601fe618300c0307f9867f9fe0c1807f9867f9860c030601fe0c0300c1fe7f830618300c1807f9860c0300c180619fe7f980601807f830601807f9fe601fe0c1807f8300c030001808043060030601800c180601807f9800c02007c0010fd000c067f8003f9fe619fe0
18003f03fe00660c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c180619866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007c0010fd000c06030006180c6198061800
0183fe00660c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c180619866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007d0010fe000d0186030006180c619806180001
83fe00660c03061986619860c0300c03061986618300c03061986619860c18661986619860c030619860c0300c18661830618300c186619860c0300c186619866198661986618306198661986619860c186618300c030001869e43061830619860c18661986619860c02007c0010fd000ccc030006180c330c6330386183fe
00660c030618ce619860c0300c03033186338300c030618ce619860c0cc618ce618ce0c030331860c0300c18661830338300c0cc618ce0c0300c0cc33986618cc330cc61830330cc61986331860c0cc618300c030000cc9e43033030330cc0c0cc330cc618cc0c02007c0010fd007678030003e80c1e07c1e0383f1fe00000
0c0306187a619860c0300c0301e1861e8300c0306187a619860c0786187a6187a0c0301e1860c0300c186618301e8300c0786187a0c0300c0781e986618781e078618301e078619861e1860c078618300c030000789e4301e0301e0780c0781e078618780c0200100010ed00000cad0001ffc0f300010200100010ed00000c
ad0001ffc0f300010200100010ed00000cad0001ffc0f300010200100010ed00000cad0001ffc0f300010200100010ed00000cad0001ffc0f3000102000b0010ed00000c9d000102000b0010ed00000c9d00010200790010fa007301e078618787f9861e1861e0000c1fe0c0780c0307f9fe7f9fe1e0301e1fe7f9fe0c0780
c0307f8780c0780c0787f9fe1e0307f9fe7f8300c1fe1e1fe7f8780c0787f9fe7f8781e0300c0781e0780c1fe1e0780c0301e0307f8780c1fe7f9fe000780c1fe1e1fe1e0787f8781e0780c0787f8200790010fa00730330cc718cc601c633186330000c0301e0cc1e0780c0300c03033078330300c0301e0cc1e0780c0cc1
e0cc1e0cc0c030330780c0300c0781e030330300c0cc1e0cc0c0300c0cc330781e0cc330cc1e030330cc1e078330780c0cc1e0300c030000cc1e03033030330cc0c0cc330cc1e0cc0c0200790010fa007306198671986601c661986618000c03033186330cc0c0300c030618cc618300c03033186330cc0c18633186331860
c030618cc0c0300c0cc33030618300c186331860c0300c186618cc33186619863303061986330cc618cc0c186330300c030001863303061830619860c18661986331860c0200790010fa007306018679980601e660186600000c03033180330cc0c0300c030600cc600300c03033180330cc0c18033180331800c030600cc0
c0300c0cc33030600300c180331800c0300c180600cc33180601803303060180330cc600cc0c180330300c030001803303060030601800c18060180331800c0200790010fa007306018679980601e660186600000c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c1866
1830600300c180619800c0300c180601866198060180618306018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa00730601866d8c0601b630186300000c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c18661830600300
c180619800c0300c180601866198060180618306018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa00730601866d8787e1b61e1861e0000c0306199e619860c0300c03060186678300c0306199e619860c1806199e6199e0c030601860c0300c18661830678300c1806199e0
c0300c180679866198060180618306018061986601860c180618300c0307f9806183060030601800c18060180619800c0200790010fa00730601866780c6019e03186030000c0307f9867f9fe0c0300c030601fe618300c0307f9867f9fe0c1807f9867f9860c030601fe0c0300c1fe7f830618300c1807f9860c0300c1806
19fe7f980601807f830601807f9fe601fe0c1807f8300c030001807f83060030601800c180601807f9800c0200790010fa0073060186678066019e01986018000c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c18061986619806
0180618306018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa0073060186638066018e01986018000c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c180619866198060180618306
018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa0073061986639866018e61986618000c03061986619860c0300c03061986618300c03061986619860c18661986619860c030619860c0300c18661830618300c186619860c0300c1866198661986619866183061986619866
19860c186618300c030001866183061830619860c18661986619860c0200790010fa00730330cc618cc60186330cc330000c030618ce619860c0300c03033186338300c030618ce619860c0cc618ce618ce0c030331860c0300c18661830338300c0cc618ce0c0300c0cc33986618cc330cc61830330cc61986331860c0cc6
18300c030000cc6183033030330cc0c0cc330cc618cc0c0200790010fa007301e078618787f9861e0781e0000c0306187a619860c0300c0301e1861e8300c0306187a619860c0786187a6187a0c0301e1860c0300c186618301e8300c0786187a0c0300c0781e986618781e078618301e078619861e1860c078618300c0300
0078618301e0301e0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102
0007001f88ff01fe00028500028500a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.4\tab A typical display from the join editor in XBAP.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.8\tab Selecting primers and templates\par
\par
\pard\plain \qj \f4\fs16 {\plain \f20 1. Select "Edit contig". The primer and template selection function is available from the popup menu of the contig editor.\par
}\pard \qj {\plain \f20 \par
}\pard \qj {\plain \f20 2. Open the oligo selection window, by selecting "Select Oligo" from the contig editor popup menu.\par
}\pard \qj {\plain \f20 \par
}\pard \qj {\plain \f20 3. Position the cursor to where you want the oligo to be chosen. While the oligo selection window is visible, you will still have complete control over positioning and editing within the contig editor.\par
}\pard \qj {\plain \f20 \par
}\pard \qj {\plain \f20 4. Indicate the strand for which you require an oligo. This is done by toggling the direction arrow ("----->" or "<------"), if necessary.\par
}\pard \qj {\plain \f20 \par
}\pard \qj {\plain \f20
5. Press the "Find Oligos" button to find all suitable oligos (See "Oligo selection" in Note 17.) Information for the closest oligo to the cursor position is given in the output text window. In the contig editor the position of the oligo is marked by a
temporary tag on the consensus. The window is recentered if the oligo is off the screen. Selecting "Display Selection Information" will print a short report on the numbers of oligos considered and rejected during oligo selection. \par
}\pard \qj {\plain \f20 \par
}\pard \qj {\plain \f20 6. If this oligo is not suitable (it may have been previously chosen, and found to be unsuitable by experimentation, say), the next closest oligo can be viewed by pressing "Select Next". \par
}\pard \qj {\plain \f20 \par
}\pard \qj {\plain \f20
7. Suitable templates are automatically identified for the currently displayed oligo (See "Template selection" in Note 18.) By default, the template is that closest to the oligo site. If the choice is not suitable (it may be known to be a poor quality
template, say) another can be chosen from the "Choose Template for this Oligo" menu. Templates that do not appear on the menu can be specified by selecting "other". However, the template must be on the correct strand and be upstream of the oligo. \par
}\pard \qj {\plain \f20 \par
}\pard \qj {\plain \f20
8. A tag can be created for the current oligo by pressing the button "Create a tag for this oligo". The annotation for this tag holds the name of the template and the oligo primer sequence. There are fields to allow the user to specify their own primer
name ("serial#") and comments ("flags") for this tag. An example of oligo tag annotation\: \par
}\pard \qj {\plain \f20 \par
serial#= \par
template=a16a9.s1 \par
sequence=CGTTATGACCTATATTTTGTATG \par
flags=\par
\par
}\pard \qj {\plain \f20 9. The oligo selection window is closed when "Create a tag for this oligo" or "Quit" is selected. \par
}\pard \qj {\plain \f20 \par
}\pard\plain \s6\qj\sa60\tx560\tx860 \b\f20 \par
\pard \s6\sa60\sl280\tx560\tx860 2.9\tab Examining the "quality" of a contig\par
\pard\plain \s4\qj\sa120\sl280 \f20 This function reports on the proportion of the consensus that is "well determined" and will display a sequence of symbols that indicate the quality
of the consensus at each position or produce a graphical display. Each strand of the contig is analysed separately using the consensus algorithm, and a position is declared "well determined" if it is assigned one of the symbols a,c,g,t. The current consen
sus calculation cutoff score is used.\par
\pard \s4\qj\sa120\sl280 A summary showing the percentage of the consensus that falls into each category of quality is shown. The analysis divides the data into five categories, assigning each a code as shown in figure 4.5. Code 0 means well
determined on both strands and they agree, 1 means well determined on the plus strand only, 2 means well determined on the minus strand only, 3 means not well determined on either strand and 4 means well determined on both strands but they disagree. If
the user chooses to have the data displayed graphically the following scheme is used. A rectangular box is drawn so that the x coordinate represents the length of the contig. The box is notionally divided vertically into 5 possible levels which are given t
he y values\:
-2,-1,0,1,2. The quality codes assigned to each base position are plotted as rectangles. Each rectangle represents a region in which the quality codes are identical, so a single base having a different code from its immediate neighbours will a
ppear as a very narrow rectangle. Obviously a single line at the midheight shows a perfect sequence. In figure 4.6 we show the result for the section of contig shown in figure 4.8.\par
\pard \s4\qj\sa120\sl280 \par
\par
\par
\par
\par
\pard \s4\qj\li1580\ri1760\sb160\sl280\box\brsp100\brdrth \tqc\tx2000\tqc\tx3960\tqc\tx6360 \tab {\b Strands\tab Quality\tab Y cordinates\par
}\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx2000\tqc\tx3960\tqc\tx6200 {\b \tab OK\tab code\par
}\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx1780\tqc\tx2120\tx2380\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab +\tab -\tab and the same\tab 0\tab 0\tab to\tab 0\par
\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx1780\tqc\tx2120\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab +\tab \tab 1\tab 0\tab to\tab 1\par
\tab -\tab \tab 2\tab -1\tab to\tab 0\par
\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx2120\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab neither\tab 3\tab -1\tab to\tab 1\par
\pard \s4\qj\li1580\ri1760\sa60\sl280\keepn\box\brsp100\brdrth \tqc\tx1780\tqc\tx2120\tx2400\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab +\tab -\tab but different \tab 4\tab -2\tab to\tab 2\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.5\tab The codes and coordinates used by the "Quality plot". \par
\par
\pard\plain \li1500\ri1660\sb400\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 94.67 % OK on both strands and they agree(0)\par
\pard \li1500\ri1660\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth 0.67 % OK on plus strand only(1)\par
2.00 % OK on minus strand only(2)\par
2.67 % Bad on both strands(3)\par
0.00 % OK on both strands but they disagree(4)\par
\pard \li1500\ri1660\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth {\fs22 \par
}\pard \li1500\ri1660\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth 3310 3320 3330 3340 3350\par
0000000000 0000000000 0000000000 0000000000 0000000000\par
\par
3360 3370 3380 3390 3400\par
0020000000 0000000032 0000032000 0000000000 0300000030\par
\par
3410 3420 3430 3440 3450\par
\pard \li1500\ri1660\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth 0000000000 0010000000 0000000000 0000000000 0000000000\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4 6\tab Listed output from "Examine Quality" showing the results for the section of contig displayed in figure 4.8.\par
\pard \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.10\tab Using graphical displays to examine contigs\par
\pard\plain \s4\qj\sa120\sl280 \f20 The programs contain three graphical displays to aid the examination of contigs. The first simply gives an overview of all the contigs in the database and provides, with the use of a
crosshair, a mechanism for the other two displays to select contigs. One of these displays produces a schematic representation of each of the readings in a contig. The lines in the display show the relative positions of each reading and also their sense. T
he plot is divided vertically into two sections by a line that is identified by an asterisk drawn at each end. All lines that lie above this line represent readings that are in their original sense, all lines below show readings that are in the complementa
ry sense. The final graphical display is of the "quality" of the data as described above.\par
\pard \s4\qj\sa120\sl280
When these graphical displays are visible users may employ a crosshair, moved by mouse or keyboard commands, to examine the data in more detail. The crosshair is positioned and when keyboard characters S, Q, N or Z are typed the program will show the local
aligned sequences in a text window, produce the quality plot, give the names of the nearest readings or zoom into the display. \par
\pard \s4\qj\sa120\sl280 A typical display of all three plots
is shown in figure 4.7. The top rectangle shows a separate line for each of the projects contigs. The righthand one is bisected by a vertical line indicating that it has been selected by the user. The next rectangle below is divided by a horizontal line ma
rked at each end by an asterisk. Each of the other horizontal lines in the box represents one of the selected contigs gel readings. Those above the dividing line are in their original orientation, those below have been complemented. The box below is also d
ivided by a horizontal line and shows the "quality" for each base in the contig. Rectangluar areas marked above the central line show sections that only have a good consensus on the minus strand, and rectangles below show good sections from the other stran
d. Places where the vertical lines reach the top and bottom of the box show disagreements between the two strands. Places with only the midline have a good consensus on both strands.\par
\pard\plain \li80\sl220\keepn\tx720 \f4\fs16 {{\pict\macpict\picw441\pich231
237effffffff00e601b81101a0008201000affffffff00e601b8090000000000000000310000000000e501b79800780000000001f103bb0000000001f103bb0000000000e501b70001028900028900028900028900028900090100158e550054ff000901000f8eff00f0ff000d0100089b000008f5000010ff000d0100089b
000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b00
0008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b0000
08f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff005d14000801ffc0003ff803ff801ffc00007fff003fff80fb00003ffdff0ef000007fffe0
003ffc03ffff00001ffdff00c0fd00003ffdff03f000007ff8ff04f003fffffefe00003ffcff00f0fc00000ffdff00fcfd000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff
000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff00
0d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d
0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff005813000fff007fffe00ffe00fff007ffffc001ffe000faff00e0fd000e1fffffc0003fffe007fe0001fffff0fd00007ffdff00e0fd00031fffffc0f800041ffe000003feff00e0fc00001ffcff00f8fd000007f0ff00
f0ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010
ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff
000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000901000f8eff00f8ff0009010008
8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009
0100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0018010008ba00007ffaff038000003ffaff00
f8e9000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00
090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001d010008c40000
1ffbff00e0f7000007faff00fcef000007f9ff00f0ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009010008
8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001d
010008e2000307fffffcd600000ffaff00feee000007faff01fe10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010
ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e
000010ff00090100088e000010ff001f010008fc000001fbff00c0c000007ffaff00c0ee000007fbff02fe0010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00
090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e0000
10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001a02000801f9ff00f8c000faff02c00000faff00f0eb000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e0000
10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100188e000030ff00090100188e000030ff00090100188e000030ff00090100188e00023000000b020718
e09000030e31c0000b020718e09000030e31c0000b020789e09000030f13c0000a0100ff8f000101feff00090100188e000030ff000901003f8eff02f800000b0201db8090000303b700000b020799e09000030f33c0000b020718e09000030e31c0000b020618609000030c30c000090100188e000030ff000901003c8e00
0078ff00090100188e000030ff00090100188e000030ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100
088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001d01000ffaff00c0d5000007f0ff00e0f3000001f0ff00e0f6000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00
090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e0000
10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001c02000803fbff00f8c400007ff0ff0200000ff9ff0080f1000010ff00090100088e000010ff00090100088e000010ff00090100088e
000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff000901
00088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001f010008f800f1ff00c0d500007ffbff00c0ed00003ffbff00c0f8000010ff00090100088e000010ff00090100088e0000
10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009010008
8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001f010008e900007ff3ffe2000007fbff00fcea00007ffaff00f0fc000010ff
00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e00
0010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0019010008dc000001f2ff00feef
000003faff0080df000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009010008
8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0019
010008db00007ffbff00f0de000003faff00c0e8000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff000901
00088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff
000901000f8eff00f0ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001
f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce
000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f800
0008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff00190100
08f8000008ce000001f3000010fc000004e1000010ff004203000fffb0fa00034800027fefff03fc02065fe7ff03f3800001fe00133fffffc201880000105177000001042408006002fe0001425ff4ff00fcfb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe001320
00244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410
ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe0013200024420188
0000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff00520300
0873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177
000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa00
0348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe001320002442018800001051770000010424
08006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240
ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe
00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402
065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe
000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc0002
6c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe0003010000
40fe0003e0000004fb0002400410ff000901000f8eff00f0ff005703000873b8fc00052800c8000240ef00030402067cfc00026c00c0ef000313800001fe00136000244201880800105177000001043408086002fe000163f0fe000301000040fe0006e0000004000010fe0002400410ff004c03000852b8fc00022800c8e9
000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe00
00e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc0012400024000188
08001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc0002
2800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe00030100
0040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc00124000
2400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852
b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe
000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc
001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c
03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd00
0163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004a03000852b8fc00022fffc8e9000020fc00026c0040ef00
010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc00001ffcff00f0ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010
ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1
000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc00
0004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f30000
10fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff000901000f8eff00f0ff00028900028900028900028900028900028900a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.7\tab A typical graphical display from XBAP or SAP.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \par
2.11\tab Disassembling contigs\par
\pard\plain \s4\qj\sa120\sl280 \f20
Sometimes it is necessary to drastically alter contigs. We may need to break a contig in two, remove a single reading, remove a whole set of consecutive readings from a contig, or remove a set of readings from the database independent of which contigs they
are in. \par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.11.1\tab Removing a single reading\par
\pard\plain \s4\qj\sa120\sl280 \f20 This function is found in the "Alter relationships" menu. The user types in the number of the reading to be removed. If the reading is required to hold the contig together - i.e. is the only one cove
ring a particular region - the program will create an extra contig consisting of the data to the right of the removed reading. The original contig will be shortened accordingly.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.11.2\tab Removing a set of readings\par
\pard\plain \s4\qj\sa120\sl280 \f20 This function is called "Disassemble readings" and can remove any group of readings from a database. It works in two modes\:
1. A set of adjacent readings in a contig can be removed by the user naming the two end ones (the left one first); 2. A set of readings from any number of contigs can be remove
d by the user giving the name of a file that contains their names. In both modes the program cleans up the database by moving data to fill up any holes made in the files.\par
For both modes of operation the program request a file of file names. If the user creates their own file (i.e. mode 2) each reading name must be on a separate line of the file. For mode 1 the user names the leftmost then the rightmost reading for removal.
They MUST be in left to right order. They and all intervening readings will be removed. For both modes, if necessary, new contigs will be created. \par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.11.3\tab Breaking a contig\par
\pard\plain \s4\qj\sa120\sl280 \f20
This function is found in the "Alter relationships" menu. It can be used to break a contig at the beginning of a particular reading so that the identified reading becomes the left end of a new contig. The user types in the number of the reading that will b
ecome the left end.\par
\pard \s4\qj\sa120\sl280 \par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.12\tab Shuffling pads\par
\pard\plain \s4\qj\sa120\sl280 \f20 One weakness of the assembly routine is that padding characters introduced to line up the readings are not always aligned with the pads in other sequences\:
a single problem such as a compression can give rise to pads apparently randomly arranged in the different readings covering the region. This function attempts to shuffle the pads around so that they align with one another, h
ence simplifying editing. No information is lost in the process\: only the positions of padding characters are changed. The function is best used prior to editing.\par
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.13\tab Displaying a contig\par
\pard\plain \s4\qj\sa120\sl280 \f20 The "Display a contig" option shows the aligned readings for any par
t of a contig. Users select "Display a contig", then select the contig. The number, name and strandedness of each reading is shown and the consensus is written below. A typical example, showing part of a contig from positions 3301 to 3450, is seen in figu
re 4.8. Overlapping this region are readings 3, 40, 8, 37, 35 and 2, with archive names L3.SEQ, A21A7.S1 and so on. Readings 3, 8, 35 and 2 are in reverse orientation as indicated by the minus signs. There are a few padding characters in the working versio
ns, but the consensus (shown below each page width) has a definite assignment for every position except 3376. \par
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.14\tab Highlighting differences between readings and the consensus\par
\pard\plain \s4\qj\sa120\sl280 \f20
During the latter stages of a project this option is used to highlight disagreements between individual gel readings and their consensus sequences. Typical output is seen in the figure 4.9 which shows the result for the section of contig shown in figure 4.
8. Characters that agree with the consensus are shown as + symbols for the plus
strand and - for the minus strand. Characters that disagree with the consensus are left unchanged and so stand out clearly. Note that a similar display is now more conveniently available within the contig editor.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Set the consensus cutoff score.\par
2.\tab Redirect output to disk.\par
3.\tab Display the contig.\par
4.\tab Close the redirection file.\par
5.\tab Select "Highlight disagreements".\par
6.\tab Define the name of the redirection file.\par
7.\tab Define an output file name.\par
8.\tab Select a symbol for good plus strand data.\par
9.\tab Select a symbol for good minus strand data.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \page \par
\pard\plain \li760\ri760\sl220\box\brsp100\brdrth \tqr\tx8240 \f4\fs16 10.\tab Print the file.{\plain \f20 \par
}\pard \li760\ri760\sl220\box\brsp100\brdrth \tqr\tx8240 \tab 3310 3320 3330 3340 3350\par
\pard \li760\ri760\sl220\box\brsp100\brdrth -3\tab L3.SEQ \tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
40\tab A21A7.S1\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
-8\tab A16A2.S1\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
37\tab A21A2.S1\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
\tab CONSENSUS\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
\par
\tab 3360 3370 3380 3390 3400\par
-3\tab L3.SEQ\tab gatctgaccaagcgacag*tttaaa*gtgctgcttgccatt*ctgcgt*a\par
40\tab A21A7.S1\tab gatctgaccaagcgacag*gttaaagttgctgctt\par
-8\tab A16A2.S1\tab gatctgaccaagcgacag*tttaaa*gtgctgcttgccatt*ctgcgt*a\par
37\tab A21A2.S1\tab ga-ctgaccaagcgacag*tttaaa*gtgctgcttgccatt*ctgcgt*a\par
35\tab A16D12.S1\tab gttttaaa-gtgctgcttgccatttctgcgtaa\par
-2\tab L2.SEQ\tab t*ctgcgt*a\par
\tab CONSENSUS\tab gatctgaccaagcgacag*tttaaa-gtgctgcttgccatt*ctgcgt*a\par
\par
\tab 3410 3420 3430 3440 3450\par
-3\tab L3.SEQ\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
-8\tab A16A2.S1\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
37\tab A21A2.S1\tab aaacctatgggtgggaataaaccaatggacagaatcaccgattctcaact\par
35\tab A16D12.S1\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
-2\tab L2.SEQ\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
\pard \li760\ri760\sl220\box\brsp100\brdrth \tab CONSENSUS\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.8\tab Typical output from "Display contig".\par
\pard\plain \li840\ri940\sb320\sl220\box\brsp100\brdrth \f4\fs16 3310 3320 3330 3340 3350\par
\pard \li840\ri940\sl220\box\brsp100\brdrth -3 L3.SEQ --------------------------------------------------\par
40 A21A7.S1 ++++++++++++++++++++++++++++++++++++++++++++++++++\par
-8 A16A2.S1 --------------------------------------------------\par
37 A21A2.S1 ++++++++++++++++++++++++++++++++++++++++++++++++++\par
atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
\par
3360 3370 3380 3390 3400\par
-3 L3.SEQ -------------------------*------------------------\par
40 A21A7.S1 +++++++++++++++++++g+++++gt++++++++\par
-8 A16A2.S1 -------------------------*------------------------\par
37 A21A2.S1 ++-++++++++++++++++++++++*++++++++++++++++++++++++\par
-35 A16D12.S1 -t----------------------t------a-\par
-2 L2.SEQ ----------\par
gatctgaccaagcgacag*tttaaa-gtgctgcttgccatt*ctgcgt*a\par
\par
3410 3420 3430 3440 3450\par
-3 L3.SEQ --------------------------------------------------\par
-8 A16A2.S1 --------------------------------------------------\par
37 A21A2.S1 ++++++++++++g+++++++++++++++++++++++++++++++++++++\par
-35 A16D12.S1 --------------------------------------------------\par
-2 L2.SEQ --------------------------------------------------\par
\pard \li840\ri940\sl220\keepn\box\brsp100\brdrth aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 \par
\pard \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 Figure 4.9\tab Typical output from "Highlight disagreements", showing the results for the section of contig displayed in figure 4.8.\par
\pard \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.15\tab Screen editing contigs in SAP\par
\pard\plain \s4\qj\sa120\sl280 \f20 When using SAP the best way for users to edit a whole contig interactively is to use their prefered external editor on the standard display of a contig. When the screen edit function is selected SAP writ
es a text file containing a display of the contig and passes it to an external editor - say EDT on the VAX or emacs on a UNIX system. The user modifies the file using the editor and when the editor is exited SAP moves the changed contig back into the proje
ct database.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Screen edit".\par
2.\tab Select the contig to edit.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define a temporary file for use by the editor. After a slight pause the editor will start and the first page of the contig will appear on the screen.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Edit the contig using the editors standard commands.\par
5.\tab Exit from the editor.\par
6.\tab Accept "Put contig back into the database".\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.16\tab Automatic editing of contigs in SAP\par
\pard\plain \s4\qj\sa120\sl280 \f20
This function automatically changes characters in gel readings to make them agree with the consensus sequence. At first sight this may seem like an unethical procedure but as is explained in the notes it is quite legitimate and saves a great deal of time.
In figure 4.10 we show the effect on using autoedit on the section of contig displayed in figure 4.8. All changed characte
rs (for example position 3369, reading A21A7.S1) are denoted by uppercase letters. Note that apart from position 3375 which has an unresolved consensus all other changes have been made. These edits were made using a combined consensus for both strands, but
the standard version of the program treats each strand separately and will only make a change if the consensus for the two strands agree.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Redirect output to disk.\par
2.\tab Select "Display contig".\par
3.\tab Identify the contig to edit/display.\par
4.\tab Close the redirection file.\par
5.\tab Print the file containing the displayed contig.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Check the contig and the original films and annotate the printout to indicate the required edits.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Set the cutoff for the consensus calculation.\par
8.\tab Select "Auto edit".\par
9.\tab Identify the contig and the section to edit. \par
10.\tab The program will display a summary of changes made.\par
11.\tab Display the contig and compare it with the annotated printout.\par
12.\tab Use another editing method to finish the editing.\par
\pard\plain \li820\ri960\sl220\pagebb\box\brsp100\brdrth \f4\fs16 3310 3320 3330 3340 3350\par
\pard \li820\ri960\sl220\box\brsp100\brdrth -3 L3.SEQ atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
40 A21A7.S1 atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
-8 A16A2.S1 atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
37 A21A2.S1 atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
CONSENSUS atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
\par
3360 3370 3380 3390 3400\par
-3 L3.SEQ gatctgaccaagcgacagtttaaa*gtgctgcttgccattctgcgtaaaa\par
40 A21A7.S1 gatctgaccaagcgacagTttaaagGtgctg\par
-8 A16A2.S1 gatctgaccaagcgacagtttaaa*gtgctgcttgccattctgcgtaaaa\par
37 A21A2.S1 gaTctgaccaagcgacagtttaaa*gtgctgcttgccattctgcgtaaaa\par
-35 A16D12.S1 gtttaaa-gtgctgcttgccattctgcgtaaaa\par
-2 L2.SEQ tctgcgtaaaa\par
CONSENSUS gatctgaccaagcgacagtttaaa-gtgctgcttgccattctgcgtaaaa\par
\par
3410 3420 3430 3440 3450\par
-3 L3.SEQ cctatgggtggaataaaccaatggacagaatcaccgattctcaacttag\par
-8 A16A2.S1 cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par
37 A21A2.S1 cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par
-35 A16D12.S1 cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par
-2 L2.SEQ cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par
\pard \li820\ri960\sl220\keepn\box\brsp100\brdrth CONSENSUS cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc{\fs22 \par
}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.10\tab The result of applying the "Auto editor" to the section of contig displayed in figure 4.5.\par
\pard\plain \s6\sb400\sa60\sl280\tx560\tx860 \b\f20 2.17\tab Using the original editor in SAP\par
\pard\plain \s4\qj\sa120\sl280 \f20 This simple editor can insert, delete
and change gel reading sequences by performing one selected operation at a time. It is used during the interactive entry of new readings and interactive joining of contigs. The commands request the position at which the edit is required and the number of
characters to insert, delete or change.\par
\pard\plain \s5\sb400\sa160\sl320\tx560 \b\f20\fs28 3. NOTES\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
As each reading is entered into a project database it is given a unique number. The first is numbered 1, the second 2 and so on. Their original file names (known as "archives" because they are kept outsid
e the database and never edited) are also copied into the database. During assembly contigs are constantly being changed and reordered so the program identifies them by the numbers or names of the readings they contain. Whenever the program asks users to i
dentify a contig or reading they can type its number or its archive name. If they type its archive name they must precede the name by a slash "/" symbol to denote that it is a name rather than a number. For example if the archive name is fred.gel with numb
er 99, users should type /fred.gel or 99 when asked to identify the contig. Generally, when it asks for the reading to be identified, the program will offer the user a default name, and if the user types only return, that contig will be accessed. When a da
tabase is opened the default contig will be the longest one, but if another is accessed, it will subsequently become the current default. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab An XBAP database is made from five separate files\: the "archive names" file *.ARN, the "relationships" file *.RLN,
the "sequences" file *.SQN, the "tag" file *.TGN, and the "comments" file *.CCN. If the database is called FRED then version 0 of database FRED comprises files FRED.AR0, FRED.RL0, FRED.SQ0, FRED.TG0 and FRED.CC0. The version is the last symbol in the file
names. If the "copy database" option is used it will ask the user to define a new "version". The normal strategy is to use version 0 for all work and to use other versions as backups. Program SAP uses databases formed from only the first three of these f
iles. Normally the program is used to handle DNA sequences but many of the functions also work on protein sequences. The choice of sequence type is made when the database is started.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab The vector sequence should be stored in a simple text file with up to 80 characters of data per line. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
Almost all readings are assembled automatically in their first pass through the assembly routine. Those that are not can be dealt with in two ways. Either they can be put through assembly again as single named rea
dings (Users should type n when asked "Use file of file names"), with the parameters set to allow the reading in. Or they can be entered through the assembly routine using the "Put all readings in new contigs" mode, and then joined to the contig they overl
ap using the Contig Joining Editor. If it is found that readings are not being assembled in their first pass through the assembler, then it is likely that the contigs require some editing to improve the consensus. Also it may be that poor quality data is b
eing used, possibly by users overinterpretting films or traces. In the long term it can be more efficient to stop reading early and save time on editing. For those using fluorescent sequencing machines the unused data can be incorporated after assembly.
\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Obviously we cannot use a script to operate a program that expects to be controlled by mouse clicks! The program BAP is an xterm version of XBAP which can be used from a script.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab There is a remote possibility of a join being missed by the "Find internal
joins" routine. If a small contig is wholly contained within a larger one, such that its ends are further than ("Probe length" - "Minimum initial match length") from the ends of the larger contig, and the consensus for the small contig lies to the left of
the consensus for large contig, the overlap will not be discovered. (See the search strategy).\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab For those using fluorescent sequencing machines and XBAP the combination of the contig editor and the graphical displays of consensus "quality" will probably
be sufficient for checking and editing contigs as everything can be done at the computer screen. For those using autoradiographs the facility to produce printouts of "display" and "highlight disagreements" options for use while checking films, and the aut
oedit command are most appropriate.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab
In general the quality of a reading deteriorates along the length of the gel and so it is also possible to use a length cutoff for the quality calculation. Only the data from the first section of each reading will be included in the calculation. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab
There are some limitations on the changes that can be made to the contigs when using the SAP screen editor. Alignments must be maintained during editing. Whole lines of sequence should not be deleted or added unless the order of the gel readings in the
contig is preserved. Each line in the contig display consists of gel reading numbers, their names and 50 character sections of sequence. Insertions are limited in the following way. No line of sequence can be extended rightwa
rds more than 5 characters beyond the end of a full length line (a full length line is 50 characters long). Only one character can be added to the left end of full length lines, but sections of sequence beginning further into a line can be extended leftwar
ds up to an equivalent position. Do not delete any non-sequence lines in the file. Before returning the contig to the database the program checks that the rules have been obeyed. If an error is found the number of the erroneous line in the file is displaye
d and the contig will not be changed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab
The following is a justification for using the auto edit function. The general strategy employed when collecting shotgun sequence data is to keep sequencing until the redundancy in the contigs is fairly high, and then to get a printout of a contig, che
ck problems against the films, note corrections on the printout, and make the changes using an interactive editor. In general the consensus is correct except for places where padding characters have been used to accomm
odate a single gel with an extra character, or where the consensus is dash. The important point for the auto editor is that most edits simply make the gel readings conform to the consensus, or remove columns of pads. The auto editor does the following. 1)
calculates a consensus for the contig (or part of a contig) to be edited, and then uses this consensus to direct the editing of the contig in 3 stages 2) stage 1\:
find and correct all places where, if the order of two adjacent characters is swapped, they will both agree with the consensus (given that they did not match the consensus before). These corrections are termed "transpositions" 3) stage 2\:
find and correct all places where there is a definite consensus but the gel reading has a different character. These corrections are termed "changes". 4) stage 3\:
delete all positions in which the consensus is a padding character. These corrections are termed "deletions". All changed characters are shown in uppercase letters so it will be obvious which characters
have been assigned by the program (except for deletions). The number of each type of correction will be displayed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab
The "calculate consensus" function, the "display contig" routine, the contig editor and the "show quality" option use the rules outlined here to calculate a consensus from aligned gel readings. The consensus sequence can contain any of 6 possble symbols\:
a,c,g,t,* and -. The last symbols is assigned if none of the others makes up a sufficient proportion of the aligned characters at any pos
ition in the contig. The following calculation is used to decide which symbol to place in the consensus at each position. Each uncertainty code contributes a score to one of a,c,g,t,* and also to the total at each point. Symbols like r and y which don't co
rrespond to a single base type contribute only to the total at each point. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab Definite assignments i.e. A,C,G,T,a,c,g,t,b,d,h,v,k,l,m,n,a,c,g,t,* =1 probable assignments i.e. 1,2,3,4 = 0.75 other uncertainty codes including r,y,5,6,7,8,- = 0.1 A cutoff scor
e between 51 and 100% is set by the user. (When the program starts this is set to 75%.). At each position in the contig we calculate the total score for each of the 5 symbols a,c,g,t and * (denote these by Xi, where i=a,c,g,t or *), and also the sum of the
se totals (denote this by S). Then if 100 Xi / S > the cutoff for any i, symbol i is placed in the consensus; otherwise - is assigned. For the "examine quality" algorithm each strand is treated separately but the calculation is the same. \par
12.\tab Databases can
become corrupted if the machine crashes so the programs contain a function "Check database for logical consistency" which checks to see if all the relational data is internally consistent. Some routines automatically perform this check before they start.
Users are advised to make frequent copies of their databases using the "Copy database" option. Note that if BAP is used in "execute with dialogue" mode the "Check logical consistency" function also creates a consensus for the whole database and scans it t
o find any regions which contain 15 dashes in 20 characters. Such a finding would indicate problems with the database.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\pagebb\tx560 13.\tab
We have covered many of the most important or complicated operations peformed by SAP and XBAP, but several others have not been mentioned. These include those for creation of consensus sequence files for processing by other programs, and complementing
contigs, both of which are trivial. There is also a set of routines for fixing corrupted databases.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 14\tab The VAX version of SAP will only a
llow one person to access a sequencing database at a time - producing an "unable to open database" error message if a second person tries. On UNIX machines there is no such check in program SAP so users need to make sure that simultaneous use does not occu
r. Otherwise the data will be corrupted. Program BAP prevents more than one person from using a database at any time. It does so using the following mechanism. When a user requests to open a particular copy (say 0) of a database (say DB) the program checks
for the existence of a file named DB_BUSY0 in the current directory. In normal circumstances, if the file exists, it indicates that somebody else is currently using the database and the program displays the message "Sorry database busy" and does not open
the files. If the file does not exist the program creates it and opens the database. When a user stops using the database (usually by quitting the program) the "busy file" is deleted, hence allowing others to use the database. If the program terminates abn
ormally the busy file will not be deleted and so the database will not be useable until the busy file is explicitly deleted using the rm command. Obviously it is dangerous to delete the file before checking if another user is using the database.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 15\tab
After a run of the assembly routine, reading names can appear in the file of failed reading names for the following reasons. 1. The reading file was not found; 2. the reading file was too short (less than the minimum match length); 3. the reading appear
ed to matc
h somewhere but failed to align sufficiently well (too many padding characters or too high a percentage mismatch); 4. a reading of the same name was already present in the database; 5. the reading was entered but also appeared to match another contig and t
he join was not made. This can occur for two reasons\: a. because the overlap between the two contigs was too large, or b. because after the reading is entered into one contig a new consensus is calculated and compared to the other contig\:
it may then not match as well as it did originally, and the join will not be made.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16\tab
We have recently devised our own file format (called SCF) for storing traces, sequences and confidence values for data produced by automated sequence readers (Dear and Staden, 1992). For ABI data these typically reduce the storage required to 30% of the
original. Data from the ABI 373A and the Pharmacia A.L.F. can be converted to this form using the program makeSCF. Note that A.L.F. files must first be processed by program alfsplit which s
plits the original data into one file per reading. Sequences can be extracted from SCF files in a form suitable for assembly by use of the program trace2seq. To locate and mark regions of a sequence from an automated sequence reader that are of too low a q
uality to be used for assembly we use the script clip-seqs. This script takes as input a file of reading file names. For each reading it renames the original file "original-filename~" and writes a new file called "original-filename" in which the poor quali
ty regions are marked.\par
\pard\plain \qj \f4\fs16 {\plain \f20 \par
}\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 17\tab The oligo selection engine is the one used in the program OSP. It is described in some detail in\:
Hillier, L., and Green, P. (1991). The parameters controlling the selection of oligos can be changed in the "Oligo Selection Parameters" window. The weigh
ts controlling the scoring of selected oligos can be changed in the "Oligo Selection Weights" window. By default, the oligos are selected from a window that extends 40 bases either side of the cursor. The size and location of this
window relative to the cursor position can be changed in the "Parameters" window. In XBAP oligos are ranked according to their proximity to the cursor position, rather than by their scores. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 18\tab For simplicity, each reading is considered to represent a template. In practise, many readings can be made off the same template. Suitable templates that are identified are those that\:
1. are in the appropriate sense, 2. have 5' ends that start upstream of the oligo, and 3. are sufficiently close to the o
ligo to be useful. This last criterion relates to the insert size for the subclones used for sequencing and the average reading length. A template is considered useful if a full reading can be made from it, taking into account both of these factors. The d
efault insert size is 1000 bases, and the default average reading length is 400 bases. These values can be changed in the "Parameters" window. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1982. Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. {\i Nucl. Acids Res}. {\b 10 }(15)\:4731-4751.\par
2.\tab Staden, R. 1990. An improved sequence handling package that runs on the Apple Macintosh. Comput. {\i Applic. Biosci}. {\b 4}, 387-393.\par
3.\tab Dear S and Staden,R. 1991. A sequence assembly and editing for efficient management of large projects. {\i Nucl. Acids Res}. {\b 19}, 3907-3911.\par
4.\tab Hillier, L., and Green, P. 1991. "OSP\: an oligonucleotide selection program," PCR Methods and Applications, {\b 1}\:124-128. \par
5.\tab Dear S and Staden, R. 1992. A standard file format for data from DNA sequencing instruments. DNA Sequence, {\b 3}, 107-110.\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 5. Analysing Sequences to Find Genes\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 2.1\tab The uneven positional base frequencies method.\par
2.2\tab The positional base preferences method\par
2.3\tab The codon usage method\par
2.4\tab Searching for open reading frames\par
2.5\tab Searching for tRNA genes\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20 We outline three methods for finding protein genes and one for locating tRNA genes, plus routines for finding open reading frames and displaying the p
ositions of stop codons. All the methods are contained in the program NIP. The correct interpretation of the analyses presented requires a good understanding of the underlying ideas used by the methods. Despite this we concentrate here on the use of the te
chniques and refer the reader to earlier publications (1-5) for more background information. \par
\pard \s4\qj\sa120\sl280 The assumption made by the methods for finding protein genes is that protein coding regions, when analysed in terms of 3 letter nonoverlapping "words", will look
different to noncoding regions analysed in the same way. Suppose we analyse a sequence in one reading frame and count its codons. Then we define the "positional base composition" as the frequency at which each of the four base types occupies each of the th
ree positions in codons. In coding regions the positional base frequencies will be less random than they are in noncoding regions. This is the basis of method 1\:
the "Uneven positional base frequencies method". If this reading frame is coding for a protein
the positional base composition will tend towards a particular bias which is common to the majority of genes. This is the basis of method 2 the "Positional base preferences method". If the sequence has a very biased base composition then in protein genes
this may effect the choice of amino acids, and will effect the use of bases in the third positions of codons. This bias is also utilised by the positional base preferences method. Finally if the reading frame is coding for a protein its use of codons is al
so likely to be nonrandom and this is the basis of method 3, the "Codon usage method".\par
\pard \s4\qj\sa120\sl280
All the methods perform their analyses over segments of the sequence of size "window", and then move the window on by three bases and repeat the calculation. The "Uneven positional base frequencies" method only produces a single value for each segment and
hence cannot distinguish between frames or strand - it only measures the probability that a region is coding and nothing more. The other two methods produce different va
lues for each of the three potential reading frames and hence can help to decide which is coding. Their results are plotted in three separate boxes arranged one above the other. For these we also indicate which of the three reading frames is the highest sc
oring at each position along the sequence. This is done by plotting a single dot at the mid-height of the box that contains the highest score, so that if one frame is the highest scoring for many consecutive positions, the dots will produce a solid line at
the mid-height of its box. We also mark the positions of stop codons. These are represented by short vertical lines and are positioned so that they bisect the mid-height of each box. Start codons are marked at the base of the box for each reading frame.
\par
\pard \s4\qj\sa120\sl280 The search for tRNA genes involves looking for segments that could fold into the cloverleaf structure and which have the expected conserved bases in the appropriate positions.\par
\pard \s4\qj\sa120\sl280 Notice that we have not mentioned searches for relevent "signals" like promoters
or splice junctions which are also useful for finding genes. These searches are described in the chapter on searching for motifs. In the current chapter the only "signal" we include is the stop codon. However as all results are presented graphically it is
easy for users to overlay the displays of signal searches with those presented here and so effectively combine them.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab The uneven positional base frequencies method.\par
\pard\plain \s4\qj\sa120\sl280 \f20 This method produces a single value for each segment of the sequence, and wou
ld give the same result if applied to each reading frame or to the complementary strand. The results are plotted in a box that is cut by a horizontal line. This line is labelled 76% and we expect 76% of noncoding sequences to score below this line and 76%
of coding sequence to score above it. Of the methods described this one makes the fewest assumptions and so is a good unbiased indicator of the probability that a sequence is coding.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Uneven positional base frequencies".\par
2.\tab Define "Odd window length". \par
3.\tab Define "Plot interval".\par
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 5.1. In the example shown the 5' end of the sequence codes for several proteins and the 3' end codes for ribosomal RNAs.\par
\pard\plain \li100\sb300\sl160\keepn \f4\fs16 {\plain \fs16 {\pict\macpict\picw436\pich41
1103ffffffff002801b31101a00082a0008c01000affffffff002801b3070000000022000100010000a000a0a100a400020de801000a00000000000000000700010001220027000100da23000021000101b22300002300262300002100270001230000a000a301000affffffff002801b32300da21000101b2230026210027
0001a000a12000170001001701b2220025000100df2300032300062301002300fb2300fd2300022300fe2302032300ff2300002300fe2300fd2301002300032300002300fd2302022300042300002300052300002301fd2300002300032300002300012302fd2300fd2300002300fd2300ff2301fe23000023000023000223
00062302012300fc2300032300012300002301052300062300fa2300f82302ff2300fb2300002300002300002301002300002300002300002300002302002300002300032300052300012301092300ff2300042300022300002302042300fa2300fc2300fe2300022301002300fd2300002300002302032300002300fe2300
ff2300012300002300022301012300fc2300062300012300032302ff2300002300032300fe2300022301082300f92300fd2300032302022300032300fb2300fa2300002302ff2300fe2300002300fc2300fe2301f92300fb230000230000230000230200230000230000230000230000230100230000230000230000230003
2302fd2300002300002300002301002300002300002300002300002302002300002300002300002300002301002300002300002300032300032302082300fa23000723000223000423010b2300f82300fd2300fa2300fc2302fe2300fd2300fc2300fe23010023000023000023000023000023020023000023000023000023
00002301002300002300002300002300002300022300fe2302002300002300002300002301002300002300002300002300002302022300002300fe2300002300052301002300fb2300002300062300fc2302032300012300002300fc2300032301002300012300ff2300012300032302022300fb2300fd2300ff2302fe2300
032300ff2300042300ff2301fb2300002300002300002300002302002300002300022300032300fe2301002300052300012300032300ff2302042300052300042300002300082301fb2300fc2300fb2300fa2302ff2300fd2300002300fa2300012301ff2300042300ff2300032300012302fc2300012300032300fd230006
2301032300032300032300022300062300fd2302fb2300032300f92300002300002301fb2300f72300022300002300002302002300fe2300ff2300fe2300002301002300052300fe2300ff2300002302fe2300002300002300022300072301fa2300032300ff2300042302ff2300fa23000023000323000323010023000623
00fd2300032300fb2302032300ff2300012300fd2300052302042300fa2300fd2300ff2300002301f82300002300032300fd2300002302002300002300002300002301032300092300062300022300032302fd2300fa2300fe2300fd2300ff2301fd2300fb2300062300022300fe2302fc2300012300062300fc2300032301
032300fe2300002300022300032302002300032300002300032300fe2300022301fe2300052300032300fe2300022302fa2300032300fa2300fe2300002301062300032300ff2300fe2302fd2300ff2300f22300022300fb2301022300fe230000230000230000230200230000230000230000230000230100230000230000
2300002300002302002300002300002300002300022301062300012300032300002302fd2300022300062300042300fd2301002300fd2300032300fc2300fd2302fb2300002300fa2300022300002302fe2300002300002300002300002301002300002300002300002300052302042300022300002300fe23000323010323
00002300002300022302fd2300002300fd2300fe2300ff2301032300062300012300032300f72300002300fa2302062300032300fd2300fd2301ff2300fe2300002300fc2300fe2302002300002300002300002300002301002300022300032300012300002302ff2300042300002300082300042301032300052300fa2300
012300ff2302fe2300fc2300042300032301002300fd2300fa2300062300002302fd2300ff2300032300fa2300fe2301092300fc2300fe2300032300002302052300fa2300fe2300fc2300032301fd2300002300002300fa2300fe2302032300fd2300052300032301032300f82300032300fc2300072302062300fa230005
2300002300032302fd2300fe2300f92300fe2300ff2301012300ff2300fb2300032300002300052302012300ff2300fd2300002300032301002300012300fa2300022300fb23020023000023000023000023000023010023000023000023000023000023020023000023000023000023000023010023000023000023000323
02032300022300012300062300fd2301032300f92300012300ff2300032302fb2300ff2300012300fd2300032301fd2300022300002300032300042302fd2300062300032300fd2300002301ff2300042300032300002302ff2300f823000b2300fb2300f92301002300002300fe2300fd2300fd2302ff2300fe2300002300
002300002301002300002300022300002300fe2302002300002300002300062300022302012300ff2300fd2300012300022300002301fd2300012300022300fe2300ff2302fd2300fe230000230000230000230100230000230000230000230000230200230000230000230000230100230000230000230000230000230200
2300002300002300002300002301002300002300002300002300002302002300002300002300002300002301002300002300002300002300002302002300002300002300022301fe2300002300002300002300002302002300002300002300002300002301002300022300032300062300fd23020423000323000323000023
00fc2301fe2300022300fb2300fc2300fe2302fd2300002300002300002301002300002300002300002300002302002300002300002300002300002300002300002301002300002300002300002300002302002300002300002300002302002300002300002300002300002301002300002300002300002300002302002300
002300002300002300002301002300002300002300032300032302fa2300032300ff2300fe2300022301092300012300062300ff2302002300012300062300022300fb2301002300002300fd2300032300fd2302fd2300032300052300062300fd2301fe23000a2300fb2300032300fd230203230003230000230000230000
2301fc2300fb2300fd2300f82302002300022300002300012300032301022300032300012300022300002302002300f82300082300fa2300012301002300052300002300fe2300002300042302fe2300fe2300ff2300032300032301002300fb2300072300002300fd23020423000023000023000023000023010023000023
00002300002300002302002300002300002300fd2300ff2302042300002300002300fd2301022300fd2300fe2300fa2300032302002300002300002300fe2300022301002300032300032300f823000a2302012300fd2300002300032300002301ff2300fe2300002300002300032302002300002300fd2300032301002300
fa2300032300002300ff2302032300012300f82300002300002301022300032300032300002300002302002300002300fa2300fe2300052301fd2300fd2300032300002300022300002302002300012300032300fd2300022301012300fd2300ff2300032300fb2302fe2300022300002300052300f8230101230000230002
2300002300022302002300fb2300fd2300012301ff2300012300032300042300fb2302032300fb2300032300042300fc2302fa2300032300002300002300072301f92300022300012300fa23000a2302042300002300ff2300fd2300002301f52300012300fd23000223020023000023000123000223000323010523000023
00032300012300002302002300002300002300f52300002301ff2300032300052300fe2300fe2302ff2300fd2300002300062300062301002300ff2300002300fd2302032300f92300ff2300032300fd2300002300fd2301042300ff2300fb2300002300022302002300fe2300032300ff2300012301022300fd2300032300
032302fe2300072300fd2300012300002301002300fd2300062300002300fd2302fb2300002300ff2300fd2300002301042300fd2300ff2300012300022302002300fb2300fd2300052300042301fc2300012300ff2300012302fd2300032300fa2300062300022302002300fe2300022300fb2300022301fe230003230005
2300fd2300032302022300fb2300fd2300002300042301fc2300012300082300032300002302002300002300ff2300002301012300002300002300fd2300fd2302002300022300fe2300052300f62301ff2300fe2300ff2300062300032300fe2300022302fb2300fd2300062300042301fe23000323000323000023000023
02ff2300fb2300002300032300022301f92300fa2300fc2300fb2300002302ff2300072300002300ff2300002301fe2300062300022300092300fd2302002300022300fe2300032301022300002300012300002300fc2302fe2300062300fc2300042300002301002300002300002300002300ff2302fe2300032300fc2300
012300032301002300002300ff2300002300002302fe2300002300ff2300042302fc2300042300002300ff2300fd2301012300ff2300042300002300002302002300002300002300002300002301002300002300002300fd2300002300022302012300fd2300022300fb2300fb2301032300fc23000b2300fd230000230204
2300ff2300fe2300022300fd2301fe2300002300fd2300002300fd2302fe2300032300fc2300032300fe2301ff2300042300fc2300012302002300002300022300062300fd2301032300052300fd2300012300ff2302012300fd2300fe2300022300062301fc2300032300fb2300022300fc2302042300002300fb2300fa23
00002301fb23000623000a2300012302022300fe2300032300fa2300022301fe2300fe2300ff2300fb23000a2302002300012300022300012300002302002300fd2300fd2300032300fa2301012300ff2300012300fd2300052302002300fb2300052300fb2300ff2300fe2301022300032300032300fe2300022302002300
002300002300032300002301fd2300fe2300f72300f923000023020123000623000b2300002301002300fe2300002300fd2300022302fe23000b2300fa2300022300032301fd2300f92300072300042300002302002300ff2300012300fd2300fa2301002300012300042300032300fb2302fe2300fd2300032300022301fd
2300052300042300ff2300012302002300002300002300fc2300042301fd2300022300fd2300fe2300fe2302fd2300032300002300fa2300052301032300002300002300fa2300012302032300052300022300002301fd2300fe2300fe2300002300022302fa2300fe2300fa2300032300022300fe2300032302fa23000623
00ff2300032301fa2300032300032300012300ff2302fe2300052300052300f92300022301fb2300062300ff2300012300002302052300fb2300fc2300002300fb2301032300062300072300012300fa2302fd2300082300f82300fd2300002301012300072300fc2300042302fe2300fa2300f82300022300042301022300
fe2300022300002300fe2302002300fd2300002300f92300fd2301092300002300092300012300022302002300032300002300032300002301002300f52300fa2300032302fd2300002300fc23000923000b2301f92300002300fd2300fa2300fd2302062300052300012300ff2300032300fb2301072300042300002300ff
2300fe2302022300012300002300002300002302002300002300fc2300fea0008da00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.1.\tab Example output from the uneven positional base frequencies method. The 5' end codes for proteins and the 3' end contains ribosomal RNA genes.\par
\pard\plain \s6\sb360\sa60\sl280\tx560\tx860 \b\f20 2.2\tab The positional base preferences method\par
\pard\plain \s4\qj\sa120\sl260 \f20 As a result of the genetic code and the relative frequencies with which amino acids are used in proteins, DNA sequences codi
ng for proteins have a particular bias in their positional base frequencies. This method scans DNA sequences and measures the closeness of each reading frame to this bias in their positional base frequencies. The closeness to the expected bias is expressed
as a \:
"score". By default the program will use a "global" set of expected values for the positional base frequencies which are derived from average amino acid compositions in known proteins. Alternatively users may create their own set of expected values
by analysing known genes from the same genome. In addition users can combine the "global" values for the first two positions in codons with third position values derived from other genes of the same genome.\par
\pard \s4\qj\sa80\sl260
In order to use a nonglobal standard, a codon table in the format described in the chapter on statistical analysis of nucleic acid sequences, can be created using the method "Creating a codon usage file". Alternatively a section of the sequence being analy
sed can be scanned to produce an internal standard. The method is particularly useful for selecting which reading frame is coding.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.2.1\tab Using the global standard\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Positional base preferences method".\par
2.\tab Select "Standard source" as "Global".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Window length". The default length of 67 should be used for most cases. Shorter windows give noisier plots and the longer the window the more chance there is of missing a short exon.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Plot interval".\par
\pard\plain \s4\qj\sa120\sl260 \f20 The plot will appear as in figure 5.2. This shows a 10,000 base section of sequence tha
t codes for several proteins in each of the three reading frames. See the introduction for an explanation of the plotting scheme used.\par
\pard\plain \s8\qj\fi-1140\li1140\sb300\sa120\sl240\keepn\tx1140 \f21\fs20 {{\pict\macpict\picw447\pich225
0d7effffffff00e001be1101a0008201000affffffff00e001be090000000000000000310000000000df01bd98002400000000008d012000000000008d011f0000000000df01bd000102dd0006007fdfff00fc140040ed000e01f000e1ffffebffff87ff83d40004140040ed000e0110009200002a00008800425c00041400
40ed000e2908009200002c00007800442a0004140040ed000e5a08008c0000140000400044220004170040f3000008fc00068608010c000010fd000324220004170078f3000008fc0002860501fe000010fd000328020004130040f3000008fc0002800701f9000328020004130040f3000014fc0002800101f90003300100
04150040f30008140000100000800087f9000310010004150040f300081400001800010000a4f9000310010004130040f300081400002400010000e4f700018004130040f30008240000240001000018f700018004130040f30008220000240001000018f700018004130040f30008220000220002000018f7000180041300
40f30008220000420002000008f700018004140040fa000002fb0005210000420002f400018004140040fa000002fb000541000042001cf400014004160040fa000003fd0007440041040042001cf400014004170040fb00011003fd0007cc00410c00410024f4000140041d1476befc5eafdbeff59adfb1e0d6ddbbc5ad0f
e1bd24f600031000f7bc1d1476befc5eafdbeff59edfb1e0d6fdbbc5ff0fe1bd20f600031000f7bc1d1476befc5eafdbeff59affb1e0d6ffbbc5ff0fe1bd40f600031000f7bc1b1476befc5eafdbeff59fffb1e0d7ffbbffffdfffbdbff4ff01f7bc1a014008fd000e2288080a0000010380800299008180f4000120041901
400efd000d2288100a00000102810000690081f30001200419016016fd000d5588100200000102810000650081f30001101419016012fd000d5588100100000100410000050081f30001101419016022fd000d9508100100000100410000050081f3000110141a026021b0fe000d8d08100100000600410000030081f30001
102c1a136041c80000030808200100000600410000020041f300010c2c1a1350410e2800030806200100000a00210000020041f300010c6c1a135081015900020006200100000800210000020049f300010aec190e5081008700040005a0014200080022fd000036f300010304180e4880000700040005a000c200080022fd
000012f2000004140e48800004c0240001a000c600080022ed000004140045fe000aa03400004000a600080012ed000004140045fe000aa048000040002602c8001aed000004140045fe000a1048000040002a03480014ed00000413007dfe00011080fd00041a04480014ed000004120043fe000011fc00041904280010ed
000004100042fe000015fc0002016428eb000004100042fe00000dfc0002016430eb0000040f0042fe00000afc00010198ea0000042523400a00000a44013c4001109a0034842208e0400200808100020806088001c094080800042501400afe001e44013c4001109a0034842208e0400200808100020806088001c0940808
000406007fdfff00fc0a0040fb00000ce60000040a0040fb00000ae60000040a0040fb000012e60000040a0040fb000011e60000040b0040fc00010191e60000040b0078fc000101a1e60000040b0040fc00010941e6000004100040fc0002094080fc000010ed000004100040fc00020a0080fc000010ed000004100040fc
00020e0080fc000018ed000004100040fc00020e00c0fc00001ced000004100040fc00020e0040fc000024ed000004130040fc00020a0040fc000324000002f0000004130040fc0002080040fc000322000006f0000004130040fc0002100020fc000322000006f0000004130040fc0002100020fc000322000005f0000004
130040fc0002100020fc000322000009f0000004140040fd000318100020fc000322000009f0000004170040fd000318100020fc0006220000090000c0f300000425235dea924fb4a5900076f67fdddb6f23effd311f5fe9f8769dc2bbc579fa7e5fd7e7f7fd7c25235dea924fb4a6600076f67fdddb6f63effd311f5fedf8
769dc2bbc579fa7e5fd7e7f7fd7c25235debd24fb4a6600076f67fdddb6f63effd209f5fedf8769dc2bbc579fa7e5fd7e7f7fd7c25045debfa4fb4feff1bf6f67fdddb6f7feffd3f9f5feff8769dc2bbc579fa7e5fd7e7f7ff7c1b1440010800004020001000003000004100002080021af4000102041e1740020400004000
001000002a00004100002080021a000004f7000102041e1760020700004000000800002e000040800020800419400004f7000165042017600205000080000008200022002040c0004080840140000af90003100055a4201760040100008000000a680041805280c0024040c400a4000bf9000310004d6420045004008001fe
000f0eac0041825280c00340413800a6000bf900032800806c20045008008001fe000f079200418355802803804100002a0011f900032800803c21045008008001fe001002120081834d803402004100001a0030c0fa00032400801421044808008002fe001002118180458d003404004200001a0040c0fa00034400801421
044808008002fc000e41005c8000030400420000110080a0fc000540004400800421044808004002fc000e6100548000029400240000010080a0fc0005a0004400800421044410004004fc0002220064fe000894002c000001010020fc0005a0004201000423044410004004fc0002220020fe001358001000000101002400
0010000120008201000420044210002004fc000012fc000068fd000e0101002c00001000011000820100041f044210003014fc000012fc000060fc000d82003c00001800021000820200041f047a20000818fc00001efc000060fc000d82000200081c02021000820200041f044220000838fc000010fc000020fc000d8200
0200142c020410110204000417044240000828f0000d44000200146205040a1101080004170442400008e8f0000d4400020022a315040a1101880004160342400005ef000d48000100e2a335040d2a01700004250642d00307440c06fe001910a040025000c00000040800340401018100c8a4456a21741304250643d00304
440c06fe001910a040025000c00000040800340401fe010088dc45ac2074130406007fdfff00fc0a0043fe000008e30000040a0043fe000008e30000040a0043fe000008e30000040e044280000414fc000010e90000040e044280000494fc000018e900000410047a80000776fc0002188020eb00000410044280000756fc
0002148030eb00000410044480000402fc0002278030eb00000410044440000802fc0002278048eb00000410044440000801fc0002264048eb00000410044440000801fc0002224048eb00000410044440000801fc0002224048eb00000410044440000801fc0002204048eb00000410044820001001fc0002205848eb0000
0410044820002001fc0002405948eb0000041105482000400080fd0002402588eb0000041105482000400080fd0002c02588eb0000041105481000800080fd0002800588eb0000041205501000800040fe000301000684eb0000042523701fefe001cb3d2bffeb00020629f73b0ef1c60fef7ddff6f7dfe5f75e54fbacfd37
34fc2523701fefe001cb3d2bffeb00020629f73b0ef1c61fef7ddff6f7dfe5f75e54fbacfd3734fc2523701fefe001cb3d2bffeb00020229f73b4ef1c62fef7ddff6f7dfe5f75e54fbacfd3734fc25237fffefffffcb3d2bffebfffffe29f73b4ef1c67fef7ddff6f7dfe5f75e54fbacfd373dfc1a05501001000040fe0003
01000002fe000360000044f3000109b41a05600801000020fe000301000002fe000360000042f300010ab41a05600805000020fe000302000001fe000360000042f300010e141a05600806000020fe000302000001fe000390000042f3000116041a0560080800002cfe000302000001fe000398000082f3000110041a0540
0908000014fe000012fe000681000088400082f3000110041a05400d08000014fe00001afe000682c00084c00081f3000110041a05400308000002fe00001afe000682400104c00081f3000110041a094002f00000020000082afe000682420103200101f3000120041a0940009000000300003426fe00069c250100200101
f3000120041b09400080000001000054a4fe000664248100200201f400022020041f0040fc0003c0004364fe000c60288200204201000004000002fa00023040041e0040fc0002a00043fd000c4018820020a40100000c000102fa00023040041e0040fc0002a00040fd000c4010620020a40100000a000102fa0002314004
1c0040fc0002200080fb000a6200132801000012000142fa00023140041f0078fc0002100080fb000a24001318010000121002cdfd00050800004a80041f0040fc0002100080fb000a14001b10010000111802adfd00050800004a80041f0040fc0002080080fb000a14001c0000880021e802b5fd0005140000ca80041f00
40fc00010801fa00141400040000740020280c35803400001400014400041f0040fc00010c01fa00141800040000440040240c04803600001400010400042523400000aa0a020ec280020c801021001c809050009204c405501c846aee0573625284900c2523400000aa0a020ffe80020c801021001c8090500091ffc407f0
1cffebfffff3fffe84900c06007fdfff00fc02dd00a00083ff}}\par
\pard \s8\qj\fi-1140\li1140\sa120\sl240\tx1140 Figure 5.2\tab Example output from the positional base preferences method. Most of the sequence is coding for proteins.\par
\pard\plain \s9\fi-560\li860\sb400\sa60\sl280\tx1140 \b\f20 2.2.2\tab Using a nonglobal standard\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Make an appropriate codon usage file as described in the chapter on statistical analysis of nucleotide sequences.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Positional base preferences method".\par
3.\tab Select "Standard source" as "Codon usage table".\par
4.\tab Define "File name of standard". The file will be read and displayed on the screen.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
Select "Normalisation" as "Combine with global standard". This alternative means we will use the values for the first two positions of codons combined with the third position values from our codon table. Otherwise ("Use observed frequencies") will use a
ll three positions from our codon table. The positional base frequencies to be used will be displayed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Use 1.0 for positional weights". The alternative allows users to
give greater or lesser emphasis to any of the three positions by defining weights for each. The program displays the "Expected scores per codon in each frame".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Window length". Windows shorter than the default of 67 may be useful if the bias is sufficiently strong. Look at the "Expected scores in each frame" to help decide.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Plot interval".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Accept "Plot relative scores". This means that for each frame we plot its score divided by the sum of the scores for all three frames. It produces
smoother plots than the alternative "Plot absolute scores" which simply plots the scores for each frame. The minimum and maximum expected scores for the given standard and window length are displayed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Accept "Leave scaling values unchanged". The expected scores just displayed will be used to scale the plots. If required the user can change the scaling values at this point.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 The plot will now appear as in figure 5.2. Typical dialogue is shown in figure 5.3.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab The codon usage method\par
\pard\plain \s4\qj\sa120\sl280 \f20 The codon usage meth
od scans along a sequence and measures the closeness of each reading frames codon composition to an expected set of codons. Of the methods described it is the most sensitive, but consequently has to make the strongest assumption, namely that we know the ap
proximate codon usage for the genes being searched for. The codon usage will depend on the codon preferences and the amino acid composition of the protein product. For this reason the program contains three methods of "normalisation". The table of codon us
age may be used as read "Observed frequencies"; the table may be transformed to reflect an average amino acid composition "Normalise to average amino acid composition"; the table may be transformed to have no amino acid bias "Normalise to no amino acid bia
s". The table can be read from a file produced by "Creating a codon usage file" as described in the chapter on statistical analysis of nucleic acid sequences, or an "internal standard" can be used by the user defining a region of the current sequence. In t
he latter case the program will calculate the codon usage for the defined region.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Codon usage method".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Reject "Define internal standard". If an internal standard is used the program will ask for the end points of the segments over which to calculate the codon usage.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "File name of standard". The file will be read and displayed on the screen.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Normalisation" as "Average amino acid composition". The program will display the expected values for each reading frame for the window lengths 21, 31 and 41 codons. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Window length".\par
6.\tab Select "Plot interval".\par
\pard\plain \s4\qj\sa120\sl280 \f20
The plot will appear as in figure 5.4. This shows a 10,000 base section of sequence that codes for several proteins in each of the three reading frames. See the introduction for an explanation of the plotting scheme used.\par
\pard\plain \li1840\ri1980\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Positional base preferences method to find protein genes\par
\pard \li1840\ri1980\sl220\box\brsp100\brdrth Select standard source\par
X 1 Use global standard\par
2 Use internal standard\par
3 Use codon usage table\par
? Selection (1-3) (1) =3\par
? File name of standard=atpase.cods\par
===========================================\par
F TTT 21. S TCT 33. Y TAT 15. C TGT 5.\par
F TTC 55. S TCC 40. Y TAC 40. C TGC 4.\par
L TTA 8. S TCA 7. * TAA 8. * TGA 0.\par
L TTG 19. S TCG 12. * TAG 1. W TGG 17.\par
===========================================\par
L CTT 22. P CCT 17. H CAT 6. R CGT 73.\par
L CTC 21. P CCC 4. H CAC 30. R CGC 23.\par
L CTA 1. P CCA 10. Q CAA 19. R CGA 5.\par
L CTG 168. P CCG 48. Q CAG 80. R CGG 3.\par
===========================================\par
I ATT 47. T ACT 14. N AAT 17. S AGT 8.\par
I ATC 98. T ACC 54. N AAC 52. S AGC 26.\par
I ATA 6. T ACA 7. K AAA 85. R AGA 0.\par
M ATG 75. T ACG 13. K AAG 28. R AGG 0.\par
===========================================\par
V GTT 67. A GCT 56. D GAT 41. G GGT 90.\par
V GTC 29. A GCC 53. D GAC 66. G GGC 66.\par
V GTA 49. A GCA 59. E GAA 101. G GGA 5.\par
V GTG 57. A GCG 64. E GAG 41. G GGG 8.\par
===========================================\par
Select normalisation\par
X 1 Use observed frequencies\par
2 Combine with global standard\par
? Selection (1-2) (1) =2\par
T C A G Range\par
1 0.177 0.211 0.277 0.336 0.159\par
2 0.271 0.238 0.310 0.182 0.128\par
3 0.242 0.301 0.168 0.289 0.132\par
? Use 1.0 for positional weights (y/n) (y) =\par
Expected scores per codon in each frame\par
0.785 0.736 0.736\par
? odd span length (31-101) (67) =\par
? plot interval (1-11) (5) =\par
? Plot relative scores (y/n) (y) =\par
\par
Minimum maximum range\par
0.3219 0.3519 0.0214\par
\pard \li1840\ri1980\sl220\keepn\box\brsp100\brdrth ? Leave scaling values unchanged (y/n) (y) =\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.3\tab
Typical dialogue from the "Positional base preferences method" using a nonglobal standard in the form of a codon table to specify the values for the third positions in codons.\par
\pard\plain \s6\sb400\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Searching for open reading frames\par
\pard\plain \s4\qj\sa120\sl280 \f20 This routine finds all open reading frames of some minimum length and writes its results in the form of an EMBL feature table. \par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find open reading frames".\par
\pard\plain \sl220\keepn \f4\fs16 {{\pict\macpict\picw442\pich218
0f42ffffffff00d901b91101a0008201000affffffff00d901b9090000000000000000310000000000d801b898002400000000008d012000000000008d011f0000000000d801b8000102dd0006007fdfff00fc1e0040fb000ef0fe00f26100dc0e004000180ffa40fe00020ffdc0fa0000041f0070fc000f01110101159180
a412004000280906c0fe0002100240fa0000041f0040fc000f011101010d92808232004008280802a0fe0002100240fa0000041f0040fc000f02110101080a81027200c008288801a0fe0002100040fa0000041f0070fc000f02110081080a89037108d49425900120fe0002200040fa0000041f0040fc000f02090082000a
8900a118d59445700120fe0002200040fa0000041f0070fc0015040a00c200048900a114dd9446700120000003c00020fa0000041f0040fc0015040a002200048a002124db5446100020000002000020fa0000041f0040fc0015040a002e00044a000123526282100020000002000020fa0000041f0070fc0015040a001000
004a000121226280000020000002000020fa000004220040fc000b040a001000005a0001212223fe000920000002000020000008fd000004210040fc00010404fd00055c0000a02003fe000920000002000020000018fd000004210070fc00010404fd0005540000a02003fe000910000002000020000018fd000004200040
fc000004fc0005740000a02001fe000910000002000020000018fd000004210070fc000008fc0005500000c02001fe000e100000040000200000180020000004210040fc000008fc0005100000c00001fe000e1000000400001000001400500000041e0070fc000008f90002c00001fe000e10000004000010000014005000
00041c0040fc000018f90000c0fc000e1000001400001000001400480000041c0040fc000018f9000040fc000e1000001c00001000002400480000041e067c66de6dd21858f9000040fc000e1e7ff6fc00003dbebfe797cf9ddefc1e066c66de6dd21850f9000040fc000e1e7ff6e400003dbebfe797cf9ddefc1a067c66de
6dd2185ff3ff02fe7ff6feff08fdbebfff97ff9ddefc1a066c66de6dd21850f3000e1e7ff6e400003dbebfe797cf9ddefc180040fc000010f3000e100000200000094002a40584001004180070fc000010f3000e080000400000094002a40584003004180040fc000010f3000e080000400000094007424804002804180040
fc000010f3000e0800004000000a400543c804002804180070fc000020f3000e0800004000000a4004033804002804180060fc000020f3000e0800004000000a3004023804002804180070fc000020f3000e04000040000006300c023004002804180060fc000020f3000e040000400000063008003002002804190060fd00
010220f3000e040000400000063008001002004804190078fd00010220f3000e040000400000060808001002004804190058fd00010220f3000e040000400000060808000002004804190078fd00010220f3000e040000800000020808000002004804190048fd00010320f3000302000080fe000708080000020048041900
44fd00010520f3000302000080fe00070810000002004804190074fd00010520f3000302000080fe000708100000020044041a0644040000014520f3000302000080fe000704100000020084041a067406008001c520f3000302000080fe000704100000020084041a06440a0080012540f3000302000180fe000704100000
010084041a06440a0080022540f3000302000180fe000704100000010084041a06720901400224c0f3000301800280fe00070220000001008704252362796940a3daec02e005042000000800400000a70019e403041201200220210005b90484252373f9df7fffdaec02e005042000000800400000a70019ffff0412012003
e0210005ff04fc06007fdfff00fc180643803f0e1e00c0f2000171eefd000101e0fe0002ff00041906728041111200c0f20002891280fe00010120fe0002810004190642804090a200c0f200028a1280fe00010120fe00028100041906424040a0a10140f20002860180fe00010120fe00028100041a06724080e0618140f3
000301060180fe00010220fe00028080041a0642408000018120f3000301040180fe00010210fe00028080041a0672298000004920f3000301000080fe000702100000010080041a014419fe00014920f3000301000080fe000702100000010080041a014416fe00012920f3000301000080fe000702100000010080041a01
7406fe00013620f3000e0100004000000202100000010080041a014406fe00011620f3000e020000400000020210000001004004190074fd00011620f3000e020000400000060210000001004004190044fd00010620f3000e020000400000060210000001004004180044fc000020f3000e02000040000006021000000100
4004180074fc000020f3000e020000400000060208020002002004180044fc000020f3000e0200004000000a0408020002002004180074fc000020f3000e0200004000000a0408030002002004180048fc000010f3000e0400004000000a0408030002002004180048fc000010f3000e040000400000090408030002002004
230078fc001d3c7a36ac17fffffdf7dddefebfffb1fc0000768bba9b5c0e85c31a003cfc230068fc001d3c7a36ac17fffffdf7dddefebfffb1fc0000768bba9b5c0e85c39c003cfc230068fc001d3c7a36ac17fffffdf7dddefebfffb1f40000768bba9b5c0ec5c39c003cfc23007ffcff0efc7a36ac17fffffdf7dddefebf
ffb1feff0bf68bba9f5ffec7c39ffffcfc180048fc000010f3000e100000200000111404c482880010041c0078fc000010f9000040fc000e100000200000111404c484880010041c0068fc000010f9000040fc000e100000200000111404c484880010041c0070fc000010f90000c0fc000e10000020000011340524848800
10041e0070fc000008f90002c00001fe000e10000020000010b4052454480008041e0070fc000008f90002c00001fe000e10000020000020f405285850000804210070fc000008fc0005400000c00001fe000e10000020000020e805285850000804210050fc000008fc0005640000a00003fe000e10000020000020880628
7850000804220040fc00010804fd0005640000a00003fe000e100000200000208806287850000804220070fc00010404fd0005640001200003fe000e200000200000208806283050000804230040fc000b040a001000005a0001210203fe000e2000003c0000200800283050000804230070fc000b040a003000005a000123
0203fe000e200000240000200800280050000804230040fc001d040a002800005a0001230203001000200000240000200800180060000804230040fc001d040a004c00009a0001250302861000200000020000200000180020000404230070fc001d040a0044000099008114830286300120000003c0004000001000200004
04230040fc001d020900820000890081148302853001200000022000400000100020000404230070fc00180211008200048900c108850445480120000002200040000010fe00010404230040fc000f02110102000a8900c208850429480120fe0005100240000010fe00010404200040fc000f01110102018a808122088484
294802a0fe0002100240fb00010404200070fc000f011101010291808122004484288906a0fe0002100540fb000104042523400184262c0000949223065500813a00449418898ac68212084805400420800000106d6c2523700184262c0000f4ee22fe7500ff3e007c7c1887fac68212084ff8800420800000106ffc06007f
dfff00fc070040e0000104fc070070e000010704070040e000010404070040e000010404070070e000010404070040e000010404070070e000010404070040e0000104040b0040e6000008fc000108040b0070e6000008fc000108040b0040e6000008fc000108040b0070e6000008fc000108040b0040e6000008fc000108
040b0050e6000008fc000108040c0070e600013404fd000108040d0070e6000734040028000008040d0070e6000774040038000008040d0070e6000754070048000008040d0068e60007540700480000080425107fdcef8d2bebf7efdfffc720ffcda7fdfbfeef0fff303dfbe9fc7fdff44eddf6976ef80425107fdcef8d2b
ebf7efdfffc720ffcda7fdfbfeef0fff303dfbe9fc7fdfb44efdf6976ef00425107fdcef8d2bebf7efdfffc720ffcda7fdfbfeef0fff303dfbe9fc7fdfb44cfdf6976ef00425107fdcef8d2bebf7efdfffc720ffcda7fdfbfeef0fff303dfbedfc7fdffc4ffdfe976efffc140048ed00030800002cfe000784082084840010
04180078fc000008f300030c00003cfe00078408208486001004180048fc000008f300030c000034fe0007840821034a001004180044fc000008f3000312000034fe0007840811034a002004180074fc000008f3000e12000024000001040811024a0020041c0044fc000018fc000010f9000e12000024000001020811003a
002004200074fc000018fc000010fe000020fd000e120000240000010210110029002004200044fc000018fc000010fe000020fd000e120000220000010210110001004004210044fd00010418fc000010fe000020fd000e12000022000019021012000100400422017404fe00011424fc000010fe000020fd000e12000022
000016021012000100400423014406fe00011424fc000018fe00012020fe000e11000042000016021012000100400423017406fe00013a24fc00002cfe0013306080000021000042000016021014000100800425014419fe00072a2400000600002cfe00135250c0000021000042000026021014000100800425014429fe00
1e4a2400000600042c0020015a50c000002100004200002202100c000100800425237229800000492400000600042a002001de914040002100004200002202100c000100800425234240800000412400000600062a002002d695404000210401420000220210080000808004252372404060604126000006080a6a03300ad6
8f40c000210601820000220220080000808004252342404090a081460400090c0a4a02b016c18820c001208a01820000200120080000810004251d428040912080c20c00090c1a4102b010418820a001a08a12810000200120fe0002810004251d728040911080c10a00091419812470204100212002c08912811000400120
fe00028100042523529a212a1190d95e0dcb3aa381ddf873c10835a20ac0972e8338a04801202028048108a42523739a3f2e1e90d9ffffbbfb6381ddfff3c1081fbffec0f7ec83ffffc801e0202804ff08a406007fdfff00fc02dd00a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.4\tab Example output from the codon usage method. Most of the sequence is coding for proteins.\par
\pard\plain \s7\qj\fi-560\li560\sb400\sa120\sl280\tx560 \f20 2.\tab Define "Minimum open frame in amino acids".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select "Strands". The alternatives are\: + strand only, - strand only, or both strands. Typical output is shown in figure 5.5.\par
\pard\plain \li2120\ri2240\sb400\sl220\box\brsp100\brdrth \f4\fs16 FT CDS 525..965 \par
\pard \li2120\ri2240\sl220\box\brsp100\brdrth FT CDS 956..1789 \par
FT CDS 2128..2607 \par
FT CDS 2604..3155 \par
FT CDS 3159..4709 \par
FT CDS 4733..5623 \par
FT CDS 5539..7032 \par
FT CDS 7044..7454 \par
FT CDS 7797..8134 \par
FT CDS complement(2227..2634)\par
FT CDS complement(2250..3023)\par
FT CDS complement(3027..3899)\par
FT CDS complement(3903..4760)\par
FT CDS complement(4327..4626)\par
FT CDS complement(4646..5332)\par
FT CDS complement(5345..5647)\par
FT CDS complement(5635..6012)\par
FT CDS complement(6016..6441)\par
FT CDS complement(6445..7083)\par
FT CDS complement(7035..7445)\par
\pard \qj\li2120\ri2240\sl220\keepn\box\brsp100\brdrth FT CDS complement(7406..7777)\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.5\tab Typical output from "Find open reading frames"\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Searching for tRNA genes\par
\pard\plain \s4\qj\sa120\sl280 \f20 tRNA genes have two classes of feature that can be used to locate them in genomic sequences\:
their ability to fold into the cloverleaf secondary structure, and the presence of specific "conserved" bases at particular positions relative to this structure. The level of congruence with the canonical structure is quite variable\:
some tRNA genes contain intervening sequences and others, particular those from organelles, have few of the conserved bases. The program searches for potential cloverleaf forming str
uctures and optionally the presence of conserved bases. The user can define the range of loop sizes, the minimum numbers of potential base pairs, a range of intron sizes, and which, if any, of the conserved bases should be present. The results are presente
d either textually or graphically. \par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "tRNA search".\par
2.\tab Define "Maximum tRNA length".\par
3.\tab Define "Aminoacyl stem score". See note 8.\par
4.\tab Define "Tu stem score".\par
5.\tab Define "Anticodon stem score".\par
6.\tab Define "D stem score".\par
7.\tab Define "Minimum base pairing total".\par
8.\tab Define "Minimum intron length".\par
9.\tab Define "Maxmimum intron length".\par
10.\tab Define "Minimum length for TU loop".\par
11.\tab Define "Maximum length for TU loop".\par
12.\tab Accept "Skip search for conserved bases". See notes section.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 13.\tab Reject "Plot results".
This gives listed output in which the potential cloverleafs are displayed. The alternative plotted output simply draws a vertical line to represent the score for the potential gene, at the position it has been found. Typical dialogue and the beginning of s
ome listed output is shown in figure 5.6.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
In general, for finding protein genes, we recommend the use of all the methods. The "Uneven positonal base frequencies" method can show which regions are likely to be coding but not which strand or fram
e. The "Positional base preferences" method can show the correct frame and also help to find which regions are coding. The "Codon usage" method has the greatest resolution, having been used successfully with windows of 11 codons, and can help find small ex
ons and to pinpoint exon/intron boundaries.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab
When the "Uneven positional base frequencies" calculation was applied to all the sequences in the 1984 version of the EMBL library 14% of noncoding segments failed to reach the value represented by the base of
the box, whereas all coding segments did. The top value of the box was not reached by any noncoding segments but was exceeded by 16% of coding sequences. 76% of noncoding segments failed to reach the line labelled 76% but 76% of coding segments fell above
it. We would not expect this result change significantly if it were to be recalculated on the current libraries.\par
3.\tab When the "Positional base preferences" method, using "global" values, was applied to all the {\i E. coli} genes in the 1984 version of the EMBL library it chose the correct reading frame for 91% of coding segments. {\i E. coli}
sequences were used for technical rather than scientific reasons and we have no reason to believe that other organisms should give significantly different results. This result used only the values for the first two positions in codons and so for genes wit
h a strongly biased base composition we would expect even better discrimination.\par
\pard\plain \li1180\ri1440\sb100\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 tRNA search\par
\pard \li1180\ri1440\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth ? Maximum trna length (70-130) (92) =\par
? Aminoacyl stem score (0-14) (11) =\par
? Tu stem score (0-10) (8) =\par
? Anticodon stem score (0-10) (8) =\par
? D stem score (0-8) (3) =\par
? Minimum base pairing total (30-44) (30) =\par
? Minimum intron length (0-30) (0) =\par
? Maximum intron length (0-30) (0) =\par
? Minimum length for TU loop (4-12) (6) =\par
? Maximum length for TU loop (6-12) (9) =\par
? Skip search for conserved bases (y/n) (y) =n\par
Give a score for each base, then a minimum total at the end\par
? Base 8, T is 100% conserved. Score (0-100) (0) =\par
? Base 10, G is 95% conserved. Score (0-100) (0) =\par
? Base 11, Y is 96% conserved. Score (0-100) (0) =\par
? Base 14, A is 100% conserved. Score (0-100) (0) =\par
? Base 15, R is 100% conserved. Score (0-100) (0) =\par
? Base 21, A is 97% conserved. Score (0-100) (0) =\par
? Base 32, Y is 100% conserved. Score (0-100) (0) =\par
? Base 33, T is 98% conserved. Score (0-100) (0) =\par
? Base 37, A is 91% conserved. Score (0-100) (0) =\par
? Base 48, Y is 100% conserved. Score (0-100) (0) =\par
? Base 53, G is 100% conserved. Score (0-100) (0) =\par
? Base 54, T is 95% conserved. Score (0-100) (0) =\par
? Base 55, T is 97% conserved. Score (0-100) (0) =\par
? Base 56, C is 100% conserved. Score (0-100) (0) =\par
? Base 57, R is 100% conserved. Score (0-100) (0) =\par
? Base 58, A is 100% conserved. Score (0-100) (0) =\par
? Base 60, Y is 92% conserved. Score (0-100) (0) =\par
? Base 61, C is 100% conserved. Score (0-100) (0) =\par
? Minimum total conserved base score (0-0) (0) =\par
? Plot results (y/n) (y) =n\par
264\par
t\par
t-a\par
c-g\par
a-t\par
t+g\par
\pard \li1180\ri1440\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth a-t\par
a a\par
a-t gta\par
c aacgc\par
a t !!!! c\par
cgt gtgcg a\par
!!! t cga\par
a gca c\par
g t g\par
c aa t\par
a-t a\par
t-a t a\par
t-a\par
t-a\par
g t\par
c g\par
\pard \li1180\ri1440\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth caa\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.6\tab Typical dialogue and textual output from "Find tRNA genes".\par
\pard\plain \s7\qj\fi-560\li560\sa80\sl280\tx560 \f20 4.\tab If the codon table used by the "Codon usage" me
thod is normalised to have average amino acid composition it retains its codon preference bias for each amino acid type but now the amino acid composition is the average of all proteins. In general this is optimal\:
we have the expected codon preference bia
s plus an expected amino acid bias. If we normalise to no amino acid bias we are safeguarding ourselves against missing a protein of anomalous composition but at the expense of not employing all of the useful information for distinguishing coding from nonc
oding. \par
\pard \s7\qj\fi-560\li560\sa80\sl280\tx560 5.\tab
The program also contains a graphical version of Ficketts method (6), except here we use a window to analyse each segment of the sequence rather than giving a single value for each open reading frame. The tables used are those from the original publicat
ion.\par
\pard \s7\qj\fi-560\li560\sa80\sl280\tx560 6.\tab If the results from the "Find open reading frames" option are directed to disk (See the introductory chapter), the file can be used by the routines that use feature tables as input.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab The program also contains several routines for plotting the positions of stop and start codons for either strand of the sequence. One form of the output is included in figures 5.2 and 5.4.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab The tRNA gene search using a simple scoring system for base pairing\:
A-T and G-C base pairs each score 2 and G-T scores 1. The use of a "Minimum base pairing total" allows low cutoffs to be set for each individual stem, but that overall some reasonable
level of stability is possible. In this way a low score for one stem can be compensated by a high score in another.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Th
e cloverleaf is composed of four base-paired stems and four loops. Three of the stems are of fixed length but the fourth, the dhu stem which usually has four base pairs, sometimes has only three. All of the loops can vary in size. The following relationshi
ps between the stems in the cloverleaf are assumed in the program\:
(a) there are no bases between one end of the aminoacyl stem and the adjoining tuc stem; (b) there are two bases between the aminoacyl stem and the dhu stem; (c) there is one base between t
he dhu stem and the anticodon stem; (d) there are at least three bases between the anticodon stem and the tuc stem.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. and McLachlan, A.D. 1982. Codon preference and its use in identifying protein coding regions in long DNA sequences. {\i Nucl. Acids Res.} {\b 10}\:151-156.\par
2.\tab Staden, R. 1984. Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes. {\i Nucl. Acids Res}. {\b 12}\:551-567.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1985. Computer methods to locate genes and signals in nucleic acid sequences. (in) {\i Genetic Engineering, Principle and Methods}, Setlow J.K., Hollaender A., (eds.), {\b 7}\:
67-114, (Plenum Press, New York).\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Staden, R. 1990. Finding Protein Coding Regions in Genomic Sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:163-180 (Academic Press, New York).\par
5.\tab Staden, R. 1980. A computer program to search for tRNA genes. {\i Nucl. Acids Res}. {\b 8}\:817-825.\par
6.\tab Fickett, J.W. 1982. Recognition of protein coding regions in DNA sequences. {\i Nucl. Acids Res}. {\b 10}\:5303-5318.\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 6. Searching for Motifs in Nucleic Acid Sequences\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Searching for percentage matches to consensus sequences\par
2.2\tab Searching for consensus sequences using a score matrix\par
2.3\tab Using weight matrices for searching nucleotide sequences\par
2.4\tab Using "hardwired" motif searches.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
4.\tab References\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20 The program NIP contains several ways of defining and searching for motifs (1-4), and also contains a number of "hardwired" motifs that are already
defined and can be selected as separate searches. We describe searches for percentage matches to consensus sequences, the use of score matrices and the creation and use of nucleotide and dinucleotide weight matrices (see note 7). In addition we give detail
s of the "hardwired" motifs available from the program. In another chapter we have covered searches for exact matches to consensus sequences by describing how to find restriction enzyme recognition sequences. When searching for exact matches, percentage ma
tches or using a score matrix the search string or consensus sequence may include IUB redundancy codes. All of the searches produce both listed and graphical output. The listed output displays the matching sequence and its position and the graphical output
draws a box to represent the length of the sequence, and plots vertical lines within the box at the positions of matches. The heights of the lines are proportional to the match score (see figure 6.1).\par
\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw447\pich44
032fffffffff002b01be1101a0008201000affffffff002b01be0900000000000000003100000000002a01bd98002400000000001d012000000000001d011f00000000002a01bd000102dd0006007fdfff00fc060040df000004060040df000004060040df0000041002400088f7000020f1000001fd0000041002400088f7
000020f1000001fd0000041002400088f7000020f1000001fd00000421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe0005014200
05c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020
012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc0003021004
60fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482
b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501
420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00406007fdfff00fc02dd00a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.1\tab Typical graphical output from a motif sea
rch. It shows a rectangular box in which each match is identified by a vertical line whose height gives the match score and whose x coordinate indicates the position in the sequence.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Searching for percentage matches to consensus sequences\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find percentage matches".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par
3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par
4.\tab Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default.\par
5.\tab Accept "This sense". The alternative directs the program to search for the complement of the string.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Percent match". The search is performed, the results are presented graphically (see figure 6.1), the number of matches displayed, and the scores and positions of the top 10 matches displayed.
\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define the number of matches to "Display". For the number of mat
ches chosen the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round to step 3. See figure 6.2.\par
\pard\plain \li220\ri280\sb400\sl220\box\brsp100\brdrth \f4\fs16 Find percentage matches\par
\pard \li220\ri280\sl220\box\brsp100\brdrth ? Type in string (y/n) (y) =\par
? Keep picture (y/n) (y) =\par
? String=AAAATTTT\par
STRING=AAAATTTT\par
? This sense (y/n) (y) =\par
? Percent match (1.00-100.00) (70.00) =\par
\par
Total scoring positions above 70.000 percent = 41\par
Scores 7 7 7 7 6 6 6 6 6 6\par
Positions 428 534 2994 7026 130 191 192 372 427 429\par
? Display (0-41) (0) =4\par
\par
428\par
aaaatatt\par
***** **\par
AAAATTTT\par
1\par
\par
534\par
aaagtttt\par
*** ****\par
AAAATTTT\par
1\par
2994\par
aaaatttc\par
*******\par
AAAATTTT\par
1\par
\par
7026\par
aaaacttt\par
**** ***\par
AAAATTTT\par
\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth 1\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.2\tab Worked example for the percentage match search\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.2\tab Searching for consensus sequences using a score matrix\par
\pard\plain \s4\qj\sa120\sl280 \f20
A score matrix gives a score for the alignment of each possible pair of sequence symbols. The matrix used by this program includes all the IUB redundancy codes and gives scores that represent the level of redundancy. The matrix is shown in figure 6.3.
\par
\pard\plain \s7\qj\fi-560\li560\sb200\sa120\sl280\tx560 \f20 1.\tab Select "Find matches using a score matrix".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par
3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par
4.\tab Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default.\par
5.\tab Accept "This sense". The alternative directs the program to search for the complement of the string. The program displays the maximum possible score for the string.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Score". The search is performed, the results are presented graphically (see figure 6.1), the number of matches displayed, and the scores and positions of the top 10 matches displayed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab
Define the number of matches to "Display". For the number of matches chosen the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round
to step 3. The dialogue shown in figure 6.2 is almost exactly the same as that for "Searching for consensus sequences using a score matrix".\par
\pard\plain \li1580\ri1560\sb300\sl220\box\brsp100\brdrth \f4\fs16 T C A G - R Y W S M K H B V D N ?\par
\pard \li1580\ri1560\sl220\box\brsp100\brdrth T 36 0 0 0 9 0 18 18 0 0 18 12 12 0 12 9 0\par
C 0 36 0 0 9 0 18 0 18 18 0 12 12 12 0 9 0\par
A 0 0 36 0 9 18 0 18 0 18 0 12 0 12 12 9 0\par
G 0 0 0 36 9 18 0 0 18 0 18 0 12 12 12 9 0\par
- 9 9 9 9 36 18 18 18 18 18 18 27 27 27 27 36 0\par
R 0 0 18 18 18 36 0 9 9 9 9 6 6 12 12 18 0\par
Y 18 18 0 0 18 0 36 9 9 9 9 12 12 6 6 18 0\par
W 18 0 18 0 18 9 9 36 0 9 9 12 6 6 12 18 0\par
S 0 18 0 18 18 9 9 0 36 9 9 6 12 12 6 18 0\par
M 0 18 18 0 18 9 9 9 9 36 0 12 6 12 6 18 0\par
K 18 0 0 18 18 9 9 9 9 0 36 6 12 6 12 18 0\par
H 12 12 12 0 27 6 12 12 6 12 6 36 8 8 8 27 0\par
B 12 12 0 12 27 6 12 6 12 6 12 8 36 8 8 27 0\par
V 0 12 12 12 27 12 6 6 12 12 6 8 8 36 8 27 0\par
D 12 0 12 12 27 12 6 12 6 6 12 8 8 8 36 27 0\par
N 9 9 9 9 36 18 18 18 18 18 18 27 27 27 27 36 0\par
\pard \li1580\ri1560\sl220\keepn\box\brsp100\brdrth ? 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.3\tab The DNA score matrix using IUB symbols\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Using weight matrices for searching nucleotide sequences\par
\pard\plain \s4\qj\sa120\sl280 \f20 A we
ight matrix is the most sensitive way of defining a motif. It is a table of values that gives scores for each base type in each position along a motif. For a motif of length 8 bases the weight matrix would be a table 8 positions long and 4 deep. The simple
st way of choosing the values for the table is to take an alignment of all known examples of the motif and to count the frequency of occurrence of each base type at each position. These frequencies can be used as the table of weights. When the table is use
d to search a new sequence the program calculates a score for each position along the sequence by adding or multiplying (see note 6) the relevant values in the table. All positions that exceed some cutoff score are reported as matching the original set of
motifs.\par
\pard \s4\qj\sa120\sl280
How can we select a suitable cutoff score? The simplest way is to apply the weight matrix to all the known occurrences of the motif - i.e. the set of sequence segments used to create the table - and to see what scores they achieve. The cutoff can b
e selected accordingly. For convenience the weight matrix is stored as a file along with its cutoff score, a title that is displayed when the file is read, and a few other values need by the program. A routine for creating weight matrix files from sets of
aligned sequences is included in the program. When a search using the weight matrix is performed the program will either list the matching sequence segments or plot their positions as for the other motif search methods.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.3.1\tab Creating a weight matrix file from a set of aligned sequences\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par
2.\tab Select "Make weight matrix".\par
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 3.\tab
Define "Name of aligned sequences file". We assume the file of aligned sequences has already been created (See note 3). The program reads and displays the contents of the file numbering each sequence as it goes. Then it displays the length of the longes
t sequence.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Sum logs of weights". The alternative is to sum the weights when calculating scores (see note 4). \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Accept "Use all motif
positions". The alternative allows the user to define a "mask" which identifies positions within the motif that should be ignored when the matrix is created (see note 5). The program now calculates the weights and applies them in turn to each of the seque
nces in the file. The number and score for each sequence is displayed, followed by the top, bottom and mean scores and the standard deviation. In addition the mean plus and minus 3 standard deviations is displayed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Cutoff score". The default is the mean minus 3 standard deviations, but users may, for example, decide to use the lowest score obtained by the sequences in the file.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Top score for scaling plots". This parameter is used by the graphics output routine when scaling the plots. Its value will influence the height of lines plotted to represent matches.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Position to identify". When a search is performed it is not always appropriate to report the position of a match relative to the leftmost base in the motif. For example wh
en performing a splice junction search we may want to know the position of the G in the conserved GT, rather than the position of the first base in the matrix. The "Position to identify" allows the user to define which base is marked. The bases in the tabl
e are number 1,2,3 and so on.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Define a "Title". This is a title that will be displayed when the matrix file is read prior to performing a search. It is limited to 60 characters.\par
10.\tab Define "Name for new weight matrix file". Give a name for the weight matrix file. Typical dialogue is shown in figure 6.4.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 \page 2.3.2\tab Searching using a weight matrix\par
\pard\plain \s4\qj\sa120\sl280 \f20 Once a weight matrix has been stored in a file it can be used to search any sequence. Results can be displayed graphically or the matching sequence segments can be listed out with their scores.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par
2.\tab Select "Use weight matrix".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Motif weight matrix file". The name of the file containing the weight matrix. The program reads the file and displays its title.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define
"Cutoff score". The default will be the value set when the weight matrix file was created. If the score is negative the program will calculate sums of logs of frequencies, otherwise it will add frequencies.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Accept "Plot results". Alternatively they will be listed.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The results will appear as in figure 6.5\par
\pard\plain \li1440\ri1500\sb200\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Motif search using weight matrix\par
\pard \li1440\ri1500\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select operation\par
X 1 Use weight matrix\par
2 Make weight matrix\par
3 Rescale weight matrix\par
? Selection (1-3) (1) =2\par
? Name of aligned sequences file=heatshock.seq\par
1 ATAAAGAATATTCTAGAA\par
2 CTCGAGAAATTTCTCTGG 144\par
3 TTCTCGTTGCTTCGAGAG 36\par
4 GCCTCGAATGTTCGCGAA 15\par
5 GACTGGAATGTTCTGACC 45 DROSOPHILA HSP68\par
6 ATCTCGAATTTTCCCCTC 12\par
7 ATCCAGAAGCCTCYAGAA 35 DROSOPHILA HSP83\par
8 CTCTAGAAGTTTCTAGAG 25\par
9 TTCTAGAGACTTCCAGTT 15\par
10 CCCCAGAAACTTCCACGG 147 DROSOPHILA HSP22\par
11 GCGAAGAAAATTCGAGAG 46\par
12 TGCCGGTATTTTCTAGAT 26\par
13 CCCGAGAAGTTTCGTGTC 97 DROSOPHILA HSP23\par
14 TTCCGGACTCTTCTAGAA 13 DROSOPHILA HSP26\par
15 CTCGAGAAAGCTCGCGAA 204 XENOPUS HSP70\par
16 CTCGCGAATCTTCCGCGA 194\par
17 CTCGCGAAAGTTCTTCGG 139\par
18 CTCGGGAAACTTCGGGTC 72\par
19 TGCCAGAAGTTGCTAGCA 124 XENOPUS HSP30\par
20 CTCGGGAACGTCCCAGAA 14\par
21 ATCCCGAAACTTCTAGTT 129 SOYBEAN HSP17\par
22 GTCCAGAATGTTTCTGAA 98\par
23 TTTCAGAAAATTCTAGTT 78\par
24 CCCAAGGACTTTCTCGAA 28\par
25 TTTTAGAATGTTCTAGAA 179 DICTYOSTELIUM DIRS-1\par
26 TTCTAGAACATTCGAAGA 169\par
Length of motif 18\par
? Sum logs of weights (y/n) (y) =\par
? Use all motif positions (y/n) (y) =\par
Applying matrix to input sequences\par
1 -15.609 ATAAAGAATATTCTAGAA\par
2 -15.965 CTCGAGAAATTTCTCTGG\par
3 -18.186 TTCTCGTTGCTTCGAGAG\par
4 -15.331 GCCTCGAATGTTCGCGAA\par
5 -20.897 GACTGGAATGTTCTGACC\par
6 -17.347 ATCTCGAATTTTCCCCTC\par
7 -16.271 ATCCAGAAGCCTCYAGAA\par
8 -12.227 CTCTAGAAGTTTCTAGAG\par
9 -15.933 TTCTAGAGACTTCCAGTT\par
10 -15.604 CCCCAGAAACTTCCACGG\par
11 -17.866 GCGAAGAAAATTCGAGAG\par
12 -17.159 TGCCGGTATTTTCTAGAT\par
13 -16.399 CCCGAGAAGTTTCGTGTC\par
14 -14.646 TTCCGGACTCTTCTAGAA\par
15 -14.801 CTCGAGAAAGCTCGCGAA\par
16 -16.163 CTCGCGAATCTTCCGCGA\par
17 -16.280 CTCGCGAAAGTTCTTCGG\par
18 -15.598 CTCGGGAAACTTCGGGTC\par
19 -17.721 TGCCAGAAGTTGCTAGCA\par
20 -16.257 CTCGGGAACGTCCCAGAA\par
21 -14.243 ATCCCGAAACTTCTAGTT\par
22 -16.456 GTCCAGAATGTTTCTGAA\par
23 -15.453 TTTCAGAAAATTCTAGTT\par
24 -17.443 CCCAAGGACTTTCTCGAA\par
25 -13.335 TTTTAGAATGTTCTAGAA\par
26 -15.914 TTCTAGAACATTCGAAGA\par
Top score -12.227 Bottom score -20.897\par
Mean -16.119 Standard deviation 1.636\par
Mean minus 3.sd -21.028 Mean plus 3.sd -11.210\par
? Cutoff score (-999.00-9999.00) (-21.03) =\par
? Top score for scaling plots (-21.03-999.00) (-11.21) =\par
? Position to identify (0-18) (1) =\par
? Title=Heatshock weights 24-10-91\par
\pard \li1440\ri1500\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth ? Name for new weight matrix file=heatshock.wts\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.4\tab An example run of creating a weight matrix\par
\pard\plain \li1400\ri1500\sb300\sl220\box\brsp100\brdrth \f4\fs16 Motif search using weight matrix\par
\pard \li1400\ri1500\sl220\box\brsp100\brdrth Select operation\par
X 1 Use weight matrix\par
2 Make weight matrix\par
3 Rescale weight matrix\par
? Selection (1-3) (1) =\par
? Motif weight matrix file=heatshock.wts\par
Heatshock weights 24-10-91\par
? Cutoff score (-9999.00-9999.00) (-21.03) =\par
? Plot results (y/n) (y) =\par
\par
619 -20.84 gctcggaagcttctgctc\par
818 -20.74 ttggcgaagctttcaaag\par
1190 -21.02 gccaggtaagtttcagac\par
1601 -20.91 tttgcgactgttcggtaa\par
2387 -20.24 cgctcgcagattctggac\par
2534 -20.87 gccgagaagatcatcgaa\par
2890 -16.38 ctcccggatgttctggag\par
2989 -19.54 ctcgcgaaaatttctgct\par
3451 -20.76 atcctggaagttccggtt\par
6020 -20.73 tctcaggaactgctggaa\par
6335 -20.51 gctgagaaattccgtgac\par
7107 -20.31 ctctggtctggtcgagaa\par
7117 -19.61 gtcgagaaaatccaggta\par
\pard \li1400\ri1500\sl220\keepn\box\brsp100\brdrth 7892 -20.18 cttccgaaagtgctgcat\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.5\tab Example run of a search using a weight matrix to produce text output.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Using "hardwired" motif searches.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The program contains predefined motif definitions for the following\:\par
\pard \s4\qj\li1120\sa120\sl280 {\i E. coli} promoters\par
prokaryotic ribosome binding sites\par
mRNA splice junctions\par
eukaryotic ribosome binding sites\par
polyadenylation sites\par
\pard \s4\qj\sb240\sa120\sl280 All except the po
lyadenylation site, which is simply defined as an exact match to the string AATAAA, are represented as weight matrices. Each search is performed simply by the user selecting the appropriate option from the menu and each plots its results in its own graphic
s window. The ribosome binding site searches are reading frame specific and so they normally plot their results to fit nicely with the output from the "gene search by content" methods described in the chapter on finding genes. Likewise the splice junction
searches produce separate output for each of the three reading frames. Below, as an example of using the hardwired motifs, we show how to perform such a search.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.4.1\tab Searching for splice junctions\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Splice search using weight matrix". The program automatically reads in weight matrices that define the donor and acceptor sites and displays their titles.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Donor cutoff score". The default is stored in the file.\par
3.\tab Define "Acceptor cutoff score". The default is stored in the file.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4. \tab Accept "Plot results". The alternative lists the results giving the position, score, matching sequence and reading frame. A typical plotted result appears in figure 6.6.\par
\pard\plain \qj\ri-100\sb240\sl480\keepn \f4\fs16 {{\pict\macpict\picw454\pich123
04be00000000007b01c6001102ff0c00fffe0000002d8f9e002d8f9e00000000004e011f000000000001000a00000000004e011f0098802400000000004e011f0000000000000000002d8f9e002d8f9e000000010001000100000000000000000000000000439867000000010000ffffffffffff0001000000000000000000
00004e011f00000000004e011f000002dd0006007fdfff00fc060040df000004060040df0000040a0040e9000020f80000040a0040e9000020f80000040c0040e9000020fa00022000040c0040e9000020fa0002200004110040eb0005200020000080fd0002200004170040fd000001f200071000200020000090fd000220
0004170040fd000001f200071000600020080090fd0002200004170040fd000001f200071000600020080090fd0002200004180040fe00011001f2000712006000200c0090fd000224000406007fdfff00fc060040df0000040a0040ee000008f30000040a0040ee000008f30000040a0040ee000008f30000040a0040ee00
0008f30000040a0040ee000008f30000040a0040ee000008f300000c0a0040ee000008f300000c0a0040ee000008f300000c0e0040ee000008fe000010f700000c180040f6000001fc0002010008fe000010fc000008fd00000c2002400004fd0005400010000001fc000601000808001010fc000008fe0001800c06007fdf
ff00fc060040df000004060040df0000040a0040fc000004e50000040a0040fc000004e50000040c0040fc000004e700020104040c0041fc000004e70002010404100041fc000004fe000008eb0002010404100041fc000004fe000008eb0002010404150041fc000004fe00010814f3000010fb00020104041a014180fd00
0004fe0005081400400040f7000010fb00020904041b02498008fe000904400200081400400040f7000050fb000209040406007fdfff00fc060040df000004060040df000004060040df000004060040df000004060040df0000040a0040fe000010e30000040e0040fe000010f4000001f10000040e0040fe000010f40000
01f10000040e0040fe000010f4000001f10000040e0040fe000018f4000001f1000004180040fe000018f60002080001fb00040800008001fc0000041d04400000081afd000005fc000308080001fb00040800008001fc00000406007fdfff00fc060040df000004060040df0000040a0040f8000008e90000040a0040f800
0008e90000040a0040f8000008e90000040a0040f8000008e90000040e0040f8000008ee000004fd000004140040fa0002400008f6000002fa000004fd000004180040fe000040fe0002400008f6000002fa000004fd000004190040fe000040fe0002400008f600010a02fb000004fd000004220048fe000a402000004000
4801000001fe0006408000000a0202fc000004fd00000406007fdfff00fc060040df000004060040df000004060040df000004060040df000004090040e2000340000004090040e20003400000040c0340000002e50003400000040c0340000002e50003400000040e0340000002e70005080040020004120340000002eb00
0001fe0005080040020004120340080002eb000001fe00050800400200041b044008020280f6000040fd000008fd000001fe000508004002000406007fdfff00fc02dd000000ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.6\tab
Typical graphical output from using the hardwired splice junction search. The results are presented in a reading frame specific way so it shows, in the bottom three boxes, results for donor sites and in the top three boxes those for acceptor sit
es. In both cases the vertical ordering of the boxes is frame 0 at the bottom, frame 1 in
the middle and frame 2 at the top. For example there is a very strong peak corresponding to an acceptor in frame 1 that can be seen just over halfway along the sequence .\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
For this program a motif is a short segment of sequence of fixed length. More complex structures termed "patterns" which we define as sets of motifs separated by varying gaps, are covered in another chapter. The current chapter should be read before the
chapter on patterns. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab It is debateable whether the gain in sensitivity that is afforded by the use of a score matrix is of value for searching nucleotide sequences, however it is very important for protein sequences.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
The files of aligned sequences used to make weight matrices have the following format. Each sequence should be on a separate line. The sequence should start in column 2 and is terminated by a new line or a space. Anything after the space is treated as
a comment. The files can be created by previous searches or using an editor.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab The frequencies in the weight
matrix can be used in two ways to calculate scores for sequences. Some users prefer to add the frequencies to give a total score, and others to multiply them by summing their logs. If we regard the frequencies as probabilities then multiplication seems the
correct procedure. The user chooses which method will be employed when the weight matrix is created, however the choice can be overridden when the matrix is used. If multiplication is selected then all results will presented as sums of logs.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Masking th
e weight matrix is particularly useful in cases where a limited number of examples of a motif are available, or when the motif may have several components. In the first case the limited number of examples may make the matrix unrepresentative of the motif b
ecause the bases in the unconserved positions may bias the results of searches. When a large number of examples is available to create the matrix, the unconserved positions should tend towards equal base composition and hence have no influence on the overa
ll score. We stated that a motif might have several components\: for example a motif might have both structural and specificity components. We may want to separate out the two parts and masking provides such a facility.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab
The weight matrix handling routine contains a further option "Rescale weight matrix". If the user has edited a weight matrix to change the frequency values this provides a way of selecting a new cutoff score. It allows users to read in a set of aligned
sequences and a weight matrix and to apply the matrix to the set of sequences to see the range of scores achieved. A new weight matrix file containing the selected cutoff score is written to disk.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab The program also contains a set of routines identical to those used to create and search for nucleotide weight matrices, but which deal instead with dinucleotide weight matrices. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab The reader is reminded that most options in the program, if selected when in "execute without dialogue" mode, will automatically use a set of defaults and produce a
result with little or no user input. Most motif searches require far less user input than that shown above, where we have tried to show the scope of the methods.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab
Although the program contains hardwired motifs we expect most sites that use the programs to accumulate their own libraries of motifs and patterns, which users can employ by simply knowing the names of the corresponding files.\par
\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \page 4.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1984. Computer methods to locate signals in nucleic acid sequences. {\i Nucl. Acids Res}. {\b 12}\:521-538.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Staden, R. 1985. Computer methods to locate genes and signals in nucleic acid sequences. (in) {\i Genetic Engineering, Principle and Methods, }Setlow J.K., Hollaender A., (eds.), {\b 7}\:
67-114, (Plenum Press, New York).\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4 (1)}\:53-60.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 7. Using Patterns to Analyse Nucleic Acid Sequences\par
\pard\plain \s5\sb200\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Creating a pattern file containing an exact match motif and weight matrix motif.\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.2\tab Searching a sequence using a pattern file\par
2.3\tab Comparing a sequence against a library of patterns\par
2.4\tab Searching sequence libraries for patterns\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
4.\tab References\par
\pard\plain \s5\sb200\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20 Here we describe one of the most powerful facilities provided by the program NIP\: the ability to define and search for complex patterns of motifs (1-3).
In another chapter we give details of seaching for individual motifs but here we show how to create patterns and libraries of patterns and to use them to search single sequences and sequence libraries. Once a pattern has been defined and stored in a file
it can used to search any sequence. In addition if users want to routinely screen sequences against libraries of patterns this can be achieved by use of files of file names. The program can produce several alternative forms of output. It will display the s
egment of sequence matching each individual motif in the pattern, display all the sequence between and including the two outermost motifs, produce a description of the match in the form of an EMBL feature table, or draw a simple graphical plot.\par
\pard \s4\qj\sa120\sl280 At the end of the chapter we describe how a related program NIPL is used to search libraries of sequences to find patterns. NIPL is capable of producing alignments of sequence families.\par
\pard \s4\qj\sa120\sl280 Patterns are defined as sets of motifs with variable spacing. Each motif in a pat
tern can be defined using any of several methods, and their positions relative to one other are defined in terms of minimum and maximum separations. In addition, by the use of logical operators, each motif can be declared to be essential (the AND operator)
, optional (the OR operator), or forbidden (the NOT operator). The following methods (termed "classes" by the program) for defining motifs are provided\:
1) exact match to a short sequence; 2) percentage match to a short sequence; 3) match to a short sequen
ce using a score matrix and cutoff score; 4) match to a weight matrix; 5) match to the complement of a weight matrix; 6) inverted repeat or stem-loop; 7) exact match to a short sequence with a defined step; 8) direct repeat. Classes 1, 2 , 3 and 7 permit t
he use of IUB redundancy codes.\par
\pard \s4\qj\sa120\sl280 The motifs in a pattern are numbered sequentially and motif spacing is defined in the following way. When a new motif is added to a pattern the user specifies the "Reference motif" by its number and then a "Relative start po
sition". The "Relative start position" is defined by taking the first base of the "Reference motif" as position 1, the next as 2, and so on. Then the user defines the allowed variation in the spacing by specifying the "Number of extra positions". Notice th
at the position of a motif can be defined relative to any other motif, and that a negative "Relative start position" declares the motif to be to the left of its "Reference motif".\par
\pard \s4\qj\sa120\sl280 The probability of finding each individual motif in the current sequence, th
e product of the probabilities for all the motifs in a pattern "Probability of finding pattern", and the "Expected number of matches" is calculated and displayed by the program. In addition to the cutoffs used for the individual motifs, users can apply two
pattern cutoffs\: "Maximum pattern probability" and "Minimum pattern score".\par
Below we describe\: how to create a pattern; how to use a pattern file to search a sequence; how to use a "File of pattern file names" to search a sequence for a whole library of
patterns. To describe how to create a pattern file we first show all the steps to make one containing two motifs, and then, to save space, the parts specific to the individual motif types are sketched in the notes section.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2. Methods\par
\pard\plain \s6\fi-560\li560\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Creating a pattern file containing an exact match motif and weight matrix motif.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher".\par
2.\tab Select "Pattern definition mode" as "Use keyboard".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select "Results display mode" as "Motif by motif". The alternatives are listed in the introduction.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Motif definition mode" as "Exact match".\par
5.\tab Define "Motif name". Each motif can be given an 8 character name.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "String". Type in the sequence of the motif. The program will display the probability of finding the motif.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Select "Motif definition mode" as "Weight matrix".\par
8.\tab Define "Motif name".\par
9.\tab Select "Logical operator" as "AND". The alternatives are "OR" and "NOT".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Select "Number of reference motif". At this stage the only choice is 1 and this is the default.\par
11.\tab Define "Relative start position". The base position relative to the "Reference motif". See the introduction.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 12.\tab Define "Number of extra positions".\par
13.\tab Define "Weight matrix file name". Type the name of the file containing the weight matrix.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab The program now cycles round to step 7 and all subsequent passes round the loop to add further motifs to the pattern would differ only in the details for the different motif "classes".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 14.\tab Select "Pattern complete"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 15.\tab Accept "Save pattern in a file". The alternative does not save the pattern and so it can only be used once on the current sequence.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16.\tab Define "Pattern definition file". Give a name for the new file.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 17. \tab
"Define "Pattern title". All patterns can have a 60 character title that can be displayed when the pattern file is read and the sequence searched. The program will now display a detailed textual description of the pattern, the "Probability of finding
the pattern" and the "Expected number of matches".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 18.\tab Define "Maximum pattern probability". Yes maximum\: any match with a greater probability of being found will be rejected. If no value is specified the search will be quicker (see notes).\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 19.\tab
Define "Minimum pattern score". A minimum pattern score only makes sense if all the motifs in the pattern are defined with compatible scoring methods. For example percentage matches and weight matrices using sums of logs are incompatible. Searching wil
l now commence and any matches displayed using the chosen method. A worked example of creating such a pattern and performing a search is shown in figure 7.1, and the actual pattern file is shown in figure 7.2.\par
\pard\plain \li1360\ri1300\sb200\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Pattern searcher\par
\pard \li1360\ri1300\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select pattern definition mode\par
X 1 Use keyboard \par
2 Use pattern file \par
3 Use file of pattern file names\par
? Selection (1-3) (1) =\par
Select results display mode\par
X 1 Motif by motif \par
2 Inclusive \par
3 Graphical \par
4 EMBL feature table \par
? Selection (1-4) (1) =\par
Select motif definition mode\par
X 1 Exact match \par
2 Percentage match \par
3 Cut-off score and score matrix \par
4 Cut-off score and weight matrix\par
5 Complement of weight matrix \par
6 Inverted repeat or stem-loop \par
7 Exact match, defined step \par
8 Direct repeat \par
9 Pattern complete \par
? Selection (1-9) (1) =\par
? Motif name=T run\par
? String=TTTTT\par
Probability of score 5.0000 = 0.870E-03\par
Select motif definition mode\par
X 1 Exact match \par
2 Percentage match \par
3 Cut-off score and score matrix \par
4 Cut-off score and weight matrix\par
5 Complement of weight matrix \par
6 Inverted repeat or stem-loop \par
7 Exact match, defined step \par
8 Direct repeat \par
9 Pattern complete \par
? Selection (1-9) (1) =4\par
? Motif name=heat\par
Select logical operator\par
X 1 And \par
2 Or \par
3 Not \par
? Selection (1-3) (1) =\par
? Number of reference motif (1-1) (1) =\par
? Relative start position (-1000-1000) (6) =10\par
? Number of extra positions (0-1000) (0) =20\par
? Weight matrix file name=heatshock.wts\par
Heatshock weights 18-12-90 \par
Probability of score -21.0280 = 0.117E-02\par
Select motif definition mode\par
1 Exact match \par
2 Percentage match \par
3 Cut-off score and score matrix \par
X 4 Cut-off score and weight matrix\par
5 Complement of weight matrix \par
6 Inverted repeat or stem-loop \par
7 Exact match, defined step \par
8 Direct repeat \par
9 Pattern complete \par
? Selection (1-9) (4) =9\par
? Save pattern in a file (y/n) (y) =\par
? Pattern definition file=_paper.pat\par
? Pattern title=demo pattern\par
Pattern description\par
\par
demo pattern \par
Motif 1 named T run is of class 1\par
Which is an exact match to the string\par
TTTTT\par
Motif 2 named heat is of class 4\par
Which is a match to a weight matrix with score -21.028\par
and the 5 prime base can take positions 10 to 30\par
relative to the 5 prime end of motif 1\par
It is anded with the previous motif.\par
Probability of finding pattern = 0.1015E-05\par
Expected number of matches = 0.1734E+00\par
? Maximum pattern probability (0.00-1.00) (1.00) =\par
? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par
Working\par
Match\par
505 T run \par
ttttt\par
528 heat \par
ttaaagaaagttttatac\par
Total matches found 1\par
\pard \li1360\ri1300\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Minimum and maximum observed scores -15.34 -15.34\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 7.1\tab Worked example of creating a simple pattern and performing a search.\par
\pard\plain \li2380\ri2520\sb300\sl220\box\brsp100\brdrth \f4\fs16 demo pattern \par
\pard \li2380\ri2520\sl220\box\brsp100\brdrth A1 T run Class \par
TTTTT\par
@ End of string\par
A4 heat Class \par
1 Relative motif\par
10 Relative start position\par
20 Number of extra positions\par
\pard \li2380\ri2520\sl220\keepn\box\brsp100\brdrth heatshock.wts\par
\pard\plain \s8\qj\fi-1140\li1140\sb80\sa120\sl240\tx1140 \f21\fs20 Figure 7.2\tab The pattern file created by the work shown in figure 7.1.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.2\tab Searching a sequence using a pattern file\par
\pard\plain \s7\qj\fi-560\li560\sb160\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Pattern definition mode" as "Use pattern file".\par
3.\tab Select "Results display mode" as "Inclusive"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Pattern definition file". Type the name of the file containing the pattern. The pr
ogram will read the file then display its title, a detailed textual description of the pattern, the "Probability of finding the pattern", and the "Expected number of matches".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Maximum pattern probability". \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Minimum pattern score". Searching will now commence and any matches displayed using the chosen method. A worked example, using the pattern file created in figure 7.1 is shown in figure 7.3.\par
\pard\plain \li1300\ri1320\sb300\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Pattern searcher\par
\pard \li1300\ri1320\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select pattern definition mode\par
X 1 Use keyboard \par
2 Use pattern file \par
3 Use file of pattern file names\par
? Selection (1-3) (1) =2\par
? Pattern definition file=_paper.pat\par
Select results display mode\par
X 1 Motif by motif \par
2 Inclusive \par
3 Graphical \par
4 EMBL feature table \par
? Selection (1-4) (1) =2\par
Probability of score 5.0000 = 0.870E-03\par
Heatshock weights 18-12-90 \par
Probability of score -21.0280 = 0.117E-02\par
\par
Pattern description\par
\par
demo pattern \par
Motif 1 named T run is of class 1\par
Which is an exact match to the string\par
TTTTT\par
Motif 2 named heat is of class 4\par
Which is a match to a weight matrix with score -21.028\par
and the 5 prime base can take positions 10 to 30\par
relative to the 5 prime end of motif 1\par
It is anded with the previous motif.\par
Probability of finding pattern = 0.1015E-05\par
Expected number of matches = 0.1734E+00\par
? Maximum pattern probability (0.00-1.00) (1.00) =\par
? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par
Working\par
505 T run \par
tttttgatgcttgactctaagccttaaagaaagttttatac\par
Total matches found 1\par
\pard \li1300\ri1320\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Minimum and maximum observed scores -15.34 -15.34\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 7.3\tab Worked example of using a pattern file as input.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.3\tab Comparing a sequence against a library of patterns\par
\pard\plain \s4\qj\sa120\sl280 \f20
This mode of operation allows a sequence to be searched, in turn, for any number of patterns each stored in a separate pattern file. The names of the files containing the individual patterns must be stored in a simple text file. This file is called "a file
of pattern file names" and its name is the only user input required to define the search.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par
2.\tab Select "Pattern definition mode" as "Use file of pattern file names".\par
3.\tab Select "Results display mode" as "Inclusive"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
Define "File of pattern file names". Type the name of the file containing the list of pattern file names. The program will read the file and then, in turn, all the pattern files it names. Each of these patterns will be compared against the current seque
nce but only those that give matches will produce any output. The pattern title and each match will be displayed.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Searching sequence libraries for patterns\par
\pard\plain \s4\qj\sa120\sl280 \f20
The program NIPL can be used to search sequence libraries for patterns. Its use is similar to the pattern search routine described above, except that it does not have the facility for creating pattern files, so they must be created beforehand using NIP. In
addition to its obvious application of finding new occurrences of patterns or checking on their frequency it is a usef
ul way of obtaining sequence alignments. It can restrict its search to a list of named entries or can search all but those on a list of entries. It can restrict its output to showing the highest scoring match in each sequence, but by default it will show a
ll matches.\par
\pard \s4\qj\sa120\sl280
Of its modes of output, two require further description. The first "Padded sections" creates a new file for each match. The file will contain the sequence between and including the two outermost motifs in the pattern. It will be gapped to the f
urthest extent defined by the pattern, which means that if all the files were subsequently written one above the other all the motifs in the pattern would be exactly aligned, with the sections between them containing the requisite numbers of padding charac
ters. The second such mode of output is called "Complete padded sequences". Here the user must know the maximum distance between the leftmost motif and the start of all the sequences that match. A trial run in which only the positions of matches are report
ed is usually required. The user gives this maximum distance to the program. The program then writes a new file containing the full length of all matching sequences, again maximally gapped (including their left ends) so that they would all align if written
above one another. For both of these modes of output the files created are named "entryname" where "entryname" is the name given to the sequence in the sequence library. These modes are best used with the option "Report all matches" rejected, so that only
the best match for each sequence is reported. The sequences can be lined up using the sequence assembly program SAP.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select NIPL.\par
2.\tab Define "Name for results file."\par
3.\tab Select a library.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
Select "Search whole library". The alternatives are "Search only a list of entries" and "Search all but a list of entries". The files containing the list of entries should contain one entry name per line, left justified.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Results display mode" as "Inclusive". The alternatives include "Motif by motif", "Scores only", "Complete padded sequences" and "Padded sections".\par
6.\tab Accept "Report all matches". The alternative only shows the best match for each sequence.\par
7.\tab Define "Pattern definition file". The name of the file containing the pattern created using NIP. \par
\tab The program displays a textual description of the pattern and the expected number of matches per 1000 residues assuming an average nucleic acid composition.\par
8.\tab Define "Maximum pattern probability". The program will run much more quickly if none is given.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Define "Minimum pattern score".\par
\pard\plain \s4\qj\sa120\sl280 \f20 The search will start.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
The "exact match" motif class requires a consensus sequence. The "percentage match" motif class requires a consensus sequence and a cutoff score. The "score matrix" motif class requires a consensus sequence and a cutoff score. The "weight matrix" searc
h and the "complement of a weight matrix" only require the name of the file containing the matrix. The "inverted repeat" or "stem-loop" requires a stem length, minimum and maximum loop sizes,
and a cutoff score using scores A-T = G-C = 2, G-T = 1. Note that if the user defines an inverted repeat as a "Reference motif" the "Relative position" can be defined from either its 5' or 3' ends. The "direct repeat" motif class requires a repeat length
, the minimum and maximum gap between the two occurrences of the repeat, and a minimum score.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab The motif class "Exact match, defined step" is rarely used. A typical use might be to find a start codon followed, for some minimum distance, by no stop codons
in the same reading frame. The step would have the value 3 to keep the reading frame the same as that of the start codon, and the stop codon searches would be included using the NOT operator.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
The details of the probabilty calculations are outside the scope of this article. They are quite rapid and are essential both for assessing the statistical significance of any matches found and for allowing meaningful cutoffs to be applied to patterns.
Obviously, in general, cutoff scores are inappropriate for patterns containing a mixture of motif classes.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
The program calculates the "Probability of finding the pattern" and the "Expected number of matches". The first figure is actually the product of the individual motif probabilities but the latter figure is more useful because it takes into account the a
llowed variation in spacing between motifs and the length of the current sequence. In both cases the composition of the current sequence is also used so that different probabilities would be calculated for other sequences.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
The pattern definition system is very flexible. Assume that a laboratory has a large library of patterns stored in its computer. Different groups or users may want to screen their sequences against different subsets of a pattern library. Each group ther
efore uses its own "File of pattern file names" which contains only the names of the pattern files that are relevant to their sequences. Of course a pattern may contain only one motif. Hence a library of patterns can include both simple and comp
lex patterns. In the same way a laboratory may have a large library of weight matrices defining different motifs and different users may want to combine them in different ways to produce their own patterns. \par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4(1)}\:53-60.\par
2.\tab Staden, R. 1989. Methods for calculating the probabilities of finding patterns in sequences. {\i CABIOS} {\b 5(2)}\:89-96.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 8. Searching for Restriction Sites\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Search for restriction sites and list them enzyme by enzyme\par
2.2\tab Search for restriction sites and list them by position\par
2.3\tab Search for restriction sites and list their names above the sequence\par
2.4\tab Search for restriction sites and plot their positions\par
2.5\tab Find restriction enzymes that cut infrequently\par
2.6\tab Producing a back translation from a protein sequence\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20
The program NIP contains a routine for finding and displaying the positions of the cut sites of restriction enzyme recognition sequences. Linear or circular sequences can be searched and the results can be listed in various forms or displayed graphically.
The recognition sequences to be searched for can be typed on the keyboard or read from files. The format of these files is given in note 1. At the end of the chapter we also describe how to pro
duce back translations of protein sequences so that these routines can be used to search them for restriction sites.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Search for restriction enzyme sites and list them enzyme by enzyme\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Input source" as "All enzymes file". A number of standard files are available and users may also have their own.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Accept "Search for all names". \par
4.\tab Select "Order results enzyme by enzyme".\par
5.\tab Accept "List matches".\par
6.\tab Accept "The sequence is linear". The alternative is circular.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Accept "Search for definite matches". The alternative is to search for possible matches in a sequence containing IUB redundancy codes.\par
\pard\plain \s4\qj\sa120\sl280 \f20
The results will then appear in the form shown in figure 8.1 Each match is numbered and its enzyme name given, followed by the matching sequence with the cut site indicated by a ' symbol. The position of the cut site is given followed by the length of the
potential fragment ending at that site, followed by a list of fragments sizes sorted on length.\par
\pard\plain \li1160\ri1380\sl220\box\brsp100\brdrth \f4\fs16 Matches found= 3\par
\pard \li1160\ri1380\sl220\box\brsp100\brdrth Name Sequence Position Fragment length\par
1 AccII cg'cg 313 312 51\par
2 AccII cg'cg 364 51 188\par
3 AccII cg'cg 552 188 312\par
449 449\par
Matches found= 6\par
Name Sequence Position Fragment length\par
1 AciI cc'gc 503 502 12\par
2 AciI gc'gg 553 50 12\par
3 AciI gc'gg 714 161 50\par
4 AciI gc'gg 872 158 105\par
5 AciI gc'gg 884 12 158\par
6 AciI cc'gc 896 12 161\par
105 502\par
Matches found= 3\par
Name Sequence Position Fragment length\par
1 AcyI gg'cgtc 698 697 5\par
2 AcyI gg'cgtc 765 67 67\par
\pard \li1160\ri1380\sl220\keepn\box\brsp100\brdrth 3 AcyI ga'cgcc 996 231 231\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 8.1\tab Typical output from "List enzyme by enzyme".\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Search for restriction enzyme sites and list them by position\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par
2.\tab Select "Input source" as "All enzymes file". \par
3.\tab Accept "Search for all names". \par
4.\tab Select "Order results by position".\par
5.\tab Accept "List matches". \par
6.\tab Accept "The sequence is linear".\par
7.\tab Accept "Search for definite matches". \par
\pard\plain \s4\qj\sa120\sl280 \f20 The results will then appear in the form shown in figure 8.2 Each match is numbered and its enzyme name given, followed b
y the matching sequence with the cut site indicated by a ' symbol. The position of the cut site is given followed by the length of the potential fragment ending at that site.\par
\pard\plain \s6\fi-540\li560\sb240\sa60\sl280\tx560 \b\f20 2.3\tab Search for restriction enzyme sites and list their names above the sequence\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par
2.\tab Select "Input source" as "All enzymes file". \par
3.\tab Accept "Search for all names". \par
4.\tab Select "Show names above the sequence".\par
5.\tab Reject "Hide translation".\par
6.\tab Accept "Use 1 letter codes".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Line length". This is the number of bases that will appear on each line of output. It must be a multiple of 30. \par
\pard\plain \li1640\ri1720\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Name Sequence Position Fragment length\par
\pard \li1640\ri1720\sl220\box\brsp100\brdrth 1 HapII c'cgg 2 1\par
2 HpaII c'cgg 2 0\par
3 MspI c'cgg 2 0\par
4 MseI t'taa 14 12\par
5 HincII gtt'aac 15 1\par
6 HindII gtt'aac 15 0\par
7 HpaI gtt'aac 15 0\par
8 DsaV 'ccagg 23 8\par
9 EcoRII 'ccagg 23 0\par
10 TspAI 'ccagg 23 0\par
11 ApyI cc'agg 25 2\par
12 BstNI cc'agg 25 0\par
13 MvaI cc'agg 25 0\par
14 ScrFI cc'agg 25 0\par
15 MaeIII 'gttac 47 22\par
16 BsrI actggt' 49 2\par
17 MseI t'taa 55 6\par
18 MaeII a'cgt 63 8\par
19 SfaNI gcatcaacaa'gata 86 23\par
\pard \li1640\ri1720\sl220\keepn\box\brsp100\brdrth 20 MaeII a'cgt 91 5\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 8.2\tab Typical output from "List by position".\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 8.\tab Accept "The sequence is linear".\par
9.\tab Accept "Search for definite matches". \par
\pard\plain \s4\qj\sa120\sl280 \f20 The results will then appear in the form shown in figure 8.3 The sequence is listed with a 3 phase translation underneath and every tenth base numbered. Above the sequence the positions of the cut sites of res
triction enzymes are marked.\par
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Search for restriction enzyme sites and plot their positions \par
\pard\plain \s7\qj\fi-560\li560\sa80\sl260\tx560 \f20 1.\tab Select "Search".\par
2.\tab Select "Input source" as "All enzymes file". \par
3.\tab Accept "Search for all names". \par
4.\tab Select "Order results by position".\par
5.\tab Reject "List matches". \par
6.\tab Accept "The sequence is linear".\par
7.\tab Accept "Search for definite matches".\par
\pard\plain \s4\qj\sa80\sl260 \f20 The results will then appear in the form shown in figure 8.4. Each enzyme that has a match is named at the left edge of the display and its cut sites are marked by short
vertical lines. If the display window fills up the bell will ring. Users may then take a screen dump before typing return. The program then displays the message " ? Restart plotting from bottom of frame". To do so type return. To quit type !.\par
\pard\plain \li1200\ri1240\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Search for restriction enzyme sites\par
\pard \li1200\ri1240\sl220\box\brsp100\brdrth Select operation\par
X 1 Search\par
2 List enzyme file\par
3 Clear text\par
4 Clear graphics\par
? Selection (1-4) (1) =\par
Select input source\par
1 All enzymes file\par
X 2 Six cutter file\par
3 Four cutter file\par
4 Personal file\par
5 Keyboard\par
? Selection (1-5) (2) =1\par
? Search for all names (y/n) (y) =\par
Select results display mode\par
X 1 Order results enzyme by enzyme\par
2 Order results by position\par
3 Show only infrequent cutters\par
4 Show names above the sequence\par
? Selection (1-4) (1) =4\par
? Hide translation (y/n) (y) =n\par
? Use 1 letter codes (y/n) (y) =\par
? Line length (30-90) (60) =\par
? The sequence is linear (y/n) (y) =\par
? Search for definite matches (y/n) (y) =\par
\par
HapII\par
HpaII\par
MspI MseI\par
. .HincII\par
. .HindII\par
. .HpaI DsaV\par
. .. EcoRII\par
. .. TspAI\par
. .. . ApyI\par
. .. . BstNI\par
. .. . MvaI\par
. .. . ScrFI MaeIII\par
. .. . . . BsrI MseI\par
ccggttagactgttaacaacaaccaggttttctactgatataactggttacatttaacgc\par
10 20 30 40 50 60\par
P V R L L T T T R F S T D I T G Y I * R\par
R L D C * Q Q P G F L L I * L V T F N A\par
\pard \li1200\ri1240\sl220\keepn\box\brsp100\brdrth G * T V N N N Q V F Y * Y N W L H L T P\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 8.3\tab Typical dialogue and output for a "Names above the sequence" search.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Finding restriction enzymes that cut infrequently\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par
2.\tab Select "Input source" as "All enzymes file". \par
3.\tab Accept "Search for all names". \par
4.\tab Select "Show only infrequent cutters".\par
5.\tab Define "Maximum number of cuts".\par
6.\tab Accept "The sequence is linear".\par
\pard\plain \li160\ri200\sl220\keepn\box\brsp100\brdrth \f4\fs16 {{\pict\macpict\picw430\pich254
0b99ffffffff00fd01ad1101a0008201000affffffff00fd01ad090000000000000000310000002400fa01ac9800240000000000b7011f0000000000b7011f0000002400fa01ac000102dd001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001
f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000210000006007fdfff00fc06f5000020ea0006f5000020ea0006f5000020ea0006f5000020ea0006f5000020ea0006f5000020ea0006007f
dfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc040020de00040020de00040020de00040020de00040020de00040020de0006007fdfff00fc1402
000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000210000006007fdfff00fc06fb000004e40006fb000004e40006fb00
0004e40006fb000004e40006fb000004e40006fb000004e40006007fdfff00fc0af8000080fe000080eb000af8000080fe000080eb000af8000080fe000080eb000af8000080fe000080eb000af8000080fe000080eb0006007fdfff00fc040020de00040020de00040020de00040020de00040020de00040020de0006007f
dfff00fc0602000040e0000602000040e0000602000040e0000602000040e0000602000040e00006007fdfff00fc0602000040e0000602000040e0000602000040e0000602000040e0000602000040e00006007fdfff00fc0602000040e0000602000040e0000602000040e0000602000040e0000602000040e00006020000
40e00006007fdfff00fc06eb000040f40006eb000040f40006eb000040f40006eb000040f40006eb000040f40006007fdfff00fc06eb000040f40006eb000040f40006eb000040f40006eb000040f40006eb000040f40006eb000040f40006007fdfff00fc06eb000010f40006eb000010f40006eb000010f40006eb000010
f40006eb000010f40006007fdfff00fc040020de00040020de00040020de00040020de00040020de00040020de0006007fdfff00fc06eb000008f40006eb000008f40006eb000008f40006eb000008f40006eb000008f40006007fdfff00fc06fe000020e10006fe000020e10006fe000020e10006fe000020e10006fe0000
20e10006fe000020e10006007fdfff00fc06fe000020e10006fe000020e10006fe000020e10006fe000020e10006fe000020e10006007fdfff00fc06fe000020e10006fe000020e10006fe000020e10006fe000020e10006fe000020e10006007fdfff00fc06eb000008f40006eb000008f40006eb000008f40006eb000008
f40006eb000008f40006eb000008f40006007fdfff00fc06eb000010f40006eb000010f40006eb000010f40006eb000010f40006eb000010f40006007fdfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fe000008e10006fe000008e10006fe000008e1
0006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fa000080e50006fa000080e50006fa000080e50006fa000080e50006fa000080e50006007fdfff00fc06eb000008f40006eb000008f40006eb000008f40006eb000008f40006eb000008f40006eb000008f40006007fdfff00fc06fe000008e100
06fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc02dd00a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00020000000e00
252c000800140554696d65730300140d00092e0004000001002b010b055472753949a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a000c0000001800252a0a055366614e49a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a0014000000
2000252a08055363724649a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a001c0000002800252a08044d766149a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00260000003200252a0a044d737049a00097a10096000c010000000200
000000000000a1009a0008fffd00000011000001000a002e0000003a00252a08044d736549a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00370000004300252a09064d6165494949a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00
400000004c00252a09054d61654949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00490000005500252a09054d70614949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00510000005d00252a08044d706149a00097a10096000c01
0000000200000000000000a1009a0008fffc00000011000001000a00590000006500252a080648696e644949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00630000006f00252a0a0648696e634949a00097a10096000c010000000200000000000000a1009a0008fffc000000
11000001000a006b0000007700252a080648696e503149a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00750000008100252a0a0548696e3649a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a007d0000008900252a080448686149a0
0097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00870000009300252a0a054861704949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a008f0000009b00252a08054861654949a00097a10096000c010000000200000000000000a1009a00
08fffd00000011000001000a0098000000a400252a090645636f524949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00a1000000ad00252a090745636c31333649a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00a9000000b50025
2a080444736156a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00b2000000be00252a090444646549a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00ba000000c600252a080443666f49a00097a10096000c01000000020000000000
0000a1009a0008fffc00000011000001000a00c3000000cf00252a09054273744f49a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00cc000000d800252a09054273744e49a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00d4000000
e000252a080442737249a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00de000000ea00252a0a084273703134334949a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00e6000000f200252a08054273694c49a00097a10096000c0100
00000200000000000000a1009a0008fffd00000011000001000a00f0000000fc00252a0a0441707949a00097a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 8.4\tab Typical output from "Plot positions".\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 7.\tab Accept "Search for definite matches". \par
\pard\plain \s4\qj\sa120\sl280 \f20 The names and number of cut sites of all enzymes that cut less than or equal to the "Maximum number of cuts" will then be displayed.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Producing a back translation from a protein sequence \par
\pard\plain \s4\qj\sa120\sl280 \f20
The routine for producing back translations is contained in the program PIP. It back translates protein sequences into DNA using the standard genetic code. The translation can use either the IUB symbols or a set of codon preferences. If a set of codon pre
ferences is used they must conform to the format of codon tables pr
oduced by the nucleotide interpretation program, and the back translation will contain the favoured codons. If, for any amino acid there is no favoured codon, the IUB symbols will be employed. The program will plot the redundancy along the sequence and hen
ce can be used to find the best sequences to use as primers. The DNA sequence can be saved to a file and analysed using the nucleotide analysis program. \par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Back translate".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "No codon preference". The alternative will cause the program to ask for "File name of codon table", which should be in the same format as those created by the nucleotide interpretation program.
\par
3.\tab Reject "Plot redundancy". The alternative will ask for a window length to use for the plot. The window length is in codons. A plot will appear in which the best primers are sited at the peaks and the worst at the troughs.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Save DNA to disk"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "File name for DNA sequence". This file can later be read into program NIP and all the searches described above employed.\par
\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \page 3.\tab Notes\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
The file containing the definitions of the restriction enzymes names and their recognition sequences uses the standard IUB redundancy symbols and has the following format. Each name is followed by a /, then each of its recognition sequences is followed
by a /. The last recognition sequence for each enzyme is followed by //. The cut sites should be indicated by a '. If the cut site is not contained in the recognition sequence, the recognition sequence should be extended by sufficent N symbo
ls. For example the two lines from the standard file shown below define the enzymes Alw21I and Alw26I. These files are kindly updated each month by Dr. Rich Roberts.\par
\pard \s7\qj\li1720\sa120\sl280\tx1720 Alw21I/GWGCW'C//\par
Alw26I/GTCTCN'NNNN/'NNNNNGATCC//\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab
To search for a subset of the restriction enzymes in a file the user should reject "Search for all names" and the program will ask for the names of the enzymes wanted and extract their recognition sequences from the file. Alternatively, if a user was al
ways using the same subset, then a file containing only those enzymes could be created by editing the standard file. This file would then be selected as "Personal file" for "Input source".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
The routine also allows names and recognition sequence to be entered on the keyboard. This is selected as "Keyboard" for "Input source", and the program will prompt for names and their recognition sequences. In this way the routine can be used to search
for exact matches to any short sequence. Again IUB redundancy codes can be used.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab When back translating
from proteins it is often useful to produce a back translation using both a table of codon preferences and one using the IUB symbols. This is because the restriction enzyme search program can distinguish between definite and possible cuts in the sequence.
Those matches that the program terms "definite matches" are ones in which the specification of the recognition sequence corresponds exactly to that of the back translation. The program will also find what it terms "possible matches" which are ones that dep
end on the particular codons chosen for each amino acid. These are sites at which recognition sequences could be engineered to produce a cut in the DNA without changing the amino acid, but which are not necessarily found in the original sequence. \par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 9. Statistical and Structural Analysis of Nucleotide Sequences\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Calculating the base composition\par
2.2\tab Calculating the dinucleotide composition\par
2.3\tab Calculating the codon composition\par
2.4 \tab Creating a codon usage file\par
2.5\tab Plotting the base composition\par
2.6 \tab Searching for anomalous compositions\par
2.7\tab Search for anomalous word usage\par
2.8\tab Calculate codon constraint\par
2.9 \tab Searching for stem-loops\par
2.10\tab Searching for long range inverted repeats\par
2.11\tab Searching for long range repeats\par
2.12\tab Searching for repeated words\par
2.13\tab Searching for possible Z DNA\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
4.\tab References\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we deal with performing simple statistical and structural analysis of nucleotide sequences and also describe some more unusual test
s. We cover base, dinucleotide and codon compositions, potential amino acid compositions, and the relative frequencies of each base in each position of codons. We describe how to produce plots to show regions of unusual composition and to measure the codon
bias for a gene. In addition we describe a set of functions for finding "structures" in nucleotide sequences, including short range inverted repeats or stem-loops, long range inverted repeats, long range direct repeats, and Z DNA. All the methods are cont
ained in the program NIP.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Calculating the base composition\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab Select "Calculate base composition". The composition of the active region is shown.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.2\tab Calculating the dinucleotide composition\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab
Select "Calculate dinucleotide composition". The dinucleotide composition of the active region and an expected dinucleotide composition is shown. The expected composition is calculated from the base composition assuming a random order of bases in the sequ
ence. See figure 9.1.\par
\pard\plain \li1180\ri1440\sb200\sl220\box\brsp100\brdrth \f4\fs16 T C A G\par
\pard \li1180\ri1440\sl220\box\brsp100\brdrth Obs Expected Obs Expected Obs Expected Obs Expected\par
T 5.86 5.97 6.18 5.99 4.24 5.91 8.14 6.56\par
C 6.10 5.99 5.14 6.02 5.91 5.93 7.38 6.59\par
A 5.57 5.91 5.64 5.93 7.91 5.84 5.05 6.49\par
\pard \li1180\ri1440\sl220\keepn\box\brsp100\brdrth G 6.90 6.56 7.56 6.59 6.11 6.49 6.30 7.22\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa200\sl240\tx1140 \f21\fs20 Figure 9.1\tab The dinucleotide composition display\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Calculating the codon composition\par
\pard\plain \s4\qj\sa120\sl280 \f20 This function counts codons, amino acid composition, protein molecular weights, hydrophobicity and base compos
itions. Users select the segments of the sequence to be analysed. The segments can be defined on the keyboard or from an EMBL/GenBank feature table.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate codon composition".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Show observed counts". The alternative displays its codon tables so that the total for each amino acid sums to 100. This makes it easier to see any bias present in the codon usage.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Accept "Define segments using keyboard". The alternative is to use a feature table.\par
4.\tab Define "From". The start of the segment to be analysed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
Define "To". The end of the segment to be analysed. The results will be displayed as in figure 9.2 and then the program will again ask "From". The user should define a zero value for "From" when all segments of interest have been analysed. The program w
ill then display a cummulative total for all the values it calculates.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The counts are broken down into several figures. Apart from the codon counts we see the base composition by position in codon expressed as a percentage of each bases own
frequency; base composition by position in codon expressed as a percentage of the overall base composition of the segment; base composition expected for the observed amino acid composition if there was no codon preference; percentage deviations of the ob
served amino acid composition from an average amino acid composition (1) ; the molecular weight and hydrophobicity (2) of the putative amino acid sequence.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4 Creating a codon usage file\par
\pard\plain \s4\qj\sa120\sl280 \f20 This method writes a file of codon usage in the form of a codon tab
le (see figure 9.2). Such tables can be used by several other methods contained within the programs. If required the user can start with an existing file and add to it.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate a codon table and write it to disk".\par
2.\tab Accept "Start with empty table".\par
\pard\plain \li440\ri500\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Calculate base, codon and amino acid compositions\par
\pard \li440\ri500\sl220\box\brsp100\brdrth ? Show observed counts (y/n) (y) =\par
? Define segments using keyboard (y/n) (y) =\par
\par
? From (0-8134) (0) =1\par
? To (1-8134) (8134) =1000\par
? + strand (y/n) (y) =\par
===========================================\par
F TTT 5. S TCT 7. Y TAT 4. C TGT 2.\par
F TTC 17. S TCC 3. Y TAC 5. C TGC 3.\par
L TTA 3. S TCA 4. * TAA 3. * TGA 1.\par
L TTG 4. S TCG 3. * TAG 0. W TGG 7.\par
===========================================\par
L CTT 3. P CCT 6. H CAT 6. R CGT 3.\par
L CTC 1. P CCC 1. H CAC 4. R CGC 2.\par
L CTA 0. P CCA 4. Q CAA 3. R CGA 1.\par
L CTG 36. P CCG 6. Q CAG 5. R CGG 4.\par
===========================================\par
I ATT 12. T ACT 3. N AAT 6. S AGT 0.\par
I ATC 13. T ACC 5. N AAC 7. S AGC 7.\par
I ATA 1. T ACA 2. K AAA 9. R AGA 0.\par
M ATG 9. T ACG 7. K AAG 3. R AGG 1.\par
===========================================\par
V GTT 6. A GCT 5. D GAT 7. G GGT 9.\par
V GTC 3. A GCC 6. D GAC 6. G GGC 9.\par
V GTA 7. A GCA 2. E GAA 5. G GGA 5.\par
V GTG 9. A GCG 7. E GAG 3. G GGG 3.\par
===========================================\par
Total codons= 333.\par
T C A G\par
1 25.00 34.27 40.28 35.94\par
2 45.42 28.63 36.02 22.27\par
3 29.58 37.10 23.70 41.80\par
----- ----- ----- -----\par
= 100% 100% 100% 100%\par
1 21.32 25.53 25.53 27.63 = 100%\par
2 38.74 21.32 22.82 17.12 = 100%\par
3 25.23 27.63 15.02 32.13 = 100%\par
% 28.43 24.82 21.12 25.63 Observed, overall totals\par
% 29.65 23.25 23.95 23.15 Expected, even codons per acid\par
A C D E F G H I K L\par
20. 5. 13. 8. 22. 26. 10. 26. 12. 47.\par
O-E % -27. -11. -25. -61. 71. 10. 38. 52. -36. 59.\par
M N P Q R S T V W Y\par
9. 13. 17. 8. 11. 24. 17. 25. 7. 9.\par
O-E % 14. -10. 1. -39. -41. 6. -11. 15. 64. -15.\par
\pard \li440\ri500\sl220\keepn\box\brsp100\brdrth Total acids= 329. Molecular weight= 36493. Hydrophobicity= 64.7\par
\pard\plain \s8\qj\fi-1140\li1140\sb80\sa280\sl240\tx1140 \f21\fs20 Figure 9.2\tab A worked example of calculating codon, base and amino acid compositions.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 3.\tab Accept "Show observed counts". The alternative is to have the counts for each amino acid type sum to 100.\par
4.\tab Accept "Define segments using keyboard". The alternative is to use an EMBL/GenBank feature table.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "From". The start of the segment to count over.\par
6.\tab Define "To". The end of the segment.\par
7.\tab Accept "+ strand". Alternatively the minus strand.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab The table will appear on the screen and the program will cycle round to step 5. When all segments have been defined a zero v
alue for "From" will instruct the program to display on the screen a table which is the sum of all the individual tables.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Name for codon table file". Give the name of the file in which to save the final table. \par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Plotting the base composition\par
\pard\plain \s4\qj\sa120\sl280 \f20 This function plots the base composition for each "window length" of the sequence. The frequency of any combinations of bases can be plotted.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot base composition".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select which combination of bases to plot. The default is A+T, but any single base or combination of bases can be used.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
Select "Odd window length". This is the size of window over which each count is made, it is "odd" so that the plotted point exactly corresponds to the centre of each window. The count is made over the window and then the window is moved on by 1 base, an
d the count repeated.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Plot interval". Especially when using long windows it is unnecessary to plot the results for every point along the sequence. A plot interval of 5 will mean the value for every fif
th point will be plotted. The plot will appear in the form shown in figure 9.3\par
\pard\plain \ri-100\sb360\sl220\keepn \f4\fs16 {\plain \fs16 {\pict\macpict\picw451\pich82
343affffffff005101c21101a00082a0008c01000affffffff005101c2070000000022000100010000a000a0a100a400020de801000a0000000000000000070001000122004f000100b223000021000101c123000023004e23000021004f0001230000a000a301000affffffff005101c22300b221000101c123004e21004f
0001a000a122003c000100ff2300fb2300fa2300f82300fa2300fb2300fe2300022300012300022301002300002300022300042300ff2300012300002300ff2300fe2300032300022300022300032300012300fd2302022300fe2300022300fe2300fd2300032300fd23000323000223000023000023000023000123000523
00fe2301002300ff2300fe2300ff2300002300012300002300fd2300002300032300022300002300fe2300002300ff2300fd2302002300032300fd2300fe2300fe2300002300002300022300032300022300002300012300002300022300002301012300fd2300022300fe2300002300ff2300fe2300002300032300002300
022300fe2300ff2300fe2300fd2302002300002300fe2300022300002300fe2300022300fe2300022300032300032300ff2300002300fe2300032302ff2300012300032300002300fa2300ff2300012300002300002300002300fb2300002300002300022300002300022301fe2300052300002300032300002300fe2300ff
2300002300fe2300032300ff2300fd2300002300012300ff2302012300032300ff2300fd2300002300062300fe2300022300fe2300ff2300fe2300022300002300fe2300ff2301002300002300012300022300fe2300002300022300fd2300012300fd2300002300022300002300fe2300002302022300012300022300fe23
00032300002300022300fe2300022300fe2300032300022300022300fe2300fe2301002300002300002300002300ff2300fb2300022300fe2300002300002300002300002300fd2300002300032300ff2302002300032300fe2300022300002300fe2300002300022300fe2300002300ff2300002300fe2300002300032302
002300022300fd2300012300032300002300ff2300002300fd2300fc2300022300002300022300fe2300022301012300002300022300032300012300002300022300fe2300ff2300fd2300032300fe2300002300fd2300022302012300002300002300022300fd2300012300002300022300fe2300f8230002230002230000
2300032300002300fd2300002300002300062300002301012300ff2300fe2300002300052300022300002300fc2300002300ff2300002300012300ff2300002300002300fe2302032300fd2300ff2300012300ff2300002300042300ff2300fe2300022300012300002300ff2300002300002301fe2300ff2300fe23000023
00ff2300012300002300002300022300002300012300022300002300002300002302012300022300fe2300002300022300fe2300002300002300002300ff2300fe2300032300022300fd2300002302012300002300fd2300022300fe2300ff2300fe2300022300012300ff2300002300002300002300012300ff2300002301
002300fe2300032300ff2300002300012300032300042300fe2300022300fe2300fe2300022300002300002302fd2300fe2300022300002300fe2300022300fe2300002300022300012300002300002300ff2300002300012301ff2300012300002300002300ff2300002300fe230000230000230002230000230000230000
2300fd2300012302002300022300fd2300fe2300002300ff2300fe2300022300002300012300032300ff2300042300002300002302002300022300fe2300022300fd2300002300fb2300002300002300022300fe2300022300002300012300022300012301002300042300002300002300002300002300002300fe2300fe23
00022300002300002300052300002300002302fe2300022300002300002300fe2300ff2300fc2300002300022300022300002300002300fe2300fe2300022301002300fd2300fe2300002300022300012300022300022300002300002300fe2300022300002300fb2300fe2302ff2300fe2300002300002300032300ff2300
032300012300002300022300fe2300002300022300fe2300002301002300ff2300012300fd2300002300002300002300ff2300002300002300002300fd2300002300002300012300ff2302012300032300002300032300022300022300032300fe2300002300002300ff2300002300fc2300002300002302ff230000230001
2300022300022300002300012300ff2300012300022300012300fd2300ff2300fe2300fe2301ff2300fe2300fd2300fd2300002300fe2300022300052300002300012300022300032300032300ff2300012300002300ff2300fe2300002300002302022300fe2300002300fe2300ff2300012300002300ff23000023000123
00ff2300fd2300002300012300ff2300fd2301fc2300fd2300022300012300002300ff2300002300fd2300032300032300002300fe2300ff2300002300032302022300002300002300012300022300002300002300012300ff2300fe2300032300002300ff2300012300032301002300002300022300002300fe2300ff2300
012300fd2300ff2300fe2300022300002300002300002300fe2302032300022300012300ff2300012300fd2300022300012300022300fb2300ff2300012300002300002300002302002300fd2300022300012300022300fe2300ff2300002300012300002300002300ff2300fd230001230000230000230103230000230000
2300022300002300012300022300fe2300002300002300022300fd2300012300ff2300012302ff2300fe2300022300fe2300ff2300002300012300fd2300002300022300002300002300002300002300002301012300022300fe2300ff2300012300002300022300fe2300002300ff2300012300002300002300ff23000023
02fe2300022300fe2300052300fe2300ff2300032300002300002300002300012300042300fb2300032300002300fd2301002300012300fd2300002300ff2300012300ff2300032300002300fd2300fd2300002300012300002300032302ff2300002300012300052300022300012300fb2300002300ff2300012300fd2300
002300022300fe2300022302012300ff2300fe2300002300032300fd2300002300032300fc2300012300002300032300ff2300012300022301fd2300fe2300ff2300032300fd2300012300fd2300002300002300002300022300012300ff2300032300fe2302032300002300022300002300fe2300fd2300002300ff230001
2300ff2300002300fe2300002300002300032300fd2301ff2300012300ff2300002300032300002300fb2300fd2300032300022300002300002300012300fd2300032302002300022300012300fd2300032300002300ff2300012300022300fe2300002300022300002300002300002300fe2300002300002300ff23000023
02012300ff2300012300002300002300002300fd2300002300002300002300fd2300fe2300022300022300fe2301032300032300ff2300012300022300fe2300002300002300002300ff2300002300fe2300022300012300032302fd2300002300022300012300042300032300fd2300fe2300fe2300002300022300fe2300
002300fc2300012300022301002300012300ff2300002300002300002300002300012300002300002300022300022300002300012300002302ff2300002300fe2300fd2300012300ff2300fe2300002300002300ff2300fe2300ff2300032300fe2300022301fe2300ff2300fe2300fe2300002300022300fe2300042300fe
2300022300002300012300032300022300fe2302002300022300fe2300002300022300fe2300002300ff2300002300fe2300002300022300fe2300002300ff2300012302ff2300002300002300002300012300002300022300002300002300012300022300012300022300002300fe2301002300022300fe2300002300fd23
00ff2300fd2300002300fe2300002300002300fd2300012300042300032302012300002300ff2300002300032300012300002300022300fd2300012300002300022300fe2300ff2300fe2301002300ff2300012300002300fd2300022300002300012300002300fd2300ff2300fe2300022300002300002302032300002300
fe2300022300012300032300ff2300032300002300fe2300022300002300022300002300012300002301ff2300012300ff2300002300012300022300012300002300ff2300002300fd2300002300fe2300fe2300ff2302fd2300002300032300012300002300022300002300002300002300022300fe2300fe2300ff230001
2300022302fe2300022300fe2300002300022300022300002300002300002300fc2300002300022300002300fe2300022301fe2300002300002300022300002300002300002300022300fc2300ff2300012300ff2300002300002300fe2302ff2300fd2300012300002300fb2300ff2300032300fe23000223000223000123
00002300002300032300022300fd2301012300022300fd2300002300002300002300012300022300fe2300002300fd2300022300fe2300022300012300fc2300012300ff2300012300002302022300002300002300002300012300022300012300022300fd2300fe2300022300012300fd2300022300012302002300002300
022300002300fd2300002300012300002300fd2300ff2300012300032300fd2300022300012301022300fe2300022300022300002300fe2300002300002300fd2300012300ff2300fe2300ff2300002300002300fe2302022300032300fe2300032300002300002300002300ff2300fe2300fd230000230002230000230001
2300002301002300ff2300002300012300ff2300012300022300002300002300012300042300032300fb2300022300fe2302002300fe2300fd2300002300ff2300fe2300022300012300002300002300022300002300012300022300022301002300002300fe2300fd2300002300012300002300ff2300fe2300ff23000323
00fe2300022300002300fe2302022300002300fe2300ff2300fe2300032300ff2300002300fe2300002300032300002300ff2300002300012300032302002300002300002300fd2300022300012300002300fd2300002300002300022300fd2300002300012300ff2301fe2300002300002300002300022300012300ff2300
012300022300fd2300fe2300002300002300002300ff2302012300002300ff2300012300022300042300ff2300fe2300002300002300ff2300032300012300002300ff2301002300002300fe2300ff2300032300002300fe2300002300002300032300ff2300032300fe2300002300002302ff230000230000230000230000
2300012300ff2300002300fe2300002300ff2300002300fd2300002300fe2300002301fe2300ff2300002300002300002300052300012300022300002300032300032300002300002300022300fe2302fd2300002300002300002300fb2300052300012300022300fe2300052300002300ff2300032300012300ff2302fd23
00002300012300ff2300fe2300002300002300fb2300022300012300fd2300ff2300002300002300fd2301012300022300fe2300032300022300012300ff2300002300012300ff2300012300002300fd2300fd2300002300002302032300ff2300032300002300fe2300022300fe2300022300012300002300ff2300012300
ff2300002300002300032300fe2300022300022300002301fe2300fe2300002300ff2300fd2300032300002300fe2300fd2300032300022300fe2300032300022300022302002300002300002300fe2300fe2300042300fe2300002300fe2300ff2300fe2300ff2300002300fe2300ff2301002300fe230000230002230003
2300002300012300022300012300ff2300002300fd2300fe2300022300fe2302002300ff2300fc2300002300042300012300032300fd2300002300022300012300022300fe2300ff2300002300012302002300ff2300002300002300012300022300032300fe2300002300002300022300fe2300022300fd2300fe23010023
00002300ff2300fe2300002300022300002300012300ff2300002300012300ff2300fe2300002300ff2302032300fd2300042300ff2300012300022300032300002300fe2300002300ff2300fb2300002300002300022301002300002300012300022300fd2300062300002300002300fe2300ff2300fe2300002300ff2300
012300ff2300012302fd2300022300002300002300012300002300002300002300022300012300002300022300002300fe2300022302002300002300fe2300022300fe2300ff2300002300012300022300fe2300ff2300002300fe2300022300002301012300002300ff2300002300002300012300002300ff2300fe230000
2300002300ff2300002300002300012302002300032300ff2300002300fe2300022300032300002300fe2300fd2300fd2300002300022300012300022301fe2300022300002300002300012300052300ff2300fe2300002300002300fe2300022300fe2300ff2300002300fe23020023000023000223000023000023000023
00012300ff2300032300022300fe2300fe2300002300002300002301002300002300002300022300002300002300fe2300022300022300fe2300002300fe2300ff2300012300022302022300002300002300fe2300022300002300002300002300002300002300002300002300012300002300002302022300fb2300022300
fe2300fd2300012300fd2300fd2300002300fd2300022300002300002300002300002300fe2300022300fc2300002300022300fe2301002300022300022300032300fd2300012300002300022300012300fd2300002300fd2300022300fe2300032302002300002300ff2300002300fe230000230000230000230002230000
2300012300ff2300032300fe2300002301002300002300002300002300ff2300002300012300032300002300022300012300002300ff2300012300ff2302002300fe2300002300ff2300fd2300fc2300022300002300032300fd2300052300fe2300022300012300032301002300022300fe2300022300fd2300002300fe23
00002300002300ff2300fd2300fe2300022300fe2300022300002302002300fe2300022300032300012300022300002300002300fe2300ff2300012300002300fa2300032300002302002300ff2300032300fe2300fd2300002300022300fe2300fe2300022300022300002300012300032300002301ff2300012300002300
022300002300fd2300fd2300002300fe2300032300022300012300002300ff2300002302032300fe2300022300fe2300002300022300fb2300002300022300fe2300032300022300fe2300ff2300fe2301032300ff2300012300fd2300002300002300002300002300022300fe2300022300fd2300032300002300002300fe
2302ff2300002300fc2300042300032300fe2300ff2300fe2300fe2300022300022300002300012300fd2300fe2301022300022300042300002300002300002300002300002300002300002300ff2300012300022300fd2300002302002300fe2300022300002300032300fe2300ff23000123000223000123000223000023
00002300fe2300002302042300fe2300fe2300002300042300fe2300032300ff2300012300002300ff2300012300ff2300002300fc2301002300022300002300fd2300002300002300012300002300022300002300002300fd2300032300002300fe2300ff2302fe2300ff2300012300002300ff2300002300002300012300
002300022300012300022300fb2300072300fc2301042300002300012300002300ff2300fe2300032300fd2300fb2300002300fd2300ff2300002300002300012302ff2300002300012300002300032300002300fd2300ff2300002300002300012300022300002300012300fd2300022300002300032300fe2300002302ff
2300002300fb2300052300002300fe2300002300002300002300002300022300032300fe2300fc2300002300fe2301032300ff2300012300ff2300fb2300002300002300032300022300002300012300032300ff2300012300002302052300fe2300002300ff2300fd2300002300002300002300fe23000223000023000023
00002300002300002301012300ff2300002300fe2300ff2300002300002300012300ff2300002300002300002300002300042300022302fe2300ff2300012300ff2300fe2300ff2300fe2300fd2300fe2300ff2300012300002300022300002300032301032300002300ff2300042300002300002300ff2300fe2300002300
ff2300012300022300fe2300fd2300fd2300002302002300fe2300002300ff2300fb2300012300ff2300032300fe2300022300032300002300002300032300002302042300002300012300002300ff2300002300012300022300fb2300002300032300fd2300002300022300fe2301022300fe230002230000230001230003
2300fd2300fd2300032300ff2300002300002300002300fe2300fe2302052300ff2300002300002300fc2300002300022300002300002300fd2300012300fd2300022300fe2300ff2300fc2301002300fd2300022300fe2300002300ff2300012300002300022300fe2300032300002300ff2300012300ff2302fe23000023
00fd2300ff2300fe2300002300002300002300002300002300022300012300032300022300012301ff2300032300002300022300fe2300fe2300ff2300fd2300002300fe2300fd2300fe2300022300022300002302012300022300012300ff2300042300002300ff2300012300fd2300ff2300fe2300ff2300012300fd2300
fe2302022300fe2300052300022300012300002300002300032300022300002300002300002300fd2300002300002300012301fd2300002300fd2300022300fe2300022300fe2300ff2300fe2300002300002300002300002300022300002302032300002300012300002300022300002300012300fc2300fe2300022300fe
2300ff2300fe2300fe2300022301022300012300ff2300012300002300032300022300002300002300fe2300002300002300022300012300ff2300fb2300022300fe2300022300002302002300fe2300fd2300032300fd2300fe2300ff23000023000123000223000223000323000123000523000523020523000323000123
00ff2300fe2300002300fd2300fd2300002300fe2300002300002300022300012300022300032301022300002300fe2300002300ff2300fe2300fe2300ff2300fe2300ff2300fe2300fd2300002300002300fe2302022300002300012300fd2300002300002300022300fe2300002300022300002300fe2300022300fd2300
012301002300022300052300fe2300fd2300002300012300022300022300032300002300fd2300012300022300012302022300fd2300002300fe2300ff2300012300ff2300fe2300022300032300012300ff2300002300fe2300ff2300fe2301fe2300002300ff2300fd2300032300fd2300002300032300fe2300032300fd
2300ff2300002300012300002302022300012300022300fe2300042300fe2300022300fe2300022300fc2300fd2300ff2300012300fd2300032302022300012300fd2300002300022300012300ff2300012300ff2300002300fb2300022300012300002300022301012300ff2300fe23000223000323000023000023000223
00fe2300fe2300022300022300002300012300022302fe2300ff2300032300002300fe2300002300002300002300002300002300ff2300002300002300fe2300fe2300ff2301002300002300fe2300022300fd2300002300002300032300002300fd2300002300012300ff2300012300002302ff2300002300002300012300
002300ff2300012300022300002300012300ff2300002300002300012300022301022300fc2300022300002300002300002300022300fe2300fd2300002300fd2300012300022300fe2300002302ff2300032300012300002300052300002300002300fd2300032300022300fb2300fe2300002300ff230000230203230000
2300002300fd2300032300022300002300fe2300022300fe2300fe2300022300fd2300002300fe2300002301002300ff2300012300002300ff2300fe2300022300002300fe2300ff2300012300002300002300022300012302ff2300012300022300012300fd2300022300012300002300022300002300fd2300fe2300ff23
00042300ff2300fd2300002300002300fe2300022301032300002300fd2300002300002300032300fe2300052300002300fd2300fe2300032300002300022300002302fe2300002300022300032300002300ff2300032300fe2300ff2300012300022300012300ff2300012300042300fc2301ff2300012300022300002300
fe2300022300022300fe2300002300fe2300022300fe2300fd2300022300fe2302ff2300fc2300002300002300ff2300002300fd2300fe2300fd2300fd2300012300ff2300012300ff2300002302012300042300032300002300fd2300002300042300022300012300042300002300002300032300002300032301fe230000
2300002300002300022300fe2300002300022300fe2300042300002300002300fe2300002300fe2302ff2300fe2300002300ff2300002300fe2300002300002300002300002300002300022300fe2300022300002300032301fe2300002300022300012300fd2300022300012300fd2300ff23000323000023000023000023
00032300002302002300fe2300022300022300fe2300002300022300fe2300fe2300002300042300002300fb2300fd2300002302fe2300052300fe2300002300fd2300002300022300012300022300012300fc2300012300ff2300002300012301022300fe2300022300012300002300ff2300fe2300022300002300012300
002300ff2300002300002300012302042300fe2300fd2300fe2300032300002300002300002300022300002300032300022300002300fe2300002300ff2301002300012300ff2300fc2300fd2300022300002300002300012300002300002300002300002300022300002302022300fc2300fd2300022300002300fd230004
2300ff2300fe2300fd2300002300022300fc2300022300022301fc2300ff2300032300022300012300fd2300002300fd2300012300002300ff2300002300002300012300042302fe2300032300002300002300002300ff2300fe2300002300fe2300002300fd2300ff23000123000223000123000223020023000223000123
00002300022300fe2300022300002300012300fd2300002300002300022300032300022300fe2300fe2300002300002300002301002300022300fd2300012300022300002300002300022300002300fc2300002300002300002300ff2300012302ff2300fe2300002300022300002300002300032300fe2300ff2300032300
022300fe2300002300002300002301fe2300022300022300012300ff2300002300002300012300fd2300022300002300fe2300002300002300022302002300002300002300012300022300fe2300ff2300fc2300002300fd2300022300fd2300002300fc2300042300fc2301022300022300002300fe230002230000230001
2300ff2300012300022300002300032300002300022300fe2302002300fe2300022300fd2300012300ff2300012300002300002300022300002300002300022300012300ff2302fc2300002300002300022300fe2300ff2300032300fd2300002300002300002300002300fd2300012300002301ff23000023000023000323
00fb2300032300032300ff2300002300012300022300002300002300022300012300fd2302002300032300fd2300fe2300022300fe2300022300fe2300022300fe2300002300002300042300fc2300042301002300fe2300002300022300002300002300002300002300012300ff2300012300022300fe2300022300fe2302
022300012300fd2300022300fe2300ff2300032300fb2300032300ff2300fe2300022300fe2300022300032301fe2300032300fd2300002300002300002300022300fd2300002300012300fd2300022300032300012300fd2302032300002300002300002300022300fd2300002300fe2300002300ff2300012300ff230000
2300fe2300022300fe2302032300fd2300022300fe2300fe2300042300002300002300002300002300002300fe2300022300002300002301fe2300fe2300022300022300012300022300002300002300fe2300022300002300002300002300fd2300002302002300002300fc2300022300fe23000223000223000023000023
00002300fe2300022300012300022300fb2301002300fe2300022300002300022300002300012300ff2300012300022300012300002300002300fd2300ff2302002300fe2300fe2300002300002300ff2300002300032300002300002300022300002300fe2300002300032300ff2300012300002300022300002300012302
022300012300fc2300002300002300fe2300ff2300fe2300022300fc2300002300ff2300012300ff2300fe2301ff2300002300fc2300022300fd2300002300012300002300002300052300fd2300002300022300032300fe2302002300002300032300022300002300002300022300fb2300002300012300022300fd230001
2300ff2300002301fd2300012300022300fb2300022300012300002300fd2300022300012300002300002300022300fe2300022300fe2302ff2300002300fe2300002300022300fe2300002300fe2300ff2300012300002300002300002300fc2300012301002300002300002300022300032300fd23000123000423000123
00022300002300002300fe2300ff2300002302fe2300002300002300fe2300002300002300022300022300002300032300fe2300ff2300012300002300022302fd2300002300012300fd2300002300032300002300002300ff2300002300fe2300002300022300002300fe2301002300002300002300032300022300002300
012300ff2300002300002300002300fe2300ff2300002300002300002302002300012300002300002300ff2300002300002300012300002300002300ff2300fe2300022300002300002301fe2300002300fe2300022300002300002300fe2300002300002300042300002300fe2300002300022300002302002300fe230000
2300fb2300022300012300ff2300012300fd2300002300022300fe2300022300fe2300002301032300ff2300052300032300002300032300002300022300002300fe2300022300fe2300fe2300ff2300002302002300002300012300002300002300022300002300002300032300032300002300ff2300002300fe23000223
00002302002300002300fe2300002300ff2300012300002300002300002300022300fd2300fc2300002300022300002301022300fe2300002300002300002300002300002300fe2300fc2300fc2300022300fd2300032300fb2300032302fc2300fe2300022300032300fe2300002300032300042300002300012300022300
012300ff2300fe2300022301fe2300fd2300032300002300032300ff2300032300fe2300fd2300002300002300fd2300022300012300032300ff2300032300fd2300012300002300042302fe2300fd2300002300012300ff2300002300fe2300ff2300fc2300022300022300fe2300fd2300fe2300032302ff230000230001
2300ff2300002300012300022300022300002300012300002300002300002300002300032301ff2300002300fb2300002300002300002300002300fd2300fd2300012300ff2300012300022300fe2300022302032300022300032300002300002300fe2300032300002300042300fe2300fe2300fd23000223000123000223
01fe2300002300002300fd2300022300002300032300fd2300fd2300002300fe2300002300022300fe2300fd2300fe2302ff2300002300012300002300022300002300012300002300002300022300032300fd2300fe2300ff2300fe2301022300002300012300002300ff2300002300032300032300022300012300022300
002300022300002300012302ff2300fe2300fe2300022300fe2300002300fd2300022300012300002300002300042300fc2300022300022302012300002300ff2300002300002300fe2300fe2300022300002300fe2300ff2300002300012300ff2300002300002301002300fe2300002300022300002300fe2300ff230001
2300ff2300012300fd2300fe2300002300ff2300012302042300012300032300002300ff2300fe2300ff2300042300ff2300fd2300fe2300fe2300ff2300032300002301032300ff2300fe2300002300022300002300012300002300002300002300002300022300002300002300002302012300ff2300002300fe2300ff23
00032300fe2300002300002300fd2300002300052300002300fe2300002301ff2300fc2300002300022300fb2300002300002300032300002300002300022300022300012300032300022300022302fe2300022300fe2300022300002300012300002300ff2300012300002300ff2300012300022300012300fd2302002300
ff2300fc2300ff2300002300fe2300ff2300fe2300032300002300002300022300002300012300022301022300012300fd2300fe2300002300002300002300002300002300fd2300ff2300012300002300ff2300002300fe2300032300002300fd2300022302012300002300002300002300022300fe230000230003230000
2300fd2300fd2300fe2300ff2300032300022301002300002300002300002300062300052300002300032300fd2300002300002300002300002300002300fe2300002302ff2300012300022300002300012300022300fe2300032300002300002300002300fd2300002300002300002301ff2300fe2300ff2300002300fc23
00022300002300fe2300022300002300022300002300002300032300fe2302022300002300fe2300032300ff2300012300ff2300002300012300ff2300012300002300ff2300fe2300032302002300ff2300012300ff2300012300002300032300ff2300fe2300ff2300012300ff2300012300ff2300002300fe2301002300
ff2300012300022300fe2300002300002300002300002300fd2300032300022300fd2300002300012302fd2300022300012300ff2300002300fe2300002300022300fc2300022300fe2300002300042300012300002301002300022300012300002300002300002300ff2300fe2300002300022300fd2300012300fd230002
2300012302ff2300002300fe2300fe2300002300ff2300032300fd2300fe2300022300fd2300002300012300002300002302002300002300002300ff2300032300032300fe2300002300042300fe2300022300002300012300022300fe2300002301002300ff2300fe2300022300002300fe23000023000223000023000023
00fe2300022300002300002300012302002300ff2300012300022300fe2300022300002300fe2300002300ff2300002300002300fc2300022300fe2301ff2300012300ff2300fe2300002300002300022300012300002300022300fe2300022300002300fe2300ff2302fe2300ff2300032300032300fd2300012300ff2300
002300032300022300002300fe2300fe2300042300002301fe2300022300fc2300002300022300fe2300ff2300002300fe2300002300ff2300002300002300fe2300022300012302ff2300002300032300032300022300002300002300002300002300002300fc2300002300ff2300fe2300002302002300002300fd230003
2300022300012300002300002300022300fe2300022300022300fc2300022300fe2300002300042300fe2300022300012301022300002300fe2300fd2300022300002300fc2300002300ff2300fd2300002300012300032300002300002302002300002300002300022300022300fe2300002300002300fe23000223000523
00fe2300022300fe2300002300ff2301012300ff2300fe2300002300022300fc2300022300002300002300002300002300022300fe2300002300022302fe2300fe2300fd2300002300002300022300fe2300022300002300fd2300032300012300042300002300002301032300002300012300002300ff2300fe2300002300
002300ff2300fe2300022300fc2300022300022300012302ff2300002300002300012300ff2300012300fd2300fe2300002300042300fe2300fe2300022300fd2300002302012300ff2300012300ff2300fe2300022300002300012300002300022300fe2300002300fd2300032300ff2300002301002300002300002300fe
2300ff2300012300002300002300032300002300ff2300012300ff2300012300002302022300fe2300002300ff2300fe2300022300012300002300ff2300fe2300002300022300fe2300002300002301ff2300fb2300012300022300022300fe2300002300002300022300012300032300022300fd2300012300022302fe23
00022300002300fe2300ff2300012300022300002300fe2300002300002300022300fe2300002300002301022300022300032300fe2300ff2300012300022300012300002300002300ff2300fe2300ff2300012300ff2300fc2302002300002300002300fd2300ff2300012300022300012300002300022300052300fe2300
002300032300002302ff2300032300002300fe2300fd2300002300002300022300002300002300fe2300002300022300012300ff2301012300ff2300fd2300002300002300002300fe2300022300fc2300ff2300fd2300fe2300fe2300042300fc2302002300ff230003230000230005230003230002230000230001230000
2300022300fd2300002300002300002300fb2301002300fe2300ff2300002300032300012300ff2300002300032300022300fe2300032300032300ff2300fe2302022300012300ff2300fd2300012300ff2300fe2300022300012300ff2300002300012300032300fc2300012300ff2300fc2300fd2300fd23000023020023
00fd2300032300002300022300032300002300052300012300002300022300fe2300032300fc2300002301012300fd2300002300fe2300002300022300fd2300012300022300fe2300022300fe2300002300ff2300002302002300fe2300022300002300012300022300fd2300002300002300032300002300fe2300002300
ff2300012300fd2301032300022300022300002300032300002300002300002300012300002300fd2300032300fc2300002300002302002300002300002300012300022300fd2300032300fe2300ff2300012300002300ff2300fe2300002300002301002300002300032300fd2300fe2300022300022300fe2300fe230005
2300fd2300002300022300012300002302ff2300002300fe2300fe2300002300042300002300fc2300022300022300032300032300002300032300002300fe2302002300002300002300fc2300fe2300fd2300fe2300fd2300ff2300012300ff2300002300032300fe2300022301fe2300002300ff23000023000023000023
00fe2300002300002300032300002300002300002300022300012302ff2300002300002300fe2300032300022300fe2300ff2300fe2300022300002300fd2300002300fb2300fd2301002300fd2300012300002300022300fe2300032300032300052300032300022300002300002300022300012302002300ff2300002300
fc2300042300fe2300fe2300ff2300fe2300fa2300fd2300fe2300002300022300002300002301002300012300002300032300ff2300012300ff2300fd2300002300fe2300022300002300002300002300fe2302002300032300002300002300ff2300012300ff2300002300002300032300fe230002230003230000230000
2302002300032300002300ff2300fe2300fe2300fc2300012300022300012300002300022300002300052300002301002300fe2300ff2300fe2300002300002300fe2300ff2300032300022300fe2300002300002300fe2300002302002300002300ff2300002300032300032300ff2300fe2300032300002300002300fd23
00002300002300002300032300002300ff2300002300002300002301012300002300022300fd2300002300fc2300022300fe2300052300022300002300002300002300012300002302002300002300002300ff2300002300002300012300ff2300012300022300fe2300fd2300ff2300fe2300002302fd2300fe2300032300
002300022300fe230002230003230000230008230003230003230005a0008da00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa340\sl240\tx1140 \f21\fs20 Figure 9.3\tab A typical base composition plot. This is an A+T plot for bacteriophage Lambda and shows that one half is A+T rich and the other G+C rich.\par
\pard\plain \s6\sb240\sa100\sl280\tx560\tx860 \b\f20 2.6\tab Searching for anomalous compositions\par
\pard\plain \s4\qj\sa120\sl280 \f20
This "search" is performed by comparing a standard composition against each segment of the sequence and plotting the difference. The difference between the observed and expected composition at each point is expressed as the chi-square value.
Any one of the base, dinucleotide or trinucleotide compositions can be used as the standard. No expected level of divergence is used so the program always displays the results so that the plots fill the alloted space on the screen. At the end the observed
range is displayed.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot dinucleotide composition differences as chi squared". Alternatively select base or trinucleotides.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Start". Define the position of the first base to be used in the standard.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "End". Define last base of the standard. The default standard region is the whole sequence.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Odd window length". \par
5.\tab Define "Plot interval".\par
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 9.4\par
\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw448\pich119
06f6ffffffff007601bf1101a0008201000affffffff007601bf0900000000000000003100000000007501be98002400000000004e012000000000004e011f00000000007501be000102dd0006007fdfff00fc0a0040fc000002e50000040a0040fc000002e50000040a0040fc000003e50000040a0040fc000003e5000004
0a0040fc000003e50000040b0040fc00010380e60000040b0040fc00010280e60000040b0040fc00010280e60000040b0040fc00010240e60000040b0040fc00010240e60000040d0040fc0003027ffff8e80000040d0040fc000302000008e80000040d0040fc000302000008e80000040d0040fc000302000008e8000004
0d0040fc000302000008e80000040d0040fc000302000008e80000040d0040fc000304000008e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040e
0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc00030400
0004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000002e9000102040f0040fc000304000002ea00028002040f0040fc000304000002ea00028002040f0040fc000304000002ea0002a006040f0040fc000304000002ea00
02a00604130040fc000304000002f6000080f60002a00604130040fc000304000002f6000080f60002a00604130040fc000304000002f6000080f60002a00544140040fc000308000002f700010180f60002a00564150040fc000308000002f700010180f70003017005641a0040fc000308000002fd000003fc00010180f8
000420015005441a0040fc000308000002fd000003fc00010140f8000420015005441a0040fc000308000002fd000005fc00010240f8000420015005441e0040fe000620000800000202fe000005fd0002400240f8000430015005841e0040fe000620000800000202fe000005fd0002400240f8000430015005841f0040fe
000620000800000202fe00010480fe0002400240f8000450015005841f0040fe0011200008000001060002020480000001c00240f8000450015005841f0040fe0011300008000001060002020480000001c00240f800045001100584230040fe0011300008000001050002020480000001c00440fd000018fd000450011005
84230040fe0011300008000001050006060480020001a00440fd000014fd00044801080584230040fe0011300008000001050006060480020001200440fd000017fd0004c801080984241540018000500008000001090006060880060001200440fd0009110008800088020809842415400160005000080000010900050608
80050002200420fd0009110008800108020809042523400120005000080000010900050608800500022004200180080021000880010802080904252340012000500008000001110005050880050002200420028008002080088001080208090425234002200050001000000111000905888005040221842002800810208018
a001080208090425234002200050001000000111000905888009040221882002800810608018a00108020a1104250640022000500010fe001990800909884008870211882004401410608015600108020a1004250640021000500010fe001990800989884008950211882004401430808075600105020a1004250640021000
500010fe001990800889484008f90212882004401428808045500105020a1004250640021000500010fe0019a080088948400809020a88200440142880804550010502061004250640021800900010fe0019a080108948400808840a48100440142880804550010704051004250640041800880050fe0019a08010b9504090
08840a4810044012ac8040455101070405a004250640041802880050fe0019a080106050409000840a5010042022c50040835101068405a00425064004084b0800d0fe0019a08010605048900094065010082021430040821281028405a00425064004064d0dffa0fe0019a04010403048d000b40450104831e1030044800e
8900840560042506780406b4020020fe00196040204020555000b804700c48124003004480088900840560042506480805b4020020fe000d40402040203560006804100ca80cfe00087a80085600880040042203444805b0fb000d405fa0000033600048000003b80cfe00080a80005600d000400406007fdfff00fc02dd00
a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 9.4\tab An anomalous composition plot. This shows an immunoglobulin switch region and the plateau corresponds to a segment composed entirely of A and G bases.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.7\tab Search for anomalous word usage\par
\pard\plain \s4\qj\sa120\sl280 \f20 This function is designed to examine the abundances of short words in a nucleotide sequence to see if particular ones are either under or over repre
sented (3). It compares the observed and expected frequencies and plots them for each segment of the sequence. There has been some work on the relative abundances of CG dinucleotides in eukaryotic sequences (e.g. reference 4) and this routine can be used t
o examine such biases or any others that might be of interest.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot observed-expected word usage".\par
2.\tab Define "String". That is the word to search for. The default is CG.\par
3.\tab Define "Odd window length".\par
4.\tab Define "Plot interval".\par
5.\tab Define "Maximum plot value". Define the maximum expected value for the plot.\par
6.\tab Define "Minimum plot value".\par
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 9.5.\par
\pard\plain \ri-60\sb200\sl220\keepn \f4\fs16 {\plain \fs16 {\pict\macpict\picw453\pich122
0800ffffffff007901c41101a00082a0008c01000affffffff007901c4070000000022000100010000a000a0a100a400020de801000a000000000000000007000100012200770001008a23000021000101c32300002300762300002100770001230000a000a301000affffffff007901c423008a21000101c3230076210077
0001a000a120003b0001003b01c322003b00011a082300022302002300fe2302022300fe2301fe2300022300fe2302022300fe2301022300002302002300002300fe2302022300002301022300002302fc2300042300fe2301fe2300ff2302012300022301002300fd2300002302002300002302002300002301fe23000223
00002302fe2300022301fd2300032300002302fd2300002302fe2300002300002301ff2300002302fe2300002301002300002300022302012300022301002300012302002300002300022302002300002301002300002302002300002301002300012300022302fe2300022302fe2300002301002300022300002302fe2300
022301002300002300fe2302022300002301002300002300022302002300012302002300022301002300002300012302002300022301fe2300002302022300002300002302002300002301002300032302002300002300ff2301012300002302ff2300002301002300002300012302ff2300002302012300002301022300fe
2300002302022300fe2300002301fd2300002302022300002302012300002300ff2301fc2300002302002300ff2301012300002300002302022300fd2302002300fe2301ff2300032302fe2300002300022301002300012302002300002301ff2300002300002302002300012302ff2300012301ff2300fe23000223020123
00002301002300022300002302fe2300022302fd2300fe2300002301022300002302002300002301002300fe2300022302002300012301002300022302fe2300002300002302002300002301002300022302002300002300002301002300002302002300002302002300032300fb2301002300022302002300022301012300
002300022302002300002300002301012300022302012300ff2302002300002300fe2301002300002302002300002301002300022300002302002300fe2302022300012301ff2300002302fe2300ff2300002301fe2300022302fe2300002301ff2300002300002302002300fe230200230000230100230000230002230201
2300ff2300012301022300002302fe2300002302002300002300002301022300fe2302fd2300002301002300002300022302002300fe2301022300012302022300002300fe2302022300002301002300fe2302ff2300002300012301ff2300042302002300002302002300ff2300012301fd2300002302022300002301fe23
00002300002302002300002300002302002300002301002300002302002300ff2300002301fe2300022302fe2300022301fe2300fe2300002302002300022302fe2300022301fe2300ff2302fe2300022300002301002300002302002300012302022300002300fe2301022300fe2302fd2300022301002300fe2300002302
022300002300002301012300ff2302fe2300032302002300ff2300002301012300ff2302fe2300002301002300002300002302ff2300012302002300ff2301002300012300002302fd2300022301002300fe2302032300002300002301022300fe2302022300012302ff2300002300012301ff230000230200230000230000
2301002300012302ff2300002300012302002300ff2301002300002302012300002300022301002300002302002300002301022300002302012300002300002302ff2300012301022300fe23020023000023000023010023000223020023000123020223000123000023010223000023020023000023010023000023000223
02012300002300022301002300002302002300012302002300002300ff2301002300002302002300002301012300ff2300002302002300002302012300002301002300002300002302002300002301ff2300002302002300012300002301ff2300fe2302032300ff2302002300002300002301002300002302002300fe2300
002301022300012302002300002300ff2302002300002301002300002302002300002300012301002300002302002300002302022300fd2301fe2300022300012302ff2300012301002300ff2302002300012300fd2301022300fe2302002300002302022300002300002301fe2300002302fd2300fe2300002301022300fe
2302002300002300022302002300032301ff2300002302002300012300ff2301012300002302032300002301022300022300fb23020123000023020023000023010023000223000023020023000023010223000023020023000023000123020023000023010223000023020023000023000023010023000023000023020023
00012301002300002302ff2300012300002302022300002301fe2300022302002300012300002301042300002302002300012302002300002301002300002300002302002300002301002300ff2302fe2300fe2300002301ff2300002302032300fe2302022300002300fe2301ff2300002302002300002300fe2301002300
002302002300fd2300ff2302002300012301fb2300002302002300fd2300ff2301fe2300032302ff2300002301fe2300fd2300012302ff2300012302fd2300ff2301002300002300fe2302ff2300fe2301002300002302002300fe2300002302002300002301002300ff2302002300fe2300ff2301fe2300fd2300002302fe
2300ff2302fe230000a0008da00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 9.5\tab
A plot of anomalous word usage. This shows a plot of CG usage for the Human CMV immediate-early region. The frequency of CG is much lower than would be expected from the composition.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.8\tab Calculate codon constraint\par
\pard\plain \s4\qj\sa120\sl280 \f20 This method measures the level of constraint imposed on a sequence by coding for a protein. The codon constraint is the difference between the observe
d codon improbability and the mean improbability for a sequence of the same composition. That is it is a measure of the codon bias and the program performs the calculation over windows of length 99 codons. See reference 5. The user can select segments to a
nalyse either by defining them on the keyboard or by using an EMBL/GenBank feature table. The result for each selected segment, which is simply a single number, is displayed.\par
\pard\plain \s7\qj\fi-560\li560\sa80\sl280\tx560 \f20 1.\tab Select "Calculate codon constraint".\par
2.\tab Accept "Define segments using keyboard".\par
3.\tab Define "From". The start of the segment.\par
4.\tab Define "To". The end of the segment.\par
5.\tab Accept "+ strand".\par
\pard\plain \s4\qj\sa120\sl280 \f20 The result will be displayed, and the program will ask for the next segment to be defined. \par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.9\tab Searching for stem-loop structures\par
\pard\plain \s4\qj\sa120\sl280 \f20 This routine finds simple putative stem-loop structures having a minimum number of base pairs in their stems. Results can be listed or plotted.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search for hairpin loops".\par
2.\tab Define "Minimum loop size".\par
3.\tab Define "Maximum loop size".\par
4.\tab Define "Minimum number of base pairs"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Reject "Plot results". The alternative writes out the stem-loops as shown in figure 9.6. The plotted output marks the position of each stem, the height of the mark showing the length of the stem.\par
\pard\plain \li3480\ri3940\sb200\sl220\box\brsp100\brdrth \f4\fs16 g\par
\pard \li3480\ri3940\sl220\box\brsp100\brdrth g.t\par
t.g\par
c-g\par
a-t\par
t.g\par
t.g\par
g-c\par
t.g\par
g.t\par
g.t\par
t.g\par
t.g\par
g-c\par
t.g\par
tggcga gttttaa\par
\pard \li3480\ri3940\sl220\keepn\box\brsp100\brdrth 843\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 9.6\tab A typical textual display from the routine for finding simple hairpin loops.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.10\tab Searching for long range inverted repeats\par
\pard\plain \s4\qj\sa120\sl280 \f20 This method finds inverted repeats. It allows for no mismatches, insertions or deletions within the matching segments.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find long range inverted repeats".\par
2.\tab Accept "Plot results". The alternatve lists out all the matching segments.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Start". The beginning of the region to analyse. In general the whole sequence will be analysed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "End".\par
5.\tab Define "Minimum inverted repeat". The length of the minimum match.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The results will now be plotted in an unusual way as shown in figure 9.7 in which the positions of matching segments are joined by rectangular lines.\par
\pard\plain \li100\sb200\sl220\keepn\box\brsp20\brdrth \f4\fs16 {{\pict\macpict\picw445\pich118
0448ffffffff007501bc1101a0008201000affffffff007501bc0900000000000000003100000000007401bb98001e00000000003d00f000000000003d00ec00000000007401bb000102e3000701001fe6ff00c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c0070100
18e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c00a00
7ff1ff00c0f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00e007ff5ff00e0fe000040f60000c00f017818fb0000
01f4ff00f0fb0000c019017818fb000501c1800000e0fe000040fe00017030fb0000c019017818fb000501c1800000e0fe000040fe00017030fb0000c019017818fb000501c1800000e0fe000040fe00017030fb0000c01502781807f7ff00e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040
fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1
800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01102781804fc000001f5ff01f030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c180
0000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01c1678
1804000007ffffe1c1800000f0006000400008007030fb0000c01c167ffffc000007ffffe1c1800000f0006000400008007030fb0000c01c16781804000007ffffe1c1800000f0006000400008007030fb0000c01c16781804000007ffffe1c1800000f0006000400008007030fb0000c01c1678180407fe07ffffe1c18000
00f0006000400008007030fb0000c01c1678180407fe07ffffe1c1800000f0006000400008007030fb0000c01c1678180407fe07ffffe1c1800000f0006000400008007030fb0000c002e300a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl20\tx1140 \f21\fs20 Figure 9.7\tab
A plot of direct or inverted repeats. Each matching segment is joined by a rectangular line. Here we show the direct repeats of at least 25 bases in a mouse immunoglobulin switch region.\par
\pard\plain \s6\sb120\sa40\sl280\tx560\tx860 \b\f20 2.11\tab Searching for long range repeats\par
\pard\plain \s4\qj\sa120\sl260 \f20 This method finds direct repeats. It allows for no mismatches, insertions or deletions within the matching segments.\par
\pard\plain \s7\qj\fi-560\li560\sa80\sl260\tx560 \f20 1.\tab Select "Find long range repeats".\par
2.\tab Accept "Plot results". The alternatve lists out all the matching segments.\par
\pard \s7\qj\fi-560\li560\sa80\sl260\tx560 3.\tab Define "Start". The beginning of the region to analyse. In general the whole sequence will be analysed.\par
\pard \s7\qj\fi-560\li560\sa80\sl260\tx560 4.\tab Define "End".\par
5.\tab Define "Minimum repeat". The length of the minimum match.\par
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 The results will now be plotted in an unusual way as shown in figure 9.7 in which the positions of matching segments are joined by rectangular lines.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.12\tab Searching for repeated words\par
\pard\plain \s7\qj\sa120\sl260\tx540 \f20 \tab This function can be used to examine the frequencies of repeated words within a sequence. It finds all words that occ
ur more than once. A "word" is a particular sequence of bases so we are dealing only with exact repeats. The user selects a minimum word length and the program finds all words of that length that occur more than once. Then it "follows" each repeated word u
ntil it becomes unique. For each word length it can report the number of different repeated words, the number of occurrences of each word, and their actual sequences and positions.\par
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 1.\tab Select "Examine repeats".\par
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 2.\tab Define "Minimum word length". The maximum expected and observed word lengths are displayed.\par
3.\tab Define "Minimum word length for display of repeated word frequencies". The number of different repeated words of each length is listed.\par
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 4.\tab Define "Minimum frequency for display of repeated words". \par
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 5.\tab Define "Minimum word length for display of repeated words". All words occurring this number of times and of this given word length will be displayed.\par
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 \par
\pard\plain \sl220\box\brsp100\brdrth \f4\fs16 {\f22\fs18 Expected length of longest repeat 12\par
}\pard \sl220\box\brsp100\brdrth {\f22\fs18 ? Minumim word length (1-6) (6) = \par
Working\par
Memory used in bytes 75164. Length of longest repeat 13\par
? Show repeat frequencies for words of at least length (6-13) (13) = 10\par
For length 10 the number of different repeated words is 86\par
For length 11 the number of different repeated words is 21\par
For length 12 the number of different repeated words is 5\par
For length 13 the number of different repeated words is 2\par
? Show repeats for words of length (6-13) (13) = 10\par
? Show repeats for words occuring with frequency (2-9999) (2) = 3\par
aaggcatcat\par
occurs at 276\par
occurs at 969\par
occurs at 6938\par
gtctggcggc\par
occurs at 1891\par
occurs at 4714\par
occurs at 7250\par
? Show repeats for words of length (6-13) (13) = 12\par
? Show repeats for words occuring with frequency (2-9999) (2) = \par
gttactggtggt\par
occurs at 641\par
occurs at 851\par
aaaggcatcatg\par
occurs at 968\par
occurs at 6937\par
aaggcatcatgg\par
occurs at 969\par
occurs at 6938\par
ttactggtggtg\par
occurs at 642\par
occurs at 852\par
ctgctgggccgt\par
occurs at 3477\par
occurs at 6424\par
}\pard \sl220\box\brsp100\brdrth {\f22\fs18 ? Show repeats for words of length (6-13) (13) =!\par
}\pard \sl220 {\f22\fs18 \par
}{\f22\fs20 Figure 9.8 Typical output from "Examine repeats".\par
}\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 \par
2.13\tab Searching for possible Z DNA\par
\pard\plain \s4\qj\sa60\sl260 \f20
The program contains three algorithms for searching for sequences with the potential for forming Z DNA. In varying ways they look for segments of alternating purines and pyrimidines and they all plot their results. A typical result is shown in figure 9.9.
\par
\pard\plain \sl220\keepn \f4\fs16 {{\pict\macpict\picw447\pich119
0512ffffffff007601be1101a0008201000affffffff007601be0900000000000000003100000000007501bd98002400000000004e012000000000004e011f00000000007501bd000102dd0006007fdfff00fc060040df000004060040df000004060040df000004060040df000004060040df000004060040df0000040600
40df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040
df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df0000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001
fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080
f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041702400040f6000001fb000380000004fb000440002000041702400040f6000001fb000380000004fb000440002000041702400040f6000001fb000380000004fb000440002000041702400040f6000001fb00
0380000004fb000440002000041802400040f600010180fc0003c0000004fb000440002000041802400040f600010180fc0003c0000004fb000440002000041802400040f600010180fc0003c0000004fb0004400020000421044000400020fc0005020002000181fc0006c0004004000440fe000440003084142104400040
0020fc0005020002000181fc0006c0004004000440fe0004400030841421044000400020fc0005020002000181fc0006c0004004000440fe0004400030841421044000400020fc0005020002000181fc0006c0004004000440fe0004400030841422044000c00030fc0005020003000281fd0007014000c006000440fe0004
600051843c22044000c00030fc0005020003000281fd0007014000c006000440fe0004600051843c22044000c00030fc0005020003000281fd0007014000c006000440fe0004600051843c23044000c01430fc001903004300028181020042014000c0060146600040006404d3563c23044000c01430fc0019030043000281
81020042014000c0060146600060006404d3563c23044000c01c30fc001903004300028181020042014000c00601e6600060006406d3563c23044000c01c30fc001903004300028181020042014000c00601e6600050006406d3563c23044000c01e28fc0019030062800282818600a3014000c00601e6a00088006406d55e
3c23044000c01628fc0019030062800282818600a3014000c0060156a00088006405d55e3c23044000c01628fc0019030062800282818600a3014000c0060156a00084006405d55e3c20045ffffff7effaff03fefffefefeff02bfff7ffdff095fbfff87fffffddd7ffc060050df000004060060df000004060060df000004
060060df000004060040df00000406007fdfff00fc02dd00a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 9.9\tab A plot of predictions for potential Z DNA containing some high peaks produced by regions of alternating purines and pyrimidines.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Whenever the program reads a sequence file it always displays the base composition to provide the user with a check on the correctness of the file.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab
The search for anomalous words function operates in the following way. Users select a "word" - say CG and a window length. The program examines each successive window length along the sequence, with each window overlapping the previous one by windowleng
th-1 bases. For each window position the program calculates the base composition and the number of
occurrences of the chosen word. From the base composition it calculates an expected number of occurrences of the chosen word by simply multiplying the relevent frequencies and assuming random ordering. It plots observed - expected hence showing regions tha
t are enriched or depleted in the chosen word.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
The codon constraint calculation offers a measure of the codon bias that is independent of any set tables of expected codons. Although some users may find the underlying mathematics difficult to understand
the values obtained provide an interesting measure. It was shown (5) for a set of {\i E. coli} genes that their values of codon constraint correlated with their levels of expression. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab The algorithm for finding possible stem loops counts A-T, G-C and G-T pairs as matching but will only find stems with no mismatches or loopouts.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab The long range inverted and direct repeat routines are fast but only find exact matches. More flexible and exhaustive methods are described in the chapter on sequence comparisons.\par
6.\tab It is also possible to use the pattern searching routines to define and search for inverted and direct repeats. They are particularly useful for finding specific structures - for example tRNA folds.\par
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 7.\tab
It is possible that the "Examine repeats" algorithm may run out of memory, particularly if a short minimum word length is chosen or the sequence is very long or very repetitive. If this occurs the maximum word length reported may not be the longest in t
he sequence\: the memory will have been consumed before it was found.\par
\pard\plain \s5\sb320\sa60\sl320\tx560 \b\f20\fs28 \page 4.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab McCaldon,P. and Argos,P. 1988 Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences. {\i Proteins} {\b 4}, 99-122.\par
2.\tab Sweet,R.M. and Eisenberg,D. 1983. Correlation of sequence hydrophobicity measures similarity in three-dimensional protein structure. {\i J. Mol. Biol}. {\b 171}\:479-488.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Honess,R.W., Gompels,U.A., Barrell,B.G., Craxton,M., Cameron,K.R., Staden,R., Chang,Y.-N and Hayward,G.S. 1989 Deviations from expected frequencies
of CpG dinucleotides in herpesvirus DNAs may be diagnostic of differences in the states of their latent genomes. {\i J. Gen. Virol}, {\b 70}, 837-855.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Bird,A.P. 1980 DNA methylation and the frequency of CpG in animal DNA. {\i Nucl. Acids Res}. {\b 8}, 1499-1504.\par
5.\tab McLachlan, A.D., Staden, R., and Boswell, D.R. 1984. A method for measuring the non-random bias of a codon usage table. {\i Nucl. Acids Res}. {\b 12}\:9567-9575.\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 10. Translating and Listing Nucleic Acid Sequences\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Listing the sequence with all six reading frames translated\par
2.2\tab Listing the sequence with its open reading frames translated\par
2.3\tab Listing the sequence with defined segments translated\par
2.4\tab Listing the sequence with translated segments defined from a feature table\par
2.5\tab Producing a file of protein sequences for all open reading frames.\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.6\tab Producing a file of protein sequences for segments defined from a feature table\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we deal with producing simple listings from nucleotide seque
nces. All functions are contained in the program NIP. We can list the sequence alone, in single or doubled stranded format or with translations to protein. The translations can be of all six phases, all open reading frames, or of specified segments. The p
ositions of these segments can be defined on the keyboard or read from a EMBL/GenBank feature table. Translations can use the one letter or three letter codes. In addition we can produce files containing only the protein translations, and which are suitabl
e for processing by other programs. Again the positions of the translated segments can be defined on the keyboard, read from a feature table, or be all open reading frames. For the user, producing all these results is very simple, so we only give examples
of "methods" and show what the results look like. All outputs that list the sequence can be produced from the menu option named "Translate and list".\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Listing the sequence with all six reading frames translated\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par
2.\tab Accept "Show translation".\par
3.\tab Select "The segments to translate will be "All six frames"".\par
4.\tab Accept "Use 1 letter codes".\par
5.\tab Define "Start". Where to list from.\par
6.\tab Define "End". Where to list to.\par
7.\tab Define "Line length". The number of characters in each line of output.\par
8.\tab Reject "Number ends of lines". This alternative writes the positions underneath each line.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The listing will then appear. Given the choices taken it will look the same as figure 10.1.\par
\pard\plain \li1240\ri1280\sb200\sl220\box\brsp100\brdrth \f4\fs16 Q D Y I G H H L N N L Q L D L R T F S L\par
\pard \li1240\ri1280\sl220\box\brsp100\brdrth R I T * D T T * I T F S W T C V H S R W\par
G L H R T P P E * P S A G P A Y I L A\par
caggattacataggacaccacctgaataaccttcagctggacctgcgtacattctcgctg\par
1010 1020 1030 1040 1050 1060\par
gtcctaatgtatcctgtggtggacttattggaagtcgacctggacgcatgtaagagcgac\par
L I V Y S V V Q I V K L Q V Q T C E R Q\par
P N C L V G G S Y G E A P G A Y M R A P\par
S * M P C W R F L R * S S R R V N E S\par
\par
V D P Q N P P A T F W T I N I D S M F F\par
W I H K T P Q P P S G Q S I L T P C S S\par
G G S T K P P S H L L D N Q Y * L H V L\par
gtggatccacaaaaccccccagccaccttctggacaatcaatattgactccatgttcttc\par
1070 1080 1090 1100 1110 1120\par
cacctaggtgttttggggggtcggtggaagacctgttagttataactgaggtacaagaag\par
H I W L V G W G G E P C D I N V G H E E\par
P D V F G G L W R R S L * Y Q S W T R R\par
T S G C F G G A V K Q V I L I S E M N K\par
\par
S V V L G L L F L V L F R S V A K K A T\par
R W C W V C C S W F Y S V A * P K R R P\par
L G G A G S V V P G F I P * R S Q K G D\par
tcggtggtgctgggtctgttgttcctggttttattccgtagcgtagccaaaaaggcgacc\par
1130 1140 1150 1160 1170 1180\par
agccaccacgacccagacaacaaggaccaaaataaggcatcgcatcggtttttccgctgg\par
R H H Q T Q Q E Q N * E T A Y G F L R G\par
P P A P D T T G P K I G Y R L W F P S W\par
E T T S P R N N R T K N R L T A L F A V\par
\par
S G V P G K F Q T A I E L V I G F V N G\par
A V C Q V S F R P R L S W * S A L L M V\par
Q R C A R * V S D R D * A G D R L C * W\par
agcggtgtgccaggtaagtttcagaccgcgattgagctggtgatcggctttgttaatggt\par
1190 1200 1210 1220 1230 1240\par
tcgccacacggtccattcaaagtctggcgctaactcgaccactagccgaaacaattacca\par
A T H W T L K L G R N L Q H D A K N I T\par
R H A L Y T E S R S Q A P S R S Q * H Y\par
\pard \li1240\ri1280\sl220\keepn\box\brsp100\brdrth L P T G P L N * V A I S S T I P K T L P\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 10.1\tab A six phase translation using the 1 letter codes\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Listing the sequence with its open reading frames translated\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par
2.\tab Accept "Show translation".\par
3.\tab Select "The segments to translate will be "Open reading frames"".\par
4.\tab Define "Minimum open frame in amino acids".\par
5.\tab Accept "Use 1 letter codes".\par
6.\tab Define "Start". Where to list from.\par
7.\tab Define "End". Where to list to.\par
8.\tab Define "Line length". The number of characters in each line of output.\par
9.\tab Select "Both strands"\par
10.\tab Accept "Number ends of lines".\par
\pard\plain \s4\qj\sa120\sl280 \f20 A typical result is shown in figure 10.2.\par
\pard\plain \li720\ri680\sb200\sl220\box\brsp100\brdrth \tx7780 \f4\fs16 Q D Y I G H H L N N L Q L D L R T F S L\par
\pard \li720\ri680\sl220\box\brsp100\brdrth \tx7780 caggattacataggacaccacctgaataaccttcagctggacctgcgtacattctcgctg\tab 1060\par
. \: . \: . \: . \: . \: . \:\par
gtcctaatgtatcctgtggtggacttattggaagtcgacctggacgcatgtaagagcgac\par
L I V Y S V V Q I V K L Q V Q T C E R Q\par
* S S R R V N E S\par
\par
V D P Q N P P A T F W T I N I D S M F F\par
gtggatccacaaaaccccccagccaccttctggacaatcaatattgactccatgttcttc\tab 1120\par
. \: . \: . \: . \: . \: . \:\par
cacctaggtgttttggggggtcggtggaagacctgttagttataactgaggtacaagaag\par
H I W L V G W G G E P C D I N V G H E E\par
T S G C F G G A V K Q V I L I S E M N K\par
\par
S V V L G L L F L V L F R S V A K K A T\par
tcggtggtgctgggtctgttgttcctggttttattccgtagcgtagccaaaaaggcgacc\tab 1180\par
. \: . \: . \: . \: . \: . \:\par
agccaccacgacccagacaacaaggaccaaaataaggcatcgcatcggtttttccgctgg\par
R H H Q T Q Q E Q N * E T A Y G F L R G\par
E T T S P R N N R T K N R L T A L F A V\par
\par
S G V P G K F Q T A I E L V I G F V N G\par
agcggtgtgccaggtaagtttcagaccgcgattgagctggtgatcggctttgttaatggt\tab 1240\par
. \: . \: . \: . \: . \: . \:\par
tcgccacacggtccattcaaagtctggcgctaactcgaccactagccgaaacaattacca\par
A T H W T L K L G R N L Q H D A K N I T\par
L P T G P L N * V A I S S T I P K T L P\par
\par
S V K D M Y H G K S K L I A P L A L T I\par
agcgtgaaagacatgtaccatggcaaaagcaagctgattgctccgctggccctgacgatc\tab 1300\par
. \: . \: . \: . \: . \: . \:\par
tcgcactttctgtacatggtaccgttttcgttcgactaacgaggcgaccgggactgctag\par
A H F V H V M A F A L Q N S R Q G Q R D\par
\pard \li720\ri680\sl220\keepn\box\brsp100\brdrth \tx7780 L T F S M Y W P L L L S I A G S A R V I\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa180\sl240\tx1140 \f21\fs20 Figure 10.2\tab A listing showing the translation of open reading frames from both strands of a sequence from position 1001 to 1300\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Listing the sequence with defined segments translated\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par
2.\tab Accept "Show translation".\par
3.\tab Select "The segments to translate will be "Typed on the keyboard"".\par
4.\tab Accept "Use 1 letter codes".\par
5.\tab Define "Start". Where to list from.\par
6.\tab Define "End". Where to list to.\par
7.\tab Define "Line length". The number of characters in each line of output.\par
8.\tab Select "Both strands".\par
9.\tab Accept "Number ends of lines".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define "Translate from". Define the start of the next segment to translate - say the next exon.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab Define "Translate to". Define the end of the next segment to translate.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 12.\tab Select "Strand". As both strands have been selected above the program will allow either to be translated for each defined segment.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The program will now cycle around through steps 10, 11 and 12 until a zero value is defined for "Translate from". At which point the listing will appear. Given the choices made it will look the same as figure 10.2.
\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Listing the sequence with translated segments defined from a feature table\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par
2.\tab Accept "Show translation".\par
3.\tab Select "The segments to translate will be "Read from a feature table"".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Feature table file name". Type the name of the file containing the appropriate feature table in EMBL/GenBank format.\par
5.\tab Define "Operator". This defines which feature table operators should be employed when selecting the segments to translate.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Use 1 letter codes"\par
7.\tab Define "Start". Where to list from.\par
8.\tab Define "End". Where to list to.\par
9.\tab Define "Line length". The number of characters in each line of output.\par
10.\tab Select "Both strands"\par
11.\tab Accept "Number ends of lines".\par
\pard\plain \s4\qj\sa120\sl280 \f20 The program will now read the feature table file and translate the segments defined using the selected operator(s) and the listing will appear as in figure 10.2.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Producing a file of protein sequences for all open reading frames.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and write protein sequences to disk".\par
2.\tab Reject "Translate selected regions". The alternative is "Open reading frames".\par
3.\tab Define "Minimum open frame in amino acids".\par
4.\tab Select "Both strands".\par
5.\tab Define "File name for translation".\par
\pard\plain \s4\qj\sa120\sl280 \f20
A typical results file is shown in figure 10.3. It shows that the file is written in FASTA format (i.e. an entry name line starting with a > symbol (here the first entry name is 188, the start of the DNA segment), followed by a title (here in EMBL feature
table format giving the start and end of the DNA that produced the protein), followed by the sequence terminated by an *.\par
\pard \s4\qj\sa120\sl280 \par
\pard\plain \sl220 \f4\fs16 {\f22\fs18 \par
}\pard \li1260\ri1360\sl220\box\brsp100\brdrth {\f22\fs18 >188 188..733\par
}\pard \li1260\ri1360\sl220\box\brsp100\brdrth {\f22\fs18 TMEVNKKQLADIFGASIRTIQNWQEQGMPVLRGGGKGNEVLYDSAAVIKWYAERDAEIEN\par
EKLRREVEELRQASEADLQPGTIEYERHRLTRAQADAQELKNARDSAEVVETAFCTFVLS\par
RIAGEIASILDGLPLSVQRRFPELENRHVDFLKRDIIKAMNKAAALDELIPGLLSEYIEQ\par
SG*\par
>711 711..2633\par
VNISNSQVNRLRHFVRAGLRSLFRPEPQTAVEWADANYYLPKESAYQEGRWETLPFQRAI\par
MNAMGSDYIREVNVVKSARVGYSKMLLGVYAYFIEHKQRNTLIWLPTDGDAENFMKTHVE\par
PTIRDIPSLLALAPWYGKKHRDNTLTMKRFTNGRGFWCLGGKAAKNYREKSVDVAGYDEL\par
AAFDDDIEQEGSPTFLGDKRIEGSVWPKSIRGSTPKVRGTCQIERAASESPHFMRFHVAC\par
PHCGEEQYLKFGDKETPFGLKWTPDDPSSVFYLCEHNACVIRQQELDFTDARYICEKTGI\par
WTRDGILWFSSSGEEIEPPDSVTFHIWTAYSPFTTWVQIVKDWMKTKGDTGKRKTFVNTT\par
LGETWEAKIGERPDAEVMAERKEHYSAPVPDRVAYLTAGIDSQLDRYEMRVWGWGPGEES\par
WLIDRQIIMGRHDDEQTLLRVDEAINKTYTRRNGAEMSISRICWDTGGIDPTIVYERSKK\par
HGLFRVIPIKGASVYGKPVASMPRKRNKNGVYLTEIGTDTAKEQIYNRFTLTPEGDEPLP\par
GAVHFPNNPDIFDLTEAQQLTAEEQVEKWVDGRKKILWDSKKRRNEALDCFVYALAALRI\par
SISRWQLDLSALLASLQEEDGAATNKKTLADYARALSGEDE*\par
>74 complement(74..727)\par
LFDIFTQQPRYQFIQRGCFVHGFDDIPFQEINMSVFQFRKTPLHRQGEPVENTGNFTCDP\par
RQHESTECGFHHFSGVSGILQFLCVGLRTRKSMAFVLNSSWLEICLAGLPQFFNLPAQLF\par
VLNFSIPFGIPFYDGGRVIKHLITLATASQNGHSLFLPVLNGTDTRTENVSQLLFVDFHC\par
SFHGQKQRKETTEAKKPRFQHLSFPFFSEGILNKNIKL*\par
>313 complement(313..732) \par
PDCSIYSLSNPGISSSSAAALFMALMISRFRKSTCRFSSSGKRRCTDRGSPSRILAISPA\par
IRDSTKVQNAVSTTSAESLAFFSSCASACARVSRWRSYSIVPGWRSASLACRSSSTSRRS\par
}\pard \li1260\ri1360\sl220\box\brsp100\brdrth {\f22\fs18 FSFSISASLSAYHFMTAAES*\par
}\pard \li1260\ri1360\sl220 {\f22\fs18 \par
}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 10.3\tab The contents of a file containing the protein sequences of the open reading frames found by the program\par
\pard\plain \s6\fi-560\li560\sb240\sa60\sl280\tx560 \b\f20 2.6\tab Producing a file of protein sequences for segments defined from a feature table\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and write protein sequences to disk".\par
2.\tab Accept "Translate selected regions".\par
3.\tab Reject "Define segments using keyboard". The alternative is to use a feature table.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Feature table file name". Type the name of the file containing the appropriate feature table in EMBL/GenBank format.\par
5.\tab Define "Operator". This defines which feature table operators should be employed when selecting the segments to translate.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "File name for translation"\par
\pard\plain \s4\qj\sa120\sl280 \f20 The program will now read the feature table file and translate the segments defined using the selected operator(s). The results will be stored as in figure 10.3.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab To produce a listing without translation the "Translate and list" function can be used with the "Show translation" option rejected. Alternatively the function "List the sequence" can be used.
\par
2.\tab Some users may be confused by the fact that the program asks "Where to list from, and to" and also "Define segments to translate". This allows for 5' and 3' untranslated regions to be included in the listing.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
The feature table file employed by the programs is a simple text file containing the data for the current sequence. Because of the multiplicity of different sequence library formats we have not provided the facility of reading such data directly from li
braries. The feature tables for individual library entries must be extracted (see the introductory chapter) or files can be created for new sequences.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
The current feature tables use "operators" such as "join" or "order" to specify which segments should be translated together to make a complete protein sequence. The program allows users to select which ones to employ, the default being "Use all operato
rs".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab The program contains a function "Set genetic code" which allows users to choose from a menu of codes or to define their own by specifying amino acid and codon pairs. This sets the code for all functions.
\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 11. Statistical and Structural Analysis of Protein Sequences\par
\pard\plain \s3\sb200\sa120\sl360 \b\f20\fs32 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Plotting hydrophobicity\par
2.2 \tab Plotting charge\par
2.3\tab Plotting hydrophobic moment and hydrophobicity\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700\tx1980 2.4\tab Drawing helical wheels\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.5\tab Producing a Robson secondary structure prediction\par
2.6\tab Calculating the amino acid composition and molecular weight\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
4.\tab References\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we describe the use of routines for plotting hydrophobicity, charge and hydrophobic moments, drawing helix wheels and predicting second
ary structure. Use of all these routines is very straightforward and they are contained in the program PIP.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Plotting hydrophobicity\par
\pard\plain \s4\qj\sa120\sl280 \f20 This method uses the values of Kyte and Doolittle (1)\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot hydrophobicity".\par
2.\tab Define "Window length".\par
3.\tab Define "Plot interval".\par
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 11.1.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Plotting charge\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot charge".\par
2.\tab Define "Window length".\par
3. \tab Define "Plot interval".\par
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear and will be similar to that shown in figure 11.1.\par
\pard\plain \sl220\keepn \f4\fs16 {{\pict\macpict\picw448\pich81
0396ffffffff005001bf1101a0008201000affffffff005001bf0900000000000000003100000000004f01be9800240000000000350120000000000035011f00000000004f01be000102dd0006007fdfff00fc060040df000004060040df000004060040df000004060040df000004060078df000004060040df0000040600
40df000004060040df000004060040df00000407017840e0000004070140b0e000000407014108e000000407014104e000000407014204e00000040b017a02fc000020e60000040c014202fd00010250e60000040f014402fd00010590e900031000000418014401fd00010490f800010380f8000020fe0003700000041c01
4801fd00010808f800010480fd000010fd000060fe0003880000041d017801fd00010808f800010440fe00010428fd000090fe00038804000424074801000002000804fe00010110fd00010440fe00010a28fe000801108004010806000424075000800005001004fe000101a8fd00010840fe000d09440000020111800b01
04090004240c500080000480100200800002a4fd00010840fe000d10c40020030a0a4009010709002425236000800004e0100201400002440020000210400004401002005004960c400882010881e42523780080000810100202300002430050000d1020001ba0200100900490004810820090412425234000800008102001
02080002008088001120200020104000811004600037f0c20090222406007fdfff00fc2523400040002008e00104032c040022020020c0180080038000420808000020000c00600a14241440004000200500010400c204001201002080080080fe0002420410fd000404004004142113780030004005000084000104001401
0040000401fd0002220410fb00024004141f13400008004005000098000088000c010040000401fd0002240410f900000c1f134000080040020000e00000900000010040000407fd00021c0220f90000041c05400009008002fc0008600000010e80000208fc000103e0f900000416044000068080fb000040fe0004918000
0210f2000004110378000441f6000490800003f0f20000040d0340000022f60000a0ee0000040d0340000022f6000060ee000004090340000014e2000004090340000008e2000004060078df000004060040df000004060040df000004060040df000004060040df000004060078df000004060040df000004060040df0000
04060040df000004060040df00000406007fdfff00fc02dd00a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.1\tab A hydrophobicity plot using the values of Kyte and Doolittle.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Plotting hydrophobic moment and hydrophobicity\par
\pard\plain \s4\qj\sa120\sl280 \f20 This method plots the hydrophobic moment and the hydrophobicity as defined by Eisenberg {\i et al} (2).\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot hydrophobic moment".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Angle". This is the angle between the residues when the helix is viewed end on. The default value of 100 degrees is that found in alpha helices.\par
3.\tab Define "Window length". The default of 18, if used in conjunction with the default "Angle", is equivalent to 5 turns of the helix.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Plot interval".\par
\pard\plain \s4\qj\sa120\sl280\tx560 \f20
The plot will appear as in figure 11.2. with the hydrophobicity shown above the hydrophobic moment. The scale for the hydrophobicity runs from -1.0 to 1.5 and for the hydrophobic moment from 0.0 to 1.5. The program plots the mean values for each window pos
ition with the value at position x representing the segment from x-window length+1 to x.\par
\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw447\pich160
0659ffffffff009f01be1101a0008201000affffffff009f01be0900000000000000003100000000009e01bd9800240000000000670120000000000067011f00000000009e01bd000102dd0006007fdfff00fc060040df000004060070df000004060040df000004060070df000004060040df000004060070df0000040600
40df000004060070df000004060040df00000406007edf000004060040df000004060070df000004060040df000004060070df000004060040df000004060078df000004060074df0000040a0072fc000008e50000040a0061fc000038e50000040e007ffc000044f2000020f500000413016080fd000084f2000050f90000
08fe00000414017080fe00010104f2000048f9000016fe00000419016040fe00010202f2000088fe000001fe000502110008000419017030fe00010401f2000084fe000902818000052088140004200c40080001000401e00800000380f900012102fe00090242600004e096120004240c70080002800800101400000c40fd
000008fe000e320380000004741000040061210004240c40080004800800082200003040fd000034fe000e4a004000001c0c0c00080001210004250c70040004500800042200004020fe00131c440000e04c0020000020000b0e080000c08004250c400400086f10000241f8038010fe001323820001104000100000400000
91100000808004241e7e02000800f00001800404001100001c4002000e0880000c400080000060e0fd0000041f0340020008fd0012800208000ef000244002001004800003b00080f90000041e0370010010fc000b023800000fc0228001001003fe0002080080f90000041c0340010010fc000101c0fe00053042800100a0
fd00010801f8000004170370008020f70005084280008160fd00010401f8000004160340008040f700040481000041fc000107c6f8000004150370004440f700040301000021fb000028f8000004110340004a80f300001afb000010f80000041002700031f2000006fb000010f8000004060040df00000406007edf000004
060040df000004060070df000004060040df000004060070df000004060040df000004060070df000004060040df000004060070df000004060040df00000406007fdfff00fc060040df000004060040df000004060070df000004060040df000004060040df000004060040df000004060070df000004060040df00000406
0040df000004060070df000004060040df000004060040df000004060070df000004060040df000004060040df000004060040df00000406007edf000004060040df000004060040df000004060070df000004060040df000004060040df000004060070df000004060040df000004060040df000004060040df0000040600
70df0000040a0040f6000010eb0000040a0040f600002ceb0000040a0070f6000024eb0000040a0040f6000042eb0000040c0040f80002800082eb00000411007efe000040fd000303400082eb000004110040fe0000a0fd000304400101eb000004130040fe0000a0fd0005082002010001ed0000041e044000000110fd00
070820040100030010f9000040fe000103e0fd0000042204700000011cfd000730100800c0048128fd000480000004c0fe00010410fd00000423044000000204fd000720081000200482c4fe00050380000006a0fe00012808fd00000424044000000402fe0008204008100020088404fe0005024000000920fe00015808fd
00000424047000001c02fe0008504004100020084404fe0005044000004920fe00018004fd00000425104000002002700000484004200010084802fe000f84400000a820040001000400e00000042523400000200190000048800420001010480200000144420000b0100a018a000403100080042523700001200109000084
80044000087030020000024826000110105206740002241003000425234001c2c0000a8008848002800008801002000804482508071010b208000003d812040004251b40022440000a601703000280000500000104340428291408000d0108fe0004080d080004241b60041800000411a00200030000070000012a44043019
240800030108fd0003088800041e077f88100000040a60f9000701d982880000c408fe000090fc00025000041b014050fd00000cf7000601037800008210fe000070fc00025000040e014030ed000103f0f8000220000406007fdfff00fc02dd00a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.2\tab A hydrophobic moment (below) and hydrophobicity plot. The hydrophobicity plot displays the mean va
lues on a scale of -1.5 to 1.0 and the hydrophobic moment on a scale of 0.0 to 1.5.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Drawing helical wheels\par
\pard\plain \s4\qj\sa120\sl280 \f20 This method draws helical wheels for any segment of the sequence (3). In addition it displays the hydrophobic moment for the segment (2).\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Draw helix wheel".\par
2.\tab Define "Angle". The default angle of 100 degrees is that found in alpha helices.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Window length". The default of 18, if used in conjunction with the default "Angle", is equivalent to 5 turns of the helix.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Step
". To produce a display for a sequence position N bases from the current one type N, and the display will appear in place of the previous one. The default value of N is 1, so by repeatedly hitting carriage return the user can step, residue by residue, thro
ugh the sequence.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The display for the current position in the sequence will appear as in figure 11.3. and the bell will ring. The program now allows the user to "step through the sequence displaying the helix wheel for each position.
\par
\pard\plain \li900\ri960\sb500\sl220\keepn\box\brsp120\brdrth \f4\fs16 {{\pict\macpict\picw355\pich329
0c64ffffffff014801621101a00082a0008c01000affffffff0148016209000000000000000031010f01050121011338a10096000c010000000200000000000000a1009a0008fffd00000004000001000a01100106011e01112c000c00150948656c76657469636103001504010d000c2e00040000010028011a01070144a0
0097a0008da0008c01000affffffff0148016231012600ba013800c838a10096000c010000000200000000000000a1009a0008fffc00000004000001000a012700bb013500c628013100bc014ca00097a0008da0008c01000affffffff0148016231011d0087012e009538a10096000c010000000200000000000000a1009a
0008fffc00000004000001000a011d0088012b009328012700890146a00097a0008da0008c01000affffffff014801623100df004600f1005438a10096000c010000000200000000000000a1009a0008fffd00000004000001000a00e0004700ee00532800ea00480156a00097a0008da0008c01000affffffff0148016231
0097003900a8004738a10096000c010000000200000000000000a1009a0008fffc00000004000001000a0097003a00a500452800a1003b0159a00097a0008da0008c01000affffffff0148016231006b004d007c005b38a10096000c010000000200000000000000a1009a0008fffc00000004000001000a006b004e007900
59280075004f014ca00097a0008da0008c01000affffffff01480162310032008a0044009838a10096000c010000000200000000000000a1009a0008fffc00000004000001000a0033008b0041009628003d008c014ba00097a0008da0008c01000affffffff0148016231002b00ba003d00c838a10096000c010000000200
000000000000a1009a0008fffd00000004000001000a002c00bb003a00c628003600bc0144a00097a0008da0008c01000affffffff0148016231003300f1004500ff38a10096000c010000000200000000000000a1009a0008fffd00000004000001000a003400f2004200fd2b37080148a00097a0008da0008c01000affff
ffff0148016231005101190063012738a10096000c010000000200000000000000a1009a0008fffd00000004000001000a0052011a006001252b281e0145a00097a0008da0008c01000affffffff014801623100b9014400cb015238a10096000c010000000200000000000000a1009a0008fffc00000004000001000a00b9
014500c701512b2b67014ba00097a0008da0008c01000affffffff01480162310098014400aa015238a10096000c010000000200000000000000a1009a0008fffc00000004000001000a0099014500a701512800a30146014ba00097a0008da0008c01000affffffff0148016231003e00ba004f00c838a10096000c010000
000200000000000000a1009a0008fffc00000004000001000a003f00bb004d00c728004900bc0131a00097a0008da0008c01000affffffff014801623100b9013100ca013f38a10096000c010000000200000000000000a1009a0008fffd00000005000001000a00ba013200c8013e2b777b0132a00097a0008da0008c0100
0affffffff014801623101080090011a009e38a10096000c010000000200000000000000a1009a0008fffc00000004000001000a010900910117009c28011300920133a00097a0008da0008c01000affffffff01480162310075005b0087006938a10096000c010000000200000000000000a1009a0008fffd000000050000
01000a0076005c00840068280080005d0134a00097a0008da0008c01000affffffff0148016231005c0109006e011738a10096000c010000000200000000000000a1009a0008fffc00000005000001000a005d010a006b0116280067010b0135a00097a0008da0008c01000affffffff014801623100f900fe010b010c38a1
0096000c010000000200000000000000a1009a0008fffd00000004000001000a00fa00ff0108010b28010401000136a00097a0008da0008c01000affffffff014801623100d5005700e7006538a10096000c010000000200000000000000a1009a0008fffd00000004000001000a00d6005800e400632800e000590137a000
97a0008da0008c01000affffffff014801623100480093005a00a138a10096000c010000000200000000000000a1009a0008fffc00000005000001000a00490094005700a028005300950138a00097a0008da0008c01000affffffff01480162310098013200a9014038a10096000c010000000200000000000000a1009a00
08fffc00000004000001000a0099013300a7013e2b9f500139a00097a0008da0008c01000affffffff0148016231010f00b7011c00d038a10096000c010000000200000000000000a1009a0008fffd00000009000001000a011000b8011e00cd28011a00b9023130a00097a0008da0008c01000affffffff01480162310097
004a00a6006338a10096000c010000000200000000000000a1009a0008fffd00000009000001000a0098004b00a600602800a2004c023131a00097a0008da0008c01000affffffff0148016231004600e3005700f838a10096000c010000000200000000000000a1009a0008fffc00000008000001000a004700e4005500f6
28005100e5023132a00097a0008da0008c01000affffffff014801623100e2011700f3012c38a10096000c010000000200000000000000a1009a0008fffc00000007000001000a00e3011800f101292b349c023133a00097a0008da10096000c010000000200000000000000a1009a0008fffd0000003a000001000a000000
00000e007728000a00010d444b464c4544564b4b4c594853a00097a10096000c010000000200000000000000a1009a0008000400000007000001000a00180002003400132b0218044d20200d2a0e0148a00097a10096000c030000000200000000000000a1009a0008000b00000004000001000a0018000d00420031280022
001a05372e38310d2800300016062d322e39370d2b070e03313532a00097a0008c01000affffffff0148016231003300890045009738a10096000c010000000200000000000000a1009a0008fffd00000005000001000a0034008a00420096296e014ba00097a0008da0008c01000affffffff014801623100f30123010401
3138a10096000c010000000200000000000000a1009a0008fffd00000005000001000a00f40124010201302b9ac00153a00097a0008d01000affffffff0148016207000000002200bc01210000a000a0a100a4000209fd01000a0000000000000000070001000109ffffffffffffffff22005900bf62632300002100fc009f
23000023cc8723000021006d00fe2300002100ee00fe2300002100d8006b23000023338723000021009f0120230000239f6323000023a29c23000021005f00e2230000233278230000a000a301000affffffff0148016222005900bf62632100fc009f23cc8721006d00fe2100ee00fe2100d8006b23338721009f0120239f
6323a29c21005f00e2233278a000a1a10096000c030000000200000000000000a1009a0008fffc00000003000001000a002000f9003101020d000e28002c00fa012ba00097a10096000c030000000200000000000000a1009a0008fffc00000003000001000a002100820032008b28002d0083012ba00097a10096000c0300
00000200000000000000a1009a0008fffc00000003000001000a0096015800a701612bd675012ba00097a10096000c030000000200000000000000a1009a0008fffc00000003000001000a00b7015700c801602800c30158012ba00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a00
4401250055012f280050012a012da00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a001900b7002a00c128002500bc012da00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a011d0107012e0111280129010c012da00097a10096000c030000
000200000000000000a1009a0008fffc0000ffff000001000a013600b6014700c028014200bc012ea00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a012a007c013b00862801360082012ea00097a10096000c030000000200000000000000a1009a0008fffc0000fffe000001000a
00e4003100f5003b2800f00037012ea00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a0092002400a3002e28009e002a012ea00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a005a003e006b00472800660043012ea00097a00083ff}}
\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 11.3\tab A typica
l helix wheel display using a window of only 13 residues. The display includes a schematic of the helix showing the links between residues, with each vertex numbered according to position; the residue type at each vertex; a symbol denoting a classification
as hydrophobic (.), positively charged (+), negatively charged (-), or otherwise (). The residue number of the first sequence element in the current window is displayed at the top left corner along with the sequence. Below this is the total hydrophobicity
and hydrophobic moment according to Eisenberg {\i et al }(2).\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Producing a Robson secondary structure prediction\par
\pard\plain \s4\qj\sa120\sl280 \f20 This method uses the method of Garnier {\i et al} (4) to predict the positions of alpha helices, beta sheets, turns and random coil. The results can be either plotted or listed.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Robson secondary structure prediction".\par
\pard \s7\qj\fi-560\li560\ri-100\sa120\sl280\tx560 \page 2.\tab Accept "Plot results". The alternative produces a listing like that shown in figure 11.4.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 11.5. and the program also prints a count of the number of positions at which each of the 4 structure types is the highest scoring.\par
\pard\plain \li1500\ri1460\sb200\sl220\box\brsp100\brdrth \tqr\tx3220\tqr\tx4700\tqr\tx6140\tqr\tx7420 \f4\fs16 350 P\tab 274\tab -178\tab -84\tab -77\par
\pard \li1500\ri1460\sl220\box\brsp100\brdrth \tqr\tx3220\tqr\tx4700\tqr\tx6140\tqr\tx7420 351 L\tab 16\tab -192\tab -21\tab -38\par
352 K\tab 371\tab -223\tab -75\tab -68\par
353 L\tab 365\tab -152\tab -101\tab -65\par
354 S\tab 331\tab -82\tab -84\tab -63\par
355 K\tab 311\tab -43\tab -110\tab -88\par
356 A\tab 280\tab -23\tab -110\tab -80\par
357 V\tab 234\tab -12\tab -135\tab -75\par
358 H\tab 177\tab -10\tab -143\tab -92\par
359 K\tab 153\tab 2\tab -180\tab -138\par
360 A\tab 158\tab 52\tab -175\tab -130\par
361 V\tab 144\tab 78\tab -187\tab -115\par
362 L\tab 132\tab 58\tab -186\tab -80\par
363 T\tab 124\tab 63\tab -142\tab -78\par
364 I\tab 144\tab 32\tab -111\tab -43\par
365 D\tab 120\tab -49\tab -29\tab 5\par
366 E\tab 103\tab -80\tab 13\tab 43\par
367 K\tab 111\tab -113\tab 23\tab 42\par
368 G\tab 132\tab -127\tab -13\tab 64\par
369 T\tab 172\tab -132\tab -42\tab 52\par
\pard \li1500\ri1460\sl220\keepn\box\brsp100\brdrth \tqr\tx3220\tqr\tx4700\tqr\tx6140\tqr\tx7420 370 E\tab 216\tab -170\tab -122\tab -4{\b \par
}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa200\sl240\tx1140 \f21\fs20 Figure 11.4\tab A listing of the Robson secondary structure prediction. It includes the sequence position, the residue type and the values for the four structure classes.\par
\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw446\pich256
0d0fffffffff00ff01bd1101a0008201000affffffff00ff01bd090000000000000000310000000000fe01bc9800240000000000a601200000000000a6011f0000000000fe01bc000102dd0006007fdfff00fc060040df000004060040df000004060040df000004060041df00000407014280e0000004060042df0000040b
0042fd00010140e500000410014380fe000101a0fd0000c0ea000004110640000008800120fe000101a0ea000004160c40000019900124000010022380fc00000cf1000004200c40000066e80216000070022240fc000012fe00014001fc000060fd00010404240e400000a68804190000908214400006fe000612200000a0
0180fe00010190fd00010c04252340001100080409020111421820000b02800012500000a181900000040f10001006001204252340003f0008040903c20d3c0020001103c00022900000a1827900038e080800380a90120425234061410004340104a205200010021084200022910000a272279ea495300800440ba8218425
234292e03b1c5f070a12f300e0d0059e9830002751860122f20493de5161f800440bb82d042508429400000648010814fe00171004a0b814002009b621221400600860c008008508a44104250843140000018000880cfe000d0f0460c01ad9200a4952220c0020fe000604808590424004230842140000018000880cfe000b
0d046080032ac006014a2204fc000605e883904380041f014018fc000050fc000084fe000604400401861c04fc0006051500500200041e014008fc000030fc000048fe0005040000018018fb0006060500600000041a0040fb000020fc000070fb0002010018fb0006020600200000040e0040f5000050f2000002fc000004
0a0040f5000040ec000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df00000406007fdfff00fc060040df000004060040df000004060040df00000407014380e0000004060041df000004060041df0000040a0041f7000020ea000004100041fe0002
8000c0fd000060ea000004190640000008e00120fe00010161fb000008fd000001f60000041b064000000da00120fe000202e280fc000008fe00018001f60000041c0640600053200110fe000202e280fc00000cfe0002c00180f70000042306405010a3100212fe000202a280fe0002c00014fe0004c001800002fc000304
00100425064090110310020efe001904948000000140002400000121018000c200018000000420120425234090190210040e0000c0049c8000000220002480000121814000a38c014000000c202e8425234090190210140d020330280880000c222000248100022281430123920f6000000ab02d04252340891d02182c0503
0234d80040001a54202025830022224124821291083000300ab44d04252342f766eef868e50484150323c0002a55d0303be2e033e34124c21279381000500efc560425234287e60018800504c4170000440722881049c39491322231242a1a01e00880580adc440425234306a00015000108241600003c05418810c943140b
4c1211343c0c00000980480a8a440425234106000016000108141400001218c1800d0902140780140d38240c000009a0841b0a8004252340040000060000881c080000111000000f0e02140780140608000c00000950852b0a8004252340040000040000880008000001900000070002080500180600000400000908872b0b
000420014004fc0002900008fe000050fb000f040018060000040000050481230300041b0040fb000060fc000050fb000304000802fc0006060700a30000041b0040fb000020fc000060fb000304000802fc0006020100e2000004140040f5000060f9000008fb0006020000220000040a0040e5000002fc0000040a0040e5
000002fc000004060040df000004060040df000004060040df000004060040df00000406007fdfff00fc060040df000004060040df000004060040df0000040b014380f8000008ea0000040a0042f700000cea0000040e0043f700000cf1000008fb0000040e0042f700000af100000cfb0000040f014380f8000012f10000
14fb0000041c014004fd000008fd000012fd000004fc00010204fd000014fb0000041e01410afd000014fd000012fd00000afc0001060cfe00010224fc000101041f014292fd000014fd000012fd00000bfc00010a0cfe0001c522fd000203010421014291fd000024fd000621200000020904fd00060a12000001a522fd00
02028284240642910000a00024fd00182150000007110f000007000a12000002252200010000028204252342508000d006443804000820900000091109c0000900091200300218e2180380000282042523426043f0918582280c000e20900030089109200008801112002802180128048000028404252343bfc32912899a68
1200113e3000380e9f8a60001c802f1e002803f7ff2804802002858425234000440f0e480184122011201180c80890902000204e21120044040001281c406164840425234000440208500184225320c009410808a09010002052e0a110441c0001241040b2a44404252340002402085001044254a00009220810a060100040
2380a12d841000012620210a34440425234000240000300002829c60000924041040600800802080c127022000014220210e08440425234000180000200002818000000614041000000881000080812002200001424011000848042402400018fd000c0300800000061c04a000000981fe000c01200240000082401b000028
04200040fb000002fc00061004c000000a42fe000c012002400000014004000030041f0040fb000002fb0005028000000a64fe00070120014000000180fe000130041b0040f40005038000000614fe000301200180fe000080fe00011004160040f4000002fe00010614fe000301400080f9000004110040ef000008fe0003
01400080f90000040a0040ea0000c0f70000040a0040ea000080f7000004060040df000004060040df000004060040df00000406007fdfff00fc0a0040e40000c0fd0000040a0040e40000a0fd00000413014280f4000080fe000006f7000090fd00000417014280f600021000c0fe000009f80006011001c00000041c0143
80f600026800a0fe00010880fe000001fd00060110032000000421014280fa000002fe0002480110fe00011080fe00010280fe00060108042000000421014280fa000005fe0002c40110fe00011040fe00010640fe000602080420000004250f4000060001c000380405300001040110fe00011040fe000b3420e000000204
0820008004250f40000500023000280a04a80003020210fe00101040080000482090000004060820018004252340000900021000440a0868000c0102080038001020140000c0211000000405901002400425234000110002080082120864001401020801e4002021e21c0100111000000801d0100240042523400011000404
00826108040010008208030300202222240100121000000800201004200425234000108004040103a09002001000e208040381c012022201000a080000080000100420042523400020600404010000900200200024078c00418014022202000c08010008000008c82104252341c06020080401000060020020001804f00022
001c0243e200000801c008000008a81704252342a0ff100ffe01ffff0ffe105fffebf811fda3bff3ff789ffffff802201ffffffdeff5042506401b000e880202fe0002012840fe000610003200000180fe00090b06201000000518148423064004000bc80202fe0002012480fc00001cfe000080fe00090a84383000000300
0c841e0640040002480104fd0001a680fc000008fa00090a8c2ce00000020000041e0640000002280104fd0001e380fc000008fa0002045005fe000302000004170040fe0002100104fd00018080f40002042003fb000004140040fe0002100104fd000080f1000003fb0000040a0040fc0000fce50000040a0040fc000020
e5000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df00000406007fdfff00fc060040df000004060043df00000407014280e000000407014280e000000407014280e000000407014380e000000425134000203b1807070200f000e0c0000e00300006
40fe000cf000000e1001f800000b980c04060040df000004060040df000004060040df000004060040df000004060040df0000042406407000eee000e0fe0011032380000001c00038800001e10100000208fd000304201204060040df000004060040df000004060040df000004060040df000004060040df000004060040
df0000042202439f80fe000018fd00111e200010061f0260000c000e1e000001f7fefc00010184060040df000004060040df000004060040df000004060040df000004060040df000004252340005f0007fc01ffff0ffc001fffe3f801fd81bff3fe3801fffff800000ffffff8cfe004060040df000004060040df00000406
0040df000004060040df000004060040df000004060040df00000406007fdfff00fc02dd00a00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.5\tab A secondary structure plot using the method of Robson. The likelihood that each 17 residue segment of the sequence forms one of the four structure classes\:
helix (H), extended (E) normally termed sheet, turn (T) and coil (C) are each plotted out across the screen in four strips. Below this
is a "decision" strip (D) in which a single dot is poltted for the higest scoring structure class at each point. Here we see a sequence that is predicted to be predominantly helical.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Calculating the composition and molecular weight of a sequence.\par
\pard\plain \s4\qj\sa120\sl280 \f20
Select "Count amino acid composition". The composition and molecular weight are displayed as in figure 11.6.. Each column contains the one letter code for the amino acid, the number of occurrences of that amino acid in the sequence, and the number expresse
d as a percentage, and its molecular weight.\par
\pard\plain \li220\ri280\sb200\sl220\box\brsp100\brdrth \f4\fs16 Sequence composition\par
\pard \li220\ri280\sl220\box\brsp100\brdrth A C S T P A G N D E Q B Z H\par
N 0. 14. 19. 12. 30. 26. 3. 10. 11. 4. 0. 0. 0.\par
% 0.0 5.3 7.3 4.6 11.5 9.9 1.1 3.8 4.2 1.5 0.0 0.0 0.0\par
W 0. 1219. 1921. 1165. 2132. 1483. 342. 1151. 1420. 513. 0. 0. 0.\par
\par
A R K M I L V F Y W - X ? \par
N 7. 7. 10. 15. 39. 23. 13. 11. 8. 0. 0. 0. 0.\par
% 2.7 2.7 3.8 5.7 14.9 8.8 5.0 4.2 3.1 0.0 0.0 0.0 0.0\par
W 1093. 897. 1312. 1697. 4413. 2280. 1913. 1795. 1490. 0. 0. 0. 0.\par
\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth Total molecular weight= 28256.254\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.6\tab A typical molecular weight and composition display. It includes the residue type, their number, their percentage and their contribution to the molecular weight.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab The methods described in the chapters on motif and pattern searching can also be used to search for specifi
c structures. For example a sequence can be searched for all the structures contained in the PROSITE motif library.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab It is often convenient to produce displays in which several of the plots described above appear together on the screen.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Kyte, J. and Doolittle, R.F. 1982. A simple method for displaying the hydropathic character of a protein. {\i J.Mol. Biol}. {\b 157}\:105-132. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Eisenberg, D., Schwarz, E., Komaromy, M., and Wall, R. 1984. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. {\i J. Mol. Biol.} {\b 179}\:125-142.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Schiffer,M and Edmundson,A.B. 1967 Use of helical wheels to represent the structures of proteins and to identify the segments with helical potential. {\i Biophys. J}. {\b 7}, 121-135.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Garnier, J., Osguthorpe, D.J., and Robson, B. 1978. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. {\i J. Mol. Biol}. {\b 120}\:
97-120.\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 12. Searching for Motifs in Protein Sequences\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Searching for exact matches.\par
2.2\tab Searching for percentage matches to consensus sequences\par
2.3\tab Searching for consensus sequences using a score matrix\par
2.4\tab Using weight matrices for searching protein sequences\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
4.\tab References\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20
The program PIP contains several ways of defining and searching for motifs (1,2). We describe searches for exact matches and percentage matches, the use of score matrices and the creation and use of weight matrices. All of the searches produce
both listed and graphical output.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Searching for exact matches.\par
\pard\plain \s4\qj\sa120\sl280 \f20
The routine for finding and displaying the positions of exact matches to sequences can display its results in various forms. It is equivalent to the restriction enzyme search routine in the nucleotide analysis programs. The sequences to be searched for ca
n be typed on the keyboard or read from files. The format of these files is given in the notes. Here we give only a single example of the use of the routine which shows how to produce a plot of the positions of all amino acid types in a sequence.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab
Select "Input source" as "All acids file". A number of standard files are available and users may also have their own. The one selected simply contains the one letter codes for all the standard amino acids.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Accept "Search for all names". The alternative allows users to select a subset of the entries in the file by name.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Order results name by name".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Reject "List matches". If results are listed the output gives the name and position of each match and also the separations between matches.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The results will then appear in the form shown in figure 12.1. \par
\pard\plain \li80\ri80\sl220\keepn\box\brsp40\brdrth \f4\fs16 {{\pict\macpict\picw441\pich182
14a4ffffffff00b501b81101a0008201000affffffff00b501b8090000000000000000310000000000b201b798002a000000000083014400000000008301440000000000b201b7000102d70020f90002020080fd000a8000008000010204200201fb000620000200401004fe0020f90002020080fd000a8000008000010204
200201fb000620000200401004fe00220050fa0002020080fd000a8000008000010204200201fb000620000200401004fe00220050fa0002020080fd000a8000008000010204200201fb000620000200401004fe00220070fa0002020080fd000a8000008000010204200201fb000620000200401004fe00220020fa000202
0080fd000a8000008000010204200201fb000620000200401004fe00220020fa0002020080fd000a8000008000010204200201fb000620000200401004fe0009fd000007deff01c00005d900014000070050da00014000070050da00014000070050da00014000070070da000140000b0070fe000007deff01c00025fb0018
c340000e04308008944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e04308008944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e04308008944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e043080
08944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e04308008944000004080c01000400404880128004afe000340481008fd000b0020fe000007deff01c00026fc00018004fe000a0100400001000a00020120fc0008224412200041000820fe000308010000280070fd00018004fe
000a0100400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100
400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100400001000a00020120fc0008224412200041000820fe00010801ff0009fd000007deff01c00028fd0005014041802010fe00fe101900040500180001080080010084001804028000500500000480002a0020fe000501
4041802010fe00fe101900040500180001080080010084001804028000500500000480002a0050fe0005014041802010fe00fe101900040500180001080080010084001804028000500500000480002a0060fe0005014041802010fe00fe101900040500180001080080010084001804028000500500000480002a0010fe00
05014041802010fe00fe101900040500180001080080010084001804028000500500000480000b0070fe000007deff01c00026fc0014040010042000890000400310080040004180112058fe0008010401100800008004fd00280060fd0014040010042000890000400310080040004180112058fe00080104011008000080
04fd00280050fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd00280070fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd00280050fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd0028
0050fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd0009fd000007deff01c00027fd0004040a000080fc00092a010808100001000090fe000e040100008021048630000050008000290020fe0004040a000080fc00092a010808100001000090fe000e04010000802104863000
0050008000290050fe0004040a000080fc00092a010808100001000090fe000e040100008021048630000050008000290050fe0004040a000080fc00092a010808100001000090fe000e040100008021048630000050008000290050fe0004040a000080fc00092a010808100001000090fe000e0401000080210486300000
500080000b0070fe000007deff01c000230020fa00070800801009010408fc000920000090000020200120fe000301000001fd00230060fa00070800801009010408fc000920000090000020200120fe000301000001fd00230050fa00070800801009010408fc000920000090000020200120fe000301000001fd00230070
fa00070800801009010408fc000920000090000020200120fe000301000001fd00230040fa00070800801009010408fc000920000090000020200120fe000301000001fd00230040fa00070800801009010408fc000920000090000020200120fe000301000001fd0009fd000007deff01c00021fd00080100880004800000
40fd0002100101fc0005020000101440fa000022fe00230050fe0008010088000480000040fd0002100101fc0005020000101440fa000022fe00230070fe0008010088000480000040fd0002100101fc0005020000101440fa000022fe00230070fe0008010088000480000040fd0002100101fc0005020000101440fa0000
22fe00230050fe0008010088000480000040fd0002100101fc0005020000101440fa000022fe000b0050fe000007deff01c0001ffd000604000001400108fd000010fc000028f8000020fe000380000080fe00210050fe000604000001400108fd000010fc000028f8000020fe000380000080fe00210070fe000604000001
400108fd000010fc000028f8000020fe000380000080fe00210050fe000604000001400108fd000010fc000028f8000020fe000380000080fe00210050fe000604000001400108fd000010fc000028f8000020fe000380000080fe000b0050fe000007deff01c00029fd00230220000410c020462080000081000024028812
06016000a0005000084842100c48208028ff0029fd00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800002b0040fe00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800002b0040fe00250220000410c020462080000081000024
02881206016000a0005000084842100c4820802800002b0040fe00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800002b0040fe00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800000b0070fe000007deff01c00026fc000008
fd000c06000200008004010840000001fe000016fd000a5800044c00040000620000280050fd000008fd000c06000200008004010840000001fe000016fd000a5800044c00040000620000280060fd000008fd000c06000200008004010840000001fe000016fd000a5800044c00040000620000280060fd000008fd000c06
000200008004010840000001fe000016fd000a5800044c00040000620000280060fd000008fd000c06000200008004010840000001fe000016fd000a5800044c000400006200000b0070fe000007deff01c00027fc0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd
0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd00
12540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd0012540430210000800802800860b2a20100001808fe0004100a821022fd000302002090ff0009fd000007deff01c0001bfb00011008fc000040fc000008fd000001f9000001fe000002fd001d0050fc00011008fc00
0040fc000008fd000001f9000001fe000002fd001d0050fc00011008fc000040fc000008fd000001f9000001fe000002fd001d0070fc00011008fc000040fc000008fd000001f9000001fe000002fd001d0050fc00011008fc000040fc000008fd000001f9000001fe000002fd000b0050fe000007deff01c00027fb002304
488809088d15210106240210080004400048001502010223060000800082000c500000290020fc002304488809088d15210106240210080004400048001502010223060000800082000c500000290050fc002304488809088d15210106240210080004400048001502010223060000800082000c500000290070fc00230448
8809088d15210106240210080004400048001502010223060000800082000c500000290050fc002304488809088d15210106240210080004400048001502010223060000800082000c500000290070fc002104488809088d15210106240210080004400048001502010223060000800082000c50ff0009fd000007deff01c0
0020fc000001fa000020fa0014010000120020004048000003000004010000020000220070fd000001fa000020fa0014010000120020004048000003000004010000020000220040fd000001fa000020fa0014010000120020004048000003000004010000020000220060fd000001fa000020fa0014010000120020004048
000003000004010000020000220040fd000001fa000020fa00140100001200200040480000030000040100000200000b0040fe000007deff01c00028fc0024a02800404010200008400080000010080240002021880800200000100020010880418000002a0070fd0024a02800404010200008400080000010080240002021
880800200000100020010880418000002a0040fd0024a02800404010200008400080000010080240002021880800200000100020010880418000002a0060fd0024a02800404010200008400080000010080240002021880800200000100020010880418000002a0040fd0024a0280040401020000840008000001008024000
2021880800200000100020010880418000000b0070fe000007deff01c00024fa000a820010400004c008201044fd0006a6000400000102fd000662000020040204ff0024fa000a820010400004c008201044fd0006a6000400000102fd0008620000200402040000260060fb000a820010400004c008201044fd0006a60004
00000102fd0008620000200402040000260050fb000a820010400004c008201044fd0006a6000400000102fd0008620000200402040000260050fb000a820010400004c008201044fd0006a6000400000102fd0008620000200402040000260050fb000a820010400004c008201044fd0006a6000400000102fd0008620000
2004020400000b0070fe000007deff01c0000dfa000301000002fb000001e9000f0020fb000301000002fb000001e9000f0050fb000301000002fb000001e9000f0040fb000301000002fb000001e9000f0040fb000301000002fb000001e9000b0070fe000007deff01c00028fc0024022004030450001016004001c02806
369020a0101a404280048180c49001222052000100002a0020fd0024022004030450001016004001c02806369020a0101a404280048180c49001222052000100002a0050fd0024022004030450001016004001c02806369020a0101a404280048180c49001222052000100002a0070fd0024022004030450001016004001c0
2806369020a0101a404280048180c49001222052000100002a0050fd0024022004030450001016004001c02806369020a0101a404280048180c49001222052000100000b0050fe000007deff01c00002d700a0008c310002000100b5001038a10096000c010000000200000000000000a1009a0008fffd0000000300000100
0a00050002000f000a2c000c00150948656c76657469636103001504010d00082e0004000001002b030c0159a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a000e00020018000a2a090157a00097a10096000c010000000200000000000000a1009a0008fffd0000000300000100
0a00150002001f000a2a070156a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a001f00020029000a2a0a0154a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a002700020031000a2a080153a00097a10096000c01000000020000000000
0000a1009a0008fffe00000003000001000a00300002003a000a2a090152a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a003900020043000a2a090151a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00420002004c000a2a090150a0
0097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a004a00020054000a2a08014da00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a00530002005d000a2a090148a00097a10096000c010000000200000000000000a1009a0008fffe0000000300
0001000a005c00020066000a2a09014ca00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00640002006e000a2a08014ba00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a006e00020078000a2a0a0149a00097a10096000c01000000020000
0000000000a1009a0008fffe00000003000001000a007600020080000a2a080148a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00800002008a000a2a0a0147a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a008800020092000a2a08
0146a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00900002009a000a2a080145a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a0099000200a3000a2a090144a00097a10096000c010000000200000000000000a1009a0008fffe0000
0003000001000a00a2000200ac000a2a090143a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00aa000200b4000a2a080141a00097a0008da00083ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb40\sa120\sl240\tx1140 \f21\fs20 Figure 12.1\tab Typical graphical output from "Search for exact matches" in which the position of each matching string (here individual amino acid types) is marked.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Searching for percentage matches to sequences\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find percentage matches".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par
3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par
4.\tab Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Percent match". The search is performed, the results are presented graphically, the number of matches displayed, and the scores and positions of the top 10 matches displayed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define the number of matches to "Display". For the number of matches chose
n the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round to step 3.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Searching for sequences using a score matrix\par
\pard\plain \s4\qj\sa120\sl280 \f20
A score matrix gives a score for the alignment of each possible pair of sequence symbols. This method is more sensitive than the simple percentage match search. The default matrix MDM78 used by this program is shown in figure 12.2.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find matches using a score matrix".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par
3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default. The program displays the minimum and maximum possible scores for the string.\par
5.\tab Define "Score". The search is performed, the results are presented graphically, the number of matches displayed, and the scores and positions of the top 10 matches displayed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab
Define the number of matches to "Display". For the number of matches chosen the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round
to step 3. An example run is shown in figure 12.3.\par
\pard\plain \li220\ri280\sb200\sl220\box\brsp100\brdrth \f4\fs16 C S T P A G N D E Q B Z H R K M I L V F Y W - X ? \par
\pard \li220\ri280\sl220\box\brsp100\brdrth C 22 10 8 7 8 7 6 5 5 5 5 5 7 6 5 5 8 4 8 6 10 2 10 10 10 10\par
S 10 12 11 11 11 11 11 10 10 9 10 10 9 10 10 8 9 7 9 7 7 8 10 10 10 10\par
T 8 11 13 10 11 10 10 10 10 9 10 10 9 9 10 9 10 8 10 7 7 5 10 10 10 10\par
P 7 11 10 16 11 9 9 9 9 10 9 10 10 10 9 8 8 7 9 5 5 4 10 10 10 10\par
A 8 11 11 11 12 11 10 10 10 10 10 10 9 8 9 9 9 8 10 6 7 4 10 10 10 10\par
G 7 11 10 9 11 15 10 11 10 9 10 10 8 7 8 7 7 6 9 5 5 3 10 10 10 10\par
N 6 11 10 9 10 10 12 12 11 11 12 11 12 10 11 8 8 7 8 6 8 6 10 10 10 10\par
D 5 10 10 9 10 11 12 14 13 12 13 12 11 9 10 7 8 6 8 4 6 3 10 10 10 10\par
E 5 10 10 9 10 10 11 13 14 12 12 13 11 9 10 8 8 7 8 5 6 3 10 10 10 10\par
Q 5 9 9 10 10 9 11 12 12 14 11 13 13 11 11 9 8 8 8 5 6 5 10 10 10 10\par
B 5 10 10 9 10 10 12 13 12 11 13 11 11 10 10 8 8 6 8 5 7 4 10 10 10 10\par
Z 5 10 10 10 10 10 11 12 13 13 11 14 12 10 10 8 8 8 8 5 6 4 10 10 10 10\par
H 7 9 9 10 9 8 12 11 11 13 11 12 16 12 10 8 8 8 8 8 10 7 10 10 10 10\par
R 6 10 9 10 8 7 10 9 9 11 10 10 12 16 13 10 8 7 8 6 6 12 10 10 10 10\par
K 5 10 10 9 9 8 11 10 10 11 10 10 10 13 15 10 8 7 8 5 6 7 10 10 10 10\par
M 5 8 9 8 9 7 8 7 8 9 8 8 8 10 10 16 12 14 12 10 8 6 10 10 10 10\par
I 8 9 10 8 9 7 8 8 8 8 8 8 8 8 8 12 15 12 14 11 9 5 10 10 10 10\par
L 4 7 8 7 8 6 7 6 7 8 6 8 8 7 7 14 12 16 12 12 9 8 10 10 10 10\par
V 8 9 10 9 10 9 8 8 8 8 8 8 8 8 8 12 14 12 14 9 8 4 10 10 10 10\par
F 6 7 7 5 6 5 6 4 5 5 5 5 8 6 5 10 11 12 9 19 17 10 10 10 10 10\par
Y 10 7 7 5 7 5 8 6 6 6 7 6 10 6 6 8 9 9 8 17 20 10 10 10 10 10\par
W 2 8 5 4 4 3 6 3 3 5 4 4 7 12 7 6 5 8 4 10 10 27 10 10 10 10\par
- 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
X 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
? 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa300\sl240\tx1140 \f21\fs20 Figure 12.2\tab The amino acid score matrix MDM78.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Using weight matrices for searching protein sequences\par
\pard\plain \s4\qj\sa120\sl280 \f20
A weight matrix is the most sensitive way of defining a motif. It is a table of values that gives scores for each amino acid type in each position along a motif. For a motif of length 8 amino acids the weight matrix would be a table 8 positions long and, a
llowing for 26 amino acid symbols, 26 deep. The simplest way of choosing the values for the table is to take an alignment of all known
examples of the motif and to count the frequency of occurrence of each amino acid type at each position. These frequencies can be used as the table of weights. When the table is used to search a new sequence the program calculates a score for each position
along the sequence by adding or multiplying (see notes) the relevant values in the table. All positions that exceed some cutoff score are reported as matching the original set of motifs.\par
\pard \s4\qj\sa120\sl280 How can we select a suitable cutoff score? The simplest way is to ap
ply the weight matrix to all the known occurrences of the motif - i.e. the set of sequence segments used to create the table - and to see what scores they achieve. The cutoff can be selected accordingly. For convenience the weight matrix is stored as a fil
e along with its cutoff score, a title that is displayed when the file is read, and a few other values need by the program. A routine for creating weight matrix files from sets of aligned sequences is included in the program. When a search using the weight
matrix is performed the program will either list the matching sequence segments or plot their positions as for the other motif search methods.\par
\pard\plain \li2000\ri2260\sb200\sl220\box\brsp100\brdrth \f4\fs16 Find matches using a score matrix\par
\pard \li2000\ri2260\sl220\box\brsp100\brdrth ? Keep picture (y/n) (y) =\par
? String=ALPHA\par
Minimum score= 23 Maximum score= 72\par
? Score (23-72) (72) =60\par
\par
For score 60 the number of matches= 5\par
Scores 62 62 62 61 61\par
Positions 120 217 420 54 326\par
? Display (0-5) (0) =\par
\par
120\par
PLDHD\par
* *\par
ALPHA\par
1\par
\par
217\par
ALANT\par
**\par
ALPHA\par
1\par
\par
420\par
QLDHG\par
* *\par
ALPHA\par
1\par
\par
54\par
SLPGN\par
**\par
ALPHA\par
1\par
\par
326\par
ALPII\par
***\par
ALPHA\par
1\par
? Keep picture (y/n) (y) =\par
Default String=ALPHA\par
\pard \li2000\ri2260\sl220\keepn\box\brsp100\brdrth ? String=!\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa420\sl240\tx1140 \f21\fs20 Figure 12.3\tab An example of the listed output from "Search using a score matrix".\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.4.1\tab Creating a weight matrix file from a set of aligned sequences\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par
2.\tab Select "Make weight matrix".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
Define "Name of aligned sequences file". We assume the file of aligned sequences has already been created (see note 5). The program reads and displays the contents of the file numbering each sequence as it goes. Then it displays the length of the longes
t sequence.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Sum logs of weights". The alternative is to sum the weights when calculating scores (see note 6). \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Accept "Use all motif positions". The alternative allows the user to define a "mask" which i
dentifies positions within the motif that should be ignored when the matrix is created (see note 7). The program now calculates the weights and applies them in turn to each of the sequences in the file. The number and score for each sequence is displayed,
followed by the top, bottom and mean scores and the standard deviation. In addition the mean plus and minus 3 standard deviations is displayed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Cutoff score". The default is the mean minus 3 standard deviations, but users may, for example, decide to use the lowest score obtained by the sequences in the file.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Top score for scaling plots". This parameter is used by the graphics output routine when scaling the plots. Its value will influence the height of lines plotted to represent matches.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab
Define "Position to identify". When a search is performed it is not always appropriate to report the position of a match relative to the leftmost amino acid in the motif. For example when performing a helix-turn-helix motif search we may want to know
the position of the well conserved glycine rather than the position of the first amino acid in the matrix. The "Position to identify" allows the user to define which amino acid is marked. The amino acids in the table are number 1,2,3 and so on.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define a "Title". This is a title that will be displayed when the matrix file is read prior to performing a search. It is limited to 60 characters.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab Define "Name for new weight matrix file". Give a name for the weight matrix file.\par
\pard\plain \s4\qj\sa120\sl280 \f20 See the example run in figure 12.4.\par
\pard\plain \li1240\ri1180\sb300\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Motif search using weight matrix\par
\pard \li1240\ri1180\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select operation\par
X 1 Use weight matrix\par
2 Make weight matrix\par
3 Rescale weight matrix\par
? Selection (1-3) (1) =2\par
? Name of aligned sequences file=atpbinding.seq\par
1 GETLGIVGESGSG\par
2 GESLGVVGESGGGKSTFAR OppF\par
3 GDVISIDGSSGSGKSTFLR HisP\par
4 GEFVVFVGPSGGGKSTLLR MalK E. coli\par
5 NQVTAFIGPSGGGKSTLLR PstB\par
6 GRVMALVGENGAGKSTMMK RbsA(N)\par
7 GEVIGIVGRSGSGKSTLTK HlyB\par
8 GECFGLLGPNGAGKSTITR NodI R. leguminosarum\par
9 GEMAFLTGHSGAGKSTLLK FtsE E. coli\par
10 GQRELIIGDRQTGKTALAI ATPase\par
11 GGKVGLFGGAGVGKTVNMM ATPase\par
12 GRIVEIYGPESSGKTTLTL RecA\par
13 RSNLLVLAGAGSGKTRVLV UvrD\par
14 GGKIGLFGGAGVGKTVGIM ATPase Bovine\par
15 SKIIFVVGGPGSGKGTQCE Adenylate Kinase Rabbit\par
16 NQSILITGESGAGKTVNTK Myosin Rabbit\par
17 HVNVGTIGHVDHGKTTLTA EF-Tu E. coli\par
18 YRNIGISAHIDAGKTTERI EF-G E. coli\par
19 EYKLVVVGARGVGKSALTI v-ras (HARVEY)\par
20 EYKLVVVGASGVGKSALTI v-ras (KIRSTEN)\par
21 EYKLVVVGAVGVGKSALTI pEJ BLADDER CARCINOMA TRANSFORMING\par
22 EYKLVVVGAGGVGKSALTI pEJ BLADDER CARCINOMA CELLULAR\par
Length of motif 19\par
? Sum logs of weights (y/n) (y) =\par
? Use all motif positions (y/n) (y) =\par
Applying weights to input sequences\par
1 -36.651 GETLGIVGESGSGKSQSLR\par
2 -35.780 GESLGVVGESGGGKSTFAR\par
3 -38.180 GDVISIDGSSGSGKSTFLR\par
4 -35.403 GEFVVFVGPSGGGKSTLLR\par
5 -39.039 NQVTAFIGPSGGGKSTLLR\par
6 -40.653 GRVMALVGENGAGKSTMMK\par
7 -34.017 GEVIGIVGRSGSGKSTLTK\par
8 -37.454 GECFGLLGPNGAGKSTITR\par
9 -36.474 GEMAFLTGHSGAGKSTLLK\par
10 -43.431 GQRELIIGDRQTGKTALAI\par
11 -40.210 GGKVGLFGGAGVGKTVNMM\par
12 -40.720 GRIVEIYGPESSGKTTLTL\par
13 -45.143 RSNLLVLAGAGSGKTRVLV\par
14 -40.684 GGKIGLFGGAGVGKTVGIM\par
15 -45.197 SKIIFVVGGPGSGKGTQCE\par
16 -39.098 NQSILITGESGAGKTVNTK\par
17 -43.832 HVNVGTIGHVDHGKTTLTA\par
18 -44.817 YRNIGISAHIDAGKTTERI\par
19 -36.305 EYKLVVVGARGVGKSALTI\par
20 -35.101 EYKLVVVGASGVGKSALTI\par
21 -36.305 EYKLVVVGAVGVGKSALTI\par
22 -36.711 EYKLVVVGAGGVGKSALTI\par
Top score -34.017 Bottom score -45.197\par
Mean -39.146 Standard deviation 3.441\par
Mean minus 3.sd -49.470 Mean plus 3.sd -28.822\par
? Cutoff score (-999.00-9999.00) (-49.47) =\par
? Top score for scaling plots (-49.47-999.00) (-28.82) =\par
? Position to identify (0-19) (1) =13\par
? Title=ATP binding motif\par
\pard \li1240\ri1180\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth ? Name for new weight matrix file=atpbinding.wts\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa320\sl240\tx1140 \f21\fs20 Figure 12.4\tab An example run of the creation of a weight matrix from a set of aligned sequences.\par
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.4.2\tab Searching using a weight matrix\par
\pard\plain \s4\qj\sa120\sl280 \f20 Once a weight matrix has been stored in a file it can be used to search any sequence. Results can be displayed graphically or the matching sequence segments can be listed out with their scores.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par
2.\tab Select "Use weight matrix".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Motif weight matrix file". The name of the file containing the weight matrix. The program reads the file and displays its title.\par
4.\tab Accept "Use frequencies as weights". The alternative will use the weight matrix file as a definition of a "Membership of set" motif (see note 10).\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
Define "Cutoff score". The default will be the value set when the weight matrix file was created. If the score is negative the program will calculate sums of logs of frequencies, otherwise it will add frequencies.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Plot results". Alternatively they will be listed.\par
The results will appear.\par
\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \page 3.\tab Notes\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab The files containing the definitions of peptides that can be be searched for by the exact match search routine have the following format. Each name is followed by a /, th
en each of its peptide sequences is followed by a /. The last peptide sequence for each name is followed by //. For example a file might contain the following.\par
\pard \s7\qj\li1720\sb200\sa120\sl280\tx1880 Acidic/D/E//\par
\pard \s7\qj\li1720\sa120\sl280\tx1880 Basic/R/K/H//\par
Glyco/N-S/N-T//\par
\pard \s7\qj\fi-560\li560\sb200\sa120\sl280\tx560 \tab Users could then search for these named sets of sequences. Note that the symbol - matches any amino acid.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab To search for a subset of the names in a file employed by exact match routine the user should reject "Search for all names" and the program will ask for the names wanted and extract their sequences
from the file. Alternatively, if a user was always using the same subset, then a file containing only those names could be created. This file would then be selected as "Personal file" for "Input source".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
The exact match routine also allows names and their sequences to be entered on the keyboard. This is selected as "Keyboard" for "Input source", and the program will prompt for names and their sequences. In this way the routine can be used to search for
exact matches to any short sequence. \par
4.\tab For this pr
ogram a motif is a short segment of sequence of fixed length. More complex structures termed "patterns" which we define as sets of motifs separated by varying gaps, are covered in another chapter. The current chapter should be read before the chapter on pa
tterns. \par
5.\tab The files of aligned sequences used to make weight matrices have the following format. Each sequence should be on a separate line. The sequence should start in column 2 and is terminated by a new line or a space. Anything after the space is tr
eated as a comment. The files can be created by previous searches or using an editor.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab
The frequencies in the weight matrix can be used in two ways to calculate scores for sequences. Some users prefer to add the frequencies to give a total score, and others to multiply them by summing their logs. If we regard the frequencies as probabilit
ies then multiplication seems the correct procedure. The user chooses which method will be used when the weight matrix is created, however the choice can be overridden wh
en the matrix is used. If multiplication is selected then all results will presented as sums of logs.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab
Masking the weight matrix is particularly useful in cases where a limited number of examples of a motif are available, or when the motif may have several components. In the first case the limited number of examples may make the matrix unrepresentative o
f the motif because the amino acids in the unconserved positions may bias the results of searches. We stated that a motif might have several components\:
for example it might have both structural and specificity components. We may want to separate out the two parts and again masking provides such a facility.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab
The weight matrix handling routine contains a further option "Rescale weight matrix". If the user has edited a weight matrix to change the frequency values this provides a way of selecting a new cutoff score. It allows users to read in a set of aligned
sequences and a weight matrix and to apply the matrix to the set of sequences to see the range of scores achieved. A new weight matrix file contining the selected cutoff score is written to disk.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab
The program contains no hardwired motifs as we expect most sites that use the programs to accumulate their own libraries of motifs and patterns, and to use the PROSITE library, both of which users can employ by simply knowing the names of the correspond
ing files.\par
10.\tab The weight matrix search can also used as a "Membership of a set" search. This means that at each position in the motif, any amino acid type tha
t is non-zero in the weight matrix is counted as a match and scores a value 1. See the chapter on searching protein sequences for patterns.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4(1)}\:53-60.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 13. Using Patterns to Analyse Protein Sequences\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 1.1\tab Introduction to the PROSITE motif library\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Creating a pattern file containing a weight matrix motif and a membership of a set motif.\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.2\tab Searching a sequence using a pattern file\par
2.3\tab Comparing a sequence against a library of patterns including PROSITE\par
2.4\tab Searching libraries for patterns\par
2.5\tab Preparing the PROSITE motif library for use by the programs\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
4.\tab References\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20 Here we describe one of the most powerful facilities provided by the program PIP\: the ability to d
efine and search sequences or libraries of sequences for complex patterns of motifs. In another chapter we give details of seaching for individual motifs but here we show how to create individual patterns and libraries of patterns and to use them to searc
h sequences. Once a pattern has been defined and stored in a file it can used to search any sequence. In addition if users want to routinely screen sequences against libraries of patterns this can be achieved by use of files of file names. For example, the
program can use the PROSITE protein motif library. The program can produce several alternative forms of output. It will display the segment of sequence matching each individual motif in the pattern, display all the sequence between and including the two o
utermost motifs, produce a description of the match in the form of a SWISSPROT feature table, or draw a simple graphical plot.\par
\pard \s4\qj\sa120\sl280 Towards the end of the chapter we describe how a related program PIPL is used to search libraries of sequences to find patterns. This program can produce alignments of sequence families.\par
\pard \s4\qj\sa120\sl280
Patterns are defined as sets of motifs with variable spacing. Each motif in a pattern can be defined using any of several methods, and their positions relative to one other are defined in terms of minimum and maximum separations. In addition, by the use of
logical operators, each motif can be declared to be essential (the AND operator), optional (the OR operator), or forbidden (the NOT operator). The following methods (termed "classes" by the program) for defining motifs are provided\:
1) exact match to a short sequence; 2) percentage match to a short sequence; 3) match to a short sequence using a score matrix and cutoff score; 4) match to a weight matrix; 5) direct repeat; 6) membership of a set. \par
\pard \s4\qj\sa120\sl280
The motifs in a pattern are numbered sequentially and motif spacing is defined in the following way. When a new motif is added to a pattern the user specifies the "Reference motif" by its number and then a "Relative start position". The "Relative start pos
iti
on" is defined by taking the first base of the "Reference motif" as position 1, the next as 2, and so on. Then the user defines the allowed variation in the spacing by specifying the "Number of extra positions". Notice that the position of a motif can be d
efined relative to any other motif, and that a negative "Relative start position" declares the motif to be to the left of its "Reference motif".\par
\pard \s4\qj\sa120\sl280 The probability of finding each individual motif in the current sequence, the product of the probabilities for
all the motifs in a pattern "Probability of finding pattern", and the "Expected number of matches" is calculated and displayed by the program. In addition to the cutoffs used for the individual motifs, users can apply two pattern cutoffs\:
"Maximum pattern probability" and "Minimum pattern score".\par
\pard \s4\qj\sa120\sl280 Below we describe\: how to create a pattern; how to use a pattern file to search a sequence; how to use a "File of pattern file names" to search a sequence for a whole library of patterns; how to use a pattern file
to search a whole library of sequences; how to reformat the PROSITE motif library into a form compatible with these search programs. To describe how to create a pattern file we first show all the steps to make one containing two motifs, and then, to save
space, the parts specific to the individual motif types are sketched in the notes section.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.1\tab Introduction to the PROSITE motif library\par
\pard\plain \s4\qj\sa120\sl280 \f20 A library of protein motifs (in our terminology, because they include variable gaps, many would be called patterns) has
recently become available from Amos Bairoch, Departement de Biochimie Medicale, University of Geneva. Currently it contains over 500 patterns/motifs and arrives on tape or cdrom in two files\:
a .DAT file and a .DOC file. There is also a user documentation file PROSITE.USR. Here we outline the library structure and what is required to prepare the PROSITE library for use by our programs. A typical entry in the .DAT file is shown in figure 13.1.
\par
\pard \s4\qj\sa120\sl280 Each entry has an accession number (in figure 13.1 PS00197), a pattern definition (in figure 13.1 C-x(1,2)-[STA]-x(2)-C-[STA]-\{P\}-C) and a documentation file cross reference (in figure 13.1 PDOC00175). This pattern means\:
C, gap of 1 or 2, any of STA, gap of 2, C, any of STA, not P, C.\par
\pard \s4\qj\sa120\sl280
We need to convert all of these patterns into our pattern definitions (as membership of a set, with the appropriate gap ranges) and write each into a separate pattern file with corresponding "membership of a set" weight matrices. After the conversion each
pattern file is named accession_number.pat (here PS00197.PAT). The corresponding matrix files are accession_number.wtsa, accession_number.wtsb, etc for however many are needed (here PS00197.WTSA and PS00197.WTSB)\:
two are needed because of the variable gap.\par
n addition we can optionally split the .DAT and .DOC files into separate files, one for each entry, with names accession_number.dat and accession_number.doc. Also we create an index for the library which gives a one line description of each pattern, and en
ds with the pattern file and do
cumentation file numbers. The start of the file is shown in figure 13.2. So, refering to figure 13.2, the name of the pattern file for Glycosaminoglycan attachment site is PS00002.PAT, and for the documentation file PDOC00002.DOC\par
\pard \s4\qj\sa120\sl280
Finally we create a file of file names for all the patterns in the library. If this file of file names is PROSITE.NAM then to use the complete PROSITE library from program PIP, users select "pattern searcher" and choose the option "use file of pattern file
names", and give the file name PROSITE.NAM. For any matches found, the accession number and pattern title will be displayed.\par
\pard\plain \li360\ri440\sl220\pagebb\box\brsp40\brdrth \f4\fs16 ID 2FE2S_FERREDOXIN; PATTERN.\par
\pard \li360\ri440\sl220\box\brsp40\brdrth AC PS00197;\par
DT APR-1990 (CREATED); APR-1990 (DATA UPDATE); APR-1990 (INFO UPDATE).\par
DE 2Fe-2S ferredoxins, iron-sulfur binding region signature.\par
PA C-x(1,2)-[STA]-x(2)-C-[STA]-\{P\}-C.\par
NR /RELEASE=14,15409;\par
NR /TOTAL=69(69); /POSITIVE=63(63); /UNKNOWN=0(0); /FALSE_POS=6(6);\par
NR /FALSE_NEG=5(5);\par
CC /TAXO-RANGE=A?EP?; /MAX-REPEAT=1;\par
CC /SITE=1,iron_sulfur; /SITE=5,iron_sulfur; /SITE=8,iron_sulfur;\par
DR P15788, FER$APHHA , T; P00250, FER$APHSA , T; P00223, FER$ARCLA , T;\par
DR P00227, FER$BRANA , T; P07838, FER$BRYMA , T; P13106, FER$BUMFI , T;\par
DR P00247, FER$CHLFR , T; P07839, FER$CHLRE , T; P00222, FER$COLES , T;\par
DO PDOC00175;\par
\pard \li360\ri440\sl220\keepn\box\brsp40\brdrth //\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 13.1\tab A typical entry from the PROSITE library\par
\pard\plain \li440\ri480\sb300\sl220\box\brsp100\brdrth \f4\fs16 IN-glycosylation site. 00001,00001\par
\pard \li440\ri480\sl220\box\brsp100\brdrth Glycosaminoglycan attachment site. 00002,00002\par
Tyrosine sulfatation site. 00003,00003\par
\pard \li440\ri480\sl220\keepn\box\brsp100\brdrth cAMP- and cGMP-dependent protein kinase phosphorylation site. 00004,00004\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 13.2\tab The start of the index created by the conversion program\par
\pard\plain \s4\qj\sa120\sl280 \f20
In order to make the PROSITE library useable by the search programs it is only necessary to run a program named SPLITP3. Two other programs, SPLITP1 and SPLITP2, only make the original files marginally easier to manage and produce an index. SPLITP1 split
s the PROSITE.DAT file to create a separate file for each entry. Each file is automatically named PSentry_number.DAT. In addition it creates an index for the library (see above).\par
\pard \s4\qj\sa120\sl280 SPLITP2 performs the same operation for the PROSITE.DOC file, except that no index is created. Files are named PSentry_number.DOC.\par
\pard \s4\qj\sa120\sl280
SPLITP3 creates a separate pattern file and weight matrix files for each PROSITE entry from the file PROSITE.DAT. Pattern files are named PSentry_number.PAT, weight matrix files PSentry_number.WTSA, PSentry_number.WTSB, etc. The pattern title is the one li
ne description of the motif. SPLITP3 also creates a file of file names. Notice that it will ask for a path name so that the path can be included in the file of file names. This is the path to the directory in which the pattern files are stored\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\fi-560\li560\sb240\sa60\sl280\tx560\tx920 \b\f20 2.1\tab Creating a pattern file containing a weight matrix motif and a membership of a set motif.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par
2.\tab Select "Pattern definition mode" as "Use keyboard".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select "Results display mode" as "Inclusive". The alternatives are listed in the introduction.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Motif definition mode" as "Weight matrix"\par
5.\tab Define "Motif name". Each motif can be given an 8 character name\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Weight matrix file name". Type in the name of the file containing the weight matrix. The program will display the probability of finding the motif.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Select "Motif definition mode" as "Membership of a set".\par
8.\tab Define "Motif name".\par
9.\tab Select "Logical operator" as "AND". The alternatives are "OR" and "NOT".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Select "Number of reference motif". At this stage the only choice is 1 and this is the default.\par
11.\tab Define "Relative start position". The base position relative to the "Reference motif". See the introduction.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 12.\tab Define "Number of extra positions".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 13.\tab Select input mode as "Keyboard". The alternative is an existing file in the form of a weight matrix.\par
14.\tab Define "String". Type in the sets of allowed residue types using the one letter code. See note 1\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 15.\tab Define the "Minimum matches". This is the number of positions within the motif that must match. The default is that
all positions must match but users may want to allow some flexibility by giving a lower score.\par
\tab The program now cycles round to step 7 and all subsequent passes round the loop to add further motifs to the pattern would differ only in the details for the different motif "classes".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16.\tab Select "Pattern complete"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 17.\tab Accept "Save pattern in a file". The alternative does not save the pattern and so it can only be used once on the current sequence.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 18.\tab Define "Pattern definition file". Give a name for the new file.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 19.\tab "Define "Pattern title". All patterns can have a 60 character title that can be displayed when the pattern file is read and the sequence searched.\par
20.\tab Define "Weight matrix file name". The membership of a set motifs are stored in the form of weight matrices, and so the program needs the user to define a file name.\par
21.\tab Define "Title". Type in a title for the weight matrix like file. The title will be displayed when the file is read.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab The program will now display a detailed textual description of the pattern, the "Probability of finding the pattern" and the "Expected number of matches" (see figure 13.3).\par
22.\tab Define "Maximum pattern probability". Yes maximum\: any match with a greater probability of being found will be rejected. If no value is specified the search will be quicker (see notes).\par
\pard\plain \li1240\ri1360\sl220\pagebb\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Pattern searcher\par
\pard \li1240\ri1360\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select pattern definition mode\par
X 1 Use keyboard\par
2 Use pattern file\par
3 Use file of pattern file names\par
? Selection (1-3) (1) =1\par
Select results display mode\par
X 1 Motif by motif\par
2 Inclusive\par
3 Graphical\par
4 SWISSPROT feature table\par
? Selection (1-4) (1) =2\par
Select motif definition mode\par
X 1 Exact match\par
2 Percentage match\par
3 Cut-off score and score matrix\par
4 Cut-off score and weight matrix\par
5 Direct repeat\par
6 Membership of set\par
7 Pattern complete\par
? Selection (1-7) (1) =4\par
? Motif name=atp\par
? Weight matrix file name=atpbinding.wts\par
ATP binding\par
Probability of score -47.8010 = 0.302E-04\par
Select motif definition mode\par
1 Exact match\par
2 Percentage match\par
3 Cut-off score and score matrix\par
X 4 Cut-off score and weight matrix\par
5 Direct repeat\par
6 Membership of set\par
7 Pattern complete\par
? Selection (1-7) (4) =6\par
? Motif name=hydro\par
Select logical operator\par
X 1 And\par
2 Or\par
3 Not\par
? Selection (1-3) (1) =\par
? Number of reference motif (1-1) (1) =\par
? Relative start position (-1000-1000) (20) =22\par
? Number of extra positions (0-1000) (0) =5\par
Select input mode\par
X 1 Keyboard\par
2 File\par
? Selection (1-2) (1) =\par
Separate sets with commas\par
? String=ivl,ivl,,,rkhde\par
? Minimum matches (1.00-5.00) (3.00) =\par
Probability of score 3.000 = 0.145E-01\par
Select motif definition mode\par
1 Exact match\par
2 Percentage match\par
3 Cut-off score and score matrix\par
4 Cut-off score and weight matrix\par
5 Direct repeat\par
X 6 Membership of set\par
7 Pattern complete\par
? Selection (1-7) (6) =7\par
? Save pattern in a file (y/n) (y) =\par
? Pattern definition file=_paper.pat\par
? Pattern title=atpbinding plus\par
? Weight matrix file name=_hydro.wts\par
Weight matrix needs a title\par
? Title=hydrophobic and + spot\par
Pattern description\par
atpbinding plus\par
Motif 1 named atp is of class 4\par
Which is a match to a weight matrix with score -47.801\par
Motif 2 named hydro is of class 6\par
Which is membership of a set with score 3.000\par
It is anded with the previous motif.\par
Probability of finding pattern = 0.4368E-06\par
Expected number of matches = 0.1350E-02\par
? Maximum pattern probability (0.00-1.00) (1.00) =\par
? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par
{\f22\fs18 162\par
} GQRELIIGDRQTGKTALAIDAIINQR\par
Total matches found 1\par
\pard \li1240\ri1360\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Minimum and maximum observed scores -38.35 -38.35\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa300\sl240\tx1140 \f21\fs20 Figure 13.3\tab The creation and use of a pattern containing a weight matrix motif and a membership of a set motif.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 23.\tab
Define "Minimum pattern score". A minimum pattern score only makes sense if all the motifs in the pattern are defined with compatible scoring methods. For example membership of a set motifs and weight matrices using sums of logs are incompatible. Searc
hing will now commence and any matches displayed using the chosen method. In figure 13.3 we show a typical run i
n which a pattern containing a weight matrix and a membership of a set motif is created and stored on disk. Figure 13.4 shows the contents of the pattern file. \par
\pard\plain \li2260\ri2380\sb200\sl220\box\brsp100\brdrth \f4\fs16 atpbinding plus \par
\pard \li2260\ri2380\sl220\box\brsp100\brdrth A4 atp Class \par
atpbinding.wts \par
A6 hydro Class \par
1 Relative motif\par
22 Relative start position\par
5 Number of extra positions\par
\pard \li2260\ri2380\sl220\keepn\box\brsp100\brdrth _hydro.wts \par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa40\sl240\tx1140 \f21\fs20 Figure 13.4\tab The pattern file created in the worked example shown in figure 13.3.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Searching a sequence using a pattern file\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par
2.\tab Select "Pattern definition mode" as "Use pattern file".\par
3.\tab Select "Results display mode" as "Inclusive"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
Define "Pattern definition file". Type the name of the file containing the pattern. The program will read the file then display its title, a detailed textual description of the pattern, the "Probability of finding the pattern", and the "Expected number
of matches".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Maximum pattern probability". \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab D
efine "Minimum pattern score". Searching will now commence and any matches displayed using the chosen method. Figure 13.5 shows a typical run using a pattern file and output in the form of a SWISSPROT feature table.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Comparing a sequence against a library of patterns including PROSITE\par
\pard\plain \s4\qj\sa120\sl280 \f20
This mode of operation allows a sequence to be searched, in turn, for any number of patterns each stored in a separate pattern file. The names of the files containing the individual patterns must be stored in a simple text
file. This file is called "a file of pattern file names" and its name is the only user input required to define the search. The file of file names could contain references to entries in the PROSITE motif library and also include the names of other patterns
.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par
2.\tab Select "Pattern definition mode" as "Use file of pattern file names".\par
3.\tab Select "Results display mode" as "Inclusive"\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "File of pattern file names". Type the name of the file containing the list of pattern file na
mes. The program will read the file and then, in turn, all the pattern files it names. Each of these patterns will be compared against the current sequence but only those that give matches will produce any output. The pattern title and each match will be d
isplayed.\par
\pard\plain \li1240\ri1360\sb320\sl220\box\brsp40\brdrth \f4\fs16 Pattern searcher\par
\pard \li1240\ri1360\sl220\box\brsp40\brdrth Select pattern definition mode\par
X 1 Use keyboard\par
2 Use pattern file\par
3 Use file of pattern file names\par
? Selection (1-3) (1) =2\par
? Pattern definition file=_paper.pat\par
Select results display mode\par
X 1 Motif by motif\par
2 Inclusive\par
3 Graphical\par
4 SWISSPROT feature table\par
? Selection (1-4) (1) =4\par
ATP binding sequences\par
Probability of score -47.8010 = 0.302E-04\par
hydrophobic and + spot\par
Probability of score 3.0000 = 0.145E-01\par
\par
Pattern description\par
\par
atpbinding plus\par
Motif 1 named atp is of class 4\par
Which is a match to a weight matrix with score -47.801\par
Motif 2 named hydro is of class 6\par
Which is membership of a set with score 3.000\par
It is anded with the previous motif.\par
Probability of finding pattern = 0.4368E-06\par
Expected number of matches = 0.1350E-02\par
? Maximum pattern probability (0.00-1.00) (1.00) =\par
? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par
\par
FT atp 162 187 Program\par
\par
Total matches found 1\par
\pard \li1240\ri1360\sl220\keepn\box\brsp40\brdrth Minimum and maximum observed scores -38.35 -38.35\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 13.5\tab Worked example of using a pattern file to search a sequence, and writing the results in the form of a SWISSPROT feature table.\par
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.4\tab Searching libraries for patterns\par
\pard\plain \s4\qj\sa120\sl280 \f20 The program PIPL can be used to search whole sequence
libraries for patterns. Its use is similar to the pattern search routine described above, except that it does not have the facility for creating pattern files, so they must be created beforehand using PIP. In addition to its obvious application of finding
new occurrences of patterns or checking on their frequency it is a useful way of obtaining sequence alignments. It can restrict its search to a list of named entries or can search all but those on a list of entries. It can restrict its output to showing t
he highest scoring match in each sequence, but by default it will show all matches.\par
\pard \s4\qj\sa120\sl280
Of its modes of output two require further description. The first "Padded sections" creates a new file for each match. The file will contain the sequence between and including the two outermost motifs in the pattern. It will be gapped to the furthest exten
t defined by the pattern, which means that if all the files were subsequently written one above the other all the motifs in the pattern would be exactly aligned, with the s
ections between them containing the requisite numbers of padding characters. The second such mode of output is called "Complete padded sequences". Here the user must know the maximum distance between the leftmost motif and the start of all the sequences th
at match. A trial run in which only the positions of matches are reported is usually required. The user gives this maximum distance to the program. The program then writes a new file containing the full length of all matching sequences, again maximally gap
ped (including their left ends) so that they would all align if written above one another. For both of these modes of output the files created are named "entryname" where "entryname" is the name given to the sequence in the sequence library. These modes ar
e best used with the option "Report all matches" rejected, so that only the best match for each sequence is reported. The sequences can be lined up using the sequence assembly program SAP.\par
\pard \s4\qj\sa120\sl280 The searches, which have recently been recoded, are very rapid. For
example a search of the current SWISSPROT library for a pattern defining the globin family as 6 weight matrices with widely varying gaps, finds only globins and takes less than 4 minutes using a single processor on an Alliant FX2800. This time includes re
ading in the whole library as stored in EMBL CDROM format.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select PIPL.\par
2.\tab Define "Name for results file."\par
3.\tab Select a library.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Search whole library". The alternatives are "Search only a list of entries" and "Search all but a list of entries"
. The files containing the list of entries should contain one entry name per line, left justified.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Results display mode" as "Inclusive". The alternatives include "Motif by motif", "Scores only", "Complete padded sequences" and "Padded sections".\par
6.\tab Accept "Report all matches". The alternative only shows the best match for each sequence.\par
7.\tab Define "Pattern definition file". The name of the file containing the pattern created using PIP. \par
\tab The program displays a textual description of the pattern and the expected number of matches per 1000 residues assuming an average amino acid composition.\par
8.\tab Define "Maximum pattern probability". The program will run much more quickly if none is given.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Define "Minimum pattern score".\par
\pard\plain \s4\qj\sa120\sl280 \f20 The search will start.\par
A typical run is shown in figure 13.6\par
\pard\plain \li1120\ri1280\sb200\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 PIPL (Protein interpretation program (library)) V4.1 Jul 1991\par
\pard \li1120\ri1280\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Author\: Rodger Staden\par
Searches protein libraries for patterns of motifs\par
\par
? Name for results file=globin.res\par
Select a library\par
1 EMBL nucleotide library \par
X 2 SWISSPROT protein library \par
3 Personal file in PIR format \par
4 Personal file in FASTA format \par
? Selection (1-4) (2) =\par
Library is in EMBL format with indexes\par
Select a task\par
X 1 Search whole library \par
2 Search only a list of entries \par
3 Search all but a list of entries \par
? Selection (1-3) (1) =\par
Select results display mode\par
X 1 Motif by motif \par
2 Inclusive \par
3 Scores only \par
4 Complete padded sequences\par
5 Padded sections \par
? Selection (1-5) (1) =5\par
? (y/n) (y) Report all matches n\par
? Pattern definition file=globin.pat\par
globin 1 \par
Probability of score -34.5300 = 0.197E-02\par
globin 2 \par
Probability of score -44.6000 = 0.409E-02\par
globin 3 \par
Probability of score -75.1000 = 0.293E-01\par
globin 4 \par
Probability of score -36.1000 = 0.147E-01\par
globin 5 \par
Probability of score -73.7000 = 0.375E-01\par
globin 6 \par
Probability of score -55.9000 = 0.483E-01\par
\par
Pattern description\par
Globin pattern file \par
Motif 1 named g1 is of class 4\par
Which is a match to a weight matrix with score -34.530\par
Motif 2 named g2 is of class 4\par
Which is a match to a weight matrix with score -44.600\par
and the N-terminal residue can take positions 17 to 22\par
relative to the N-terminal end of motif 1\par
It is anded with the previous motif.\par
Motif 3 named g3 is of class 4\par
Which is a match to a weight matrix with score -75.100\par
and the N-terminal residue can take positions 27 to 35\par
relative to the N-terminal end of motif 2\par
It is anded with the previous motif.\par
Motif 4 named g4 is of class 4\par
Which is a match to a weight matrix with score -36.100\par
and the N-terminal residue can take positions 29 to 53\par
relative to the N-terminal end of motif 3\par
It is anded with the previous motif.\par
Motif 5 named g5 is of class 4\par
Which is a match to a weight matrix with score -73.700\par
and the N-terminal residue can take positions 12 to 16\par
relative to the N-terminal end of motif 4\par
It is anded with the previous motif.\par
Motif 6 named g6 is of class 4\par
Which is a match to a weight matrix with score -55.900\par
and the N-terminal residue can take positions 29 to 33\par
relative to the N-terminal end of motif 5\par
It is anded with the previous motif.\par
Probability of finding pattern = 0.6273E-11\par
Expected number of matches per 1000 residues = 0.2119E-03\par
? Maximum pattern probability (0.00-1.00) (1.00) =\par
\pard \li1120\ri1280\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth ? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 13.6\tab A typical run of PIPL using a pattern of 6 weight matrices to search the SWISSPROT library.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Preparing the PROSITE motif library for use by the programs\par
\pard\plain \s4\qj\sa120\sl280 \f20 Only the program SPLITP3 is essential for preparing the PROSITE library for use by our programs. \par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select SPLITP3\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Prosite library file". Type the name of the file containing the prosite library (usually PROSITE.DAT).\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
Define "Name for file of pattern file names". This is the file of file names that users will employ to search the whole library. It will be convenient for them if an environment variable is defined for this file name.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Path name of motif directory". This is the full path name, including the final /, to the directory in which the converted library will be stored.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
The "exact match" motif class requires a consensus sequence. The "percentage match" motif class requires a consensus sequence and a cutoff score. The "score matrix" motif class uses the MDM78 matrix and requires a consensus sequence and a cutoff score.
The "weight matrix" search only requires the name of the file containing the matrix. The "direct repeat" motif class requires a repeat length, the minimum and maximum gap between the t
wo occurrences of the repeat, and a minimum score. The "membership of a set" motif class defines sets of residue types that are allowed at each position in the motif. When they are first entered into the pattern they are normally typed on the keyboard, but
when they are stored in a file, they are written in the same format as a weight matrix. To enter them on the keyboard use the following format. Type the one letter codes for the set of residue types allowed at each position terminated by a comma (,). For
positions where any residue type is allowed simply type an extra comma. For example VLI,FY,,,DE means any of Valine, Leucine or Isoleucine in the first position, either Phenylalanine or Tyrosine in the next position, anything in the next two positions, and
Aspartic acid or Glutamic acid in the next. When the pattern is stored on the disk the program will request a name for the file and a title for the motif.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab The details of the probabilty calculations are outside the scope of this article. They are quite
rapid and are essential both for assessing the statistical significance of any matches found and for allowing meaningful cutoffs to be applied to patterns. Obviously, in general, cutoff scores are inappropriate for patterns containing a mixture of motif cl
asses.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
The program calculates the "Probability of finding the pattern" and the "Expected number of matches". The first figure is actually the product of the individual motif probabilities but the latter figure is more useful because it takes into accoun
t the allowed variation in spacing between motifs and the length of the current sequence. In both cases the composition of the current sequence is also used so that different probabilities would be calculated for other sequences.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
The pattern definition system is very flexible. Assume that a laboratory has a large library of patterns stored in its computer. Different groups or users may want to screen their sequences against different subsets of a pattern library. Each group ther
efore uses its own "File o
f pattern file names" which contains only the names of the pattern files that are relevant to their sequences. Of course a pattern may contain only one motif. Hence a library of patterns can include both simple and complex patterns. In the same way a labor
atory may have a large library of weight matrices defining different motifs and different users may want to combine them in different ways to produce their own patterns.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab Also, of course, a library does not have to be used solely for performing mass screenings\:
each individual entry can be used as a single pattern by giving the name of its pattern file - eg pathname/PS00002.PAT.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
Note that 5 of the PROSITE motifs contains the symbols > or < which means that the motifs must appear exactly at the N or C termini of the sequences. Currently our methods have no mechanism for such definitions and, for example KDEL motifs, will be perm
itted to occur anywhere throughout a sequence.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4(1)}\:53-60.\par
2.\tab Staden, R. 1989. Methods for calculating the probabilities of finding patterns in sequences. {\i CABIOS} {\b 5(2)}\:89-96.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 14. Comparing Sequences\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
2.\tab Methods\par
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Producing a dot matrix plot (or list) of exact matches\par
2.2\tab Producing a dot matrix plot using the proportional algorithm\par
2.3\tab Producing a dot matrix plot using the quick scan algorithm\par
2.4\tab Producing a list of all matching segments using the proportional algorithm\par
2.5\tab Calculating the expected scores for the proportional algorithm\par
2.6\tab Calculating the observed scores for the proportional algorithm\par
2.7\tab Producing an optimal alignment\par
2.8\tab Comparing a sequence against a library of sequences\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
4.\tab References\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we describe methods for comparing and aligning pairs of nucleic acid or protein
sequences. The program described (SIP), the original version of which was first described in 1982 (1), is based around several methods for producing "dot matrix" plots and includes routines for assessing the statistical significance of the plots, plus a d
ynamic programming algorithm for finding optimal alignments. At the end of the chapter we describe a program SIPL that is used for comparing a single sequence against a whole library of sequences.\par
\pard \s4\qj\sa120\sl280 We assume the reader is familiar with the general principl
e of dot matrix diagrams. The program uses a number of different algorithms to calculate the score for each point in a dot matrix and the user defines a minimum score so that only those points in the diagram for which the score is at least this value will
be marked with a dot. The first scoring method finds uninterrupted sections of perfect identity i.e. those that contain no mismatches, insertions or deletions. Generally this method, termed "the identities algorithm" is of limited value, but runs very qui
ckly. \par
\pard \s4\qj\sa120\sl280
The second method looks for sections where a proportion of the characters in the sequence are similar, again allowing no insertions or deletions. For a thorough analysis this method, termed "the proportional algorithm", is the best. The original method, o
f this type was first described by McLachlan (2) and involves calculating a score for each position in the matrix by summing points found when looking forwards and backwards along a diagonal line of a given length (the window). The algorithm does no
t simply look for identity but uses a score matrix that contains scores for every possible pair of characters. For comparing amino acid sequences we usually use the score matrix MDM78 (3) which is shown in figure 14.1.. It is also possible to use other ma
trices, including an identity matrix for proteins. For nucleic acids we usually use an identity matrix.\par
\pard\plain \li220\ri280\sl220\box\brsp100\brdrth \f4\fs16 C S T P A G N D E Q B Z H R K M I L V F Y W - X ? \par
\pard \li220\ri280\sl220\box\brsp100\brdrth C 22 10 8 7 8 7 6 5 5 5 5 5 7 6 5 5 8 4 8 6 10 2 10 10 10 10\par
S 10 12 11 11 11 11 11 10 10 9 10 10 9 10 10 8 9 7 9 7 7 8 10 10 10 10\par
T 8 11 13 10 11 10 10 10 10 9 10 10 9 9 10 9 10 8 10 7 7 5 10 10 10 10\par
P 7 11 10 16 11 9 9 9 9 10 9 10 10 10 9 8 8 7 9 5 5 4 10 10 10 10\par
A 8 11 11 11 12 11 10 10 10 10 10 10 9 8 9 9 9 8 10 6 7 4 10 10 10 10\par
G 7 11 10 9 11 15 10 11 10 9 10 10 8 7 8 7 7 6 9 5 5 3 10 10 10 10\par
N 6 11 10 9 10 10 12 12 11 11 12 11 12 10 11 8 8 7 8 6 8 6 10 10 10 10\par
D 5 10 10 9 10 11 12 14 13 12 13 12 11 9 10 7 8 6 8 4 6 3 10 10 10 10\par
E 5 10 10 9 10 10 11 13 14 12 12 13 11 9 10 8 8 7 8 5 6 3 10 10 10 10\par
Q 5 9 9 10 10 9 11 12 12 14 11 13 13 11 11 9 8 8 8 5 6 5 10 10 10 10\par
B 5 10 10 9 10 10 12 13 12 11 13 11 11 10 10 8 8 6 8 5 7 4 10 10 10 10\par
Z 5 10 10 10 10 10 11 12 13 13 11 14 12 10 10 8 8 8 8 5 6 4 10 10 10 10\par
H 7 9 9 10 9 8 12 11 11 13 11 12 16 12 10 8 8 8 8 8 10 7 10 10 10 10\par
R 6 10 9 10 8 7 10 9 9 11 10 10 12 16 13 10 8 7 8 6 6 12 10 10 10 10\par
K 5 10 10 9 9 8 11 10 10 11 10 10 10 13 15 10 8 7 8 5 6 7 10 10 10 10\par
M 5 8 9 8 9 7 8 7 8 9 8 8 8 10 10 16 12 14 12 10 8 6 10 10 10 10\par
I 8 9 10 8 9 7 8 8 8 8 8 8 8 8 8 12 15 12 14 11 9 5 10 10 10 10\par
L 4 7 8 7 8 6 7 6 7 8 6 8 8 7 7 14 12 16 12 12 9 8 10 10 10 10\par
V 8 9 10 9 10 9 8 8 8 8 8 8 8 8 8 12 14 12 14 9 8 4 10 10 10 10\par
F 6 7 7 5 6 5 6 4 5 5 5 5 8 6 5 10 11 12 9 19 17 10 10 10 10 10\par
Y 10 7 7 5 7 5 8 6 6 6 7 6 10 6 6 8 9 9 8 17 20 10 10 10 10 10\par
W 2 8 5 4 4 3 6 3 3 5 4 4 7 12 7 6 5 8 4 10 10 27 10 10 10 10\par
- 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
X 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
? 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 14.1\tab The amino acid score matrix MDM78.\par
\pard\plain \s4\qj\sa120\sl280 \f20
For the proportional method plotting dots at the centres of windows that reach the cutoff leads to a persistence effect that, to some extent, can be mitigated by a variation on the method. If, for example, all the high scoring amino acids are clustered at
the left end of a particular diagonal segment, dots will continue to be plotted to their right until the window score drops below the cutoff. Instead of plotting a single point for each window that reaches the cutoff score, the variant method plots p
oints for all the identities that lie in windows that reach the cutoff. Obviously the persistence effect can be more pronounced for long windows and low cutoff scores, but note that the variant method will plot nothing if there are no identities present, a
nd so similar regions could be missed! A further variant, useful for comparing a sequence against itself, ignores the main diagonal.\par
\pard \s4\qj\sa120\sl280 The third comparison method called "quick scan" is really a combination of the first two, and is similar to the FASTP prog
ram of Lipman and Pearson (4), but produces a dot matrix diagram. The algorithm is as follows. The dot matrix positions are found for all words of some minimum length (obviously length 1 is most sensitive) that are common to both sequences. Imagine a diago
nal line running from corner to corner of the diagram, at right angles to the diagonals in the dot matrix, The scores for the common words (according to the current score matrix, e.g. MDM78) are accummulated at the appropriate positions on that imaginary l
ine, hence producing a histogram. The histogram is analysed to find its mean and standard deviation. The diagonals that lie above some cutoff score (defined in standard deviation units), are rescanned using the proportional algorithm, and a diagram produce
d. The method is very fast, and is also employed by the library comparison program (see below).\par
\pard \s4\qj\sa120\sl280 \par
\pard \s4\qj\sa120\sl280 The dynamic programming alignment algorithm contained in the program is based on that of Myers and Miller (5). It guarantees to produce alignments with the opt
imum score given a score matrix, a gap start penalty, and a gap extension penalty. It is very useful to have the dot matrix methods and the alignment routine together in the same program because it allows users to produce a dot matrix diagram to help selec
t which regions of the sequence they wish to align. Selection is made by use of the crosshair. The crosshair is positioned first at the bottom left hand end of the segment to be aligned and then at the top right of the segment. When the alignment routine i
s selected the segment will be aligned. The alignment can replace the original segment of the sequence. By repeated plotting of dot matrices, followed by alignment, very long sequences can easily be aligned. \par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Producing a dot matrix plot (or list) of exact matches\par
\pard\plain \s4\qj\sa120\sl280 \f20 This method is relatively fast and can be useful for very similar sequences. It marks the position of every exact match of some minimum length with a dot or lists out the matching segments.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Apply identities algorithm".\par
2.\tab Define "Identity score". \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
Select "Plot or List". The plot will appear as in figure 14.2, which shows a comparison of two protein sequences using a score of 2. Listed output displays the matching segments and defines their positions. \par
\pard\plain \li1700\sb300\sl220\keepn \f4\fs16 {{\pict\macpict\picw283\pich299
112800000000012b011b001102ff0c00fffe0000003cb4bc003cb4bc0000000000fc00ef000000000001000a0000000000fc00ef0098801e0000000000fc00ef0000000000000000003cb4bc003cb4bc00000001000100010000000000000000000000000048c23f000000010000ffffffffffff0001000000000000000000
0000fc00ef0000000000fc00ef000002e30006003fe5ff00f80f0020f6000020f8000020fc000104080d0020fa000302000004f0000048060020e50000081b0020fe00042000000802fb0002100002fe000040fd00031000000817012008fc000004fd000020fe00011001f9000301000008100020fb000001f600018080fa
00011008150020fe00014002f5000308000004fd000304000008060020e5000008110020f2000312010008fe000020fc0000080e0020f3000020fe000040f8000008060020e50000080d0320000020f3000001f7000008130020fb000040fa000080f90005200000100008110320000080f700042800000440f70000080b04
2002000001ea00010808110020f7000080f90002012048fc000102080f0020fc000001f60000a0f8000102080a0020fb000020ec0000481c05200010000001fe000010fd000008fc0002010004fe0003800000080c012001f800010202f10000080b0020ea000002fe00010408140320000020f6000302000004fb000008fe
00000816022010c0f9000020fc00040200100008fc0002200008130020fc000002fe000080fa000010f800010808150320000010fe000002f9000048fd000020fa000008160020fd0005100000040080fa000008fa0003040080080c0020f20002040080f7000008140020fe00010104f600010440fb000001fe0000081200
20f700041000808010f8000010fe0000080a0020ef000008f80000080a0020fa000010ed000008110020f3000040fb000710100000080000080a0020ee000010f90000080c012080fa00010802ef000008110020f6000040f8000780008000080000080e0020f5000002fe000010f60000080e0020f5000002fe000030f600
0008180620200000100020fd00041081808010f8000010fe000008110020f6000020f800072000000400000408060020e50000080c0020f60002200004f300000814012410f60002200040fb000010fe000308000008100022f9000008f40006400400100010080a0020fd000080ea0000080c0020fd00010888ec00010108
110320800004fe00040400001810f0000008130020f900010404fe000001fe000001f7000008060020e5000008100022f9000008f4000640060010000408080020e700020200080e0320800008fc00010802ef0000080a0020f5000008f2000008150020fd000308000040f6000080fd000408000001280d02200020fd0001
0108ed000008150020fd000001fe000010fd000008f6000380000008160020f8000340000042fe00011002fe000002fb0000080a0020f3000080f40000080a0020f6000010f1000008080020e70002020008190020fe00014002fe0002100008f900044004000010fd000008140620200000100020fc000081f5000004fe00
00080c0020fb000080ee0002040008100020fb000080f200010240fe00010108150020fa00018001fc0002040080fe000008fa0000080a0030f7000002f0000008100020fe000080fe0002010001ef000088100020fb000080f20006024000000401080c0020fa000021ef0002200008160020fe00042000000802fb000010
fc0000c0fa0000080d0020f7000320000001f3000008150020fc000088fd000080f90002012008fc000102081a0320000020fa00070400400002000004fd0002080008fe0000080d0320800004fe000004ec000008130020fa00018001fc000004fd000020f9000008140020f30002400080fe00010810fe00030800000814
0620200000100020fc000081f5000004fe0000080e0020f5000080f8000001fc000028120020f8000008fe000004f9000002fc000008100020fb000080f200010240fe000101080d012008ec0002080002fe000008060020e5000008100020f5000080fa0002800001fc000028060020e50000080a0020ee000080f9000008
0a0020f0000004f7000008120020fe000002f70000c0fa000006fc00000811042000020002f1000080fd000320000008160020fd000004fc000020fc0002020004fa0002200008060020e5000008180020fe0004200000080afb000030fe0002400040fa00000816042000800040fe000040f5000010fe000020fe0000080a
0020ee000080f90000080f0020f00002080080fc000304000008140020fd00051000000400c0fa000018f800018008120030f7000002fb000080fe000080fb000008060020e50000080f05208010000008f3000080f9000008160020fe000080fe000080f9000020fa000401000200080e0020f5000008f6000004fe000008
11072000200008000001fb000040f30000080d0320000080f9000020f10000081c042000040001fc000320000804fe000004fd000702400001404000080a0020fc000010eb0000080e0020fa000010f7000004f80000080d0020fa000004f0000380004008160020fe00046002000802fb000010fc000044fa000008120020
fb000010fc000010fc000010f80000080a0020f5000001f2000008150020fd000411c8000020fe000302000020f5000008150020fc000080f90002010040fb00050400008000081e0320000002fe00071000002000081082fe00040210000002fd000280000812052010c0000004f5000340100008fa0000080b012802eb00
0004fd0000081b042000040001fc000320000804fc000a40020002400000404000080a0020fc000010eb000008090320000008e8000008120020f700042000000108fa000080fc000008060020e5000008100020fa000021f7000008fa0002200008180020fc000080f90005010040000004fe0005040400800008110020f2
000304000020fc00040420000008140020fe000080fc000002fe000010f50002100008140920020000011140000020fb000020f600010808140020fe000080fc000002fe000010f500021010080a0020f4000010f3000008060020e5000008120020fe000001f5000004fa000001fe0000081002200004fe000080f6000004
f7000008140020fd0005100000040080fa000008f8000180081b0320000002fb00042000081082fe000006fe000002fd00028000081c0020fc000040fd000610008080100010fe000004fe000010fe0000080b0020f200018080f6000008090320000010e8000008150320000010fe000002f9000048fd000020fa0000080a
0020f2000040f500000807012004e6000008140020f90002100008f9000040fe000010fd000008060020e5000008140320000020f6000302000004fb00000cfe0000080e022010c0f10002100008fa0000080e0020fd000040ef000008fd0000081402200020fe00040201000010f9000001f800000810012020f800010180
fb000002f8000008140020fe000080fe0002010001f8000080f9000088060020e5000008090320000080e80000080a0020f2000002f5000008070020e6000120080b042002001001ea00010808120020f9000008fb000020fb000081fc000008190020fe000080fe000080f9000020fd000304000001fe0000081103200000
80f7000028fe000040f700000811072000200040020001f4000004fa000008100020ef000640020000080002fe0000081a042000800040fe000040fa000004fd000010fe000020fe0000080d0020fa000004f0000380004008120020f8000001fc000001fc000010fb000008130020fb000001fd000010fb000080f9000110
08110320000004fe00040400001810f00000080d0320800004fe000004ec0000080e02200010f00002010004fb000008140022f900000cfd000002f9000640040010000008060020e5000008060020e5000008160020fc000020fe000040fb00040810000040fa000008130020fc000002fe000080fa000010f80001080811
072000200040020001f4000004fa000008160020fe0002200008fc000008fb0002400202fa0000080c0020ed00010208fc000140081402200004fe000080f4000004fd000004fe0000080f0020fe00014002f2000004fa000008180020fd00014002fe000080fa000010fc000008fe00010808130020fc000001f60004a000
000208fc000142080c0020eb0002080002fe000008060020e50000080c0020f4000040f50002040008180020fe0002020002fe000010fd000040fa000004fc0000081002200020fd000001f6000001f8000008120021fd000040fc000080f5000008fd000008190020fd000610000004008080fb00040800012008fc000182
081605200000010004fb000020fd000340008018f9000008150020fc000080f90002010040fb0005040000800008100022f9000008f40006500400100200080a0030f7000001f0000008110020f3000040fb000010fe000308000008130022f9000308000020f7000640040010000008060020e50000080f0020fc00010402
f000044000080008160020f5000010fe000020fe00080100200002010000080a0020f1000020f6000008110320000804fe00040400001810f0000008060020e50000081a0020fe000080fe000080fe000020fd000020fa000001fe000008130320000080f90002200028fe000040f70000080f0028f90002220008f3000304
021008110320000080f7000028fe000040f7000008140020fd00018001f60000a0fe000040fc000102080c0020fa000021ef0002200008110020fd000302000010fd000001f2000008140620000080000088fa000028fe000040f70000080a0020e9000004fe00000812012020f7000080fb000302000040fb0000080b0520
8010000008ea000008120020fd000411c8000020fb000020f5000008120020fc000010fb000080f8000001fc000028140020fc0005020002008004fb000010f800010848160020fe00044002000010fd000001fa000004fa0000080e0020fd000004f4000004f8000008120020fd0002040001f700014080f9000110081100
20fc000304000040f9000001f70000081605200000010004fb000030fd000340000010f9000008120020fe000002f7000040fa000004fc000008140020fd000040f7000304002020fb0003200000081302200004fe00018080f200010240fe00010108150020fd000001fe000010fd000008f60003800000080a0020f20000
80f50000081102200044fe00018220f5000040f90000080c02200040ed000004fc0000080d0020f0000008fa0003040000080e0020fe000008f8000040f3000008160320810004fe000004fe00010202f8000004fb0000080a0020f6000040f100000813012001f800018002fa0002012008fc000102080c0020ed00010208
fc000140080e0020fc000088f0000040fd0000080a0020e9000008fe0000080e0020fa000008f1000008fe0000081605200000010004fb000020fd000340400010f90000080a0020e9000004fe000008160020fe000080fe000080f90000a0fa000001fe000008090320040010e8000008060020e50000080a0020fa000008
ed0000081605200000010004fb000020fd000340400010f90000081c0320000002fb00042800081086fe000002fe00010202fe00028000080a0020f3000080f4000008120020fc000040f7000010fe000004fa0000080e0020fd000004f4000004f80000081b0020fe000620000008020008fe00010410fc0002400002fc00
00080a0020f2000008f5000008060020e5000008140020fb000008f9000304020020fb0003300000081408202000001000200021fe000081f30002200008190020fe00042000080802fe000308000010fc000042fa0000080f0020f90002220008f30003040000081002200020fd000001f6000001f80000080f012001fb00
0008fe000002f1000008190020fc000002fe000080fa000010fe000080fe0003080008081b0020fd0008100000040080000080fd000008fd000001fd0001802806003fe5ff00f80000ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb80\sa400\sl240\tx1140 \f21\fs20 Figure 14.2\tab A dot-matrix for two related protein sequences using the "Identities algorithm" and a score of 2. Notice that the similarity is not apparent. \par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Producing a dot matrix plot using the proportional algorithm\par
\pard\plain \s4\qj\sa120\sl280 \f20 This method gives the most thorough analysis.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Apply proportional algorithm".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Odd window length". The size of window over which the scores for each point are summed.\par
3.\tab Define "Proportional score". All points achieving at least this score will be marked with a dot in the diagram.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 14.3.\par
\pard\plain \qj\li1700\sb300\sl480\keepn \f4\fs16 {{\pict\macpict\picw283\pich301
08a200000000012d011b001102ff0c00fffe0000003c32b0003c32b00000000000fc00ed000000000001000a0000000000fc00ed0098801e0000000000fc00ed0000000000000000003c32b0003c32b000000001000100010000000000000000000000000048ae57000000010000ffffffffffff0001000000000000000000
0000fc00ed0000000000fc00ed000002e30006007fe5ff00f0060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100b0040f200010180f60000100a0040f2000003f50000100a0040f2000006f50000100a0040f2000004f50000100d0340000020f5000008f5000010090340000020e800
0010060040e5000010090340000080e80000100802400001e70000100802400003e7000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000
10060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f2000060f50000100a0040f2000040f5000010060040e50000100a0040ea000040fd0000100c0040ec0002040080fd000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040
e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040eb000080fc0000100a0040ec000001fb0000100a0040ec000002fb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000
10060040e50000100a0040f9000010ee0000100a0040f9000030ee0000100a0040f9000060ee0000100a0040f90000c0ee0000100e0040f9000080fc000020f40000100a0040eb000040fc000010060040e5000010060040e5000010060040e5000010060040e50000100a0040ee000004f9000010060040e5000010060040
e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f0000002f7000010060040e5000010
060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010070040e600018010060040e5000010060040e50000100b0040fd000101
80eb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f4000002f30000100a0040f4000006f30000100a0040f400000cf30000100a0040f4000008f30000100a0040f4000010f30000100a0040f4
000030f30000100a0040f4000060f30000100a0040f4000040f3000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010090040e8000301000010060040e5
000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f7000004f00000100a0040f7000004f0000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5
000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f9000008ee000010060040e5000010060040e50000100a0040f9000020ee0000100a0040f9
000040ee0000100a0040f9000080ee0000100a0040fa000001ed0000100a0040fa000002ed0000100a0040fa000004ed0000100a0040fa000004ed0000100a0040fa000008ed000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500
0010060040e5000010060040e5000010060040e5000010060040e50000100a0040fb000040ec0000100a0040fb000080ec0000100a0040fc000001eb0000100a0040fc000002eb0000100a0040fc000006eb0000100a0040fc000004eb0000100a0040fc000008eb0000100a0040fc000010eb0000100a0040fc000020eb00
00100a0040fc000060eb0000100a0040fc000080eb0000100b0040fd00010180eb0000100a0040fd000001ea0000100a0040fd000002ea0000100a0040fd000004ea0000100a0040fd000008ea0000100a0040fd000008ea0000100a0040fd000010ea0000100a0040fd000020ea0000100a0040fd000040ea000010060040
e50000100a0040fe000001e9000010060040e50000100a0040fe000002e90000100a0040fe000004e90000100a0040fe000008e9000010060040e50000100e0040fe000010f0000040fb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500001006
0040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100d0040fd000303000020ed000010060040e5000010060040e5000010060040e50000100a0040fc00000ceb0000100a0040fc00
0008eb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500001006007fe5ff00f00000ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 14.3\tab
A dot-matrix for the two related protein sequences shown in figure 14.2, but here using the "Proportional algorithm" with a window of 21 and a score of 240. Notice that the similarity is now apparent. \par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Producing a dot matrix plot using the quick scan algorithm\par
\pard\plain \s4\qj\sa120\sl280 \f20
This method is very fast. Using the current score matrix it accumulates the scores for all the exact matches that lie on each diagonal. The mean diagonal score and its standard deviation is calculated, and those diagonals that have scores more than a chose
n number of standard deviations above the mean are rescanned using the proportional algorithm and the points above the proportional algorithms cutoff are plotted.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Apply quick scan algorithm".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Identity score". The minimum number of consecutive identical sequence symbols that count as a match.\par
3.\tab Define "Odd window length". The size of window over which the scores for each point are summed when the proportional algorithm is applied to the best diagonals.\par
4.\tab Define "Proportional score". For the best diagonals all points achieving at least this score will be marked with a dot in the diagram.\par
5.\tab Define "Number of s.d. above mean". Diagonals with scores above the minimum number of standard deviations are rescanned using the proportional algorithm.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 14.4.\par
\pard\plain \qj\li1720\sb300\sl480\keepn \f4\fs16 {{\pict\macpict\picw283\pich301
07fa00000000012d011b001102ff0c00fffe0000003c32b0003c32b00000000000fc00ed000000000001000a0000000000fc00ed0098801e0000000000fc00ed0000000000000000003c32b0003c32b0000000010001000100000000000000000000000000491cbd000000010000ffffffffffff0001000000000000000000
0000fc00ed0000000000fc00ed000002e30006007fe5ff00f0060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5
000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500
0010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000
10060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010
060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500001006
0040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100600
40e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040
e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5
000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500
0010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f9000008ee000010060040e5000010060040e50000100a0040f9000020ee0000100a0040f9000040ee0000100a0040f9000080ee0000100a0040fa
000001ed0000100a0040fa000002ed0000100a0040fa000004ed0000100a0040fa000004ed0000100a0040fa000008ed000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000
10060040e50000100a0040fb000040ec0000100a0040fb000080ec0000100a0040fc000001eb0000100a0040fc000002eb0000100a0040fc000006eb0000100a0040fc000004eb0000100a0040fc000008eb0000100a0040fc000010eb0000100a0040fc000020eb0000100a0040fc000060eb0000100a0040fc000080eb00
00100b0040fd00010180eb0000100a0040fd000001ea0000100a0040fd000002ea0000100a0040fd000004ea0000100a0040fd000008ea0000100a0040fd000008ea0000100a0040fd000010ea0000100a0040fd000020ea0000100a0040fd000040ea000010060040e50000100a0040fe000001e9000010060040e5000010
0a0040fe000002e90000100a0040fe000004e90000100a0040fe000008e9000010060040e50000100a0040fe000010e9000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000
10060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010
06007fe5ff00f00000ff}}\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 14.4\tab
A dot-matrix for the two related protein sequences shown in figures 14.2 and 14.3, but here using the "Quick scan algorithm" with an identity score of 1 and a window of 21 and a score of 240 for the proportional algorithm. Notice that the simil
arity is now apparent but the absence of background "noise" is misleading.\par
\pard\plain \s6\fi-540\li560\sb240\sa60\sl280\tx860 \b\f20 2.4\tab Producing a list of all matching segments using the proportional algorithm\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "List matching segments".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Odd window length". The size of window over which the scores for each point are summed.\par
3.\tab Define "Proportional score". All segments achieving at least this score will be listed out with the two sequences written one above the other. See figure 14.5.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Calculating the expected scores for the proportional algorithm\par
\pard\plain \s4\qj\sa120\sl280 \f20 This function calculates the probability of achieving each possible score using the proportional algorithm. Hence it provides a method of setting
cutoff scores and assessing the statistical significance of the scores found. The algorithm calculates the "Double matching probability" described by McLachlan (2) which is defined as the probability of finding the scores in two infinitely long sequences
of the same composition as the pair being compared. It is very much faster than the alternative of repeatedly scrambling and recomparing the sequences. The program offers three ways for the user to see the results of the calculation\:
the user can type a \par
\pard\plain \li2320\ri2720\sl220\box\brsp100\brdrth \f4\fs16 List matching segments\par
\pard \li2320\ri2720\sl220\box\brsp100\brdrth ? Odd window length (1-401) (11) =\par
? Proportional score (1-567) (252) =\par
Working\par
62\par
GLRRGLDVKDLEHPIEVPVGK\par
DLAEGMKVKCTGRILEVPVGR\par
81\par
63\par
LRRGLDVKDLEHPIEVPVGKA\par
LAEGMKVKCTGRILEVPVGRG\par
82\par
65\par
RGLDVKDLEHPIEVPVGKATL\par
EGMKVKCTGRILEVPVGRGLL\par
84\par
66\par
GLDVKDLEHPIEVPVGKATLG\par
GMKVKCTGRILEVPVGRGLLG\par
85\par
67\par
LDVKDLEHPIEVPVGKATLGR\par
MKVKCTGRILEVPVGRGLLGR\par
\pard \li2320\ri2720\sl220\keepn\box\brsp100\brdrth 86\par
\pard\plain \s8\qj\fi-1140\li1140\sb60\sa400\sl240\tx1140 \f21\fs20 Figure 14.5\tab A typical run of "List matching segments.\par
\pard\plain \s4\qj\sa120\sl280 \f20 score and the program will display its probability; the user can type a probability and the program will display the corresponding score, alternatively the program will list the full range of scores and probabilities.
\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate expected scores".\par
2.\tab Define "Odd window length".\par
\tab The calculation takes a noticeable time.\par
3.\tab Select "List scores and probabilities".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Number of steps between scores". This allows, say, every fifth score to be listed if the user defines the number of steps to be 5. The list will appear as in figure 14.6.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Calculating the observed scores for the proportional algorithm\par
\pard\plain \s4\qj\sa120\sl280 \f20
This function applies the proportional algorithm, but instead of producing a dot matrix it accumulates the scores and their frequencies of occurrence. It provides a method of setting cutoff scores and assessing the statistical significance of the scores fo
und. The program offers three ways for the user to see the results of the calculation\: the user can type a score and the program will display its frequency; the user can type a frequency and the progra
m will display the corresponding score, alternatively the program will list the full range of scores and frequencies. The frequencies are expressed as percentages.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate observed scores".\par
2.\tab Define "Odd window length".\par
\tab The calculation takes a noticeable time.\par
\pard\plain \li1320\ri1300\sl220\box\brsp100\brdrth \f4\fs16 Calculate expected proportional scores\par
\pard \li1320\ri1300\sl220\box\brsp100\brdrth ? Odd window length (1-401) (21) =\par
Working\par
Average score= 196.99062\par
Select probability display mode\par
1 Show probability for a score\par
X 2 Show score for a probability\par
3 List scores and probabilities\par
? Selection (1-3) (2) =3\par
? Number of steps between scores (1-10) (5) =\par
\par
5 0.10000E+01 200 0.40004E+00 395 0.00000E+00\par
10 0.10000E+01 205 0.24037E+00 400 0.00000E+00\par
15 0.10000E+01 210 0.12555E+00 405 0.00000E+00\par
20 0.10000E+01 215 0.56905E-01 410 0.00000E+00\par
25 0.10000E+01 220 0.22402E-01 415 0.00000E+00\par
30 0.10000E+01 225 0.76821E-02 420 0.00000E+00\par
35 0.10000E+01 230 0.23031E-02 425 0.00000E+00\par
40 0.10000E+01 235 0.60614E-03 430 0.00000E+00\par
45 0.10000E+01 240 0.14064E-03 435 0.00000E+00\par
50 0.10000E+01 245 0.28888E-04 440 0.00000E+00\par
55 0.10000E+01 250 0.52741E-05 445 0.00000E+00\par
60 0.10000E+01 255 0.85917E-06 450 0.00000E+00\par
65 0.10000E+01 260 0.12534E-06 455 0.00000E+00\par
70 0.10000E+01 265 0.16433E-07 460 0.00000E+00\par
75 0.10000E+01 270 0.19425E-08 465 0.00000E+00\par
80 0.10000E+01 275 0.20772E-09 470 0.00000E+00\par
85 0.10000E+01 280 0.20155E-10 475 0.00000E+00\par
90 0.10000E+01 285 0.17801E-11 480 0.00000E+00\par
95 0.10000E+01 290 0.14353E-12 485 0.00000E+00\par
100 0.10000E+01 295 0.10599E-13 490 0.00000E+00\par
105 0.10000E+01 300 0.71886E-15 495 0.00000E+00\par
110 0.10000E+01 305 0.44920E-16 500 0.00000E+00\par
115 0.10000E+01 310 0.25938E-17 505 0.00000E+00\par
\pard \li1320\ri1300\sl220\keepn\box\brsp100\brdrth 120 0.10000E+01 315 0.13881E-18 510 0.00000E+00\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa500\sl240\tx1140 \f21\fs20 Figure 14.6\tab A typical run of "Calculate expected proportional scores." The scores are listed in three columns alongside their probabilities. e.g. score 250 has a probability 0.527x10
{\up6 -5}{\plain \b\f20 .}{\up6 \par
}\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 3.\tab Select "List scores and percentages".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Number of steps between scores". This allows, say, every fifth score to be listed if the user defines the number of steps to be 5. The list will appear as in figure 14.7.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.7\tab Producing an optimal alignment\par
\pard\plain \s7\qj\sa120\sl280\tx0 \f20 This function produces an optimal alignment for any segments of the two sequences
using the algorithm of Myers and Miller (5). It guarantees to produce alignments with the optimum score, given a score matrix, a "gap start penalty" and a "gap extension penalty". That is starting a gap costs a fixed penalty F and each residue added to the
gap costs a further penalty E, so for \par
\pard\plain \li1980\ri2060\sb400\sl220\box\brsp100\brdrth \f4\fs16 Calculate observed proportional scores\par
\pard \li1980\ri2060\sl220\box\brsp100\brdrth ? Odd window length (1-401) (21) =\par
Working\par
Maximum observed score is 285\par
Select score display mode\par
X 1 Show percentage reaching a score\par
2 Show score for a percentage\par
3 List scores and percentages\par
? Selection (1-3) (1) =3\par
? Number of steps between scores (1-10) (5) =\par
156 236949 0.99998E+02\par
161 236938 0.99993E+02\par
166 236792 0.99932E+02\par
171 235882 0.99548E+02\par
176 232582 0.98155E+02\par
181 222875 0.94058E+02\par
186 203232 0.85769E+02\par
191 171507 0.72380E+02\par
196 131216 0.55376E+02\par
201 89194 0.37642E+02\par
206 52791 0.22279E+02\par
211 27315 0.11528E+02\par
216 12117 0.51137E+01\par
221 4890 0.20637E+01\par
226 1774 0.74867E+00\par
231 656 0.27685E+00\par
236 263 0.11099E+00\par
241 111 0.46845E-01\par
246 66 0.27854E-01\par
251 36 0.15193E-01\par
256 23 0.97065E-02\par
261 16 0.67524E-02\par
266 15 0.63303E-02\par
271 10 0.42202E-02\par
276 6 0.25321E-02\par
\pard \li1980\ri2060\sl220\box\brsp100\brdrth 281 2 0.84405E-03\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 14.7\tab
A typical run of "Calculate observed scores." The scores are followed by their observed number of occurrences expressed both absolutely and as a percentage of the total number of points.\par
\pard\plain \s4\qj\sa120\sl280 \f20
gap of length K residues the penalty is F + KE. Gaps at the ends of sequences incur no penalty. The size of the segments of sequence that can be aligned at once is limited to 5000 characters. The user can select the start and end of the segments by use of
the crosshair simply by clicking on any dot matrix plot. After the alignment has been produce the user can elect to have it replace the original sequence segments. By alternate use of dot matrix plotting and alignment, very long sequences can be aligned.
\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Align sequences". The crosshair will appear in the graphics window. \par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Position the crosshair on the bottom left of the segment to be aligned and hit the space bar on the keyboard. The bell will ring.\par
3.\tab Position the crosshair on the top right of the segment to be aligned and hit the space bar on the keyboard. The bell will ring.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Penalty for starting each gap".\par
5.\tab Define "penalty for each residue in gap".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab A noticeable time will elapse before the alignment is displayed on the screen. A typical alignment is shown in figure 14.8.\par
6.\tab Reject "Keep alignment". If the alignment is "kept" the padded sequences from the alignment will replace the original sequences in the active region.\par
\pard\plain \li480\ri540\sl220\box\brdrth \f4\fs16 Align the sequences\par
\pard \li480\ri540\sl220\box\brdrth Aligning region 1 to 461\par
with region 1 to 514\tab \tab Working\par
V 1 11 21 31 41 51\par
MA--TGKIVQ VIGA------ VVDVEFPQDA VPRVYDALEV QNG------N ERLVL-----\par
* * * ** * * * * *\par
MQLNSTEISE LIKQRIAQFN VVSEAHNEGT IVSVSDGVIR IHGLADCMQG EMISLPGNRY\par
H 1 11 21 31 41 51\par
V 61 71 81 91 101 111\par
EVQQQLGGGI VRTIAMGSSD GLRRGLDVKD LEHPIEVPVG KATLGRIMNV LGEPVDMKGE\par
* * ** * * ** ***** *** * ** * * **\par
AIALNLERDS VGAVVMGPYA DLAEGMKVKC TGRILEVPVG RGLLGRVVNT LGAPIDGKGP\par
H 61 71 81 91 101 111\par
V 121 131 141 151 161 171\par
IGEEERWAIH RAAPSYEELS NSQELLETGI KVIDLMCPFA KGGKVGLFGG AGVGKTVNMM\par
* ** * ** * * * * * * ***\par
LDHDGFSAVE AIAPGVIERQ SVDQPVQTGY KAVDSMIPIG RGQRELIIGD RQTGKTALAI\par
H 121 131 141 151 161 171\par
V 181 191 201 211 221 231\par
ELIRNIAIEH SGYS-VFAGV GERTREGNDF YHEMTDSNVI DKVSLVYGQM NEPPGNRLRV\par
* * ** * * *\par
DAI--INQRD SGIKCIYVAI GQKASTISNV VRKLEEHGAL ANTIVVVATA SESAALQYLA\par
H 181 191 201 211 221 231\par
V 241 251 261 271 281 291\par
ALTGLTMAEK FRDEGRDVLL FVDNIYRYTL AGTEVSALLG RMPSAVGYQP TLAEEMGVLQ\par
* * *** * * * * * * ** * * *\par
RMPVALMGEY FRDRGEDALI IYDDLSKQAV AYRQISLLLR RPPGREAFPG DVFYLHSRLL\par
H 241 251 261 271 281 291\par
V 301 311 321 331 341 351\par
ERITST---- ---------- -KTGSITSVQ AVYVPADDLT DPSPATTFAH LDATVVLSRQ\par
** **** * * * * * *\par
ERAARVNAEY VEAFTKGEVK GKTGSLTALP IIETQAGDVS AFVPTNVISI TDGQIFLETN\par
H 301 311 321 331 341 351\par
V 361 371 381 391 401 411\par
IASLGIYPAV DPLDSTSRQL DPLVVGQEHY DTAR----GV QSILQRYQEL KDIIAILGMD\par
** *** * * ** * * * * * **\par
LFNAGIRPAV NPGISVSR-- ---VGGAAQT KIMKKLSGGI RTALAQYREL AAFSQFAS--\par
H 361 371 381 391 401 411\par
V 421 431 441 451 461 471\par
ELSEEDKLVV ARARKIQRFL SQ----PFFV AE----VFTG SPGKYVSLKD --TIRGFKGI\par
* * * * * * * * * * * *\par
DLDDATRKQL DHGQKVTELL KQKQYAPMSV AQQSLVLFAA ERG-YLADVE LSKIGSFEAA\par
H 421 431 441 451 461 471\par
V 481 491 501 511 521\par
MEG--EYDHL P-EQAFYMVG SIEEAVE--- --------KA KKL*\par
** * * * * *\par
LLAYVDRDHA PLMQEINQTG GYNDEIEGKL KGILDSFKAT QSW*\par
H 481 491 501 511 521\par
Conservation 22.5%\par
\pard \li480\ri540\sl220\keepn\box\brdrth Number of padding characters inserted 63 and 10\par
\pard\plain \s8\qj\fi-1140\li1140\sb60\sa300\sl240\tx1140 \f21\fs20 Figure 14.8\tab A typical output from "Align the sequences". The horizontal and vertical sequences are labelled H and V.\par
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.8\tab Comparing a sequence against a library of sequences\par
\pard\plain \s4\qj\sa120\sl280 \f20
The program SIPL is used for comparing a probe sequence against a whole library of sequences. The searches are very fast and use the "Quick scan" algorithm described above to produce a list of matching sequences sorted in score order, and optionally, this
is followed by the production of optimal alignments using the Myers and Miller (5) algorithm. The program will search the whole of a library or restrict its search using a list of entry names. The list of
entry names can be used either as a list of sequences to search or conversely as a list of sequences to exclude from a search.\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select SIPL.\par
2.\tab Select "Personal file".\par
3.\tab Select "Format".\par
4.\tab Define "Name of sequence file". The name of the file containing the probe sequence.\par
5.\tab Define "Name of results file".\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Display alignments". The alternative will stop after producing a list of the best matching sequences.\par
7.\tab Define "Minimum library sequence length". This permits the search to skip sequences that are too short to be of interest.\par
8.\tab Define "Maximum number of scores to list". The maximum number of sequences that will be included in the results file.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab
Define "Identity score". This is the minimum number of consecutive sequence characters that will be counted as a match. Only matches of at least this length will be included in the overall score. For proteins maximum sensitivity is gained using a value
of 1, but for nucleic acids values of 4 or 6 are necessary to achieve reasonable speed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define "Number of sd above mean". This means the number of standard deviations above the mean that a diagonal must score in order for it to be scanned using the proportional algorithm.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab Define "Odd window length". This is the window size for the rescanning of high scoring diagonals using the proportional algorithm.\par
12.\tab Define "Proportional score". The score used by the proportional algorithm. It depends on the window length and the score matrix.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 13.\tab Define "Minimum global score". This is the total score achieved using the proportional algorithm when all the diagonals scoring the defined number of standard deviations above the mean, are rescanned.
\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 14.\tab Define "Penalty for starting a gap". This is for the alignment algorithm.\par
15.\tab Define "Penalty for each residue in gap". See above.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16.\tab Select a library to search. The default library will reflect the composition of the probe sequence. That is, a probe sequence that is less than 85% acgt will be guessed to be a protein.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 17.\tab Select "Search whole library". The alternatives allow the search to be restricted using a list of entry names.\par
\pard\plain \s4\qj\sa120\sl280 \f20 The search will start. A large number of parameters are required but for normal use the default value can be taken for them all. A worked example is shown in figure 14.9.\par
\pard\plain \li220\ri240\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 SIPL (Similarity investigation program (Library)) V3.0 June 1991\par
\pard \li220\ri240\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Author\: Rodger Staden\par
Compares a probe protein or nucleic acid\par
sequence against a library of sequences\par
\par
Select probe sequence\par
Select sequence source\par
X 1 Personal file \par
2 Sequence library\par
? Selection (1-2) (1) =2\par
Select a library\par
1 EMBL nucleotide library \par
X 2 SWISSPROT protein library\par
3 PIR protein library \par
? Selection (1-3) (2) =\par
Library is in EMBL format with indexes\par
Select a task\par
X 1 Get a sequence \par
2 Get annotations \par
3 Get entry names from accession numbers \par
4 Search titles for keywords \par
5 Search keyword index for keywords \par
? Selection (1-5) (1) =\par
? Entry name=bacr$halha\par
DE BACTERIORHODOPSIN PRECURSOR (BR) (GENE NAME\: BOP). \par
Sequence length= 262\par
Sequence composition\par
A C S T P A G N D E Q B Z H\par
N 0. 14. 19. 12. 30. 26. 3. 10. 11. 4. 0. 0. 0.\par
% 0.0 5.3 7.3 4.6 11.5 9.9 1.1 3.8 4.2 1.5 0.0 0.0 0.0\par
W 0. 1219. 1921. 1165. 2132. 1483. 342. 1151. 1420. 513. 0. 0. 0.\par
\par
A R K M I L V F Y W - X ? \par
N 7. 7. 10. 15. 39. 23. 13. 11. 8. 0. 0. 0. 0.\par
% 2.7 2.7 3.8 5.7 14.9 8.8 5.0 4.2 3.1 0.0 0.0 0.0 0.0\par
W 1093. 897. 1312. 1697. 4413. 2280. 1913. 1795. 1490. 0. 0. 0. 0.\par
Total molecular weight= 28256.254\par
? Results file=sipl.res\par
? Display alignments (y/n) (y) =\par
? Minimum library sequence length (10-20000) (209) =\par
? Maximum number of scores to list (1-10000) (20) =10\par
? Identity score (1-3) (1) =\par
? Number of sd above mean (0.00-10.00) (3.00) =\par
? Odd window length (1-31) (11) =\par
? Proportional score (1-297) (132) =\par
? Minimum global score (1-69168) (1729) =\par
? Penalty for starting a gap (1-100) (10) =\par
? Penalty for each residue in gap (1-100) (10) =\par
Select a library\par
1 EMBL nucleotide library \par
X 2 SWISSPROT protein library\par
3 PIR protein library \par
4 Personal file in PIR format \par
? Selection (1-4) (2) =\par
Library is in EMBL format with indexes\par
Select a task\par
X 1 Search whole library \par
2 Search only a list of entries \par
3 Search all but a list of entries \par
? Selection (1-3) (1) =3\par
? File of entry names=skip.nam\par
21794 entries processed, 25 above cutoff, sorting now\par
Entries exceeding sd cutoff= 4439\par
Mean number of diagonals above span cutoff 1.32012\par
List in score order\par
31007 BACA$HALSA DE ARCHAERHODOPSIN PRECURSOR (AR). \par
12177 BACH$NATPH DE HALORHODOPSIN PRECURSOR (HR) (GENE NAME\: HOP). \par
10999 BACH$HALSP DE HALORHODOPSIN PRECURSOR (HR) (GENE NAME\: HOP). \par
3999 HYAC$ECOLI DE HYPOTHETICAL 27.6 KD PROTEIN IN HYAB 3'REGION (GENE NAM\par
2670 OPS4$DROME DE OPSIN RH4 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NA\par
2573 PYR1$MESAU DE CAD PROTEIN (CONTAINS\: GLUTAMINE-DEPENDENT CARBAMOYL-PH\par
2328 PFLA$ECOLI DE PYRUVATE FORMATE-LYASE ACTIVATING ENZYME. \par
2194 DCOP$CANAL DE OROTIDINE 5'-PHOSPHATE DECARBOXYLASE (EC 4.1.1.23) (OMP\par
2145 BCM1$HUMAN DE LYMPHOCYTE ACTIVATION MARKER BLAST-1 PRECURSOR (BCM1 SU\par
2103 LAG3$HUMAN DE LAG-3 PROTEIN PRECURSOR (FDC PROTEIN) (GENE NAME\: LAG3 \par
BACA$HALSA DE ARCHAERHODOPSIN PRECURSOR (AR). \par
V 1 11 21 31 41 51\par
MLELLPTAVE GVSQAQITGR PEWIWLALGT ALMGLGTLYF LVKGMGVSDP DAKKFYAITT\par
* ** ** ** ** ** ** ** *** ** * * * ** \par
M-DPIALTAA VGADLLGDGR PETLWLGIGT LLMLIGTFYF IVKGWGVTDK EAREYYSITI\par
H 1 11 21 31 41 51\par
V 61 71 81 91 101 111\par
LVPAIAFTMY LSMLLGYGLT MVPFGGEQNP IYWARYADWL FTTPLLLLDL ALLVDADQGT\par
*** ** * *** * *** * * * ** ******* ********** *** * \par
LVPGIASAAY LSMFFGIGLT EVQVGSEMLD IYYARYADWL FTTPLLLLDL ALLAKVDRVS\par
H 61 71 81 91 101 111\par
V 121 131 141 151 161 171\par
ILALVGADGI MIGTGLVGAL TKVYSYRFVW WAISTAAMLY ILYVLFFGFT SKAESMRPEV\par
* *** * ** ******* * * * ** * ** * * ***\par
IGTLVGVDAL MIVTGLVGAL SHTPLARYTW WLFSTICMIV VLYFLATSLR AAAKERGPEV\par
H 121 131 141 151 161 171\par
V 181 191 201 211 221 231\par
ASTFKVLRNV TVVLWSAYPV VWLIGSEGAG IVPLNIETLL FMVLDVSAKV GFGLILLRSR\par
**** * *** *** * ** **** * * ***** ****** *** *** ******\par
ASTFNTLTAL VLVLWTAYPI LWIIGTEGAG VVGLGIETLL FMVLDVTAKV GFGFILLRSR\par
H 181 191 201 211 221 231\par
V 241 251 261\par
AIFGEAEAPE PSAGDGAAAT SD\par
** * **** **** * *\par
AILGDTEAPE PSAG-AEASA AD\par
H 241 251 261\par
Conservation 56.1%\par
\pard \li220\ri240\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Number of padding characters inserted 0 and 2\par
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 14.9\tab A run of SIPL using an entry from a sequence library and a file of entries to be excluded from the search.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
The variants on the proportional algorithm are selected by setting parameters using a special menu. This includes the facility to switch off the main diagonal for all options, which is useful when comparing a sequence against itself.\par
2.\tab For nucleotide sequences the program also has a function to complement a sequence. If the sequence on one axis is the complement of that on the other, the plots will show possible base pairing.\par
3.\tab When the cross hair is being employed, in addition to the standard special keys, the letter m will produce a display showing all the identical sequence characters around the cross hair position. The display is in the form of a matrix.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
Users should not be misled by the "Quick scan" algorithm. Its function is to perform rapid comparisons. The plots it produces may look quite striking because they will contain almost no background, however such plots tell nothing about the significance
of the similarities displayed.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab By using the "Reposition plots" function users can display several dot matrix plots on the screen at the same time. In this way plots from several pairs of sequence comparisons can be viewed together.
\par
6.\tab The library search program SIPL is of limited use for searching the nucleic acid libraries because it does not deal properly with sequences longer than 20,000 characters, but simply truncates them.\par
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1. Staden, R. 1982. An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. {\i Nucl. Acids Res}. {\b 10(9)}\:2951-2961.\par
2. McLachlan, A.D. 1971. Test for comparing related amino acid sequences. {\i J. Mol. Biol.} {\b 61}\:409-424.\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3. Schwartz, R.M. and Dayhoff, M.O. 1978. Matrices for detecting distant relationships. (in) {\i Atlas of Protein Sequence and Structure,} {\b 5 suppl. 3}\:353-358, Nat. Biomed. Res. Found., Washington D.C.
\par
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4. Lipman, D.J. and Pearson, W.R. 1985. Rapid and sensitive protein similarity searches. {\i Science} {\b 227}\:1435-1441.\par
5.\tab Myers, E.W. and Miller, W. 1988. Optimal alignments in linear space. {\i Comput. Applic. Biosci}., {\b 4}, 11-17.\par
}