5154 lines
638 KiB
Text
5154 lines
638 KiB
Text
{\rtf1\mac\deff2 {\fonttbl{\f0\fswiss Chicago;}{\f2\froman New York;}{\f3\fswiss Geneva;}{\f4\fmodern Monaco;}{\f5\fscript Venice;}{\f6\fdecor London;}{\f7\fdecor Athens;}{\f8\fdecor San Francisco;}{\f11\fnil Cairo;}{\f12\fnil Los Angeles;}
|
|
{\f13\fnil Zapf Dingbats;}{\f14\fnil Bookman;}{\f15\fnil N Helvetica Narrow;}{\f16\fnil Palatino;}{\f18\fnil Zapf Chancery;}{\f20\froman Times;}{\f21\fswiss Helvetica;}{\f22\fmodern Courier;}{\f23\ftech Symbol;}{\f24\fnil Mobile;}{\f33\fnil Avant Garde;}
|
|
{\f34\fnil New Century Schlbk;}}{\colortbl\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;}{\stylesheet{\s243\qc\sa60\sl280
|
|
\f20 \sbasedon222\snext0 footer;}{\s244\sl220\tqc\tx4320\tqr\tx8640 \f4\fs16 \sbasedon0\snext0 header;}{\sl220 \f4\fs16 \sbasedon222\snext0 Normal,Screen Font;}{\s2\qc\sa200\sl480 \b\f20\fs36 \sbasedon222\snext2 Chapter Heading;}{\s3\sb200\sa120\sl360
|
|
\b\f20\fs32 \sbasedon222\snext0 Main Subheading;}{\s4\qj\sa120\sl280 \f20 \sbasedon222\snext4 Body text;}{\s5\sb400\sa60\sl320\tx560 \b\f20\fs28 \sbasedon222\snext5 Subheading;}{\s6\sb240\sa60\sl280\tx560\tx860 \b\f20 \sbasedon5\snext6 SubSub heading;}{
|
|
\s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \sbasedon4\snext7 Indent Body;}{\s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 \sbasedon222\snext8 Figure legends;}{\s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 \sbasedon6\snext9 SubSubSub heading;}}
|
|
\paperw11880\paperh16820\margl1440\margr1440\widowctrl\ftnbj\ftnrestart \sectd \linemod0\linex0\cols1\endnhere \pard\plain \s2\qc\sa600\sl480 \b\f20\fs36 {\i\fs48 Contents\par
|
|
}\pard\plain \s7\qj\fi-560\li560\sa120\sl400\tx560\tqr\tldot\tx8980 \f20 1\tab Preface\tab 1\par
|
|
2\tab Introduction\tab 3\par
|
|
3\tab Sequence input, editing and sequence library use\tab 17\par
|
|
4\tab Managing sequencing projects\tab 26\par
|
|
5\tab Analysing sequences to find genes\tab 51\par
|
|
6\tab Searching for motifs in nucleic acid sequences\tab 60\par
|
|
7\tab Using patterns to analyse nucleic acid sequences\tab 69\par
|
|
8\tab Searching for restriction sites\tab 77\par
|
|
9\tab Statistical and structural analysis of nucleotide sequences\tab 83\par
|
|
10\tab Translating and listing nucleic acid sequences\tab 93\par
|
|
11\tab Statistical and structural analysis of protein sequences\tab 99\par
|
|
12\tab Searching for motifs in protein sequences\tab 104\par
|
|
13\tab Using patterns to analyse protein sequences\tab 112\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl400\tx560\tqr\tldot\tx8980 14\tab Comparing sequences\tab 123\par
|
|
\pard\plain \s2\qc\sa200\sl480\tqr\tldot\tx8980 \b\f20\fs36 \sect \sectd \pgnrestart\linemod0\linex0\cols1\endnhere {\footer \pard\plain \s243\qc\sa60\sl280 \f20 \chpgn \par
|
|
}\pard\plain \s2\qc\sa600\sl480 \b\f20\fs36 1. Preface (November, 1992)\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This second edition of the manual contains only minor revisions. The changes are mostly to do with managing sequencing pro
|
|
jects which is the subject on which we are currently concentrating our efforts. We have replaced our previous Developing Assembly Program DAP with another developing assembly program BAP that can assemble Bigger projects. Although this new program can hand
|
|
le 8000 readings as opposed to the miserly 1000 of the previous version, it actualy uses its space more efficiently over the course of a project. It contains a mechanism for preventing simultaneous use (and hence corruption) of databases. In addition it is
|
|
approximately four times faster during assembly and five times faster when looking for "internal joins". It now contains a routine for selecting primers and templates during the "walking" stage of a project . The "find internal joins" function now calls u
|
|
p the contig joining editor with the two contigs aligned in the window and the editor has also been speeded up. Numerous other changes have also been made but we still regard BAP as temporary, and are actively working on its replacement which we believe wi
|
|
ll overcome the limitations that BAPs aged structure has imposed on it. We have also included routines for converting ABI 373A and Pharmacia A.L.F. data to our new trace file format, for automatically marking poor quality regions of readings from these mac
|
|
hines and for converting DAP databases to BAP databases.\par
|
|
\pard \s4\qj\sa120\sl280 Other changes include providing a postscript option for saving graphics output, and facilities for using the author and freetext indexes of the sequence libraries. The sequence library indexes are v
|
|
ery useful and allow rapid searching. The freetext index is derived from ALL the text in the annotations - not just the keywords. We have also added a new repeat examining routine in NIP and a new repeat listing option in SIP.\par
|
|
\pard \s4\qj\sa120\sl280 \par
|
|
\pard\plain \s2\qc\sa600\sl480 \b\f20\fs36 1. 1 Preface to first edition \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
It could be said that this manual is long overdue, for, apart from the extensive online help available from within the programs, it is the first printed guide to using a package that has been around for longer than I care remember. On the other hand, to
|
|
misquote a cliche much used by reviewers, it could be said that this manual fills a much needed gap, in that I believe the best way to learn about computer programs is to use them. Those who are prepared to experiment and play with programs will discover
|
|
far more than any manual of reasonable size can hope to convey. However the manual serves to give users an overview of what is available and a starting point for their exploration of the programs.\par
|
|
\pard \s4\qj\sa120\sl280 One of my objectives was to be able to distribute the manua
|
|
l on floppy disk so that each site using the programs could print as many copies as they need. We had to balance the quality of the graphics and the sophistication of the layout, against the ease of producing updates and the availability of software, and d
|
|
ecided to to use the WORD4 program running on the Apple Macintosh. The graphics figures reproduced in the manual are far below the quality seen on the terminal screen, and in some cases should be viewed as merely schematic.\par
|
|
\pard \s4\qj\sa120\sl280 Most of the chapters are self-contained but users are strongly advised to read sections 3 to 7 in chapter 1, as to do so will save a lot of time.\par
|
|
\pard \s4\qj\sa120\sl280 In future editions we will add chapters on other programs in the package and expand the Notes sections to give more information about the theory and algorithms used. We welcome comments and suggestions for improvements.\par
|
|
\pard \s4\qj\sa120\sl280 I thank Brian Pashley for transforming my original documents into, what I hope will be, a useful manual.\par
|
|
\pard\plain \s3\sb200\sa120\sl360 \b\f20\fs32 Rodger Staden, March 1992.\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 2. Introduction\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Materials\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 2.1\tab Versions\par
|
|
2.2\tab Terminals\par
|
|
2.3\tab Digitizers\par
|
|
2.4\tab Sequencing machines\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab User interfaces\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 3.1\tab The xterm and VAX interface\par
|
|
3.2 \tab The X interface\par
|
|
3.3\tab Use of the bell\par
|
|
3.4\tab Printing and saving results in files\par
|
|
3.5\tab Use of feature tables\par
|
|
3.6\tab Use of graphics\par
|
|
3.7\tab The active region\par
|
|
3.8\tab Files of file names\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Character sets\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 4.1\tab Character sets for finished sequences\par
|
|
4.2\tab Symbols used in gel readings\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Sequence formats\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 5.1\tab Personal sequence files\par
|
|
5.2\tab Sequence libraries\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Conventions used in text\par
|
|
7.\tab Notes\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
In this chapter we give an overview of the chapters on the "Staden Package" of programs. Here we describe the equipment required and outline the scope of the package and the user interfaces. In the next chapter we cover character sets, sequence formats and
|
|
sequence library access.\par
|
|
\pard \s4\qj\sa120\sl280 The main programs in the package are as follows\:\par
|
|
\pard\plain \s7\qj\sa120\sl280\tx1120 \f20 GIP\tab Gel input program\par
|
|
\pard \s7\qj\sa120\sl280\tx1120\tx1580 SAP\tab Sequence assembly program\par
|
|
\pard \s7\qj\sa120\sl280\tx1120 BAP\tab Sequence assembly program\par
|
|
NIP\tab Nucleotide interpretation program\par
|
|
PIP\tab Protein interpretation program\par
|
|
SIP\tab Similarity investigation program\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx1120 MEP\tab Motif exploration program\par
|
|
NIPL\tab Nucleotide interpretation program (library)\par
|
|
PIPL\tab Protein interpretation program (library)\par
|
|
SIPL\tab Similarity investigation program (library)\par
|
|
XBAP\tab Sequence assembly program\par
|
|
XNIP\tab Nucleotide interpretation program\par
|
|
XPIP\tab Protein interpretation program\par
|
|
XSIP\tab Similarity investigation program\par
|
|
XMEP\tab Motif exploration program\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 GIP uses a digitiser for entry of DNA sequences from autoradiographs. SAP, BAP and XBAP handle everything relating to assembling and edi
|
|
ting gel readings. NIP provides functions for analysing and interpretting individual nucleotide sequences. PIP provides functions for analysing and interpretting individual protein sequences. MEP analyses families of nucleotide sequences to help discover n
|
|
ew motifs. NIPL performs pattern searches on nucleotide sequence libraries. PIPL performs pattern searches on protein sequence libraries. SIP provides functions for comparing and aligning pairs of protein or nucleotide sequences. SIPL searches nucleotide a
|
|
nd protein sequence libraries for entries similar to probe sequences. The programs whose names begin with a letter X are X11 (see below) versions of the programs. For example XNIP is an X11 version of NIP.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Materials\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Versions.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The programs run on Apple Macintosh computers, on VAX computers using the VMS operating system, and on SUN workstations (which use the UNIX operating system.) The SUN version should run, with only minor changes, on other machines running UNIX and currently
|
|
we are aware of versi
|
|
ons running on DEC ULTRIX, Silicon Graphics, Alliant FX2800 and Convex machines. Currently the Macintosh version is "frozen" in its April 1990 state, the VAX version is "frozen" in its April 1991 state and all development is being done on the SUN version.
|
|
\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.1\tab VAX version.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The VAX version will run on any VAX using the VMS operating system. A FORTRAN compiler is required.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.2\tab UNIX version.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The UNIX version is being used here on SPARCstations and DECstation 5000/240s with at least 8 megabytes of memory, 20
|
|
0 megabyte internal disk drives and 700 megabyte external disks. Colour monitors such as the GX are preferable for running the programs which display traces from fluorescent sequencing machines, but monochrome displays are adequate for all other programs.
|
|
We also use tape desktop backup packs for archiving, and a cdrom drive for handling the sequence libraries.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.3\tab Other UNIX versions.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Users of UNIX machines other than SUN SPARCstations, DECstation 5000/240 and SGI Indigo R3000 will require a FORTRAN comp
|
|
iler and ANSI C. When operated directly on the workstation screen all UNIX versions require X11 release 4.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.1.4\tab The Macintosh version\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The Macintosh version of the package requires a machine with at least 1 megabyte of memory and a 20 megabyte hard disk. It only operates on monochrome screens or colour screens set to black/white mode. The package contains only programs SAP, GIP, NIP, PIP
|
|
and SIP. All further information about this version of the package is contained in the notes.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Terminals.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The program
|
|
s can also be operated via a serial port using Tektronix terminals, PC's running MS-Kermit, or Apple Macintoshs running Versaterm Pro. The UNIX versions can also be run from X teminals or microcomputers running X emulators.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Digitizers.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The gel reading input program uses a sonic digitizer called a GRAPHBAR GP7 made by Science Accessories Corp., 200 Watson Blvd., Stratford, CT 06497, USA. When ordering specify that the device should be set to use metric units.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Sequencing machines.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The programs can handle data produced by the Applied Biosystems Inc. 373A and Pharmacia A.L.F fluorescent sequencing machines.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab User Interfaces\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The programs have two user interfaces. The first runs under the terminal emulator xterm and the second runs directly under X. On the VAX, at present only the xterm interface is available, but on UNIX systems either interface can be used. The xterm version
|
|
of the package will operate on the workstation screen, X terminals, Tektronix terminals, PC's or Macintoshes (see above). When run
|
|
on the workstation screen the programs have separate text and graphics windows, each of which can be moved, resized and iconized, and the text windiow can be scrolled in both directions. The versions that run directly under X can only be used on the works
|
|
tation screen, X terminals or using an X emulator. They produce separate text and graphics windows, an independent, constantly available help window and a separate dialogue window. All input is controlled by mouse selection and dialogue boxes.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.1\tab The xterm and VAX interface\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The user interface is common to all programs. It consists of a set of menus and a uniform way of presenting choices and obtaining input from the user. This section describes\:
|
|
the menu system; how options are selected and other choices made; how values are
|
|
supplied to the program; how help is obtained, and how to escape from any part of a program. In addition it gives information about saving results in files and the use of graphics for presenting results.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.1.\tab Menus and option selection\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Each program has several menus and numerous options. Each menu or option has a unique number that is used to identify it. Menu numbers are distinguished from option numbers by being preceded by the letter m (or M, all programs make no distinction between u
|
|
pper and lower case letters). With the exception of some parts of program SAP, the menus are not hierachical, rather the options they each contain are simply lists of related functions and their identifying numbers. Therefore options can be selected inde
|
|
pendently of the menu that is currently being shown on the screen, and the menus are simply memory aides. All options and menus are selected by typing their option number when the programs present the prompt \par
|
|
\pard \s4\qc\sb120\sa180\sl280 "? Menu or option number =" \par
|
|
\pard \s4\qj\sa120\sl280
|
|
To select a menu type its number preceded by the letter M. To select an option type its number. If users type only "return" they will get menu m0 which is simply a list of menus. If users select an option they will return to the current menu after the func
|
|
tion is completed. Where possible, equivalent or identical options have been given the same numbers in all programs, and so users quickly learn the numbers for the functions they employ most often.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.2\tab Execution and dialogue\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
All inputs requested by the program (apart from file names) have default values. In addition most of the analytical functions have a default path through which they will pass, so when users select an option, in many cases the program will immediately perfo
|
|
rm the operation selected without further dialo
|
|
gue. However if users precede an option number by the letter d (e.g. D17), they will force the program to offer dialogue about the selected option before the function operates, hence allowing them to change the value of any of its parameters. In addition,
|
|
alternative suboptions will be made available.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.3\tab Help\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Help about each option can be obtained by preceding the option number by the symbol ? when users are presented with the prompt "? Menu or option number", (e.g. ?17 gives help on the option 17), but
|
|
there are two further ways of obtaining help. Whenever the program asks a question users can respond by typing the symbol ? and they will receive information about the current option. In addition, option number 1 in all the programs will give help on all o
|
|
f a programs functions. \par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.4.\tab Quitting \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 To exit from any point in a program users type ! for quit. If a menu is on the screen this will stop the program, otherwise they will be returned to the last menu. \par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.5.\tab Making selections\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Questions and choices are dealt with in three ways. Where there are choices that are not obvious opposites, or there are more than two choices, "radio buttons" and "check boxes" are used.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\pagebb\tx1140 \b\f20 3.1.5.1.\tab Choosing between opposites.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Obvious opposites such as "clear screen" and "keep picture" are presented with only the default shown. For example in this case the default is generally "keep picture" so the program will display\: \par
|
|
\pard\plain \li1720\sa200\sl220 \f4\fs16 "Keep picture (y/n) (y) =" \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 and the picture will be retained if the user types Y or y or only return. If the user types N or n the picture will be cleared. Anything other than these or ? or ! will cause the question to be asked again.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.5.2. \tab Choosing one from many.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Radio buttons are used when only one of a number of choices can be made at any one time. The choices are presented arranged one above the other, each choice with a number for its selection, and the default choice marked with an X. For example when the user
|
|
is reading a new sequence file the following choices of format are offered.\par
|
|
\pard\plain \li1720\sb300\sl220\tx2460\tx3400 \f4\fs16 Select sequence file format\par
|
|
\pard \li1720\sl220\tx2460\tx3400 \tab 1\tab Staden\par
|
|
\tab 2\tab EMBL\par
|
|
X\tab 3\tab GenBank\par
|
|
\tab 4\tab PIR\par
|
|
\tab 5\tab GCG\par
|
|
6 FASTA\par
|
|
\pard \li1720\sa300\sl220\tx2460\tx3400 ? Selection (1-5) (3) =\par
|
|
\pard\plain \s4\qj\sb60\sa120\sl280 \f20 Any single option can be selected by typing the option number, and the default option, (here shown as 3), is also obtained by typing only "return". Again help can be obtained by typing ? and quit by typing !.
|
|
\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.5.3.\tab Choosing at least one from many.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Check boxes are used when any number of a set of choices can be made (i.e. the choices are not exclusive). Choices are made by typing choice numbers. Each choice c
|
|
an be considered as a switch whose setting is reversed when it is selected. Choices that are currently switched on are marked with an X. The user quits from making selections by typing only "return". For example in the routine that plots base composition u
|
|
sers can elect to plot the frequencies of any combination of bases, e.g. only A, or A+T, or A+T+G etc. The following check box is offered to the user\: \par
|
|
\pard\plain \li1720\sb300\sl220\tx2420\tx3400 \f4\fs16 X\tab 1\tab T\par
|
|
\pard \li1720\sl220\tx2420\tx3400 \tab 2\tab C\par
|
|
X\tab 3\tab A\par
|
|
\tab 4\tab G\par
|
|
\pard \li1720\sa300\sl220\tx2420\tx3400 ? Selection (1-4) ( ) =\par
|
|
\pard\plain \s4\qj\sb60\sa120\sl280 \f20 As shown this will plot the A+T composition. To switch off T select 1, to switch on C select 2, etc, to quit, having set the bases required type only "return". \par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 \page 3.1.6.\tab Input of numerical values \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 All input of integer or decimal numbers is presented in a standard way with the allowed range shown in brackets and the default value also in brackets. For example\: \par
|
|
\pard\plain \li1700\sb160\sa300\sl220 \f4\fs16 ? Window (5-31) (11) = \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 In this example users could type any number between 5 and 31, or "return" only, or ! or ? (see above). Any other input will cause the program to ask the question again. Typing only "r
|
|
eturn" gives the default value (here 11). \par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.1.7.\tab Input of character strings\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Character strings are requested using informative prompts of the form\:\par
|
|
\pard\plain \li1720\sb160\sa300\sl220 \f4\fs16 ? Search string =\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Or where possible the prompt will be preceded by a default value as in\:\par
|
|
\pard\plain \li1720\sb160\sl220 \f4\fs16 Default search string = atatatata\par
|
|
\pard \li1720\sa300\sl220 ? Search string =\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Question mark (?) or ! will get help or quit. Where appropriate, for example when a whole list of strings have been defined one after the other, typing return only will be a signal to the program that input is complete.
|
|
\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.2.\tab The X interface\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This interface deals with all the types of interactions described above but options are selected using pulldown menus and all inputs are via appropriately styled dialogue boxes and buttons. Default values are accepted by clicking on an "OK" button, or typi
|
|
ng return on the keyboard. Values are changed by overtyping the defaults. Quit is available from each dialogue via a "CANCEL" button. Help is constantly available via a "HELP" button in the main dialogue window. Details such as requestin
|
|
g dialogue when an option is selected are dealt with using a button labelled "execute with dialogue" which toggles to "execute".\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.3.\tab Use of the bell \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The programs use the bell to indicate that a task is completed. When the bell sounds, the programs will wait until return is typed. Users can quit from these points by typing ! but no help is available.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.4.\tab Printing and saving results in files \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 A few of the functions in the programs automatically write their textual results to disk files, but for most functi
|
|
ons users can choose whether results appear on the terminal screen or go to a file. For these functions the normal, or default, place for results to appear is on the screen, and users need to decide before the function is selected if they want to redirect
|
|
the results to a file. In all programs the option "Redirect output" gives control over whether results appear on the screen or go to a file. When a program is started results will be sent to the screen. If the option "Redirect output" is selected users wil
|
|
l be given the choice of redirecting either text or graphics to a file or of creating a postscript file for the graphics. The program will then ask users to supply a file name. If users elect to redirect output, from that point on ,all results will be sent
|
|
to the file until the option is selected again, in which case the "redirection file" will be closed, and results will again appear on the screen. If these files contain textual results they can be looked at from within the programs by using option "List
|
|
a text file". Once the program is left users can employ an appropriate system command to print the files. There is no function within the programs to direct files to a printer. If users elect to create a postscript file for the graphics the graphics will a
|
|
lso appear on the screen. If they redirect graphics the graphics commands (in Tektronix codes) will only go to the file and will not appear on the screen\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 3.5.\tab Use of feature tables\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 One particular use of redirection should be noted. The programs can use EMB
|
|
L/GenBank feature tables as input for directing translation of DNA to protein, etc, but the tables must be stored in separate text files, and cannot be read directly from the sequence libraries. The only routines that can read the sequence libraries are th
|
|
ose available under "Read a sequence". So to create a text file containing the feature table for a particular library entry users must redirect text output to disk, and then use the "Read a sequence" to display the appropriate feature table. The feature ta
|
|
ble will be written to the file, and then the file can be used for controlling translation etc. Note however that the redirection mechanism is a general function and it therefore does not add the required header and tail to saved files. To make the files u
|
|
seable as feature tables they need, as a minimum, a line at the top with the word FEATURES starting in column 1, and two empty lines at the end of the file!\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.6.\tab Use of graphics \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The analytical programs including NIP, PIP and SIP present the results of many of their analyses graphically.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.1.\tab The drawing board and plot positions\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The position at which the results for any function appear on the screen is defined relative to a notional users "drawing board" of dimension 10,000 by 10,000. This drawing board fills the screen and results are drawn in windows defined using symbols x0,y0
|
|
and xlength,ylength, where x0,y0 is the position of the bottom left hand corner of the window, and xlength is the width of the window and ylength the height of the window. The win
|
|
dow positions for each option are read from a file when a program is started. If required individual users can have their own set of plot positions, and also the positions can be redefined from within the programs using the option "Reposition plots".
|
|
\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.2.\tab The plot interval\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
For those analyses that draw continuous lines to represent results (for example a plot of base composition) the user is asked to supply the "Plot interval". All the analyses produce a value for every point along the sequence but often i
|
|
t is unnecessary to actually plot the values for all the points. The plot interval is simply the distance between the points shown on the screen. If the user selects a plot interval of 1, every point will be plotted; a plot interval of 3 will show every th
|
|
ird point. \par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.3.\tab The window length\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The word "window" is used in a further way by the programs. Most of the functions that analyse the content of a sequence (the simplest such routine plots the base composition) perform their calculations over a segment o
|
|
f the sequence of a certain length, display the result, then move on by 1 position, and recalculate. The fixed size of segment over which a calculation is performed is called a "window" and the segment size is the "window length". Many analytical functions
|
|
request "? Window length =", or more frequently "? Odd window length =". An odd number is used so that when a result is displayed for a particular window position it is derived from an equal number of points either side of the windows' midpoint.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.4.\tab Use of the cross hair\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
All programs that produce graphical output provide a function for using a cross hair to examine the plots. After the cross hair function is selected the cross will appear in the graphics window and can be steered around using the mouse or directional keys.
|
|
Special keyboard characters hit while the function is in operation produce the following results. For all programs the letter s (for sequence) will show the local sequence around the cross hair position. For the sequence comparison pro
|
|
grams that show a dot matrix the two sequences will be displayed above one another. For the sequencing project management programs all the aligned sequences in the contig will be displayed. For the sequence comparison programs the letter m (for matrix) wil
|
|
l show a matrix in which all identical characters for a window around the cross hair are marked. The punctuation symbol , will show the local position in sequence units, but leave the cross hair on the screen, whereas the space bar and any other non-specia
|
|
l character will show the local position and exit the cross hair function. Further special characters are defined in the chapter on managing sequencing projects.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.5\tab Drawing scales on plots\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 All the programs have a function "Draw a ruler" which will allow users to add scales to the axes of graphical plots. The scale can be positioned anywhere on the plot.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 3.6.6\tab Saving graphics\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The best way of saving the graphics is to use the "Redirect output" function to open a postscript file which will then contain a co
|
|
py of all plots that appear on the screen. This of course requires the file to be opened before the plots are drawn. Many terminals are not capable of dumping their screen contents to a file for subsequent printing. One convenient way of obtaining hard cop
|
|
y of graphical results is to use a micro computer as a terminal. On the Macintosh we use the terminal emulator versa termPro. This allows graphics to be saved as Macintosh files that can be annotated and printed using Macdraw and other painting programs. A
|
|
lternatively graphics can be redirected to a file and printed using a laser printer with tektronix capability (see "Printing and saving results in files"). \par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.7.\tab The active region\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
All the analytical programs use an "active region" for most of their functions. This is simply the current section of the sequence over which the analysis will be applied. When a sequence is first read in the active region will be set to its whole length,
|
|
but the user can restrict the scope of analytical functions by use of an opt
|
|
ion called "Define active region". However some functions such as "List the sequence" are always given access to the whole sequence and will allow the user to define a limited range after they have been selected.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 3.8.\tab Files of file names\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
A useful device that is employed by many of the programs is that of "files of file names". If a program needs to perform the same operation in turn on each of 20 files, the user should not have to type in 20 file names. Instead the user types in the name o
|
|
f a single file which contains the names of the other 20 files. This single file is a file of file names. They are used, for example, to process batches of gel readings, or to compare a sequence against a library of motifs.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab Character Sets\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 There are two types of character sets employed by the programs\: those for finished sequences and those used during sequencing projects.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 4.1\tab Character sets for finished sequences\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The analytical programs will operate with uppercase or lowercase sequence characters. For nucleic acids T and
|
|
U are equivalent. For proteins the standard 1 letter codes are used. The analytical programs also use IUB symbols for redundancy in back translations and for sequence searches. The symbols are shown in table 2.1 \par
|
|
\pard \s4\qj\li2260\ri2220\sb300\sa120\sl280\box\brsp100\brdrth \tx3420\tx4800 A,C,G,T\par
|
|
\pard \s4\qj\li2260\ri2220\sa120\sl280\box\brsp100\brdrth \tx3420\tx4800 R\tab (A,G)\tab 'puRine'\par
|
|
Y\tab (T,C)\tab 'pYrimidine'\par
|
|
W\tab (A,T)\tab 'Weak'\par
|
|
S\tab (C,G)\tab 'Strong'\par
|
|
M\tab (A,C)\tab 'aMino'\par
|
|
K\tab (G,T)\tab 'Keto'\par
|
|
H\tab (A,T,C)\tab 'not G'\par
|
|
B\tab (G,C,T)\tab 'not A'\par
|
|
V\tab (G,A,C)\tab 'not T'\par
|
|
D\tab (G,A,T)\tab 'not C'\par
|
|
\pard \s4\qj\li2260\ri2220\sa120\sl280\keepn\box\brsp100\brdrth \tx3420\tx4800 N\tab (G,A,C,T)\tab 'aNy'\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Table 1.1\tab The NC-IUB characters used by the analytical programs\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 4.2\tab Symbols used in gel readings\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Th
|
|
e information stored about a sequence reading has to show the original sequence, recording any doubts about its interpretation, and also, where possible, allow the changes made during editing to be indicated. Lowercase characters are used by the sequence p
|
|
roject management programs for recording readings, and uppercase symbols are used when changes are made during editing. Alternatively the reverse convention can be used. Any other characters in a sequence are treated as dash (-) characters. The symbols are
|
|
shown in table 2.2.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 5.\tab Sequence Formats\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The data formats for the programs that deal with sequencing projects are described in the chapter on managing sequencing projects. All analytical programs can read sequences stored in several formats. We distinguish between two sources of input namely\:
|
|
"sequence libraries" and "personal files".\par
|
|
\pard \s4\qj\sa120\sl280 \par
|
|
\pard \s4\qj\li1120\ri1200\sa120\sl280\box\brsp100\brdrth \tqc\tx2800 {\b Symbol \tab Meaning}\par
|
|
\pard \s4\qj\li1120\ri1200\sl280\box\brsp100\brdrth \tx1400\tqc\tx2800\tqc\tx4240\tqc\tx5640\tx6820 \tab c\tab Definitely\tab c\par
|
|
\tab t\tab "\tab t\par
|
|
\tab a\tab "\tab a\par
|
|
\tab g\tab "\tab g\par
|
|
\tab 1\tab Probably\tab c\par
|
|
\tab 2\tab "\tab t\par
|
|
\tab 3\tab "\tab a\par
|
|
\tab 4\tab "\tab g\par
|
|
\tab d\tab "\tab c\tab Possibly\tab cc\par
|
|
\tab v\tab "\tab t\tab "\tab tt\par
|
|
\tab b\tab "\tab a\tab "\tab aa\par
|
|
\tab h\tab "\tab g\tab "\tab gg\par
|
|
\tab k\tab "\tab c\tab "\tab c-\par
|
|
\tab l\tab "\tab t\tab "\tab t-\par
|
|
\tab m\tab "\tab a\tab "\tab a-\par
|
|
\tab n\tab "\tab g\tab "\tab g-\par
|
|
\tab r\tab a or g\par
|
|
\tab y\tab c or t\par
|
|
\tab 5\tab a or c\par
|
|
\tab 6\tab g or t\par
|
|
\tab 7\tab a or t\par
|
|
\tab 8\tab g or c\par
|
|
\tab -\tab a or g or c or t\par
|
|
\pard \s4\qj\li1120\ri1200\sl280\box\brsp100\brdrth \tx1400\tqc\tx3780\tqc\tx4240\tqc\tx5640\tx6820 \tab A\tab a set by auto edit or corrected by user\par
|
|
\tab C\tab c set by auto edit or corrected by user\par
|
|
\tab G\tab g set by auto edit or corrected by user\par
|
|
\tab T\tab t set by auto edit or corrected by user\par
|
|
\pard \s4\qj\li1120\ri1200\sl280\box\brsp100\brdrth \tx1400\tqc\tx4020\tqc\tx5640\tx6820 \tab *\tab padding character placed by auto assembler\par
|
|
\pard \s4\qj\li1120\ri1200\sl280\keepn\box\brsp100\brdrth \tx1400\tqc\tx2800\tqc\tx4240\tqc\tx5640\tx6820 else = -\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa300\sl240\tx1140 \f21\fs20 Table 2.2\tab The symbols used to record gel readings\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 5.1\tab Personal sequence files\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The programs can read sequences from files in PIR, EMBL, GenBank, GCG, FASTA and Staden formats. Staden format
|
|
means text files with records of up to 80 characters; all spaces are removed; lines with ";" in the first position are treated as comments and will be displayed when the file is read but not included in the sequence; if the first line of data contains a 2
|
|
0 character header of the form <---abcdefghij-----> it too will not be included in the processed sequence. This last facility allows the programs to read consensus sequences created by the sequence project management programs. Files in PIR format can conta
|
|
in any number of entries (which the user selects by entry name), but all other formats are expected to contain only one sequence. If they contain more only the first will be read.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 5.2\tab Sequence libraries\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Users may not appreciate the fact that because the sequence libraries are so large, programs need to use indexes to provide rapid retrieval of individual entries. An index is a list of entry names and pairs of offsets. For each entry name the offsets defin
|
|
e the position at which its sequence and annotation s
|
|
tart in the large file. The index, which is in any case relatively small, is arranged so that it can be searched quickly - for example the EMBL cdrom index is sorted alphabetically. When the user supplies an entry name the program rapidly finds it in the i
|
|
ndex file and then uses the associated offsets to locate the entry in the larger sequence files.\par
|
|
\pard \s4\qj\sa120\sl280 The sequence libraries are stored in different ways on the VAX and the SUN. On the VAX we adopted the widely used PIR format and indexing method and on the SUN we use the EMBL cdrom format and indexes.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 5.2.1\tab Sequence libraries on the VAX\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
On the VAX all libraries are stored in PIR format, and except for the facility to select entries by accession number, the same functions are provided as those on the SUN. Note that this means that most libraries need reformatting after they have been read
|
|
from the distribution media. Because, for each entry, the sequence and its annotation are stored separately, the reformatting process consumes significant computer resources. T
|
|
hese reformatting programs are available from PIR and we give no further information here. The programs that search whole libraries of sequences also expect the libraries to be in PIR format.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 5.2.2.\tab Sequence libraries for the UNIX version\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
For the UNIX version of the programst we use the EMBL cdrom as the primary source of sequence data and have chosen their indexing method for all libraries. These indexes leave the sequence libraries in their distribution format and simply provide offsets t
|
|
o the original fi
|
|
les. The cdrom provides the EMBL nucleic acid sequence library and the SWISSPROT protein sequence library. Currently it also includes indexes for entry names, accession numbers, authors and freetext and has an additional "title" file which, for each entry,
|
|
consists of entry name, entry length and an 80 character description of the entry. These indexes allow rapid retrieval of entries by name or accession number, and the author and freetext indexes can be searched very rapidly. The files can be left on the
|
|
cdrom or transfered to a hard disk. The programs that search whole libraries of sequences expect the libraries to be in cdrom format or PIR format.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
We have written our own programs for producing EMBL cdrom type indexes for other sequence libraries. These allow us to use the PIR protein libraries in CODATA format and between release updates of the EMBL nucleotide library. Others may wish to use them to
|
|
produce indexes for libraries such as GenBank. In addition to our own programs the scripts that produce the indexes also use the UNIX sort program. We give no further details here but the programs are described in Staden and Dear, 1992.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 5.2.2.1\tab Library description files.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The following information is only relevent to those installing the sequence libraries on a SUN. To make the sequence library handling as flexible as possible we use several level of files. As stated above, at present we only deal with the EMBL and SWISSPRO
|
|
T libraries as distributed on cdrom and the PIR protein library in CODATA format. By including a "library type" flag in the library description file we also leave open the possibility of using alternative formats. \par
|
|
\pard \s4\qj\sa120\sl280 We describe the libraries at 3 levels\:
|
|
1) a list of libraries and their types, which points to 2) the files which name the libraries individual files and their file types, then, finally 3) the librairies individual files. The files used are described below.\par
|
|
\pard\plain \s7\qj\fi-1120\li1120\sa120\sl280\pagebb\tx1120 \f20 Level 1)\tab The top level file is a list of available libraries which contains\: the library type, the name of the file containing th
|
|
e names of each libraries individual files, and the prompt to appear on the users screen. \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Example\: \par
|
|
\pard \s4\qj\li1100\sa120\sl280 File name\: SEQUENCELIBRARIES\par
|
|
File contents\:\par
|
|
\pard\plain \li1120\sl220 \f4\fs16 A\tab EMBLLIBDESCRP EMBL nucleotide library ! in cdrom format\par
|
|
A\tab SWISSLIBDESCRP SWISSPROT protein library! in cdrom format\par
|
|
\pard \li1120\sa300\sl220 B\tab PIRLIBDESCRP PIR protein library! in CODATA format\par
|
|
\pard\plain \s4\qj\sa180\sl280 \f20 The first two libraries are of type A. The logical names are EMBLLIBDESCRP and SWISSLIBDESCRP, and the prompts are "EMBL nucleotide library" and "SWISSPROT protein library". The third library is o
|
|
f type B with logical name PIRLIBDESCRP. Space is used as a delimiter and anything to the right of a ! is a comment.\par
|
|
\pard\plain \s7\qj\fi-1100\li1100\sa120\sl280\tx1120 \f20 Level 2)\tab The file containing the names of the libraries individual files contains flags to define the file types and the path or logical names of the files. Current file types are\: \par
|
|
\pard\plain \fi100\li980\sl220 \f4\fs16 A\tab Division_lookup\par
|
|
B\tab Entryname_index\par
|
|
C\tab Accession_target\par
|
|
D\tab Accession_hits\par
|
|
E\tab Brief_directory.\par
|
|
F\tab Freetext_target\par
|
|
G\tab Freetext_hits\par
|
|
H\tab Author_target\par
|
|
I\tab Author_hits\par
|
|
\pard\plain \s4\qj\sa120 \f20 Example\par
|
|
\pard \s4\qj\li1120\sa120 File name\: EMBLLIBDESCRP\par
|
|
File contents\:\par
|
|
\pard\plain \fi100\li980\sl220 \f4\fs16 A\tab STADTABL/EMBLdiv.lkp\par
|
|
B\tab /cdrom/indices/embl/entrynam.idx\par
|
|
C\tab /cdrom/indices/embl/acnum.trg\par
|
|
D\tab /cdrom/indices/embl/acnum.hit\par
|
|
E\tab /cdrom/indices/embl/brief.idx\par
|
|
F\tab /cdrom/indices/embl/freetext.trg\par
|
|
G\tab /cdrom/indices/embl/freetext.hit\par
|
|
H\tab /cdrom/indices/embl/author.trg\par
|
|
I\tab /cdrom/indices/embl/author.hit\par
|
|
\pard \li1120\sa300\sl220 \par
|
|
\pard\plain \s7\qj\fi-1120\li1120\sa120\sl280\tx1120 \f20 Level 3)\tab
|
|
The individual library files. The contents of all files below Division_lookup are exactly as they appear on the cdrom. The Division_lookup file is rewritten so the directory structure and file names can be chosen locally. Its format is I6,1x,A. \par
|
|
\pard\plain \s4\qj\sb300\sa180\sl280 \f20 The files which define all the programs and standard data files used by the package\:
|
|
staden.login and staden.profile, define the file SEQUENCELIBRARIES which contains the list of available libraries. As should be clear from the description above the three
|
|
levels need to be created (actually modified from the contents of the distribution tape) and all names can be changed locally, or set to be the same as those on the cdrom.\par
|
|
\pard\plain \s7\qj\fi-1120\li1120\sa120\sl280\tx1120 \f20 \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Example of Division_lookup file \par
|
|
\pard \s4\qj\li1120\sa120\sl280 File name\: STADTABL/EMBLdiv.lkp\par
|
|
Contents\:\par
|
|
\pard\plain \li1120\sl220 \f4\fs16 1\tab /cdrom/embl/fun.dat\par
|
|
2\tab /cdrom/embl/inv.dat\par
|
|
3\tab /cdrom/embl/mam.dat\par
|
|
4\tab /cdrom/embl/org.dat\par
|
|
5\tab /cdrom/embl/phg.dat\par
|
|
6\tab /cdrom/embl/pln.dat\par
|
|
7\tab /cdrom/embl/pri.dat\par
|
|
8\tab /cdrom/embl/pro.dat\par
|
|
9\tab /cdrom/embl/rod.dat\par
|
|
10\tab /cdrom/embl/syn.dat\par
|
|
11\tab /cdrom/embl/una.dat\par
|
|
12\tab /cdrom/embl/vrl.dat\par
|
|
13\tab /cdrom/embl/vrt.dat\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 6.\tab Conventions Used In Text\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Obviously the programs can perform many more operations than there is space to describe but, in the selection of uses shown, we have tried to give some feel for the programs' sco
|
|
pe. For this reason, and the need to conform as closely as possible to the format of the book, we have chosen specific paths through the programs, rather than attempt to describe all routes. For some sections, such as that on the facilities available for e
|
|
diting contigs, this has not been possible and we have instead described how the major commands are used. It should also be noted that the user interactions described in the methods sections are those that would be required if the options were selected in
|
|
the "Execute with dialogue" mode. In practice many of the options would normally be used without any dialogue being required.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
In the section on the user interface we outlined the different modes of obtaining input from users. Throughout the specific chapters we have adopted the following conventions to indicate which mode of input is being employed. When a program requests numeri
|
|
cal or string input we have used the term "Define", as in Define "Minimum search score". When a program requests that a choice is
|
|
made between several options, as in the case of radio buttons or check boxes, we have used the term "Select". When a program offers a choice between two options in the form of a yes or no answer, as in "Hide translation", we use the terms "Accept" or "Reje
|
|
ct". When the digitizer program uses the stylus for input we have used the term "Hit".\par
|
|
\pard \s4\qj\sa120\sl280 Because it is difficult to produce figures including pull down menus and dialogue boxes, almost all examples containing user input are taken from the xterm interface. Ho
|
|
wever the actual wording of the prompts is the same for both interfaces.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
The programs contain routines for drawing scales on plots and for simple annotation, but in general such embellishment is not done automatically by the programs. This is because the programs are designed so that many plots can be superimposed, and it is be
|
|
tter for the user to explicitly decide to add scales and annotation. More elaborate annotation can be added by saving the graphics output to files which can be handled by, say Macinto
|
|
sh, painting and drawing programs. None of the examples of graphical results shown in the following chapters have added scales\: all are exactly as drawn by the programs.\par
|
|
\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \par
|
|
\par
|
|
\par
|
|
\par
|
|
7.\tab NOTES\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 7.1\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Although all the programs in the Macintosh version of the package work, the conversion to this machine was never finished. The package does not provide access to the sequence libraries, handling only simple text files containing sequences, or those generat
|
|
ed by the assembly program SAP. The user interface, although using pu
|
|
ll down menus and dialogue boxes for all interactions, is not as "Mac like" as many would expect. However many people find this version very useful, and for others, the digitizer program alone makes the package worth having. Data input from a digitizer is
|
|
a task suited to a machine like the Macintosh, and the data files can be transferred to a larger machine for assembly and other analysis. With the exception of sequence library access, all the options available in the 1990 VAX version are contained in the
|
|
package (See Staden, 1990). We give no further details specific to the Macintosh version.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 8.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1990. An improved sequence handling package that runs on the Apple Macintosh. Comput. {\i Applic. Biosc}. {\b 4}, 387-393.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Staden, R. and Dear, S. 1992. Indexing the sequence libraries\: Software providing a common indexing system for all the standard sequence libraries. {\i DNA Sequence} {\b 3}, 99-105.\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 3. Sequence Input, Editing and Sequence Library Use\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 1.1\tab Introduction to sequence input\par
|
|
1.2 \tab Introduction to keyboard input\par
|
|
1.3\tab Introduction to input from digitizer\par
|
|
1.4\tab Introduction to editing single sequences\par
|
|
1.5\tab Introduction to using the sequence libraries\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Sequence input from keyboard\par
|
|
2.2\tab Sequence input from digitizer\par
|
|
2.3\tab Sequence input from the Pharmacia A.L.F.\par
|
|
2.4\tab Sequence input from the ABI 373A.\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.5\tab Editing a nucleic acid sequence using restriction sites and a translation and base numbering as landmarks.\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.6\tab Searching the freetext and author indexes of a sequence library\par
|
|
2.7\tab Using accession numbers to retrieve data from a sequence library\par
|
|
2.8\tab Displaying the annotations for an entry in a sequence library\par
|
|
2.9\tab Reading a sequence from sequence library\par
|
|
2.10\tab Worked example of sequence library access\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
4.\tab References\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we describe sequence input and editing and the use of sequence libraries.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.1\tab Introduction to sequence input and editing\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The package contains facilities for input of sequence data from the keyboard, sonic digitizer
|
|
s, and ABI 373A and Pharmacia A.L.F fluorescent sequencing machines. Editing of single sequences can be performed using system editors such as EDT on the VAX and EMACS on the SUN. Editing of sequence alignments is discussed in the chapter on managing sequ
|
|
encing projects.\par
|
|
\pard\plain \s6\sa60\sl280\pagebb\tx560\tx860 \b\f20 1.2\tab Introduction to keyboard input\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The program SAP contains an option to enter sequence at the keyboard. It also creates a file of file names and will list the sequences. Users may choose any 4 keys to represent the characters A, C, G and
|
|
T. For example 4 adjacent keys in the same order as the lanes on a gel could be used. The program translates these symbols to A, C, G and T, and any other characters are left unchanged. No line of input should be longer than 80 characters. Terminate input
|
|
with the symbol @.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.3\tab Introduction to input from digitizer\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Digitisers provide a convenient way of entering sequences from films into a computer. The digitiser, which is connected directly to the computer, operates on a light box, and is controlled by a pr
|
|
ogram named GIP (1). The film to be read is taped firmly to the surface of the light box, and the user defines the lane order and the centres of the four lanes to be read. These positions are defined at the point where reading will commence and the program
|
|
adjusts their values as the film is read. The user reads the sequence and transfers it to the computer by hitting the centres of the bands progressing up the film. Any number of sets of lanes and films can be read in a single run of the program. Each sequ
|
|
ence is stored in a separate file and a file of file names is also written. The program also uses a menu, which is a series of reserved areas of the light box surface, for entering commands and uncertainty codes. When the pen is pressed in these areas the
|
|
program responds accordingly. Each time the pen tip is depressed in the digitizing area the program sounds the bell on the terminal to indicate to the user that a point has been recorded. As the sequence is read the program displays it on the screen.
|
|
\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.4\tab Introduction to editing single sequences\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The editing method used by the programs is designed to give users access to an editor with which they are familiar - i.e. the one on their machine, say EDT on a VAX or EMACS on a UNIX system, and yet to allow them to edit a sequence which contains all the
|
|
landmarks they need in order to know where they are. Users can create a file containing a simple listing of the sequence (single stranded) with numbering, using "list the sequence", and then edit it with their syste
|
|
m editor, using the numbering to know where they are within the sequence. When the edits are complete they exit from the editor and the program "analyses" the edited file to extract only the sequence characters. Similarly a file containing a three phase tr
|
|
anlslation, or a file containing a sequence plus its three phase translation, plus its restriction sites marked above the sequence (see figure 3.1), can be edited. In order to be able to "analyse" such complicated listings and correctly extract the sequenc
|
|
e the following simple rule is used\:
|
|
all lines in the file that contain a character that is not A,C,T,G or U are deleted. It is obviously important to be aware of this rule and its implications. For protein sequences only a simple listing i.e. the sequence plus numbering, can be used.\par
|
|
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 1.5\tab Introduction to using the sequence libraries\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The installation of the sequence libraries is described in the introductory chapter. Direct access to the libraries is provided by all programs that need such a facility\: it is
|
|
not performed by separate programs. The facilities currently offered in NIP, PIP, SIP, NIPL, PIPL, and SIPL include the following\:\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab Get a sequence by knowing its entry name\par
|
|
\tab Get a sequences' annotation by knowing its entry name\par
|
|
\tab Get an entry name by knowing its accession number\par
|
|
\pard\plain \li1120\ri1240\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 HapII\par
|
|
\pard \li1120\ri1240\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth HpaII\par
|
|
MspI MseI\par
|
|
. .HincII\par
|
|
. .HindII\par
|
|
. .HpaI DsaV\par
|
|
. .. EcoRII\par
|
|
. .. TspAI\par
|
|
. .. . ApyI\par
|
|
. .. . BstNI\par
|
|
. .. . MvaI\par
|
|
. .. . ScrFI MaeIII\par
|
|
. .. . . . BsrI MseI\par
|
|
ccggttagactgttaacaacaaccaggttttctactgatataactggttacatttaacgc\par
|
|
10 20 30 40 50 60\par
|
|
P V R L L T T T R F S T D I T G Y I * R\par
|
|
R L D C * Q Q P G F L L I * L V T F N A\par
|
|
\pard \li1120\ri1240\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth G * T V N N N Q V F Y * Y N W L H L T P\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb80\sa120\sl240\tx1140 \f21\fs20 Figure 3.1\tab The first page width of a sequence display that can be edited by the program.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sb360\sa120\sl280\tx560 \f20 \tab Search the author index for author names\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab Search the freetext index for keywords\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The facilities currently offered in NIPL, PIPL and SIPL include\:\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab Search whole library\par
|
|
\tab Search only a list of entry names\par
|
|
\tab Search all but a list of entry names\par
|
|
\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Sequence input from keyboard\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Type in gel readings".\par
|
|
2.\tab Accept "Use special keys for A,C,T,G".\par
|
|
3.\tab Define the keys in turn.\par
|
|
4.\tab Define "File file names". A file of file names so the readings can be processed as a batch.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define in the sequence by typing it in using the selected keys. Finish by typing an @ symbol.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "File name for this gel reading". This is the name for the sequence just entered.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Accept "Type in another reading". This cycles round to step 5. If rejected the next step follows.\par
|
|
8.\tab Accept "List gel readings". The batch of readings entered will each be listed, one after the other, headed by their file names, on the screen.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Sequence input from digitizer\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Tape the autoradiograph down securely on the light box.\par
|
|
2.\tab Start the program (GIP).\par
|
|
3.\tab Define "File of file names".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Using the digitizer pen hit the digitizer menu ORIGIN, program menu ORIGIN, program menu START.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab After the bell has sounded the program will give the default lane order. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab If correct hit CONFIRM otherwise hit RESET. To reset the lane order hit the A,C,G,T boxes in the menu in left to right order.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Hit START, then hit in left to right order, at a height level with the first band to be read, the start positions for the next four lanes. The progr
|
|
am will report the mean lane separations and asks for confirmation that they are correct.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Hit START\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Hit the bands on the film in sequence order. If necessary use the uncertainty codes in the program menu. Continue until the sequence is finished.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Hit STOP.\par
|
|
10.\tab Define "Name for this reading".\par
|
|
11.\tab Accept "Read another sequence". Otherwise the program will stop.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Sequence input from the Pharmacia A.L.F.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 After processing and base calling on the PC the data for all 10 clones is contained in a single f
|
|
ile, and the user names each using local conventions. Then this single file is transfered to the SUN using PC-NFS. This program allows SUN directories to be mounted as if they were DOS disks and data can be transfered by use of the DOS copy command. On th
|
|
e SUN, to prepare for processing by program XBAP the 10 clones are split into 10 separate files each with the names given on the PC. In addition a file of file names is written Then the reads for the individual clones need to be examined to clip off the v
|
|
ector sequence and the poor data at the 5' end. See note 2.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Sequence input from the ABI 373A.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 After processing and base calling on the Macintosh the data for each clone is contained in 2 files\:
|
|
one is simply the sequence but the main file contains the raw data, trace data and sequence. For our processing we do not use the sequence file as we can ex
|
|
tract all we need from the main file. The user names each file using local conventions and then the folder is transfered to the SUN using TOPS. This program
|
|
allows SUN directories to be mounted as if they were on the Macintosh and data can be transfered by simply dragging folders on the Macintosh screen. On the SUN, to prepare for processing by program XBAP, a file of file names is written and the reads for t
|
|
he individual clones are examined to clip off the vector sequence and the poor data at the 5' end. See note 2.\par
|
|
\pard\plain \s6\fi-560\li560\sb240\sa120\sl280\tx560\tx980 \b\f20 2.5\tab Editing a nucleic acid sequence using restriction sites and a translation and base numbering as landmarks.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select NIP.\par
|
|
2.\tab Read in the sequence to be edited.\par
|
|
3.\tab Direct output to disk, say creating file edit.seq.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Use the restriction enzyme site search routine (See the relevant chapter) to create a file showing "Names above the sequence", as in figure 3.1.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Close the redirection file.\par
|
|
6.\tab Select "Edit the sequence". \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Name of file to edit". This is the file containing the sequence listing, say edit.seq.The sytem editor will start up.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Edit the sequence.\par
|
|
9.\tab Exit from the editor.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Accept "Make edited sequence active". The edited sequence will replace the original sequence. \par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Searching the freetext (or author) index of a sequence library\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Sequence library". The alternative is "Personal file", and if taken would be followed by questions about which of the formats "Staden, EMBL, GenBank, PIR, GCG or FASTA" it was stored in.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select, say, "EMBL nucleotide library".\par
|
|
4.\tab Select "Search text index for keywords".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Keywords". Type up to 5 keywords separated by spaces - i.e.space is the delimiting character (see note below about author searches).\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab
|
|
The search will start and for each match the program will display the contents of the matching line which includes the entry name, primary accession number, its length and a 80 character description. After every 20 matches the program will ring the bel
|
|
l and the user can escape by typing "!".\par
|
|
\tab The commands for searching the author index are effectively the same. Note that for authors it is useful to be able to link words together for names s
|
|
uch as De Gaule or von Meyenberg. The symbol underscore (_) can be used for this purpose - e.g. De_Gaule or von_meyenberg. The same facility is available for the keyword searches.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.7\tab Using accession numbers to retrieve data from a sequence library\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par
|
|
2.\tab Select "Sequence library".\par
|
|
3.\tab Select, say, "EMBL nucleotide library".\par
|
|
4.\tab Select "Get entry names from accession numbers".\par
|
|
5.\tab Define "Accession number". \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab The program will display the entry names corresponding to the accession number. The last entry name found will become the default entry name.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.8\tab Displaying the annotations for an entry in a sequence library\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par
|
|
2.\tab Select "Sequence library".\par
|
|
3.\tab select, say, "EMBL nucleotide library".\par
|
|
4.\tab Select "Get annotations".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Entry name". The program will display the annotation for the entry. After every 20 lines the program will ring the bell and the user can escape by typing "!".\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.9\tab Reading a sequence from a sequence library\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Read new sequence".\par
|
|
2.\tab Select "Sequence library".\par
|
|
3.\tab Select, say, "EMBL nucleotide library".\par
|
|
4.\tab Select "Get a sequence".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Entry name". The program will make the sequence the active sequence and display its base composition.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.10\tab Worked example of sequence library access\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The worked example in figure 3.2 shows a search of the text index for the keywords p53 and mouse, followed by a search of the author index for the names sanger and coulson, followed by search on accession number v00636, followed by "Get annotatio
|
|
ns" for entry lambda, and finally "Get a sequence" for entry lambda. \par
|
|
\pard\plain \sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 {\f22\fs18 Select sequence source\par
|
|
}\pard \sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 X 1 Personal file\par
|
|
2 Sequence library\par
|
|
? Selection (1-2) (1) =2\par
|
|
Select a library\par
|
|
X 1 EMBL 29 nucleotide library Dec 91\par
|
|
2 SWISSPROT 20 protein library Nov 91\par
|
|
3 PIR 31 protein library Dec 91\par
|
|
4 NRL3D 58 From Brookhaven protein library Dec 91\par
|
|
5 GenBank example\par
|
|
? Selection (1-5) (1) =\par
|
|
Library is in EMBL format with indexes\par
|
|
Select a task\par
|
|
X 1 Get a sequence\par
|
|
2 Get annotations\par
|
|
3 Get entry names from accession numbers\par
|
|
4 Search author index\par
|
|
5 Search text index for keywords\par
|
|
? Selection (1-5) (1) =5\par
|
|
Search for keywords\par
|
|
? Keywords=p53 mouse\par
|
|
P53 hits 73\par
|
|
MOUSE hits 10140\par
|
|
\'00\par
|
|
MMANT01 X00875 536 Murine gene fragment for cellular tumour antigen\par
|
|
MMANT02 X00876 83 Murine gene fragment for cellular tumour antigen\par
|
|
MMANT03 X00877 21 Murine gene fragment for cellular tumour antigen\par
|
|
MMANT04 X00878 261 Murine gene fragment for cellular tumour antigen\par
|
|
MMANT05 X00879 184 Murine gene fragment for cellular tumour antigen\par
|
|
MMANT06 X00880 113 Murine gene fragment for cellular tumour antigen\par
|
|
MMANT07 X00881 110 Murine gene fragment for cellular tumour antigen\par
|
|
MMANT08 X00882 137 Murine gene fragment for cellular tumour antigen\par
|
|
}\pard \sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 MMANT09 X00883 74 Murine gene fragment for cellular tumour antigen\par
|
|
MMANT10 X00884 107 Murine gene for cellular tumour antigen p53 (exon\par
|
|
MMANT11 X00885 562 Murine p53 gene 3' region with exon 11\par
|
|
MMANTP53 M26862 536 Mouse tumor antigen p53 gene, 5' end.\par
|
|
MMLYN M64608 2044 Mouse lyn protein mRNA, complete cds.\par
|
|
MMP53 X00741 1377 Mouse mRNA for transformation associated protein\par
|
|
MMP53A M13872 1285 Mouse p53 mRNA, complete cds, clone pcD53.\par
|
|
MMP53B M13873 1241 Mouse p53 mRNA, complete cds, clone p53-m11.\par
|
|
MMP53C M13874 1322 Mouse p53 mRNA, complete cds, clone p53-m8.\par
|
|
MMP53G1 X01235 554 Mouse genomic DNA for 5' region of cellular tumou\par
|
|
MMP53IN4 X60470 729 M.musculus p53 gene for p53 protein, intron 4\par
|
|
\'00\par
|
|
MMP53P X01236 2132 Mouse pseudogene for cellular tumour antigen p53\par
|
|
MMP53R X01237 1773 Mouse mRNA for cellular tumour antigen p53\par
|
|
MMRSB2P5 M64597 196 Mouse B2 repeat in the 3' flank of protein 53 (p5\par
|
|
MMSFFV1 X64656 165 M.musculus Friend spleen focus forming virus (SFF\par
|
|
MMSFFV2 X64657 142 M.musculus Friend spleen focus forming virus (SFF\par
|
|
24 different entries found\par
|
|
\'00\par
|
|
Select a task\par
|
|
X 1 Get a sequence\par
|
|
2 Get annotations\par
|
|
3 Get entry names from accession numbers\par
|
|
4 Search author index\par
|
|
5 Search text index for keywords\par
|
|
? Selection (1-5) (1) =4\par
|
|
Search for keywords\par
|
|
? Keywords=coulson sanger\par
|
|
COULSON hits 935\par
|
|
SANGER hits 15\par
|
|
\'00\par
|
|
LAMBDA V00636 48502 Genome of the bacteriophage lambda (Styloviridae)\par
|
|
MIBTXX V00654 16338 Complete bovine mitochondrial genome.\par
|
|
MIHSCG J01415 16569 Human mitochondrion, complete genome.\par
|
|
MIHSM1 M10546 2771 Human mitochondrial DNA, fragment M1, encoding tr\par
|
|
MIHSXX V00662 16569 H.sapiens mitochondrial genome\par
|
|
MIPX1C01 M10860 130 Bacteriophage phi-X174, nucleotides 3920-4049.\par
|
|
MIPX1C02 M10861 115 Bacteriophage phi-X174, nucleotides 3480-3595.\par
|
|
MIPX1C03 M10862 121 Bacteriophage phi-X174, nucleotides 4260-4380.\par
|
|
MIPX1CTI M10849 130 Bacteriophage phi-X174, nucleotides 3389-3520.\par
|
|
PHIX174 V01128 5386 Bacteriophage phi-X174 (cs70 mutation) complete g\par
|
|
R17CPRAA M24826 61 Bacteriophage R17 coat protein RNA fragment.\par
|
|
11 different entries found\par
|
|
\'00\par
|
|
Select a task\par
|
|
X 1 Get a sequence\par
|
|
2 Get annotations\par
|
|
3 Get entry names from accession numbers\par
|
|
4 Search author index\par
|
|
5 Search text index for keywords\par
|
|
? Selection (1-5) (1) =3\par
|
|
? Accession number=v00636\par
|
|
Entry name LAMBDA\par
|
|
Select a task\par
|
|
X 1 Get a sequence\par
|
|
2 Get annotations\par
|
|
3 Get entry names from accession numbers\par
|
|
4 Search author index\par
|
|
5 Search text index for keywords\par
|
|
? Selection (1-5) (1) =2\par
|
|
Default Entry name=LAMBDA\par
|
|
? Entry name=\par
|
|
ID LAMBDA standard; DNA; PHG; 48502 BP.\par
|
|
}\pard \sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 XX\par
|
|
AC V00636; J02459; M17233; X00906;\par
|
|
XX\par
|
|
DT 09-JUN-1982 (Rel. 01, Created)\par
|
|
DT 03-JUL-1991 (Rel. 28, Last updated, Version 3)\par
|
|
XX\par
|
|
DE Genome of the bacteriophage lambda (Styloviridae).\par
|
|
XX\par
|
|
KW circular; coat protein; DNA binding protein; genome;\par
|
|
KW origin of replication.\par
|
|
XX\par
|
|
OS Bacteriophage lambda\par
|
|
OC Viridae; ds-DNA nonenveloped viruses; Siphoviridae.\par
|
|
XX\par
|
|
RN [1]\par
|
|
RP 1-48502\par
|
|
RA Sanger F., Coulson A.R., Hong G.F., Hill D.F., Petersen G.B.;\par
|
|
RT "Nucleotide sequence of bacteriophage lambda DNA";\par
|
|
RL J. Mol. Biol. 162\:729-773(1982).\par
|
|
XX\par
|
|
\'00\par
|
|
Select a task\par
|
|
X 1 Get a sequence\par
|
|
2 Get annotations\par
|
|
3 Get entry names from accession numbers\par
|
|
4 Search author index\par
|
|
5 Search text index for keywords\par
|
|
? Selection (1-5) (1) =\par
|
|
Default Entry name=LAMBDA\par
|
|
? Entry name=\par
|
|
DE Genome of the bacteriophage lambda (Styloviridae).\par
|
|
Sequence length 48502\par
|
|
Sequence composition\par
|
|
T C A G -\par
|
|
11988. 11360. 12336. 12818. 0.\par
|
|
}\pard \sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth {\f22\fs18 24.7% 23.4% 25.4% 26.4% 0.0%\par
|
|
}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 3.2\tab A worked example of sequence library use.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab NOTES\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
|
|
The program menu for GIP is simply a set of boxes drawn on the digitizing surface that each contain a command or uncertainty code. Right handed users will find it is best to position the menu to the right of the digitizing
|
|
area, but in practice as long as its top edge is parallel to the digitizer box, it can be put anywhere in the active region. As well as the codes a,c,g,t,1,2,3,4,b,d,h,v,r,y,x,-,5,6,7,8 the following commands are included in the menu\:
|
|
DELETE removes the la
|
|
st character from the sequence; RESET allows the lane centres to be redefined; START means begin the next stage of the procedure; STOP means stop the current stage in the procedure; CONFIRM means confirm that the last command or set of coordinates are corr
|
|
ect. \par
|
|
\tab
|
|
The digitizing device also has a menu of its own. This lies in a two inch wide strip immediately in front of the digitizing box. Pen positions within this two inch strip are interpretted as commands to the digitizer and are not sent to the GIP program. In
|
|
general the only time users will need to use the device menu is when they tell GIP where the program menu lies in the digitizing area. This is done by first hitting ORIGIN in the device menu and then hitting the bottom left hand corner of the progra
|
|
m menu. If the bell does not sound after hitting START try hitting METRIC in the device menu (the program uses metric units, and some digitizers are set to default to use inches; hitting metric switches between the two).\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab
|
|
The user should try to hit the bands as near as possible to the centre of the lanes because the program tracks the lanes up the film using the pen positions. If the lane centres get too close the program stops responding to the pen positions of bands and
|
|
hence does not ring the bell. If t
|
|
his occurs users must hit the reset box in the menu and the program will request them to redefine the lane centres at the current reading position. Then they can continue reading. As a further safeguard the program will only respond to pen positions either
|
|
in the menu or very close to the current reading position.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Details about preparing the data from fluorescent sequencing machines for processing by XBAP are contained in the notes for the chapter on managing sequencing projects. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab All of the operation
|
|
s described for the EMBL nucleotide library can be performed in exactly the same way for GenBank and the SWISSPROT and PIR protein libraries. For keyword searching the freetext index is most useful because it contains all words in feature tables, definiti
|
|
on lines, title lines, keywords and comment lines. The searches are very fast. The search will find all words that start with the given keywords\:
|
|
e.g. keyword sugar will match with sugar, sugaractivating, sugars, etc. When several keywords are used together, only entries indexed on all the words will be reported. On the VAX, EMBL, GenBank, SWISSPROT and PIR can all be processed. \par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1984. A computer program to enter DNA gel reading data into a computer. {\i Nucl. Acids Res}. {\b 12}, 499-503.\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 4. Managing Sequencing Projects\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Starting a project database\par
|
|
2.2\tab Screening against restriction enzyme recognition sequences\par
|
|
2.3 \tab Screening against vector sequences\par
|
|
2.4 \tab Entering readings in to the project database (assembly)\par
|
|
2.5\tab Searching for internal joins\par
|
|
2.6\tab Editing in XBAP\par
|
|
2.7\tab Joining contigs interactively in XBAP\par
|
|
2.8\tab Selecting primers and templates\par
|
|
2.9\tab Examining the quality of a consensus\par
|
|
2.10\tab Using graphical displays to examine contigs\par
|
|
2.11\tab Disassembling contigs\par
|
|
2.12\tab Shuffling pads\par
|
|
2.13\tab Displaying a contig\par
|
|
2.14\tab Highlighting differences between readings and the consensus\par
|
|
2.15\tab Screen editing contigs in SAP\par
|
|
2.16\tab Automatic editing in SAP\par
|
|
2.17\tab Using the original editor in SAP\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
4.\tab References\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Data input, assembly, checking and editing are the major tasks of sequence project management. Data input is described in a previous chapter and here we cover everything else. The programs can deal with data derived from autoradiographs and from automated
|
|
gel reading machines such as the Applied Biosystems 373A and the Pharmacia A.L.F. and film readers such as the Amersham scanner \par
|
|
\pard \s4\qj\sa120\sl280 We describe two alternative programs for managing sequencing projects. They contain the same assembly and vector screen
|
|
ing routines but they differ in their editing methods. One program SAP (see references 1 and 2) can be operated from simple terminals and emulators but the other XBAP (3) requires an X terminal or emulator. XBAP contains a superior editor plus the facility
|
|
to annotate sequences and display the coloured traces for data derived from fluorescent sequencing machines. Those using autoradiographs will find that SAP is adequate but XBAP is essential for users of fluorescent sequencing machines. Readers should note
|
|
that several of the methods for displaying contigs described below are probably of value only to those unable to use the screen based contig editor in XBAP.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
Fluorescent sequencing machines provide machine readable data. This means, given appropriate software, that while making editing decisions the user can see, displayed on the screen, the coloured traces used to derive the sequence. However data from these
|
|
machines requires some extra processing. First the machines tend to produce long sequences with po
|
|
or quality at their 3' ends and so we have to decide how much of the data to use. Secondly the sequencing machine does not recognise the primer region (as the user would) so we need to have some way of removing it from the data. The poor quality data from
|
|
both ends of the sequence and the vector sequences are identified non-interactively by programs clip-seqs and vep. Alternatively these tasks can be performed interactively using program TED (4). We term the data from the 3' end of a reading that is not emp
|
|
loyed in the assembly process "unused" sequence. Note that we do not lose this data but simply ignore it until such time as it can be useful for locating joins between contigs, or for double stranding regions of the sequence.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
The method described here uses a database to store all the data for each sequencing project. The individual sequence readings derived from autoradiographs or from sequencing machines are initially stored in separate files but the program copies them into t
|
|
he database during the assembly process. For normal operation the program handles batches of readings - say 24 from a film or machine run. Batch processing is achieved by use of files of file names. \par
|
|
\pard \s4\qj\sa120\sl280 Depending on the strategy employed and the stage of the project the following operations may be performed.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sb100\sa120\sl280\tx560 \f20 1)\tab Start a project database.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2)\tab Select primers and templates.\par
|
|
3)\tab Obtain readings.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4)\tab Put individual readings into the computer and write a file of file names. For data derived from fluorescent sequencing machines choose which data from
|
|
the 3' end of the reading should not be used for the assembly process.\par
|
|
5)\tab Screen the batch against any vectors that may be present, excising any vector sequence found and passing to the next step, the names of those readings that contain some non-vector sequence.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6)\tab Screen the batch against any restriction sites whose presence would indicate a problem, passing those that do not match on to the next step.\par
|
|
7)\tab Compare each reading in the batch with the current contents of the project database adding them to the contigs they overlap, joining contigs or starting new contigs.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8)\tab
|
|
Check the number of contigs and the quality of the consensus sequence and plan further experiments. Try to join contigs by searching for overlaps between their ends. (This is particularly useful for those using data from fluorescent sequencing machines,
|
|
where although the 3' end of the sequence is not good enough for automatic assembly, it can be valuable for finding overlaps between contigs).\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9)\tab Edit the contigs to resolve dissagreements.\par
|
|
10)\tab Produce a consensus sequence.\par
|
|
11)\tab Analyse the consensus sequence, possibly discovering further errors.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Subsets of these operations will be cycled through repeatedly. A pure shotgun strategy would continue using steps 3-7, a pure primer walking strategy would also include step 2. A number of the steps require almost no user intervention, however checking qua
|
|
lity and final editing decisions are still interactive procedures. The program contains several options, such as displays of the overlapping reading
|
|
s in a contig, to help indicate, not only the poorly determined regions, but also which clones could be resequenced to resolve ambiguities, or those which can usefully be extended or sequenced in the reverse direction, to cover difficult regions. It is bes
|
|
t to use a command procedure or script for handling steps 5-7.\par
|
|
\pard \s4\qj\sa120\sl280 For our projects we have a script which users employ by typing "assemble filename", where filename is the file of file names for the current batch of readings. This script calls all the necessa
|
|
ry options in SAP or BAP (see notes) in order to make a backup of the database, screen against any vectors, assemble readings and print a report. In the text below we describe how these operations are performed interactively. \par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Starting a project database\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The assembled data for each project is stored in a database. At the beginning of a project it is necessary to create an empty database using program SAP or XBAP.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Open database"\par
|
|
2.\tab Select "Start new database"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define the database name. Database names can have from one to 12 letters and must not include full stop (.). \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Database is for DNA"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
|
|
Define "Database size". This is an initial size and if necessary can be increased later using "Copy database". Roughly speaking it is the number of readings expected to be needed to complete the project. Currently BAP limits the maximum to 8000 and SAP
|
|
has a limit of 1000.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Maximum reading length". This is the length of the longest reading that will be added to the database. The minimum is 512 bases, and the maximum 4096.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The program should confirm that "copy 0" of the database has been started. See Note 14 for important information.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Screening against restriction enzyme recognition sequences\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
For some strategies it is necessary to compare readings against any restriction enzyme recognition sequences that may have been used during cloning and which should not be present in the data. The function operates on single readings or processes batches a
|
|
ccessed through files of file names. The algorithm looks for exact matches to recognition sequences. The recognition sequences should be stored in a simple text file with one recognition sequence per record.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Accept "Use file of filenames".\par
|
|
2.\tab Define "File of gel reading names". The input file of file names.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
Define "File for names of sequences that pass". A file of file names for those readings that do not contain the recognition sequences. After the run it will contain the names of all the files in the batch that do not match any
|
|
of the restriction enzyme recognition sequences. Hence it can be used for further processing of the batch.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "File name of recognition sequences". The name of the file of recognition sequences.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Screening against vector sequences\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
For most strategies it is necessary to compare readings against any vector sequences that may have been picked up during cloning. The package contains two routines for screening against vectors. The original function simply reports any matches between the
|
|
readings and t
|
|
he vector sequences and only passes on those that do not match. This function should now only be used to screen for any other sequences that should be excluded from the database, because the newer one (program name VEP for vector excising program) is capab
|
|
le of both finding the vector sequences and editing them out automatically. \par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.3.1\tab Clipping off vector sequences\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 There are two types of vector that may need to be screened out of gel readings\: the sequencing vector and, for cases where, say, whole cosmids
|
|
have been shotgunned, the cloning vector. The two tasks are different. When screening out the sequencing vector we may expect to find data to exclude, both from the primer region and from the other side of the cloning site (when, for example, the insert i
|
|
s short). When screening out cosmid vector we may find that either the 5' end, or the 3' end, or the whole of the sequence is vector. Also for the cosmid search we need to compare both strands of the sequence. The program (VEP) works slightly differently f
|
|
or each of the two cases. Having read the vector sequence from a file the program asks for the "Position of the cloning site". A value of zero signifies that the search will be for the cosmid vector. A nonzero value signifies that the search is for the seq
|
|
uencing vector, and so in this case the program then asks for the "Relative position of the primer site". A negative relative position signifies that a reverse primer is being used, otherwise a forward primer is assumed.\par
|
|
\pard \s4\qj\sa120\sl280 The program screens a batch of read
|
|
ings using a file of file names and creates a new file of file names which contains the names of all those sequences that include some nonvector sequence. For each sequence that contains some vector it writes out a new copy of the file in which the vector
|
|
portion is identified.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
The search, which uses a hashing algorithm, is very rapid. Users specify a "Word length", the "Number of diagonals to combine" and a "Minimum score". The word length is the minimum number of consecutive bases that will count as a mat
|
|
ch. The algorithm treats the problem like a dot matrix comparison and finds the diagonal with the highest score. Then it adds the scores for the adjacent "Minimum number of diagonals to combine". If the combined score is at least "Minimim score" the sequen
|
|
ce is marked to indicate that it contains vector. The score represents the proportion of a diagonal that contains matching words, so the maximum score for any diagonal is 1.0.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Define "Input file of file names". This is the file containing the names of all the readings to be screened.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "File name of vector sequence". \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
Define "Position of cloning site". This is the base number, relative to the beginning of the vector sequence, that is on the 3' side of the insert site. For example for m13mp18 the SmaI site is at 6249. A zero value signifies that the search is for cosm
|
|
id vector.\par
|
|
4.\tab Define "Relative position of 3' end of primer site". This is the position, relative to the cloning site, of the first base that could be included in the sequence. For m13mp18, the 17mer Sequencing Primer and the SmaI site, the position is 41.
|
|
\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Word length". Only words of this length will be counted as matches.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Number of diagonals to combine". The scores for this number of diagonals around the highest scoring diagonal will be combined to give the total score.\par
|
|
7. \tab Define "Cutoff score". For a match, at least this proportion of the total length of the summed diagonals must contain identical words. \par
|
|
8.\tab Define "Output file of passed file names". The name of the file to contain the names of the readings to pass on to the assembly program.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Processing will commence and finishes with a summary stating the number of files processed, the number completely vector, the number partly vector and the number free of vector.\par
|
|
\pard\plain \s9\fi-560\li860\sb160\sa60\sl280\tx1140 \b\f20 2.3.2\tab Screening for "vectors"\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This function is contained in both SAP and XBAP and operates on single readings or processes batches accessed through files of file names. The algorithm looks for exact matches of length "minimum match length" and disp
|
|
lays the overlapping sequences.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Accept "Use file of filenames".\par
|
|
2.\tab Define "File of gel reading names". The input file of file names.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
Define "File for names of sequences that pass". A file of file names for those readings that do not contain the vector sequence. After the run it will contain the names of all the files in the batch that do not match the vector sequence. Hence it can be
|
|
used for further processing of the batch.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "File name of vector sequence". The name of the file containing the vector sequence.\par
|
|
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Entering readings into the project database (Assembly)\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Readings are entered into the database using the auto assemble function. This function compares each reading and its complement with a consensus of all the readings already stored in the database. If it finds any overlaps it aligns the overlapping sequence
|
|
s by inserting padding characters, and then adds the new reading to the database. Readings that overlap are added to existing contigs and readings that do not overlap any data in
|
|
the database start new contigs. If a new reading overlaps two contigs they are joined. Any readings that appear to overlap but which cannot be aligned sufficiently well are not entered and have their names written to a file of failed gel reading names. Not
|
|
e that it is possible that a reading may align well with two contigs (indicating a possible join) but that after it has been added to one of the contigs, the two contigs do not align sufficiently well. In this case, although the reading has been entered in
|
|
to the database its name will also be added to the file of failed readings. Alignments using more than the maximum number of paddings characters, or exceeding the maximum mismatch may be displayed, but the readings will not be entered into the database. It
|
|
is advisable to set the consensus cutoff to 51% before running the assembly routine as this will improve the alignments. A typical run of the assembly routine is shown in figure 4.1.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Accept "Permit entry"\par
|
|
2.\tab Accept "Use file of file names"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "File of gel reading names". The name of the input file of file names, probably passed on from "Screen against vector".\par
|
|
4.\tab Define "File for names of failures". A file to contain the names of the readings that the program fails to enter, or for which joins are not made.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Perform normal shotgun assembly"\par
|
|
6.\tab Accept "Permit joins"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Minimum initial match". Only possible overlaps containing exact matches of at least this number of consecutive identical characters will be considered for alignment.\par
|
|
8.\tab Define "Maximum number of pads per reading" This is the maximum number of padding characters permitted in any new reading during the alignment procedure\par
|
|
9.\tab Define "Maximum number of pads per reading in contig" This is the maximum number of padding characters permitted in the contig in order to align any new reading.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define "Maximum percent mismatch after alignment"\par
|
|
\pard\plain \li560\ri500\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Automatic sequence assembler\par
|
|
\pard \li560\ri500\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Database is logically consistent\par
|
|
? (y/n) (y) Permit entry\par
|
|
? (y/n) (y) Use file of file names\par
|
|
? File of gel reading names=demo.nam\par
|
|
? File for names of failures=demo.fail\par
|
|
Select entry mode\par
|
|
X 1 Perform normal shotgun assembly\par
|
|
2 Put all sequences in one contig\par
|
|
3 Put all sequences in new contigs\par
|
|
? Selection (1-3) (1) =\par
|
|
? (y/n) (y) Permit joins\par
|
|
? Minimum initial match (12-4097) (15) =\par
|
|
? Maximum pads per gel (0-25) (8) =\par
|
|
? Maximum pads per gel in contig (0-25) (8) =\par
|
|
? Maximum percent mismatch after alignment (0.00-15.00) (8.00) =\par
|
|
\par
|
|
Results skipped to save space\par
|
|
\par
|
|
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\par
|
|
Processing 4 in batch\par
|
|
Gel reading name=hinw.009 \par
|
|
Gel reading length= 292\par
|
|
Working\par
|
|
Contig 1 position 263 matches strand 1 at position 14\par
|
|
Contig 2 position 1 matches strand 1 at position 156\par
|
|
\pard \li560\ri500\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Total matches found 2\par
|
|
Trying to align with contig 1\par
|
|
Padding in contig= 1 and in gel= 0\par
|
|
Percentage mismatch after alignment = 2.9\par
|
|
Best alignment found\par
|
|
251 261 271 281\par
|
|
aattacagcg tt,cctattg acgggcgcat ccac\par
|
|
********** ** ** **** ********** ****\par
|
|
aattacagcg ttcccvattg acgggcgcat ccac\par
|
|
1 11 21 31\par
|
|
Trying to align with contig 2\par
|
|
Padding in contig= 0 and in gel= 2\par
|
|
Percentage mismatch after alignment = 1.4\par
|
|
Best alignment found\par
|
|
1 11 21 31 41 51\par
|
|
tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par
|
|
********** ********** ********** ********** ********** **********\par
|
|
tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par
|
|
156 166 176 186 196 206\par
|
|
61 71 81 91 101 111\par
|
|
tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccggcagc gcccacactg\par
|
|
********** ********** ********** ********** ***** ** * **********\par
|
|
tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccg,ca,c gcccacactg\par
|
|
216 226 236 246 256 266\par
|
|
121 131\par
|
|
ctcagacgac ggtcgctgc\par
|
|
********** *********\par
|
|
ctcagacgac ggtcgctgc\par
|
|
276 286\par
|
|
Overlap between contigs 2 and 1\par
|
|
Length of overlap between the contigs= -122\par
|
|
Entering the new gel reading into contig 1\par
|
|
This gel reading has been given the number 4\par
|
|
Working\par
|
|
Trying to align the two contigs\par
|
|
Padding in contig= 2 and in gel= 0\par
|
|
Percentage mismatch after alignment = 1.5\par
|
|
Best alignment found\par
|
|
406 416 426 436 446 456\par
|
|
tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par
|
|
********** ********** ********** ********** ********** **********\par
|
|
tgcacgacat cgagtatgag agttatatcc cgggcgcgct ctgcttgtac atggacctca\par
|
|
1 11 21 31 41 51\par
|
|
466 476 486 496 506 516\par
|
|
tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccg,ca,c gcccacactg\par
|
|
********** ********** ********** ********** ***** ** * **********\par
|
|
tgtacctctt tgtctccgtg ctctacttca tgccctccga gcccggcagc gcccacactg\par
|
|
61 71 81 91 101 111\par
|
|
526 536\par
|
|
ctcagacgac ggtcgct\par
|
|
********** *******\par
|
|
ctcagacgac ggtcgct\par
|
|
121 131\par
|
|
Editing contig 1\par
|
|
\pard \li560\ri500\sa100\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Completing the join between contigs 1 and 2\par
|
|
(Results for other readings skipped to save space)\par
|
|
\pard \li560\ri500\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Batch finished\par
|
|
9 sequences processed\par
|
|
9 sequences entered into database\par
|
|
\pard \li560\ri500\sa100\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth 2 joins made\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb60\sa120\sl240\tx1140 \f21\fs20 Figure 4.1\tab Part of a typical run of "Auto assemble".\par
|
|
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Searching for internal joins \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The purpose of this function is to use data already in the database to find possible joins between contigs. Although most joins will be made automatically during assembly, due to poor alignments, some may not have been done. The function is particularly us
|
|
eful for sequences from fluorescent sequencing machines because it may be possible to find potential joins within the unused data from the 3' ends of readings. For each potential
|
|
join found, when the X version is used, the contig joining editor is automatically called up with the two contigs aligned in the edit windows.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
The program strategy is as follows. Take the first contig and calculate its consensus. If unused data is being employed, examine all readings that are in the complementary orientation, and sufficiently near to the contigs left end, to see if they have suff
|
|
iciently good unused sequence which, if present, would protrude from the left end of the contig. If found add th
|
|
e longest such sequence to the left end of the consensus. Do the same for the right end by examining readings that are in their original orientation. Repeat the consensus calculations and extensions for all contigs hence producing an extended consensus for
|
|
the whole database. If unused data is not being employed simply calculate the consensus for the whole database. Now look for possible joins by processing the extended consensus in the following way. Take the last, say 500, bases (termed the "probe length"
|
|
by the program) of the rightmost consensus, compare it in both orientations with the extended consensus of all the other contigs. Display any sufficiently good alignments. Repeat with the left end of the rightmost contig. Do the same for the ends of all t
|
|
he contigs, always comparing only with the contigs to their left, so that the same matches do not appear twice. \par
|
|
\pard \s4\qj\sa120\sl280 Good unused data is defined by sliding a window of "Window size for good data scan" bases outwards along the sequence and stopping when greater
|
|
than "Maximum number of dashes in scan window" appear in the window. Note that it is advisable to have some sort of cutoff because if we simply take all the data it might be of such poor quality that we wont find any good matches. An initial run employing
|
|
no unused data is also recommended. Sufficiently good alignments are defined by criteria equivalent to those used in auto assemble, however here we only display alignments that pass all tests.\par
|
|
\pard \s4\qj\sa120\sl280 All numbering is relative to base number one in the contig\: ma
|
|
tches to the left (i.e. in the unused data) have negative positions, matches off the right end of the contig (i.e. in the unused data) have positions greater than the contig length. The convention for reporting the orientations of overlaps is as follows\:
|
|
i
|
|
f neither contig needs to be complemented the positions are as shown. If the program says "contig x in the - sense" then the positions shown assume contig x has been complemented. For example in the results given in figure 4.2 the positions for the first o
|
|
verlap are as reported, but those for the second assume that the contig in the minus sense (i.e. 443) has been complemented.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find internal joins".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Minimum initial match". Only matches containing this number of consecutive identical characters will be found.\par
|
|
3.\tab Define "Maximum pads per sequence". Only alignments containing less than or equal this number of padding characters in each sequence will be found.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Maximum percent mismatch after alignment". Only alignments with at lea
|
|
st this level is similarity will be found. Particularly when poor data from the 3' ends of sequences derived from fluorescent sequencing machines is used, it is important to allow for a high degree of mismatch - say around 75%.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Probe length". This is the size of sequence from each end of each contig, that is compared with the total length of all other contigs.\par
|
|
6.\tab Accept "Employ unused data". This means, where available, add the unused data from the 3' ends of sequences, to the ends of the contigs.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab
|
|
Define "Window size for good data scan". To decide how much of the unused data should be added to the end of a contig the program scans outwards, counting the numbers of dashes (-) over a window of the size defined here.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Number of dashes in scan window". If the program finds this many dashes in the scan window it will add no more of the unused data to the end of the contig.\par
|
|
\pard\plain \qj\li680\ri780\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Possible join between contig 445 in the + sense and contig 405\par
|
|
\pard \li680\ri780\sl220\box\brsp100\brdrth Percentage mismatch after alignment = 4.9\par
|
|
412 422 432 442 452 462\par
|
|
405 TTTCCCGACT GGAAAGCGGG CAGTGAGCGC AACGCAATTA ATGTGAG,TT AGCTCACTCA\par
|
|
********* * ******** ***** *** ********** ********** **********\par
|
|
445 -TTCCCGACT G,AAAGCGGG TAGTGA,CGC AACGCAATTA ATGTGAG-TT AGCTCACTCA\par
|
|
-127 -117 -107 -97 -87 -77\par
|
|
472 482 492 502 512\par
|
|
405 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT\par
|
|
********** ********** ********** ********** **\par
|
|
445 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT\par
|
|
-67 -57 -47 -37 -27\par
|
|
Possible join between contig 443 in the - sense and contig 423\par
|
|
Percentage mismatch after alignment = 10.4\par
|
|
64 74 84 94 104 114\par
|
|
423 ATCGAAGAAA GAAAAGGAGG AGAAGATGAT TTTAAAAATG AAACG-CGAT GTCAGATGGG\par
|
|
**** ***** ********** ********** ****** ** ***** **** *********\par
|
|
443 ATCG,AGAAA GAAAAGGAGG AGAAGATGAT TTTAAA,,TG AAACGACGAT GTCAGATGG,\par
|
|
3610 3620 3630 3640 3650 3660\par
|
|
124 134 144 154 164\par
|
|
423 TTG-ATGAAG TAGAAGTAGG AG-AGGTGGA AGAGAAGAGA GTGGGA\par
|
|
*** ****** ********** ** ******* *** ***** ** **\par
|
|
443 TTGGATGAAG TAGAAGTAGG AGGAGGTGGA ,GAG,AGAGA GTTGG-\par
|
|
\pard \li680\ri780\sl220\keepn\box\brsp100\brdrth 3670 3680 3690 3700 3710\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.2\tab Typical output from "Find internal joins".\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Editing in XBAP\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The XBAP editor is mouse-driven and can insert, delete and change readings in contigs. It has facilities to display the traces for data from fluorescent sequenci
|
|
ng machines and for annotation of readings. In addition it allows the poor quality data from the ends of readings to be viewed and, if required, added to the sequences. \par
|
|
\pard \s4\qj\sa120\sl280
|
|
A typical view of the editor is shown in figure 4.3. This includes the edit window showing an 80 character section of a contig, (position 3899 to 3978). Each reading is numbered and named in the left hand panel, minus signs indicating those in their revers
|
|
e orientation. Underneath is their consensus. Some of the sequence letters are lighter
|
|
than the majority showing that they are "unused". One segment (3933 to 3949) is shaded which signifies that it has been annotated. The editing cursor is at position 3921. Above this window are the main buttons the user employs to direct the editing proces
|
|
s. Below the edit window is a panel showing the traces for readings 37 and 123. Notice they are centred on the cursor position. Here the traces are shown in four different line styles, but on a colour screen they each have different colours. In the bottom
|
|
of the figure is the search window. These features are described in the relevant sections below.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.1\tab Scrolling through the contig\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The editor allows scrolling from one end of a contig to the other using the scroll bar and scroll buttons and also the arrow keys.\par
|
|
\pard \s4\qj\sa120\sl280 Action of mouse button presses when the mouse pointer is in the scroll bar\:\par
|
|
\pard \s4\qj\li1720\sa120\sl280\tx4520 Middle Mouse Button\tab Set editor position\par
|
|
Left Mouse Button\tab Scroll forward one screenful\par
|
|
Right Mouse Button\tab Scroll backwards one screenful\par
|
|
\pard\plain \li80\ri20\sl220\keepn\box\brsp20\brdrth \f4\fs16 {{\pict\macpict\picw444\pich344
|
|
82daffffffff015701bb1101a0008201000affffffff015701bb0900000000000000003100000000015601ba98007e00000000030703e900000000030703e900000000015601ba000102830002830002830007000286aa01a00007000186550140000700028600012000070001860001400007000286000120000b02013ff8
|
|
8a00030ffe40000d0402200807c18c0003089220000f06012c28040110808e0003089240000f06022648040100808e0003089220001007012348040f31e3968f00030f924000100702220807911084598f000308122000100701258804111084508f00030812400010070224c804111084508f00030ff22000100701286804
|
|
111094508f000308024000100702200807cf3863908f0003080220000b02013ff88a00030ffe4000070002860001200007000186000140000700028600012000070001865501400007000286aa01a00002830002830002830002830026e500001ff9ff0087f8ff01f87ff5ff01fe1fefff0087fcff01fe1ff2ff01f87ff2ff
|
|
00e0fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd00380200003cfa
|
|
000203fc03fa0008630c1800018180001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd006502000066fa0002030003fa000ac30c38000380c0001f807ffbff05841f8000003cfd0002087f87fbff07e01fe7fffffe1f81fcff03f0ffff87fdff0dc3f0ffff84186000060002106180fd
|
|
00051f800000600ffe0002084180fb00021fe018fc000020fd006b020000c3fe0008c01800000300030603fd000a01830c7800078060001ff3faff058418c000000cfd0002087f33fbff07e7ffe7fffffe1f9cfcff03fcffff33fdff0d99e67fff84186000060002107180fd000518c000006003fe0002084180fb00021800
|
|
18fc000020fd0072020000c0fe0008c01800000300030603fd000a01830cd8000d8060001ff3fcff07f9ff84186000000cfd0002087e79fbff08e7ffe7cfe7fe1f9e7ffdff1ffcfffe79fff9ffff99e67fff8418600006000210718000006000186000006003fe0002084180fb00041800183018fe000020fd0072020000c0
|
|
fe0000c0fe00040300030003fd000a0301989800098030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08e7ffe7cfe7fe1f9e7ffdff1ffcfffe7ffff9ffff9fe7ffff8418600006000210798000006000186000006003fe0002084180fb00041800183018fe000020fd00731d0000c00f0dc3f0781f4003003b1e0f
|
|
c0f0de000301981800018030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08e7ffe7ffe7fe1f9e7ffdff1ffcfffe7ffff9ffff9fe7ffff8418600006000210798000006000186000006003fe0002084180fb00041800180018fe000020fd007d790000c0198e60c01831c003f06706030198730003019818000180
|
|
30001ff3e47c0f8790e07f841861e1b80c0fc1f078087f3f9e647e1e43ffe7fe270f81fe1f9e7879e787c0fcfffe7f9e607e1f9fe7f03f841866e0761e02106d878619f8001866f0786e0301e16c0841801e0fc61878001801d8f07e0786f020fd007d790000c030cc30c01831800300c30603030c60000300f01800018030
|
|
001ff3e339e733c679ff8418c331cc0c186318cc087f879e633ccf19ffe07cc7cfe7fe1f9cf339e7339e7cfffe7f9e79fccf9fe7e79f84186730ce3302106d8cc330600018c398cc73030331fe08418033186618cc001f833830180cc39820fd007d790000c030cc30c01831800300c30603030c60000300f0180001803000
|
|
1ff3e799fe79cff9ff841f8619860c00660186087ff39e6799e73fffe7f9e7cfe7fe1f81e79cce79fe7cfffe7f9e79f9e60781e7ff8418661986618210679861e060001f83018661830619b6084180618063318600180618301818630020fd007d790000c030cc30c01831800300c30603030c60000180f01800018060001f
|
|
f3e79c0e01cff9ff841987f9860c0fe601fe087ff99e6798073fffe7f9e7cfe7fe1f99e01cce01c07cfffe7f9e79f9e79fe7f03f8418661986618210679fe0c0600018030186618307f9b60841807f8fe331fe00180618301818630020fd007d790000c330cc30c0181f000300c30603030c60000180601800018060001ff3
|
|
e79fe67fcff9ff8418c601860c18660180087ff99e6799ff3fffe7f9e7cfe7fe1f9ce7fe1e7f9e7cfffe7f9e79f9e79fe7ff9f8418661986618210639800c060001803018661830601b6084180601861e18000180618301818630020fd007d79000066198c30cc18300003006706033198600000c06018070180c0001ff3e7
|
|
9fe67fcff9ff8418c601860c18660180087e799e6799ff3fffe7f9e7cfe7fe1f9ce7fe1e7f9e7cfffe799e79f9e79fe7ff9f8418661986618210639801e060001803018661830601b6084180601861e18000180618301818630020fd007d7900003c0f0c3078ff1f8003fc3b3fc1e0f06000006060ff070ff180001ff3e799
|
|
e739cff99f84186319cc0c186318c6087f33cc633ce73fffe7fcc7cfe67e1f9e739f3f399e7cffff33cc799ccf9fe7e79f840cc618ce330210618c63306600180300cc73030319b6084180319860c0c60018033830198cc30020fd0068f9000130c0ef005d1f80679c0f83cffc3f841861f1b87f8fa1f07c087f87e2647e0f
|
|
3fffe01e2601f0fe1f9e783f3f83c1601fff87e27c3e1f9fe7f03f84078618761e02106187c6183c00180300786e1fe1f1b60841fe1f0fa0c07c001fe1d9fe0f07830020fd0032f9000130c0ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc00010210f200010840f2000020fd0032f9000130c0
|
|
ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc00010210f200010840f2000020fd0032f900011f80ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc00010210f200010840f2000020fd002de500001ff9ff048400000180fc0004087fffffe7f8ff01fe1fefff0084fc
|
|
00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2000020fd0026e500001ff9ff0084f80001087ff5ff01fe1fefff0084fc00010210f200010840f2
|
|
000020fd0026e500001ff9ff0087f8ff01f87ff5ff01fe1fefff0087fcff01fe1ff2ff01f87ff2ff00e0fd000283000283000283000283000283000283000283000283000901001f88ff00feff001a010010fc000006fe00010180fe000060fc00000c9d000002ff001f010010fc000006fe00010180fe000060fc00000cc2
|
|
000102a8de000002ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de000002ff002316001000004010000600004001800200006000100400000cc200010554de000002ff00231600100000c03000060000c001800300
|
|
006000180600000cc2000102a8de000002ff0023160010000180600006000180018001800060000c0300000cc200010554de000002ff0023160010000300c00006000300018000c0006000060180000cc2000102a8de000002ff0023160010000601800006000600018000600060000300c0000cc200010554de000002ff00
|
|
23160010000c03000006000c0001800030006000018060000cc2000102a8de000002ff00231600100018060000060018000180001800600000c030000cc200010554de000002ff0023160010000c03000006000c0001800030006000018060000cc2000102a8de000002ff0023160010000601800006000600018000600060
|
|
000300c0000cc200010554de000002ff0023160010000300c00006000300018000c0006000060180000cc2000102a8de000002ff0023160010000180600006000180018001800060000c0300000cc200010554de000002ff00231600100000c03000060000c001800300006000180600000cc2000102a8de000002ff002316
|
|
001000004010000600004001800200006000100400000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de000002ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de0000
|
|
02ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001f010010fc000006fe00010180fe000060fc00000cc2000102a8de000002ff001f010010fc000006fe00010180fe000060fc00000cc200010554de000002ff001a010010fc000006fe00010180fe000060fc00000c9d000002ff0009
|
|
01001f88ff00feff000901001f88ff00feff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff004a010010ed00030c0300c0fa00040781e0300cf90004781e
|
|
0780c0fa00040781e0780cf90004781e0040c0fa00040781e1fe0cf90004781e0780c0fa00040781e1fe0cf90002781e02ff004a010010ed00030c0781e0fa00040cc330701ef90004cc330cc1e0fa00040cc330cc1ef90004cc3300c1e0fa00040cc331801ef90004cc330cc1e0fa00040cc330061ef90002cc3302ff004e
|
|
010010ed00030c0cc330fa0004186618f033fa0005018661986330fa00041866198633fa000501866181c330fa00041866198033fa0005018661986330fa00041866180633fa000301866182ff004e010010ed00030c0cc330fa0004186619b033fa0005018661986330fa00041866198633fa000501866183c330fa000418
|
|
66198033fa0005018661980330fa00041866180c33fa000301866182ff004a010010ed00030c186618f900046619306180fa00040661806618f900046618066180fa0004066186c618f900046619806180fa00040661980618f9000466180c6180fa0002066182ff004a010010ed00030c186618f90004c618306180fa0004
|
|
0c61806618f90004c6180c6180fa00040c618cc618f90004c619b86180fa00040c619b8618f90004c618186180fa00020c6182ff004e010010ed00030c186618fa0005038338306180fa0004383380c618fa0005038338386180fa0004383398c618fa0005038339cc6180fa000438339cc618fa0005038338186180fa0002
|
|
383382ff004a010010ed00030c186618f90004c1d8306180fa00040c1d838618f90004c1d80c6180fa00040c1d98c618f90004c1d8066180fa00040c1d986618f90004c1d8306180fa00020c1d82ff004a010010ed00030c186618f900046018306180fa00040601860618f900046018066180fa000406019fe618f9000460
|
|
18066180fa00040601986618f900046018306180fa0002060182ff004e010010ed00030c0cc330fa00041860183033fa00050186018c0330fa00041860198633fa000501860180c330fa00041860180633fa0005018601986330fa00041860186033fa000301860182ff004e010010ed00030c0cc330fa00041866183033fa
|
|
0005018661980330fa00041866198633fa000501866180c330fa00041866198633fa0005018661986330fa00041866186033fa000301866182ff004a010010ed00030c0781e0fa00040cc330301ef90004cc331801e0fa00040cc330cc1ef90004cc3300c1e0fa00040cc330cc1ef90004cc330cc1e0fa00040cc330c01ef9
|
|
0002cc3302ff004a010010ed00030c0300c0fa00040781e1fe0cf90004781e1fe0c0fa00040781e0780cf90004781e00c0c0fa00040781e0780cf90004781e0780c0fa00040781e0c00cf90002781e02ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d0100
|
|
10ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff0076010010fc00041e0003c078f700650c0781e0301e0307f9fe0c0780c0300c0787f9fe0c0781e1fe1e1fe1e0000c0300c0781e0787f8781e0300c1fe1e0300c0781e0780c0301e1fe7f8
|
|
301e0300c0787f8781e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff0076010010fc0004330000c0ccf700650c0cc33078330780c0301e0cc1e0781e0cc0c0301e0cc3303033030330001e0781e0cc330cc0c0cc330781e030330781e0cc330cc1e078330300c078330781e0cc0c0c
|
|
c3303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff0076010010fc0004618000c186f700650c186618cc618cc0c03033186330cc331860c03033186618306183061800330cc33186619860c186618cc33030618cc3318661986330cc618300c0cc618cc331860c18661830618cc618cc
|
|
33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff0076010010fc0004618000c186f700650c180600cc600cc0c03033180330cc331800c030331806003060030600cc330cc33180601800c180600cc33030600cc3318060180330cc600300c0cc600cc331800c18060030600cc600cc33030600cc330303
|
|
30300c0cc0c180330cc6018033180600cc3302ff0076010010fc0004018000c006f700650c18060186601860c0306198061986619800c030619806003060030600cc6198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c18061
|
|
9866018061980601866182ff007b010010fc0009018000c006000fc1e076fc00650c18060186601860c0306198061986619800c030619806003060030600786198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c18061986601
|
|
8061980601866182ff007d010010fe000b01fe030000c00c00186330cefc00650c18067986679860c03061980619866199e0c030619806003067830601fe61986619806799e0c19e6018661830679866199e6018061986678300c18667986619800c180678306018667986618306798661830618300c1860c1806198667980
|
|
6199e601866182ff007b010010fc00090e0000c0380018061986fc00650c180619fe619fe0c0307f9807f9fe7f9860c0307f9806003061830600787f9fe7f980619860c186601fe7f830619fe7f986601807f9fe618300c1fe619fe7f9800c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f9866
|
|
01fe7f82ff007b010010fc0009180000c060000fc7f986fc00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c1806198661980619866018661
|
|
82ff007b010010fc0009300000c0c00000660186fc00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601866182ff00
|
|
7b010010fc0009600000c1800000660186fc00650c18661986619860c0306198661986619860c030619866183061830618006198661986619860c186619866183061986619866198661986618300c18661986619860c186618306198661986618306198661830618300c1860c186619866198661986619866182ff007b0100
|
|
10fc0009600000c1800e186318cefc00650c0cc33986339860c030618cc61986618ce0c030618cc33030338303300061986618cc338ce0c0ce331866183033986618ce330cc61986338300c18633986618cc0c0cc338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff007b010010fc00
|
|
097f8007f9fe0e0fc1f076fc00650c0781e9861e9860c03061878619866187a0c030618781e0301e8301e00061986618781e87a0c07a1e186618301e9866187a1e078619861e8300c1861e986618780c0781e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff0011010010f3000006fc
|
|
00000c9d000002ff0011010010f3000006fc00000c9d000002ff0011010010f3000006fc00000c9d000002ff0011010010f3000006fc00000c9d000002ff000d010010ed00000c9d000002ff0013010010ed00000cd7000002ecaadd000002ff0013010010ed00000cd7000001ec55dd000002ff007e010010fd000c787f80
|
|
0000041e0001e0000003fe00650c0781e0301e0307f9fe0c0780c0300c0787f9fe0c0781e1fe1e1fe1e0000c0300c0781e0787f8781e0300c3febeabaaeafabeafaaeababebfeffababeabaaeafa7f8781e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff007e010010fd000ccc0180
|
|
00000c33000330000007fe00650c0cc33078330780c0301e0cc1e0781e0cc0c0301e0cc3303033030330001e0781e0cc330cc0c0cc330781e1757757d5f5dd775dd5f57d775755d57d7757d5f5dd0c0cc3303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff007f010010fe000d018601
|
|
8000001c6180061800000ffe00650c186618cc618cc0c03033186330cc331860c03033186618306183061800330cc33186619860c186618cc332baebaeebbbaeebbaebbaeeebabaaeaeeebaeebbbae0c18661830618cc618cc33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff007f010010fe00020186
|
|
03fe00073c6000061800001bfe00650c180600cc600cc0c03033180330cc331800c030331806003060030600cc330cc33180601800c180600cc33175755dd775d5755d5775dd755755d5dd755dd775d50c18060030600cc600cc33030600cc33030330300c0cc0c180330cc6018033180600cc3302ff007e010010fd000106
|
|
03fe00076c60000618000013fe00650c18060186601860c0306198061986619800c030619806003060030600cc6198661980601800c1806018661abaeabaeebbaaeabaaebbaeeaabaaebaeeabaeebbaa0c180600306018660186618306018661830618300c1860c180619866018061980601866182ff007e010010fd000c0c
|
|
060003f0cc6e07c618003f03fe00650c18060186601860c0306198061986619800c030619806003060030600786198661980601800c1806018661975755d775dd5755d575dd7755755d5d7755d775dd50c180600306018660186618306018661830618300c1860c180619866018061980601866182ff007f120010000007f8
|
|
38060006198c730c6338006183fe00650c18067986679860c03061980619866199e0c030619806003067830601fe61986619806799e0c19e6018661abaefbaeebbbeeabaaebbaeefabaaebaeefbaeebbaa0c180678306018667986618306798661830618300c1860c18061986679806199e601866182ff007e010010fd000c
|
|
0c0c0000198c619801d8006003fe00650c180619fe619fe0c0307f9807f9fe7f9860c0307f9806003061830600787f9fe7f980619860c186601fe7f97575dff7fdd7755d57fdff75d755d5ff75dff7fdd50c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe7f82ff007e010010fd000c
|
|
060c0003f9fe61980018003f03fe00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c1866018661abaebbaeebbaeeabaaebbaeebabaaebaeebbaeebbaa0c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007f010010fe000d
|
|
0186180006180c61980018000183fe00650c18061986619860c0306198061986619860c030619806003061830600cc6198661980619860c186601866197575dd775dd7755d575dd775d755d5d775dd775dd50c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007f010010fe00
|
|
0d0186180006180c61980618000183fe00650c18661986619860c0306198661986619860c030619866183061830618006198661986619860c1866198661abaebbaeebbaeebbaeebbaeebabaaebaeebbaeebbae0c186618306198661986618306198661830618300c1860c186619866198661986619866182ff007e010010fd
|
|
000ccc300006180c330c6330386183fe00650c0cc33986339860c030618cc61986618ce0c030618cc33030338303300061986618cc338ce0c0ce331866197577dd775ddf775dd75dd777d755d5d777dd775ddd0c0cc338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff007e010010fd
|
|
007578300003e80c1e07c1e0383f1fe000000c0781e9861e9860c03061878619866187a0c030618781e0301e8301e00061986618781e87a0c07a1e18661ababebaeebafabeafaebbaebeabaaebaebebaeebafa0c0781e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff0013010010ed
|
|
00000cd7000001ec55dd000002ff0013010010ed00000cd7000002ecaadd000002ff0013010010ed00000cd7000001ec55dd000002ff0013010010ed00000cd7000002ecaadd000002ff0013010010ed00000cd7000001ec55dd000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff007f010010
|
|
fe000dc0781e000001fe0c000010000003fe00650c02808028080aa2a8200a02000020080282a8aa080280a0aa001fe1e0300c0300c0781e0787f8781e0300c1fe1e0300c0781e0780c0301e1fe7f8301e0300c0787f8781e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff007f1200
|
|
10000001c0cc33000001801c000030000007fe00650c04414044140100405011050000501404404010140441101000030330781e0781e0cc330cc0c0cc330781e030330781e0cc330cc1e078330300c078330781e0cc0c0cc3303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff007f12
|
|
0010000003c18661800001803c00007000000ffe00650c08222082220200808820888000882208208020220822082000030618cc330cc33186619860c186618cc33030618cc3318661986330cc618300c0cc618cc331860c18661830618cc618cc33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff007f
|
|
120010000006c18661800001806c0000f000001bfe00650c10011100110100404440044000441110004010111004001000030600cc330cc33180601800c180600cc33030600cc3318060180330cc600300c0cc600cc331800c18060030600cc600cc33030600cc33030330300c0cc0c180330cc6018033180600cc3302ff00
|
|
7f120010000004c00601800001804c0001b0000013fe00650c08020880208200808220082000822088008020208802002000030601866198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c180619866018061980601866182ff
|
|
007f010010fe000dc006030003f1b80c07c330003f03fe00650c10041100410100410440104001044110004010411004001000030601866198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c180619866018061980601866182
|
|
ff007f120010001fe0c00c0e000619cc0c0c6630006183fe00650c08a2088a2082008082200822a8822088a0802020880200202a8306018661986619806799e0c19e6018661830679866199e6018061986678300c18667986619800c180678306018667986618306798661830618300c1860c18061986679806199e6018661
|
|
82ff007f010010fe000dc03803000018060c180630006003fe00650c10455104550100415440154001545510404010551004001000030601fe7f9fe7f980619860c186601fe7f830619fe7f986601807f9fe618300c1fe619fe7f9800c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe
|
|
7f82ff007f010010fe000dc060018003f8060c1807f8003f03fe00650c08220882208200808220082000822088208020208802002000030601866198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601
|
|
866182ff007f010010fe000dc0c061800618060c180030000183fe00650c10441104410100410440104001044110404010411004001000030601866198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c1806198661980619866
|
|
01866182ff007f010010fe000dc18061800619860c180030000183fe00650c08220882208200808220882000822088208020208822082000030619866198661986619860c186619866183061986619866198661986618300c18661986619860c186618306198661986618306198661830618300c1860c18661986619866198
|
|
6619866182ff007f010010fe000dc18033000618cc0c0c6030386183fe00650c044410444101004104111040010441044040104104411010000303318661986618cc338ce0c0ce331866183033986618ce330cc61986338300c18633986618cc0c0cc338303318633986618303398661830618300c1860c0cc61986338cc61
|
|
8ce331866182ff007f7b0010000007f9fe1e0003e8787f87c030383f1fe000000c02a2082a20820080820a082000822082a08020208280a020000301e18661986618781e87a0c07a1e186618301e9866187a1e078619861e8300c1861e986618780c0781e8301e1861e986618301e98661830618300c1860c078619861e878
|
|
6187a1e1866182ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff0040010010fe000bc0307f800001fe0c0000c030fe
|
|
0002c0000cbf002201e1fe1e0301e0300c1fe1e0300c1fe0c1fe7f8307f8780c0301e0780c0781e0300c02ff0040160010000001c07801800001801c0001c070000001c0000cbf002203303033078330781e030330781e0301e0300c0780c0cc1e078330cc1e0cc330781e02ff0040160010000003c0cc01800001803c0003
|
|
c0f0000003c0000cbf0022061830618cc618cc33030618cc33030330300c0cc0c186330cc6198633186618cc3302ff0040160010000006c0cc03000001806c0006c1b0000006c0000cbf0022060030600cc600cc33030600cc33030330300c0cc0c180330cc6018033180600cc3302ff0040160010000004c1860300000180
|
|
4c0004c130000004c0000cbf00220600306018660186618306018661830618300c1860c180619866018061980601866182ff0040010010fe0011c186060003f1b80c0fc0c030000fc0c0000cbf00220600306018660186618306018661830618300c1860c180619866018061980601866182ff0040010010fe0011c1860600
|
|
0619cc0c1860c030001860c0000cbf00220678306018667986618306798661830618300c1860c18061986679806199e601866182ff0040010010fe0011c1860c000018060c0060c030001800c0000cbf0022061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe7f82ff0040010010fe0011c1
|
|
860c0003f8060c0fe0c030000fc0c0000cbf00220618306018661986618306198661830618300c1860c180619866198061986601866182ff0040010010fe0011c0cc18000618060c1860c030000060c0000cbf00220618306018661986618306198661830618300c1860c180619866198061986601866182ff0040010010fe
|
|
0011c0cc18000619860c1860c030000060c0000cbf00220618306198661986618306198661830618300c1860c186619866198661986619866182ff0040010010fe0011c07830000618cc0c1860c0300e1860c0000cbf00220338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff004016
|
|
0010000007f830300003e8787f8fa7f9fe0e0fc7f8000cbf002201e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c
|
|
9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff007e010010fd0001781efe0007041e03c1e0000003fe00650c0200a0aa2a8200a0282a8282a82808020080280a0282a8280a020080aa0a020080280a028080200a0aa2a8200a020080282a8280a0aa0a0200a020080aa0a020080002a820
|
|
2a8aa080aa0a020080280a0200a028080200a028000280a0280a0202a8aa0a02ff007e010010fd0001cc33fe00070c33066330000007fe00650c050110100405011044040440404414050140441104404044110501401011050140441104414050110100405011050140440404411010110501105014010110501400004050
|
|
0401014010110501404411050110441405011044000441104411050040101102ff007f010010fe000d0186618000001c6186661800000ffe001b0c088208200808820882080820808222088220822088208082208882fe20298882208220882220882082008088208882208208082208202088820888220202088822000080
|
|
88080202fe20198882208220888208822208820882000822088220888080202082ff007f010010fe0002018060fe00073c6006061800001bfe00650c04440010040444010004100041001104411100401000410040044110104004411100401001104440010040444004411100041004001040044400441101040044110000
|
|
40440401011010400441110040044401001104440100001004010040044040104002ff007f010010fe0002018060fe00076c60060018000013fe00650c082200200808220080080800808020882208802008008080200822082020082208802008020882200200808220082208800808020020200822008220820200822080
|
|
0080820802020820200822088020082200802088220080000802008020082080202002ff007f010010fe000d01b86e0003f0cc6e060030003f03fe00650c1044001004104401000410004100411044110040100041004010441010401044110040100411044001004104401044110004100400104010440104410104010441
|
|
000041040401041010401044110040104401004110440100001004010040104040104002ff007f120010000007f9cc730006198c730600e0006183fe00650c0822282008082200800808a0808020882208802288a0808a20082208202288220880200802088222820080822288220880080802282020082228822082022882
|
|
208aa080820802020820200822088a2008222880208822008a2a8802288a20082080202282ff007f010010fe000d0186618000198c619f8030006003fe00650c154410100415440100041040410055154551004110404104401545501041154551004010055154410100415441154551000410041010401544115455010411
|
|
5455000041540401055010401545510440154411005515440104001004110440154040104102ff007f010010fe000d0186618003f9fe61860018003f03fe00650c0822082008082200800808208080208822088020882080822008220820208822088020080208822082008082208822088008080208202008220882208202
|
|
088220800080820802020820200822088220082208802088220082000802088220082080202082ff007f010010fe000d0186618006180c61860618000183fe00650c10441010041044010004104041004110441100411040410440104410104110441100401004110441010041044110441100041004101040104411044101
|
|
04110441000041040401041010401044110440104411004110440104001004110440104040104102ff007f010010fe000d0186618006180c61860618000183fe00650c082208200808220882080820808220882208822088208082208822082020882208822088220882208200808220882208820808220820208822088220
|
|
8202088220800080820802020820208822088220882208822088220882000822088220882080202082ff007e010010fd000ccc330006180c33060330386183fe00650c104110100410411044040440404441104410441104404044111044101011104410441104441104110100410411104410440404411010111041110441
|
|
0101110441000041040401041010111044104411104110444110411044000441104411104040101102ff007e010010fd0075781e0003e80c1e0601e0383f1fe000000c0820a820080820a0280802a0802820882208280a82a0802a0a082208200a882208280a028208820a820080820a88220828080280a8200a0820a88220
|
|
8200a882208000808208020208200a0822082a0a0820a828208820a02a000280a82a0a082080200a82ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff0012010010ed00000ce6000103ffba
|
|
000002ff0012010010ed00000ce6000103ffba000002ff007b010010fa007201e078618787f9861e1861e0000c0781e0301e0307f9fe0c0780c0300c0787f9fe0c0781e1fe1e1fe1e3ff0c0300c0781e0787f8781e0300c1fe1e0300c0781e0780c0301e1fe7f8301e0300c0787f8781e1fe1e0301e0300c1fe1e0300c1fe0
|
|
c1fe7f8307f8780c0301e0780c0781e0300c02ff007b010010fa00720330cc718cc601c633186330000c0cc33078330780c0301e0cc1e0781e0cc0c0301e0cc3303033030333ff1e0781e0cc330cc0c0cc330781e030330781e0cc330cc1e078330300c078330781e0cc0c0cc3303033078330781e030330781e0301e0300c
|
|
0780c0cc1e078330cc1e0cc330781e02ff007b010010fa007206198671986601c661986618000c186618cc618cc0c03033186330cc331860c03033186618306183061bff330cc33186619860c186618cc33030618cc3318661986330cc618300c0cc618cc331860c18661830618cc618cc33030618cc33030330300c0cc0c1
|
|
86330cc6198633186618cc3302ff007b010010fa007206018679980601e660186600000c180600cc600cc0c03033180330cc331800c030331806003060030603ff330cc33180601800c180600cc33030600cc3318060180330cc600300c0cc600cc331800c18060030600cc600cc33030600cc33030330300c0cc0c180330c
|
|
c6018033180600cc3302ff007b010010fa007206018679980601e660186600000c18060186601860c0306198061986619800c030619806003060030603ff6198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c1806198660180
|
|
61980601866182ff007b010010fa00720601866d8c0601b630186300000c18060186601860c0306198061986619800c030619806003060030603ff6198661980601800c180601866183060186619806018061986600300c18660186619800c180600306018660186618306018661830618300c1860c1806198660180619806
|
|
01866182ff007b010010fa00720601866d8787e1b61e1861e0000c18067986679860c03061980619866199e0c0306198060030678306020161986619806799e0c19e6018661830679866199e6018061986678300c18667986619800c180678306018667986618306798661830618300c1860c18061986679806199e6018661
|
|
82ff007b010010fa00720601866780c6019e03186030000c180619fe619fe0c0307f9807f9fe7f9860c0307f9806003061830603ff7f9fe7f980619860c186601fe7f830619fe7f986601807f9fe618300c1fe619fe7f9800c18061830601fe619fe7f830619fe7f8307f8300c1fe0c1807f9fe619807f986601fe7f82ff00
|
|
7b010010fa0072060186678066019e01986018000c18061986619860c0306198061986619860c030619806003061830603ff6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007b0100
|
|
10fa0072060186638066018e01986018000c18061986619860c0306198061986619860c030619806003061830603ff6198661980619860c186601866183061986619866018061986618300c18661986619800c180618306018661986618306198661830618300c1860c180619866198061986601866182ff007b010010fa00
|
|
72061986639866018e61986618000c18661986619860c0306198661986619860c03061986618306183061bff6198661986619860c186619866183061986619866198661986618300c18661986619860c186618306198661986618306198661830618300c1860c186619866198661986619866182ff007b010010fa00720330
|
|
cc618cc60186330cc330000c0cc33986339860c030618cc61986618ce0c030618cc3303033830333ff61986618cc338ce0c0ce331866183033986618ce330cc61986338300c18633986618cc0c0cc338303318633986618303398661830618300c1860c0cc61986338cc618ce331866182ff007b010010fa007201e0786187
|
|
87f9861e0781e0000c0781e9861e9860c03061878619866187a0c030618781e0301e8301e3ff61986618781e87a0c07a1e186618301e9866187a1e078619861e8300c1861e986618780c0781e8301e1861e986618301e98661830618300c1860c078619861e8786187a1e1866182ff0012010010ed00000ce6000103ffba00
|
|
0002ff0012010010ed00000ce6000103ffba000002ff0012010010ed00000ce6000103ffba000002ff0012010010ed00000ce6000103ffba000002ff0012010010ed00000ce6000103ffba000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed00000c9d000002ff000d010010ed
|
|
00000c9d000002ff000901001f88ff00feff0006fe008955fe0006fe0089aafe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000dfe000187ff8e000201ffc2fe000ffe0003440100f8900002011241fe000ffe000385850020900002011242fe000ffe000344c90020900002011241fe
|
|
0013fe000784690022c71c71c094000201f242fe0013fe00074441002320a28a20940002010241fe0013fe000784b1002207a0f980940002010242fe0013fe00074499002208a0804094000201fe41fe0013fe0007850d002208a28a20940002010042fe0013fe000744010022079c71c0940002010041fe000dfe000187ff
|
|
8e000201ffc2fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe0006fe0089aafe0006fe008955fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe0012fe000083fdff00f0fdff00fc95000002fe0013fe000042fd000110
|
|
80fe00000495000001fe0013fe000082fd00011080fe00000495000002fe0013fe000042fd00011080fe00000495000001fe0013fe000082fd00011080fe00000495000002fe0014fe000a420000180010860301800495000001fe0014fe000a82000018c010860301800495000002fe0014fe000042fe0006c01086000180
|
|
0495000001fe0014fe000a820fb338f01087c70f9e0495000002fe0014fe000a4219b318c010866319b30495000001fe0014fe000a8219b318c010866319bf0495000002fe0014fe000a4219b318c010866319b00495000001fe0014fe000a820fb318cc10866319b30495000002fe0014fe000a42019f7e7810866fcf9e04
|
|
95000001fe0014fe000682018000001080fe00000495000002fe0014fe000642018000001080fe00000495000001fe0013fe000082fd00011080fe00000495000002fe0013fe000042fd00011080fe00000495000001fe0012fe000083fdff00f0fdff00fc95000002fe000afe0000408b000001fe000afe0000808b000002
|
|
fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000efe000080f10000019cff0082fe000efe000040f10000019c000081fe0014fe000080f1000001d000fdaa00a8d20000
|
|
82fe0016fe000040f1000001d1000001fd550050d2000081fe0014fe000080f1000001d000fdaa00a8d2000082fe0016fe000040f1000001d1000001fd550050d2000081fe0014fe000080f1000001d000fdaa00a8d2000082fe0016fe000040f1000001d1000001fd550050d2000081fe0021fe000080fe0009079f80000c
|
|
f003c0000cfe000001d000fdaa00a8d2000082fe0023fe000040fe00090cc180001d980660001cfe000001d1000001fd550050d2000081fe0020fe000080fd0008c180003d800660002cfe000001d000fdaa00a8d2000082fe0022fe000040fd0008c3003c6d81e6600f0cfe000001d1000001fd550050d2000081fe0021fe
|
|
000d80000007e3830006cdf333e0198cfe000001d000fdaa00a8d2000082fe0023fe000d40000007e0c6003ecd9b00600c0cfe000001d1000001fd550050d2000081fe001afe000080fd0008c60066fd9b0060030cfe0000019c000082fe001bfe000040fe00090ccc00660d9b3663198cfe0000019cff0081fe001bfe0000
|
|
80fe0009078c003e0cf1e3c78f3ffe0000019cff0082fe0012fe000040f7000003fc0000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0070fe000080f1000301f00002fe0006e0000001000002fe00
|
|
1e8000003800003e0001f00000010000e000000e000007c00007000001f0001cfe000610000001000002fe00247000000e00000380007c00007000001c000004000020000003e00007000004000020000007fe00011c82fe0070fe000040f1002f014000050000011000000280000500000140000044000008000040000002
|
|
800110000011000001000008800000400022fe000628000002800005fe00158800001100000440001000008800002200000a000050fe001080000880000a0000500000088000002281fe0070fe000080f1000601400008800001fe001504400008800002200000400000080000400000044001fe000610000001000008fe00
|
|
02400020fe001f4400000440000880000080000010000004000010000080000020000011000088fe000b800008000011000088000008fe00012082fe0075fe0001401ffaff00f8fa000601400008800001fe001504400008800002200000400000080000400000044001fe000610000001000008fe0002400020fe001f4400
|
|
000440000880000080000010000004000010000080000020000011000088fe000b800008000011000088000008fe00012081fe0075fe00018010fa000008fa000601400008800001fe001504400008800002200000400000080000400000044001fe000610000001000008fe0002400020fe001f4400000440000880000080
|
|
000010000004000010000080000020000011000088fe000b800008000011000088000008fe00012082fe0075fe00014010fa000008fa00060140000f800001fe001507c0000f800003e000004c000008000040000007c001fe000c10000001000009800000400020fe001f7c000007c0000f80000080000013000004c00010
|
|
00009800002000001f0000f8fe001080000980001f0000f80000098000002081fe0075fe00018010fa000008fa000601400008800001fe001504400008800002200000440000080000400000044001fe000c10000001000008800000400020fe001f4400000440000880000080000011000004400010000088000020000011
|
|
000088fe00108000088000110000880000088000002082fe0075fe00014010fa000008fa002f014000088000011000000440000880000220000044000008000040000004400110000011000001000008800000400022fe001f4400000440000880000088000011000004400010000088000022000011000088fe0010800008
|
|
8000110000880000088000002281fe0079fe0005801078000380fe000008fa002901400008800000e0000004400008800002200000380000080000400000044000e000000e000001000007fe000240001cfe001f440000044000088000007000000e00000380001000007000001c000011000088fe000b8000070000110000
|
|
88000007fe00011c82fe0017fe00054010cc000180fe000008fa0000019c000081fe0017fe00058010c0000180fe000008fa0000019c000082fe0017fe00094010c0f1e18780337c08fa0000019c000081fe0017fe000980107998318cc0336608fa0000019cff0082fe0017fe000940100d81f18fc0336608fa0000019cff
|
|
0081fe0017fe000980100d83318c00336608fa0000019c000082fe0017fe00094010cd9b318cc0337c08fa0000019c000081fe0017fe0009801078f1f7e7801f6008fa0000019c000082fe0014fe00014010fb00016008fa0000019c000081fe0014fe00018010fb00016008fa0000019c000082fe0013fe00014010fa0000
|
|
08fa0000019c000081fe0013fe00018010fa000008fa0000019c000082fe0013fe0001401ffaff00f8fa0000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0013fe0001801ff8ff00e0fc0000019c00
|
|
0082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0019fe00074010780003800003fe000020fc0000019c000081fe0019fe00078010cc0001800003fe000020fc
|
|
0000019c000082fe0019fe00074010c00001800003fe000020fc0000019c000081fe0019fe000b8010c0f1e187801f3ccdf020fc0000019c000082fe0019fe000b40107998318cc03366cd9820fc0000019c000081fe0019fe000b80100d81f18fc03366cd9820fc0000019c000082fe0019fe000b40100d83318c003366fd
|
|
9820fc0000019c000081fe0019fe000b8010cd9b318cc03366fd9820fc0000019c000082fe0019fe000b401078f1f7e7801f3c499820fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00
|
|
014010f8000020fc0000019c000081fe0013fe0001801ff8ff00e0fc0000019c000082fe000efe000040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0012fe000080f1000001dc000006c2000082fe0018fe0001401ffdff0080f7000001dc00010180c3000081fe0013
|
|
fe00018010fd000080f70000019c000082fe0017fe00014010fd000080f7000001dc000040c2000081fe001bfe00018010fd000080f7000001dc000040e800000cdc000082fe001bfe00014010fd000080f7000001dc000040e7000080dd000081fe001dfe000680100000c00080f7000001dc00018020e8000080dd000082
|
|
fe001cfe000640100000c60080f7000001db000010e9000020dc000081fe001cfe000680100000060080f7000001db000010e9000040dc000082fe001cfe000640107d99c78080f7000001db000010e8000010dd000081fe0018fe00068010cd98c60080f7000001c1000010dd000082fe001cfe00064010cd98c60080f700
|
|
0001dd000002e7000080dc000081fe001cfe00068010cd98c60080f7000001dd000002e7000080dc000082fe001cfe000640107d98c66080f7000001dd000002e6000004dd000081fe001efe000680100cfbf3c080f7000001dd0002020004e8000004dd000082fe001cfe000240100cfe000080f7000001db000004ea0000
|
|
01db000081fe001cfe000280100cfe000080f7000001db000004ea000001db000082fe001bfe00014010fd000080f7000001db000004e8000001dd000081fe001ffe00018010fd000080f7000001c1000001fd000040f3000061f1000082fe0026fe0001401ffdff0080f7000001dd000008e8000002fb000030f300018080
|
|
f40002014081fe0022fe000080f1000001dd000008e8000002fb000008f40002010080f40002041082fe001afe000040f1000001dd000008e5000080ee000040f2000081fe001afe000080f1000001dd0002080001e7000040e00002100482fe001afe000040f1000001db000001ea000004fc000002e1000081fe001cfe00
|
|
0080f1000001db000001ea000004fc000002e30002400282fe001efe000040f1000001db000001e700042000000202f4000010f0000081fe001efe000080f1000001c000041000000401f40002100020f40002800282fe0024fe000040f1000001ee00004cf1000020e8000008fb000001f40002200020f2000081fe0028fe
|
|
000080f100010180ef000080f1000020e8000008fb000001f40002200010f5000301000182fe001cfe000040f100010140de000020e500010332ef000010f2000081fe0020fe000080f1000001ee000001f1000320000080e7000001e2000302000182fe0020fe000040f1000001ef0002020080ef000080eb000010fc0000
|
|
08e1000081fe0021fe000080f1000001ef000002ed000080eb000010fc000008e4000304000082fe0021fe000040f1000001e2000040fa000080e6000380100080f5000040f0000081fe002cfe000080f100010110ee000020f800010110f9000003e7000340100080f50002800008f5000304000082fe002bfe000040f100
|
|
010108f00002040020f2000040fd000020ed000020fa000080f50002800008f2000081fe0031fe000080f100010108f0000008f600010404fd000040fd000010ed000020fa000040f50002800004f5000308000082fe001ffe000040f100010104de000040fe000010e7000020f0000004f2000081fe0027fe000080f10000
|
|
01ed000008f800010801fd0005800000401002e8000010e3000310000082fe0022fe000040f1000001ef0002100008ef0002400002ed000040fc000020e1000081fe002bfe000080f1000001ef000020f60002100040fb000040f70000fcf6000040fc000020e4000310000082fe0029fe000040f1000001e10002100028fd
|
|
00014040f900010102f1000308400020f6000002ef000081fe0037fe000080f100010102ee000002f8000020fe000002fc0002400080fb00010202f1000308400020f6000302000002f5000320000082fe0039fe000040f100010102f000044000020070f800040800800080fc0002800040fd00010201f6000040fa000020
|
|
f6000304000002f2000081fe003ffe000080f100010102f0000040fe000080fa000620000202010080fa000020fd0002040080f7000080fa000020f60005040000020006f7000340000082fe0032fe000040f100010102ea000001f90002880001fd00048000000810fd0002040040f2000004f0000301002040f5000081fe
|
|
003bfe000080f1000001ed00050100010000c0fd0005400000200081fe0005208000200808fd0002080020f2000002ee00012020f8000340000082fe002ffe000040f1000001ef00078000010000c06020f400042000001010fc0002100010f7000080fc000080e1000081fe0040fe000080f1000001f500000efc000080fd
|
|
00012180fc000080fd000040fe000020fe000010fc0002100010f8000001fb000080fb000014eb000380000082fe003ffe000040f1000001f5000311000003fc000004f0000011f80002200008f2000302800008fd00034000001cfe000008fd00018004fe000101e0fa000081fe0057fe000080f10002010080fe00000afb
|
|
00042080000c80fe00010108fa000001fc000020fe000301000008fb0002200004f2000b018000080060000002000022fe000010fe0007808002003c000218fd000380000082fe0053fe000040f10002010080fd000080fc0005204000306001fe000088f4000002fb0002080002fd0002400004f8000002f9000008fd0007
|
|
8000004100004010fe000080fe000342000406fe000080fe000081fe005bfe000080f100070100800e00002020fc0005404000401001fe000010fe000004fe000002fc000024f9000002fd0002400002f8000002f9000c0802060000010000808000b010fe000080fe00074100080100000c41fe000082fe0051fe000040f1
|
|
00070100400980008010fc00044020004008f9000002f8000004fe000002fe00018002fd0002800001f2000001fe000a0201000200000100400108fd00078200010080801001fa000081fe005afe000080f100040100001040f900048010008004fd000040fe000002fe000002fc000014fe0005140000048001fe000001fe
|
|
0000c0f3000001fc0008800000400100200208fd0002020000fe80052000c0000009fe000082fe0057fe000040f10012010000202002000800000c0000800801000402fe000040fe000002f400040800000480fd000001fe00013ffcfa000004fc000001fb00070400000200200204f900078040400020004004fe000081fe
|
|
0056fe000080f100040100004010fc0008130001000402000202f6000004fc000008fe000308000001fc000002fd000003fa000004fc000002fe000008fd0005200400100404fa0008010040800020008002fe000082fe004efe000040f10011010000400804000400002080010002040002fd000040f0000008f9000002fc
|
|
0000c0f5000f02800004100020080000040010080220fe0008080000810021000010fb000081fe005cfe000080f100040100208008fc00074040020001180002fd000090fa000008fc000004fe000308000001fe0002540004fc000030f5000f02800002000010000020080010080120fe000b48000042001e000008000003
|
|
fe000082fe005dfe000040f10012010020800408000100018040020000e0000104fe000090fb000007fb000010fb000601000041000008fc000008fc00014008f9000002fe0008200000080008100140fe000340000002fd000308010001fe000081fe005afe000080f100040100110004fd000302002004fd000401040000
|
|
01fc0003400018c8fc000012f8000340010008fc000008fd0002011010f900010220fd0006101000081000c0fe000340000004fd000304020002fe000082fe005cfe000040f1000c01001100021000008004002004fd000001f8000340002020fc000020fe000910000002000024000010fc000004fd00010404fa000e4000
|
|
00400004800000100004600040fe000350000024fd000002fb000081fe0061fe000080f100040100020001fd000318001008fd000001fd000004fd000320004030fc000d2102000014000004800020004020fc000502000e000010f9000040fd000906000008200003800040fe000310000018fd00070200000480000082fe
|
|
005bfe000040f1000c01000200009000004020000808fc000090fe000004fd000320018010fb000c08800004000004800010000020fc00070100118000400120fc000008f8000040fd000020fb000008fd00070104000040000081fe005afe000080f10005010004000080fe000320000410fc000090f800020e0010fb0006
|
|
a0200004000004fd00011040fb000680204001000020fc000008fe000680000800000880fd000020fb000008fd000301080008fe000082fe005afe000040f1000c010004000060000020c0000220fc000340000004fb0002f00008f8000022fc000320000480fb000640c02004000080fc000610200001000002fe000080fe
|
|
00010120fe000340000018fd000001fb000081fe005cfe000080f1000c01000c000020000003000001c0fc00044000000802fd000303000028fa0002100040fe00006afd000080fb0002230020f8000910100001000011000005fd00010210fe000360000014fc000685000810000082fe0059fe000040f100080100080000
|
|
60000014fd000018fd00046000001002fd000304000008fc000080fa0004c000880001fa00061c001010000040f9000001fc000002fd00010410fe0003202a8020fc000690000010000081fe005cfe000080f1000801000c00001000000cfd00018180fe000c6000001000000140000c000088fc0002800008fb0003820800
|
|
01f8000010fe000040f9000601000020000004fd00010408fe000310800020fc0002b04010fe000082fe005cfe000040f1000801000c000090000011fe000001fc000020fc000604100008000004fd000001fd000980000021002804000280f900040820000040fb0002280002fe0002800008fc000008fe000390002042fc
|
|
000040fc000081fe005cfe000080f10008010010000008000010fc000010fe00001efd001a8000040014000104000003c002000004009f800020100004000420f9000004f700008afd000380400012fc000004fe000382000841fc000660102004000082fe005dfe000040f1000901001000010800002040fe00050a080000
|
|
0110fd000b401001002400000200000420fc00046140004410fe00010408f900040280000120fc000380000002fc000060fc000002fb000080fc000660000004000081fe0066fe000080f10008010020000008000040fe0008080001000001081802fe000a40005040001b0200000810fe00050200813c0080fd000008f800
|
|
0302000001fe000c7800010100800a800100000181fd00012001fe000308000280fc0006e0044000007082fe0067fe000040f1001501004000010400008010000010208080000008061d40fd0008048001808100001008fd00040180838010fd00011002f9000003fe001d1006008407860402002440010010020001e00000
|
|
4000c000012000010080fd00011001fe00018c81fe0064fe000080f100070100410000040001fa001d014000050000100019000001000804110000100400004001010000604004fe000010f700208000080081808380055002401e60000010020080015000800020000104000280bcfd000690000002010282fe006bfe0000
|
|
40f1001501208100020200020004014000404004101904400404fd00170240101008800020026001b0000100001d00040001002001fc00220400000440001e090001070011fc0000e1200200000c0000040001000010000087fe0cfc00070208000001020281fe006efe000080f1002a01cd00800001000403870410400000
|
|
18040212101000000200000c11804000400061c180060c0082000017fe00040100400004fd000051fe00152000a1820002008006038010912807c0003000200008fe000628000a0e01f040fd00010308fe0002040182fe0070fe000040f1002b0102c8a0040080040401d386810010100086020540010001001ff001850000
|
|
40018020800803000c000060c0fe0021818001c300000755004000101009c044000480405800610590c01c3004c000001102fe00088680a4320000042020fe00060401e000040c81fe0070fe000080f100660102860800004008000004058006004181004101f0000fe881e00005500001a006000040100099d00040003001
|
|
038082000200800091800010804009938030000b5520600018830022600c0700003040c080000205ea084000001810183d403404021800283082fe0070fe000040f10066013c28c20887a0300000cc05048184079c0100860e00b03e1200001820000011e80018202001c0200029000f00bc78020004100803401f87870780
|
|
9f86078d103000518000060c00119003fde000c10021ea0008086c018000002005441038003a0c0700100081fe000efe000080f10000019cff0082fe000efe000040f10000019cff0081fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b00
|
|
0002fe000afe0000408b000001fe000efe000080f10000019cff0082fe000efe000040f10000019c000081fe0014fe000080f1000001e000fbaa00a0c4000082fe0016fe000040f1000001e1000001fb550040c4000081fe0014fe000080f1000001e000fbaa00a0c4000082fe0016fe000040f1000001e1000001fb550040
|
|
c4000081fe0014fe000080f1000001e000fbaa00a0c4000082fe0016fe000040f1000001e1000001fb550040c4000081fe0021fe000d80000001878f0000fc600060000cfe000001e000fbaa00a0c4000082fe0023fe000d400000038cd98000c0e000e0001cfe000001e1000001fb550040c4000081fe0021fe000d800000
|
|
058cc18000c16001e0002cfe000001e000fbaa00a0c4000082fe0023fe000d4000000180c1803cf861e3600f0cfe000001e1000001fb550040c4000081fe0021fe000d800003f183870006cc633660198cfe000001e000fbaa00a0c4000082fe0023fe000d400003f18601803e0c6306600c0cfe000001e1000001fb550040
|
|
c4000081fe001bfe000d800000018c0180660c6307e0030cfe0000019c000082fe001bfe000d400000018c198066cc633063198cfe0000019cff0081fe001bfe000d80000007efcf003e79f9e0678f3ffe0000019cff0082fe0012fe000040f7000003fc0000019c000081fe000efe000080f10000019c000082fe000efe00
|
|
0040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe006ffe000080f1000d01007c00001f0000200001c00010fe0017040000100000e0001f000007c002000001c00000e000007cfd001d3e00003800000400008000400080007000001c0000070003e0001c00001cfe0018
|
|
40004000007c000e0000200000080001c000000e00000e0082fe006ffe000040f1000d0100100000040000500002200028fe00170a0000280001100004000001000500000220000110000010fd001d0800004400000a00014000a0014000880000220000088000800022000022fe0018a000a0000010001100005000001400
|
|
02200000110000110081fe006ffe000080f1000d0100100000040000880002000044fe00131100004400010000040000010008800002000001fe000010fd003008000040000011000220011002200080000020000008000080002000002000000110011000001000100000880000220002fe0005100000100082fe0074fe00
|
|
01401ffaff00f8fa000d0100100000040000880002000044fe00131100004400010000040000010008800002000001fe000010fd003008000040000011000220011002200080000020000008000080002000002000000110011000001000100000880000220002fe0005100000100081fe0075fe00018010fa000008fa0024
|
|
0100100000040000880002000044001f001100004400010000040000010008800002000001fe0035100000f80008000040000011000220011002200080000020000008000080002000002000000110011000001000100000880000220002fe0005100000100082fe0074fe00014010fa000008fa000d0100100000040000f8
|
|
000200007cfe00131f00007c0001300004000001000f800002000001fe000010fd00390800004000001f0003e001f003e000800000260000098000800026000020000001f001f000001000130000f800003e0002600000100000100081fe0074fe00018010fa000008fa000d0100100000040000880002000044fe00131100
|
|
004400011000040000010008800002000001fe000010fd003908000040000011000220011002200080000022000008800080002200002000000110011000001000110000880000220002200000100000100082fe0074fe00014010fa000008fa000d0100100000040000880002200044fe0017110000440001100004000001
|
|
000880000220000110000010fd003908000044000011000220011002200088000022000008800080002200002200000110011000001000110000880000220002200000110000110081fe0078fe0005801078000380fe000008fa000d0100100000040000880001c00044fe0017110000440000e000040000010008800001c0
|
|
0000e0000010fd00390800003800001100022001100220007000001c000007000080001c00001c000001100110000010000e0000880000220001c000000e00000e0082fe0017fe00054010cc000180fe000008fa0000019c000081fe0017fe00058010c0000180fe000008fa0000019c000082fe0017fe00094010c0f1e187
|
|
80337c08fa0000019c000081fe0017fe000980107998318cc0336608fa0000019cff0082fe0017fe000940100d81f18fc0336608fa0000019cff0081fe0017fe000980100d83318c00336608fa0000019c000082fe0017fe00094010cd9b318cc0337c08fa0000019c000081fe0017fe0009801078f1f7e7801f6008fa0000
|
|
019c000082fe0014fe00014010fb00016008fa0000019c000081fe0014fe00018010fb00016008fa0000019c000082fe0013fe00014010fa000008fa0000019c000081fe0013fe00018010fa000008fa0000019c000082fe0013fe0001401ffaff00f8fa0000019c000081fe000efe000080f10000019c000082fe000efe00
|
|
0040f10000019c000081fe000efe000080f10000019c000082fe000efe000040f10000019c000081fe0013fe0001801ff8ff00e0fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe00018010f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0013fe000180
|
|
10f8000020fc0000019c000082fe0019fe00074010780003800003fe000020fc0000019c000081fe0019fe00078010cc0001800003fe000020fc0000019c000082fe0019fe00074010c00001800003fe000020fc0000019c000081fe0019fe000b8010c0f1e187801f3ccdf020fc0000019c000082fe0019fe000b40107998
|
|
318cc03366cd9820fc0000019c000081fe0019fe000b80100d81f18fc03366cd9820fc0000019c000082fe0019fe000b40100d83318c003366fd9820fc0000019c000081fe0019fe000b8010cd9b318cc03366fd9820fc0000019c000082fe0019fe000b401078f1f7e7801f3c499820fc0000019c000081fe0013fe000180
|
|
10f8000020fc0000019c000082fe0013fe00014010f8000020fc0000019c000081fe0017fe00018010f8000020fc000001c3000006db000082fe0013fe00014010f8000020fc0000019c000081fe001bfe0001801ff8ff00e0fc000001c3000010e0000040fd000082fe0017fe000040f1000001c300012040e1000090fd00
|
|
0081fe0016fe000080f1000001c2000020e1000004fd000082fe0012fe000040f1000001a2000002fc000081fe0016fe000080f1000001c3000040e0000002fd000082fe001cfe0001401ffdff0080f7000001c300018010e2000004fc000081fe001ffe00018010fd000080f7000001da000080ea000008e1000001fd0000
|
|
82fe001bfe00014010fd000080f7000001db000001c9000004fc000081fe001ffe00018010fd000080f7000001db000001ea000080e0000001fd000082fe0022fe00014010fd000080f7000001db00010208ec0002010004e2000008fc000081fe0020fe000680100000c00080f7000001da000004ea000002e0000080fe00
|
|
0082fe001cfe000640100000c60080f7000001da000004ca000008fc000081fe0020fe000680100000060080f7000001da000004ec000002de000080fe000082fe001efe000640107d99c78080f7000001c40002020001e2000010fc000081fe0020fe00068010cd98c60080f7000001db000008e9000001e0000080fe0000
|
|
82fe001cfe00064010cd98c60080f7000001db000010c9000010fc000081fe0020fe00068010cd98c60080f7000001db000010eb000002de000040fe000082fe0024fe000640107d98c66080f7000001db00011001ec000304000080e3000020fc000081fe0024fe000680100cfbf3c080f7000001e0000080fc000001e900
|
|
0040e1000040fe000082fe0021fe000240100cfe000080f7000001e100010220fc000001ca000020fc000081fe0020fe000280100cfe000080f7000001da000001ec000004de000020fe000082fe0023fe00014010fd000080f7000001e100010408e6000304000020e3000020fc000081fe001ffe00018010fd000080f700
|
|
0001db000040e8000020e1000020fe000082fe0020fe0001401ffdff0080f7000001e100010804fd000040c9000040fc000081fe001efe000080f1000001db000080eb000008ed000080f3000020fe000082fe002cfe000040f1000001fd000060e600011002fd0002800080ed000308000010f1000001f4000040fc000081
|
|
fe002dfe000080f1000001fd000018e40002800280fe000080ea00040800000380f500010208f3000010fe000082fe002bfe000040f1000001eb000080f800042000000420fe000080e7000004f400010408f5000040fc000081fe002bfe000080f1000001fe000008ef000040f600014010fd000040ed000010ed000004f3
|
|
000010fe000082fe0029fe000040f1000001fe000010e500042000100010e9000310000002f0000004f5000080fc000081fe0026fe000080f1000001fe000020f000010408f6000305400001e7000001e1000010fe000082fe002cfe000040f1000001fe00014001f100010404f8000020fe00011002e6000320000010e700
|
|
0080fc000081fe0029fe000080f1000001fc000080e2000002ea000010fe000310001010f5000020f2000008fe000082fe002ffe000040f1000001fc000080e7000040fe00040802000020ed000020fc00012010f5000020f5000001fb000081fe002bfe000080f1000001fc000080f200011002f0000020e9000302002010
|
|
f500014002f3000008fe000082fe002efe000040f1000001ec00012002f8000040fe000004fe000020e90002020020f400014002f6000001fb000081fe0021fe000080f1000301000004dc000020ed000020ed000001f3000004fe000082fe0029fe000040f1000301000008e4000080fe000004fd000003ee000020ed0000
|
|
01f6000001fb000081fe0027fe000080f1000301000030ef00014001f3000004fe00010820ea000080e3000004fe000082fe0032fe000040f1000201001cfe000020f20002400080fa000001fd00010204fe00011010ea0002800004e8000001fb000081fe002dfe000080f10002010010fe000020e2000008fb000010f100
|
|
0040fc00018004f6000001f1000004fe000082fe0033fe000040f1000001fc000010e8000001fd0008020800001000040028f1000040fc00018004f6000002f4000002fb000081fe0031fe000080f1000001fc000010f20002800080f100041040040040ec0002410004f6000302000080f4000004fe000082fe0031fe0000
|
|
40f1000001ed000301000040fa000001fd000002fe00011080e9000041f4000302000080f7000002fb000081fe001efe000080f1000001d9000010ed000080ec000080f4000002fe000082fe002efe000040f10002010080e4000002fd000001fc0002010001fc000004f7000080ec000040f7000002fb000081fe0032fe00
|
|
0080f100010101ee000301000040f40008100000010001000080fd00000bf2000020f0000003f5000002fe000082fe003afe000040f100010101fd000008f3000302000020fa000002fd000901100000010000020080fd00011080f30002100002e8000002fb000081fe003dfe000080f100010101fd000004ef000002f500
|
|
0010fc0002020080fd00012040f8000080fd0002020002f6000008fe00011020f600040200080082fe0042fe000040f1000001fc000004ef000001fb000004fc0006a0000004000084fb00012020f9000001fc0002020002f6000008fe00012010f9000004fd0002200081fe0042fe000080f1000001fc000004f700000efe
|
|
00070200001000c003c0f5000306000084fb00014010f30002120001fd000008fb000310000020f400040100020082fe0046fe000040f1000001f7000040fc000009fe0007040000104000c002fe000008fc000340000006f800018010f3000014fb000002fb000310000020f7000004fd0002400081fe0042fe000080f100
|
|
0001f7000010fc000310800070fd000340030001f5000004fb00042a00008008f9000001f5000020fc000060fe0002204004f600040100010082fe004efe000040f100010104f900010104fc000320400088fd000080fe000380000008fc000040fc0008400020008000010004f9000001f5000001fc000090fe0002204004
|
|
f9000008fd0002800081fe0049fe000080f100010108f20008202000840800000880fe000040f9000f4000000400002000200000800100020ef4000008fe00030c000040fc000090f8000103e0fa000380010082fe0057fe000040f100010108fd000002fe00010201fc00074020010208000008fa000008fc000f60000004
|
|
000020001001000002000231f400040800008040fe000080fe00010108f800010c10fe000008fe000301000081fe0051fe000080f100010108fd000002fa000670000040100202f0000040fc000a200010000040020001c0c0fb000002fc00070800008040c00080fd0002020840fe00018001fe0001700cfa000340008082
|
|
fe005efe000040f1000001fc000001fe000a0400400188000080080401f6000010fc000ca0000002000060000002000004fe000040fb000002fc000310000080fd000040fe0010020440000001000100c000800200026008fe000302000081fe0056fe000080f1000001fc000001fb000b020600008008040110000008f100
|
|
030a000050fd00011004fe000020fc000050fb000614000080000001fe00100180040440000010000001300080020004fc000340004082fe005cfe000040f1000001f8000e080020020100010004080090000004fd000308000010fc000310000009fc000304000008fe000020f5001c140000010008000020000240040240
|
|
0000100000020803000100000410fe000304000081fe0053fe000080f1000001f5000b0400c0010004100080000004fd000008f6000001fa00010808fe000010fd0002010804f9000301000402fe00030c200802fe000a1200008204040001000002fd000310002082fe0057fe000040f100010140f9000e10000808002002
|
|
0002200080000008fd000304000020fc000008fd00077810000804000010fe000010fb000004f40005200030100802fe000b120000840218000080100010fb000081fe0057fe000080f100010140f6000b100010020001c0006000000afd0001041cfb000001fe000910000008000400000220fe000008fd00010202fb0000
|
|
02fc00070400001fc0081002fb00060801e000008010fa00012082fe005afe000040f100010180fc00098000001000041000080cfd000340000002fc0002620040fd0005020400001004fe00040408000020fe000008f5000d0200004400020000102000081001fb000010fd000340000090fb000081fe005dfe000080f100
|
|
010180fc000040fc000320000410fd000040f900018180fc000002fd000008fe000402000001c0fe000004fd0002020108fc000d4000004400010400004000042001fe000304000030fd000340000080fc00011082fe005efe000040f1000001fb000941c000200002200003e0fd000040f90002804080fd00010204fe0007
|
|
3000040000100001fd000304000001fe000008fc000340000040fd000c08800002200180000004000020fd000340400020fb000081fe005afe000080f1000001fb00014220fd000040fa000060fe000080fe0002010020f8000320000004fd00010e40fe000704000006c0040080fc0014810000400000080000800001c002
|
|
80000008000020fd00014040fa00010982fe005bfe000040f1000001fb000604180040000140fa0004a000002080fe0002030011fb000302000020fc0002200010fd00050201f0082004fa0008810000080000800009fd00070285000008000040fd000320000020fc00010181fe0052fe000080f1000001fb00010804fd00
|
|
0080fd000608000010000020fd0002030008f000012020fe00070202081010000020f90005080000900001fc0006d0400008000090fd000320000020fc00010682fe0058fe000040f100010180fc000608040080000080fd000602000010000040fd0002050006fb000001fc0005020001200040fd0007010206e010080020
|
|
f4000006fd0007014000000c000090fd000310800040fc00010281fe005dfe000080f100010180fc00011003fe000001fc00072000011000004040fe0002048002fc000004fe000940000001000080004010fd00068c010008000010fb000080fd0002100004fd0002044020fe000001fc000011f900010282fe005cfe0000
|
|
40f100010140fc000610008080000140fd000301000108fe000040fe0002040006fc00040800800040fc0002c00080fc00049000000810f900074000100000200006fd00011040fd000003fc000310000048fc00010481fe005afe000080f100010140fc0005300080000002fc000340000008fa0002080001fc000008f800
|
|
0340010008fd000660000004000044fd000004fe0005100000200008fd00070820100010000408fd000310000008fc00010582fe005efe000040f100010140fc0008300041000002200001fe0002400008fc0004e000080009fe0004c000100040fc000380004002f8000304100080fd000008fe000010fd000009fd000748
|
|
20000010000408fd00030a000080fc00010881fe005ffe000080f100010140fc000e480020000004000007400080000404fe000a2000940010000080000130fc000080fe0005800000020004fa000302000002fd000910202000100000400010fd000610100800020008fc00000af900010882fe0060fe000040f100010120
|
|
fc001c4000120000042000040f80002004040002001001080010202040000208fe00021000e0fc00018004f800010220fb000a1008200040000010002088fe0006901000000a0010fc000308000104fd0002c01081fe0062fe000080f100010120fc000a8000100000080000081819fe000e0200020000024a002020004000
|
|
0208fd00010110fb0002040002fa000301000601fc00098200007000008800c022fd000610040022002004fd000304000002fe000301201082fe0064fe000040f100010120fc000680000e00000810fe00120600100002000c005004040020208020000404fe0002080210fe000340011008f90002028140fa0003f0000048
|
|
fe000d0700008000020008000022002002fd000a0c000200f8000002182081fe006afe000080f100010110fd002301000001c000100000208803000010020301f8080484804092001000040400800000050cfe000620001010000080fc0011082080200080003800030910008600010038fd00070200080400808040fc000a
|
|
0a00080104000004044082fe006afe000040f100010110fe002b3d020300083000100800010000800410010e8e07040802004228000800080200800004060200e00000020990f9001820088040180000c6010204480185000005c0000040001c0008fd0002800020fe000a0200a00104000008038081fe006ffe000080f100
|
|
240108000400c702064000080020000040040410000001003001810a02408800000800100103fe00210802013a001c0fdc300000400a000a003800024200e72001010206020002848002fefd000a0610000402024021000188fe000a0102000683000008000082fe006ffe000040f10023010400114110cc0100110f002002
|
|
01980000080280008810044010010080000006001001fe002101100bae848073f4a0680078000080388040c0003c0100800100c40c012004024007fd000b1008000004cc008002000302fe000a23a8800800800010000181fe0070fe000080f10048011500400200701118420a08c000060002080801aa8160680040e400a1
|
|
20000001006003c00000cc240088022088091c800800602000c070002001900607f057552410010408022018fe000c0e00114000020084008c0009e0fe000a688371c800800321c00082fe0070fe000040f10055010251001dd90441858d919f8001098000a007980026a38008302800ce800fcc03c0cc8043fc1e00e8a070
|
|
010301980301f00018802000080b99980ff800150800505000ca12a01620009000101e140000050090004afe000d81e000805cec31207f1070200081fe000efe000080f10000019cff0082fe000efe000040f10000019cff0081fe000afe0000808b000002fe000afe0000408b000001fe000afe0000808b000002fe000afe
|
|
0000408b000001fe000afe0000808b000002fe0006fe008955fe0006fe0089aafe000283000283000283000afe000002b5aa00a8d4000afe000005b5550050d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b500
|
|
0010d4000afe000002b5000008d4000afe000004b5000010d40015fe00020200f0fe00053000cc600060c0000008d40015fe0002040198fe00053000cc600060c0000010d40015fe0002020180fe00053000cc000060c0000008d40017fe000d040181e3cf8f3e00cce3e3e79980c2000010d40017fe000d0200f3306cd9b3
|
|
00cc63366cd980c2000008d40017fe000d04001bf3ec183300fc63366cd980c2000010d40017fe000d02001b066c183300fc63366cdf80c2000008d40017fe000d04019b366c19b300cc63366cdf80c2000010d40016fe000c0200f1e3ec0f330085fb33e789c1000008d4000afe000004b5000010d4000afe000002b50000
|
|
08d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d40014fe000004f600007ff9ff00c3f9
|
|
ffd2000010d40014fe000002f6000040f9000043f9ffd2000008d40014fe000004f6000040f9000043f9ffd2000010d40014fe000002f6000040f9000043f9ffd2000008d4001cfe000304001f0cfe00010180fe000040f9000043f9ffd2000010d40025fe00080200198c0000030180fe000343000018fe0003180043f8fc
|
|
ff019fffd2000008d40026fe000604001980000003fe00050c0043000018fe0004180043f27ffdff019fffd2000010d40025fe000f0200199c7c78f3c3879f1e0043000018fe0003180043f3fcff019fffd2000008d40027fe001d0400198c66cd9b018cd98c0043e3c799b33cf8f9e043f3e183330c1c187fd2000010d400
|
|
27fe001d0200198c60fd83018cd9800043306cdb3306cd9b3043e1cc9933e4c9933fd2000008d40027fe001d0400198c60c183018cd980004333ec1e333ec1998043f3cc9f3304f999ffd2000010d40027fe001d0200198c60cd9b318cd98c0043366c1e3f66c1986043f3cc9f0264f99e7fd2000008d40027fe001d04001f
|
|
3f6078f1e7e7999e0043366cdb3f66c19b3043f3cc9f0264f9933fd2000010d40021fe000002f800130c0043e3e799923ec0f9e043f3e19fb704fc187fd2000008d40014fe000004f6000040f9000043f9ffd2000010d40014fe000002f6000040f9000043f9ffd2000008d40014fe000004f6000040f9000043f9ffd20000
|
|
10d40014fe000002f6000040f9000043f9ffd2000008d40014fe000004f600007ff9ff00c3f9ffd2000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004
|
|
b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4002afe000002f600007ffaff00e1f6ff01f87ffaff00e1f8ff01fe1ffaff01f87ffbff00f0faff01e008d4002bfe000004f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00
|
|
011080fb00012010d4002bfe000002f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012008d4002bfe000004f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012010d40033fe000202000ffe000203000cfe000040fa000021f600010840fa0000
|
|
21f8ff01fe10fa00010840fb00011080fb00012008d4004ffe000804001980000003000cfe001943e0000600180000210f8000063000000cc000000843f000003ffe000221f87ffdff05fe7ffffe1078fb00110843e000181c00001083c0001c1800002010d40050fe0002020018fe001f03000c000c004330000630180000
|
|
210cc000063000000cc000000840c000000cfe000c21f33fffff3ffcfe7ffffe10ccfb001108433000180c0000108660000c18c0002008d40050fe00100400181e3cf8f3e00f999e004330000030fe0004210cc00006fe00090ec000000840c000000cfe000621f33fffff3ffcfeff02fe10c0fb001108433000180c000010
|
|
8660000c00c0002010d40053fe004d02000f3306cd9b300cd98c004333c78e3c3879f0210ccf1e3e71f1d00ecf363c0840c3c7400c66f8f021f320c1c30f0c3c7860fe10c0f1f6679f1e3c084337c79f0c3cd810866ccf0c38f1982008d40053fe004d040001bf3ec183300cd9800043e66cc63018cd98210f998366319b30
|
|
0fc1bf660840c06cc00c66cd9821f0264c993fe4fe73267e10799b366cd9b3660843e66cd98c66fc10866cc18c18c1982010d40053fe004d020001b066c183300cd98000430666063018cd98210f1f9f66319b300dcfbf7e0840c3ecc00c66cdf821f3264c993f04fe73267e100dfb366fd9b07e0843060cd98c7efc10866c
|
|
cf8c18c1982008d40053fe004d040019b366c19b300ccf8c00430661863018cd98210d9833663199e00dd9b3600840c667800c66cd8021f3264c993e64fe73267e100d83366c19b0600843060cd98c60cc10876cd98c18c1982010d40053fe004d02000f1e3ec0f3300f819e0043066cc63318cd98210cd9b366319b000cd9
|
|
b3660840c66c000c3ef99821f3264c993264ce73267e10cd99f66cd9b3660843060cd98c66cc1086ecd98c18ccf82008d4004efe000004f90044198c004303c79f9e7e7998210c4f1f3efd99e00ccfb33c0840c3e7800c06c0f021f3264cc387061818667e1078f033e7999e3c084306079f3f3ccc1083c7cfbf7e78182010
|
|
d4003dfe000002f900030f000040fa000021fc00010330fd00090840000cc00066c00021f8ff04fe10000030fd00010840fb0002108060fe000301982008d40038fe000004f6000040fa000021fc000101e0fd00090840000780003cc00021f8ff04fe10000030fd00010840fb00011080fc0002f02010d4002bfe000002f6
|
|
000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012008d4002bfe000004f6000040fa000021f600010840fa000021f8ff01fe10fa00010840fb00011080fb00012010d4002afe000002f600007ffaff00e1f6ff01f87ffaff00e1f8ff01fe1ffaff01f87ffbff00f0faff01e008d4000afe
|
|
000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b50000
|
|
08d40015fe000004f600007ff6ff01fe1ff5ffda000010d40017fe000002f6000040f600010210f6000001da000008d40017fe000004f6000040f600010210f6000001da000010d40017fe000002f6000040f600010210f6000001da000008d4001efe000004fd000301980380fe000040f600010210f6000001da000010d4
|
|
001ffe000002fd000301980180fe000040f600020210fcf7000001da000008d40020fe000004fd000701980180000c0040f60002021030f800010c01da000010d40022fe000002fc000691e18ccf1e0040f60005021030000003fb00010c01da000008d40025fe000004fc000690318cd98c0040f6000d0210319be3c7801e
|
|
3cd9b1e7cf01da000010d40025fe000002fc0006f1f18cdf800040f6000d0210319b3663003366fdfb366c01da000008d40025fe000004fc000663318cd8000040f6000d0210319b37e0003066fdfbf66c01da000010d40025fe000002fc000663318cd98c0040f6000d0210319b3600003066cd9b066c01da000008d40025
|
|
fe000004fc000661f7e7cf1e0040f6000d021030fbe663003366cd9b366cc1da000010d40021fe000002f800020c0040f6000d0210301b03c7801e3ccd99e66781da000008d4001bfe000004f6000058f600050210019b0003fa000001da000010d40019fe000002f600007cf60003021000f3f8000001da000008d40017fe
|
|
000004f6000066f600010210f6000001da000010d40017fe000002f6000040f600010210f6000001da000008d40015fe000004f600007ff6ff01fe1ff5ffda000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe0000
|
|
04b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d40012fe00010201fbff03e1fffffec0000008d40012fe00010401fb000321000002c0000010d40012fe00010201fb000321000002c00000
|
|
08d40012fe00010401fb000321000002c0000010d40012fe00010201fb000321000002c0000008d40014fe00010401fd0005018021001802c0000010d40014fe00010201fd0005018021001802c0000008d40014fe00010401fd0005018021001802c0000010d40015fe000b0201078f1e7c79f021079982c0000008d40015
|
|
fe000b04010cd98366cd98210cdb02c0000010d40015fe000b0201061f9f60c198210cde02c0000008d40015fe000b040101983360c198210cde02c0000010d40015fe000b02010cd9b360cd98210cdb02c0000008d40015fe000b0401078f1f60799821079982c0000010d40012fe00010201fb000321000002c0000008d4
|
|
0012fe00010401fb000321000002c0000010d40012fe00010201fb000321000002c0000008d40012fe00010401fb000321000002c0000010d40012fe00010201fbff03e1fffffec0000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000004b5
|
|
000010d4000afe000002b5000008d4000afe000004b5000010d4000afe000002b5000008d4000afe000005b5550050d4000afe000002b5aa00a8d400028300028300028300028300028300028300028300028300028300028300028300a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.3\tab A typical display from the contig editor in XBAP\par
|
|
\pard\plain \s4\qj\sb160\sa120\sl280 \f20 The four scroll buttons operate as follows\:\par
|
|
\pard \s4\qj\li1720\sa120\sl280\tx4520 "<<"\tab Scroll left half a screenful\par
|
|
"<"\tab Scroll left one character\par
|
|
">"\tab Scroll right one character\par
|
|
">>"\tab Scroll right half a screenful\par
|
|
\pard \s4\qj\sa120\sl280
|
|
The Editor cursor can be positioned anywhere in the edit window by moving the mouse pointer over the character of interest, then pressing the left mouse button. The Editor cursor can also be moved by using the direction arrow keys.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.2\tab Editing operations \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The editor operates in two main edit modes - Replace
|
|
and Insert. Replace allows a character to be replaced by another. Insert allows characters to be inserted into a reading. Characters are entered by typing them from the keyboard. Only valid characters are permitted. Characters can be deleted by positionin
|
|
g the cursor one character to their right, then pressing the delete key. Normally Insert and Delete apply to the consensus line of the contig only. This restraint can be overridden by using the "Super Edit" mode of operation, though it should be employed w
|
|
ith caution as misuse may corrupt alignments.\par
|
|
|
|
Edits can also be performed on the consensus, though they are restricted to insertion and deletion of padding characters ("*"). These edits also have special meanings. A deletion will delete all characters at the position to the left of the cursor in the c
|
|
ontig, and move the relative positions of all sequences starting to the right of the cursor position left one character. An insertion will insert the character typed ("*") into all gel reading sequences at the
|
|
cursors position in the contig, and move the relative positions of all sequences starting to the right of the cursor position right one character.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.3\tab Use of buttons \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The effect of the last edit can be undone by pressing the "Undo" button at the top of the editor window. Pressing it n times will undo the last n edits.\par
|
|
\pard \s4\qj\sa120\sl280 The cursor will automatically be positioned at the next problem when the "Find Next Problem" button is selected. The next problem is where the consensus shows either a disagreement ("-") or a pad ("*") character.\par
|
|
\pard \s4\qj\sa120\sl280 The edits to the contig can be saved by pressing the "Leave Editor" button and replying "Yes" to the prompt to "Save changes?".\par
|
|
As no changes are made to the working copy of the database until this point it is possible to abort the editor if the edit session ends up in an unsatisfactory state.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.4\tab Displaying traces for readings from fluorescent sequencing machines\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The original trace data from which the gel reading sequences were derived can be seen by double clicking (two quick clic
|
|
ks) with the middle mouse button on the area of interest. The trace will be displayed with the point clicked at the centre of the trace viewport. All traces that are displayed are maintained in one window, which will display a maximum of four traces. When
|
|
four traces are already being displayed and a new one is requested, the one at the top of the window is removed and the new one is added to the bottom. Traces can be removed individually by using the "quit" button in the panel next to the trace. \par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.5\tab Extending reads with the unused data\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Sequence data from fluorescent sequencing machines is normally clipped to remove the primer region and the poor quality data from the 3' end is marked to be ignored during assembly. Only the sequence used during assembly is made visible in the XBAP editor.
|
|
However the unused data is copied into the database and can be viewed from within the editor. Also the position of this "cutoff" can be altered. To display the unused sequences, press the "Display Cutoff" button at the to
|
|
p of the editor window. The cutoff sequence appears in grey. This sequence can be incorporated into the editable sequence, by moving the cutoff position. This is done by positioning the cursor at the end of the sequence, and using Meta-Left-Arrow and Meta-
|
|
Right-Arrow to adjust the point of cutoff. The Meta key is a diamond on the Sun keyboard.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.6\tab Using the pop-up menu\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 A pop-up menu is revealed by depressing the "Control" key on the keyboard and at the same time pressing the left mouse button.\par
|
|
\pard \s4\qj\sa120\sl280 The menu has the following functions\:\par
|
|
\pard\plain \li1880\sl220 \f4\fs16 Find Next Problem\par
|
|
Highlight Disagreements\par
|
|
Save Contig\par
|
|
Create Tag\par
|
|
Edit Tag\par
|
|
Delete Tag\par
|
|
Search\par
|
|
Select Oligo\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 \par
|
|
\pard \s4\qj\sa120\sl280 "Find Next Problem" and "Save Contig" are described above. Operations on tags are described in the section on annotation below, and then searching is outlined.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.7\tab Annotating readings\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Parts of a sequence can be annotated to record the positions of primers used for walking, or to mark sites, such as compressions, that have caused problems during sequencing. The annotations ar
|
|
e termed "tags". Each tag has a type such as "primer", a position, a length and a comment. Each type has an associated colour that will be shown on the display. First the segment to tag is selected, then it is annotated. The consensus sequence cannot be a
|
|
nnotated.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.8\tab Creating a new annotation\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Use the left mouse button to position the start of the selection. While this button is being held down, move the mouse to the other end of the segment. The selection can be extended further using the right mouse bu
|
|
tton. To create the annotation, invoke the pop-up menu, and select the "Create Tag" function. A small "tag editor" will appear which allows users to select the type of the annotation from a pull-down menu, and specify a comment if desired. To select a new
|
|
type pull down the Type menu, and select the entry desired. To enter a comment, simply type into the text window in the tag editor. The annotation is created when the "Leave" button on the tag editor is pressed, and is displayed in the colour defined in th
|
|
e tag database file (TAGDB).\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.6.9\tab Editing an existing annotation\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Position the cursor with the left mouse button on the tag, and select the "Edit Tag" off the pop-up menu. This invokes the tag editor, and changes to the type and comment of the annotation can be made. The tag is updated when the "Leave" button is pressed.
|
|
\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1180 \b\f20 2.6.10\tab Deleting an annotation\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 To delete an existing annotation, position the cursor with the left mouse button on the tag, and select the "Delete Tag" off the pop-up menu.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1160 \b\f20 2.6.11\tab Searching\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Selecting "Search" brings up a window which can remain present during normal editor operation. The window allows the user to select the direction of search, the type of search and a value to search on. The value is entered into a value text window, then pr
|
|
essing the "search" button performs the search. If successful, the cursor is positioned accordingly. An audible tone indicates failure. Pressing the "ok" button removes the search window. The search window is automatically removed when the contig editor is
|
|
exited. There are seven different search modes.\par
|
|
\pard\plain \s9\fi-560\li1120\sb180\sa60\sl280\tx1700 \b\f20 2.6.11.1\tab Search by position\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This positions the cursor at the numeric position specified in the value text window. Eg a value of "1234" causes the cursor to be placed at base number 1234 in the contig. Positioning withing a reading is achieved by prefixing the number with the "@" char
|
|
acter, eg "@123" positions the cursor at base 123 of the sequence in which the cursor lies. Relative positions can be specified by prefixing the number with a plus or minus charac
|
|
ter. Eg "+1234" will advance the cursor 1234 bases. If possible, the cursor is positioned within the same sequence. The direction buttons have no effect on the operation of "search by position".\par
|
|
\pard\plain \s9\fi-560\li1120\sb180\sa60\sl280\tx1720 \b\f20 2.6.11.2\tab Search by reading name\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This positions the cursor at the left end of the gel reading specified in the value text window. If the value is prefixed with a slash it is assumed to be a gel reading name. Otherwise it is assumed to be a gel reading number. Eg "123" positions the cursor
|
|
at the left end of gel readi
|
|
ng number 123. "/a16a12.s1" positions at the start of reading a16a12.s1. If the value was "/a16" the cursor is positioned at the first reading which starts with "a16". The direction buttons have no effect on the operation of "search by reading name".
|
|
\par
|
|
\pard\plain \s9\fi-560\li1120\sb180\sa60\sl280\tx1700 \b\f20 2.6.11.3\tab Search by tag type\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This positions the cursor at the start of the next tag which has the the same type as specified by the type value menu. To change the type, select from the menu that pops up when the mouse is clicked on the button labeled "Type\:". Th
|
|
e search can be performed either forwards or backwards from the current cursor position. To find all tags, use "search by annotation", with a null text value string.\par
|
|
\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.4\tab Search by annotation\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This positions the cursor at the start of the next tag which has a comment containing the string specified in the value text window. The search performed is a regular expression search, and certain characters have special meanings. Be careful when your val
|
|
ue string contains ".", "*", "[", "^" or "$". The search can be performed either forwards or backwards from the current cursor position.\par
|
|
\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.5\tab Search by sequence\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This positions the cursor at the start of the next piece of sequence that matches the value specified in the text value window. The search is for an exact match, which means that the case of the value string is important. The search is performed on the gel
|
|
readings themselves, rather than the consensus sequence. The search can be performed either forwards or backwards from the current cursor position.\par
|
|
\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.6\tab Search by problem\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This positions the cursor at the next place in the consensus sequence which is not "A", "C", "G" or "T". The search can be performed either forwards or backwards from the current cursor position.\par
|
|
\pard \s4\qj\sa120\sl280 \par
|
|
\pard\plain \s9\fi-560\li1120\sa60\sl280\tx1700 \b\f20 2.6.11.7\tab Search by quality\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This positions the cursor at the next place in the consensus sequence where the consensus for each strand is not "A", "C", "G" or "T" or where the two strands disagree. The search can be performed either forwards or backwards from the current cursor posit
|
|
ion.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 \par
|
|
2.7\tab Joining contigs interactively using XBAP\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The operation of the join editor in XBAP is very similar to the one for single contigs described above. It allows the user to align the ends of the two contigs by editing each contig separately. First specify which two contigs are to be joined. The program
|
|
checks that the two contig numbers are different (it will not allow circles to be formed!) The Join Editor consists of two Contig Editors in between which is sandwiched a disagreement box. This disagreement box
|
|
uses exclamation marks to denote mismatches between the two consensuses. A typical example is shown in figure 4.4. Here we see in the top window the right end of one contig and in the bottom window the left end of another. The left end of the overlap is c
|
|
orrectly aligned, as indicated by an absense of exclamation marks, but the top contig has an extra character at position 558 which is spoiling the alignment over the next segment. Notice that the "lock" button is highlighted denoting that the user has aske
|
|
d for the two contigs to scroll together.\par
|
|
\pard \s4\qj\sa120\sl280 The best strategy for joining is to align the leftmost character of the right contig with its counterpart in the left contig. Then press the \'d2Lock\'d3
|
|
button before editing the contigs to make them align for the whole overlap. The overlap must be of at least
|
|
one character. Use the scroll bar and the scroll buttons ("<<", "<", ">", and ">>") for positioning the relative positions of the two contigs. The join position can be fixed by pressing the "lock" button at the top
|
|
of the Join Editor. Locking allows the two contigs to be scrolled as one when using the scroll bar and buttons, the left ends always in the same position relative to each other. Once locked, it is best to proceed to the right along the contigs, inserting
|
|
padding characters ("*") into the consensuses to minimise the disagreements. It is important that the user aligns the two contigs throughout the whole region of overlap before completing the join because it is only at this stage that the two contigs can be
|
|
edited independently. If a join is completed leaving a region of mismatch the consensus will consist of dashes and the assembly function will fail to find overlaps in the bad section. Misaligned sections can be corrected using the "super edit" mode of the
|
|
editor. The join can be completed by pressing the "Leave Editor" button. The percentage mismatch is displayed, and users are required to confirm that they want to perform the join.\par
|
|
\pard\plain \li100\ri80\sb100\sl220\keepn\box\brsp20\brdrth \f4\fs16 {{\pict\macpict\picw441\pich144
|
|
4685ffffffff008f01b81101a0008201000affffffff008f01b80900000000000000003100000000008e01b798007c00000000014003db00000000014003db00000000008e01b7000102850002850026e600001ff9ff0087f8ff01f87ff5ff01fe1fefff0087fcff01fe1ffcff01f87ff2ff00e0f40026e600001ff9ff0084
|
|
f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f4003701003cfa000203fc03fa0008630c18000181
|
|
80001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f4005b010066fa0002030003fa000ac30c38000380c0001f807ffbff05841f8000003cfd0002087f87fbff07e01fe7fffffe107efc00030f000078fd00133c0f0000841860000600021f9ffffffe7ff84180fb00021fe018fc000020
|
|
f400610100c3fe0008c01800000300030603fd000a01830c7800078060001ff3faff058418c000000cfd0002087f33fbff07e7ffe7fffffe1063fc0003030000ccfd001366198000841860000600021f9ffffffe7ff84180fb0002180018fc000020f400670100c0fe0008c01800000300030603fd000a01830cd8000d8060
|
|
001ff3fcff07f9ff84186000000cfd0002087e79fbff08e7ffe7cfe7fe106180fd001b030001860006000066198000841860000600021f9ffffffe7ff84180fb00041800183018fe000020f400670100c0fe0000c0fe00040300030003fd000a0301989800098030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08
|
|
e7ffe7cfe7fe106180fd001b030001800006000060180000841860000600021f9ffffffe7ff84180fb00041800183018fe000020f400681c00c00f0dc3f0781f4003003b1e0fc0f0de000301981800018030001ff3fcff07f9ff84186000000cfd0002087e7ffbff08e7ffe7ffe7fe106180fd001b03000180000600006018
|
|
0000841860000600021f9ffffffe7ff84180fb00041800180018fe000020f400726e00c0198e60c01831c003f0670603019873000301981800018030001ff3e47c0f8790e07f841861e1b80c0fc1f078087f3f9e647e1e43ffe7fe270f81fe1061878618783f03000180619f81e060180fc0841866e0761e021f9ff87e0e73
|
|
f841801e0fc61878001801d8f07e0786f020f400726e00c030cc30c01831800300c30603030c60000300f01800018030001ff3e339e733c679ff8418c331cc0c186318cc087f879e633ccf19ffe07cc7cfe7fe10630cc618cc6183000180618603306018186084186730ce33021f9ff33ce667f8418033186618cc001f8338
|
|
30180cc39820f400726e00c030cc30c01831800300c30603030c60000300f01800018030001ff3e799fe79cff9ff841f8619860c00660186087ff39e6799e73fffe7f9e7cfe7fe107e18633186018300018061860619f87e1800841866198661821f9fe799fe4ff84180618063318600180618301818630020f400726e00c0
|
|
30cc30c01831800300c30603030c60000180f01800018060001ff3e79c0e01cff9ff841987f9860c0fe601fe087ff99e6798073fffe7f9e7cfe7fe10661fe331fe3f830001806186061860180fc0841866198661821f9fe799fe1ff841807f8fe331fe00180618301818630020f400726e00c330cc30c0181f000300c30603
|
|
030c60000180601800018060001ff3e79fe67fcff9ff8418c601860c18660180087ff99e6799ff3fffe7f9e7cfe7fe10631801e18061830001806186061860180060841866198661821f9fe799fe0ff84180601861e18000180618301818630020f400726e0066198c30cc18300003006706033198600000c06018070180c0
|
|
001ff3e79fe67fcff9ff8418c601860c18660180087e799e6799ff3fffe7f9e7cfe7fe10631801e18061830001866186061860180060841866198661821f9fe799fe47f84180601861e18000180618301818630020f400726e003c0f0c3078ff1f8003fc3b3fc1e0f06000006060ff070ff180001ff3e799e739cff99f8418
|
|
6319cc0c186318c6087f33cc633ce73fffe7fcc7cfe67e10618c60c0c661830000cc3386633060181860840cc618ce33021f9ff33ce663f84180319860c0c60018033830198cc30020f4005efa000130c0ef00531f80679c0f83cffc3f841861f1b87f8fa1f07c087f87e2647e0f3fffe01e2601f0fe106187c0c07c3e9fe0
|
|
00781d83c1e060180fc084078618761e021f80787e0e71f841fe1f0fa0c07c001fe1d9fe0f07830020f40032fa000130c0ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef000084fc0001021ffcff01f840f2000020f40032fa000130c0ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef00
|
|
0084fc0001021ffcff01f840f2000020f40032fa00011f80ef00001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef000084fc0001021ffcff01f840f2000020f4002de600001ff9ff048400000180fc0004087fffffe7f8ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001
|
|
087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0084f80001087ff5ff01fe10ef000084fc0001021ffcff01f840f2000020f40026e600001ff9ff0087f8ff01f87ff5ff01fe1fefff00
|
|
87fcff01fe1ffcff01f87ff2ff00e0f40002850002850002850002850002850002850002850002850007001f88ff01fe00180010fc000006fe00010180fe000060fc00000c9d00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000c
|
|
c9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de0001020024151000004010000600004001800200006000100400000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de0001020024151000018060
|
|
0006000180018001800060000c0300000cc9000001fa550040de00010200241510000300c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de00010200241510000c03000006000c000180003000600001806000
|
|
0cc9000002faaa00a0de000102002415100018060000060018000180001800600000c030000cc9000001fa550040de00010200241510000c03000006000c0001800030006000018060000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de000102002415
|
|
10000300c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000180600006000180018001800060000c0300000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de00010200241510000040100006000040018002000060
|
|
00100400000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00
|
|
010180fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200180010fc000006fe00010180fe000060fc00000c9d0001020007001f88ff01fe0007001f88
|
|
ff01fe000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200420010ed00000cfb000307f8780cf800037f9fe0c0f9000307f8780cf800037f8780c0f9000301e0300cf800031e0300c0f900
|
|
0301e0780cf800071e0780c000000200420010ed00000cfb00030600cc1ef80003600061e0f900030600cc1ef80003600cc1e0f900030330781ef80003330701e0f900030330cc1ef80007330cc1e000000200420010ed00000cfb000306018633f8000360006330f9000306018633f8000360186330f900030618cc33f800
|
|
03618f0330f9000306198633f800076198633000000200420010ed00000cfb000306018033f800036000c330f9000306018633f8000360186330f900030600cc33f80003601b0330f9000306018633f800076018633000000200460010ed00000cfb00040601806180f900036000c618f900040601866180f9000360186618
|
|
f900040601866180f9000360130618f900040600066180f900076000661800000200460010ed00000cfb000406e1b86180f900036e018618f9000406e0cc6180f900036e186618f9000406e1866180f900036e030618f9000406e0066180f900076e00c61800000200460010ed00000cfb00040731cc6180f9000373018618
|
|
f900040730786180f90003730ce618f900040731866180f9000373030618f9000407300c6180f900077303861800000200440010ed00000cfa000319866180f9000301830618f8000318cc6180f9000301876618f900040619866180f9000361830618f900040618386180f900076180c61800000200440010ed00000cfa00
|
|
0319866180f9000301830618f8000319866180f9000301806618f900040619866180f9000361830618f900040618606180f900076180661800000200400010ed00000cfa0002198633f8000301860330f80002198633f8000301806330f900030618cc33f8000361830330f900030618c033f8000761986330000002004200
|
|
10ed00000cfb000306198633f8000361860330f9000306198633f8000361986330f900030618cc33f8000361830330f9000306198033f800076198633000000200420010ed00000cfb00030330cc1ef80003330c01e0f900030330cc1ef80003330cc1e0f900030330781ef80003330301e0f900030331801ef80007330cc1
|
|
e000000200420010ed00000cfb000301e0780cf800031e0c00c0f9000301e0780cf800031e0780c0f9000301e0300cf800031e1fe0c0f9000301e1fe0cf800071e0780c0000002000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c
|
|
9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102004b0010fe000dc0781e000001fe1e0001e0000003fe002f0c1fe0c0781e000001fe7f9fe7f8780c0787f9fe7f8301e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8780cc000102004b1110000001c0cc330000018033000330000007fe00
|
|
050c0301e0cc33fe0026300c0300c0cc1e0cc0c0300c078330781e0303307833078330300c0cc1e0300c0301e0780c0cc0cc000102004b1110000003c18661800001806180061800000ffe002f0c0303318661800000300c0300c186331860c0300c0cc618cc33030618cc618cc618300c186330300c030330cc0c1860cc00
|
|
0102004a1110000006c18660000001806000061800001bfe00050c0303318060fe0025300c0300c180331800c0300c0cc600cc33030600cc600cc600300c180330300c030330cc0c18cb000102004a1110000004c186600000018060000618000013fe00050c0306198060fe0025300c0300c180619800c0300c1866018661
|
|
8306018660186600300c180618300c030619860c18cb000102004a0010fe000dc0cc6e0003f1b86e0fc618003f03fe00050c0306198060fe0025300c0300c180619800c0300c18660186618306018660186600300c180618300c030619860c18cb000102004b0010fe000dc07873000619cc73186338006183fe002f0c0306
|
|
199e679fe7f8300c0300c1806199e0c0300c18667986618306018667986678300c180618300c030619860c19e0cc000102004b0010fe000dc0cc6180001806618061d8006003fe002f0c0307f98661800000300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f8300c0307f9fe0c1860cc000102004b
|
|
0010fe000dc186618003f806618fe018003f03fe002f0c0306198661800000300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860cc000102004b0010fe000dc186618006180661986018000183fe002f0c0306198661800000300c0300c180619860c0300c1866198661830601
|
|
8661986618300c180618300c030619860c1860cc000102004b0010fe000dc186618006198661986618000183fe002f0c0306198661800000300c0300c186619860c0300c18661986618306198661986618300c186618300c030619860c1860cc000102004b0010fe000dc0cc33000618cc33186330386183fe002f0c030618
|
|
ce33800000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc618300c030619860c0ce0cc000102004b4410000007f8781e0003e8781e0fa1e0383f1fe000000c0306187a1e800000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c078618300c030619860c07a0cc000102000b00
|
|
10ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200630010fe000dc0041e000001fe0c03c7f8000003fe00470c1fe0c0780c030001fe7f9fe7f8780c0787f9fe7f83
|
|
01e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e03e400010200640d10000001c00c33000001801c0666fe000007fe00480c0301e0cc1e078000300c0300c0cc1e0cc0c0300c078330781e0303307833078330300c0cc1e0300c0301e0780c0cc0c0
|
|
3033078330300c030330cc1e078330cc330780c0cc330780e500010200640d10000003c01c61800001803c0666fe00000ffe00480c03033186330cc000300c0300c186331860c0300c0cc618cc33030618cc618cc618300c186330300c030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc0e500
|
|
010200640d10000006c03c60000001806c0606fe00001bfe00480c03033180330cc330300c0300c180331800c0300c0cc600cc33030600cc600cc600300c180330300c030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc0e500010200640d10000004c06c60000001804c0606fe000013fe0048
|
|
0c0306198061986330300c0300c180619800c0300c18660186618306018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601860e500010200640010fe000dc0cc6e0003f1b80c0606e0003f03fe00480c03061980619861e0300c0300c180619800c0300c1866018661
|
|
8306018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601860e500010200640010fe000dc18c73000619cc0c060730006183fe00480c0306199e619867f8300c0300c1806199e0c0300c18667986618306018667986678300c180618300c030619860c19e0c0306018
|
|
6678300c0306019e6198660180601860c180601860e500010200640010fe000dc18c61800018060c1f8018006003fe00480c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f8300c0307f9fe0c1860c030601fe618300c030601867f9fe60180601fe0c180601fe0e500010200
|
|
640010fe000dc1fe618003f8060c060018003f03fe00480c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601860e500010200640010fe000dc00c61800618060c060018000183fe00480c0306
|
|
198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601860e500010200640010fe000dc00c61800619860c060618000183fe00480c0306198661986000300c0300c186619860c0300c1866198661830619
|
|
8661986618300c186618300c030619860c1860c03061986618300c030619866198661986619860c186619860e500010200640010fe000dc00c33000618cc0c060330386183fe00480c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc618300c030619860c0ce0c0303318633830
|
|
0c030330ce61986330cc331860c0cc331860e500010200645d10000007f80c1e0003e8787f8601e0383f1fe000000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c078618300c030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e1860e5000102000b0010
|
|
ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102007d1110000001e03001000001fe1e0067f8000003fe00660c1fe0c0780c030001fe7f9fe7f8780c0787f9fe7f8301
|
|
e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e0300c0780c1fe1e0307f9fe000780c1fe1e1fe1e0787f8781e0780c0787f82007d0d1000000330700300000180330066fe000007fe00660c0301e0cc1e078000300c0300c0cc1e0cc0c0300c078330
|
|
781e0303307833078330300c0cc1e0300c0301e0780c0cc0c03033078330300c030330cc1e078330cc330780c0cc330781e0cc1e030330780c030000cc1e03033030330cc0c0cc330cc1e0cc0c02007d0d1000000618f00700000180618066fe00000ffe00660c03033186330cc000300c0300c186331860c0300c0cc618cc
|
|
33030618cc618cc618300c186330300c030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc3318633030618cc0c030001863303061830619860c18661986331860c02007d0d1000000619b00f00000180600066fe00001bfe00660c03033180330cc330300c0300c180331800c0300c0cc600cc33
|
|
030600cc600cc600300c180330300c030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc3318033030600cc0c030001803303060030601800c18060180331800c02007d0010fe000919301b00000180600066fe000013fe00660c0306198061986330300c0300c180619800c0300c186601866183
|
|
06018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d0010fe000d1830330003f1b86e0766e0003f03fe00660c03061980619861e0300c0300c180619800c0300c18660186618306
|
|
018660186600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d0010fe000d303063000619cc730ce730006183fe00660c0306199e619867f8300c0300c1806199e0c0300c1866798661830601
|
|
8667986678300c180618300c030619860c19e0c03060186678300c0306019e6198660180601860c180601866198061830601860c0307f9806183060030601800c18060180619800c02007d0010fe000de030630000180661986018006003fe00660c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe
|
|
619fe618300c1807f8300c0307f9fe0c1860c030601fe618300c030601867f9fe60180601fe0c180601fe7f9807f830601fe0c030001807f83060030601800c180601807f9800c02007d111000000180307f8003f80661986018003f03fe00660c0306198661986330300c0300c180619860c0300c18661986618306018661
|
|
986618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d11100000030030030006180661986018000183fe00660c0306198661986330300c0300c180619860c0300c1866198661830601866198
|
|
6618300c180618300c030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c030001806183060030601800c18060180619800c02007d11100000060030030006198661986618000183fe00660c0306198661986000300c0300c186619860c0300c186619866183061986619866
|
|
18300c186618300c030619860c1860c03061986618300c030619866198661986619860c186619866198661830619860c030001866183061830619860c18661986619860c02007d1110000006003003000618cc330ce330386183fe00660c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338
|
|
300c0cc618300c030619860c0ce0c03033186338300c030330ce61986330cc331860c0cc33186618cc61830331860c030000cc6183033030330cc0c0cc330cc618cc0c02007d7b10000007f9fe030003e8781e0761e0383f1fe000000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e830
|
|
0c078618300c030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e18661878618301e1860c03000078618301e0301e0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102
|
|
000b0010ed00000c9d000102000b0010ed00000c9d000102007d0010fe000dc0787f800001fe0c180010000003fe00660c1fe0c0780c030001fe7f9fe7f8780c0787f9fe7f8301e0300c1fe1e0301e0301e1fe7f8780c1fe001fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e0300c0780c1fe1e0
|
|
307f9fe7f8780c1fe1e1fe1e0787f8781e0780c0787f82007d1110000001c0cc01800001801c180030000007fe00660c0301e0cc1e078000300c0300c0cc1e0cc0c0300c078330781e0303307833078330300c0cc1e030000301e0780c0cc0c03033078330300c030330cc1e078330cc330780c0cc330781e0cc1e03033078
|
|
0c0300c0cc1e03033030330cc0c0cc330cc1e0cc0c02007d1110000003c18601800001803c18007000000ffe00660c03033186330cc000300c0300c186331860c0300c0cc618cc33030618cc618cc618300c1863303000030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc3318633030618cc0c
|
|
0300c1863303061830619860c18661986331860c02007d1110000006c18003000001806c1800f000001bfe00660c03033180330cc330300c0300c180331800c0300c0cc600cc33030600cc600cc600300c1803303000030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc3318033030600cc0c03
|
|
00c1803303060030601800c18060180331800c02007d1110000004c18003000001804c1801b0000013fe00660c0306198061986330300c0300c180619800c0300c18660186618306018660186600300c1806183000030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300
|
|
c1806183060030601800c18060180619800c02007d0010fe000dc1b8060003f1b80c1b8330003f03fe00660c03061980619861e0300c0300c180619800c0300c18660186618306018660186600300c1806183000030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300c1
|
|
806183060030601800c18060180619800c02007d1110001fe0c1cc06000619cc0c1cc630006183fe00660c0306199e619867f8300c0300c1806199e0c0300c18667986618306018667986678300c180618307f830619860c19e0c03060186678300c0306019e6198660180601860c180601866198061830601860c0300c180
|
|
6183060030601800c18060180619800c02007d0010fe000dc1860c000018060c186630006003fe00660c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f830000307f9fe0c1860c030601fe618300c030601867f9fe60180601fe0c180601fe7f9807f830601fe0c0300c1807f
|
|
83060030601800c180601807f9800c02007d0010fe000dc1860c0003f8060c1867f8003f03fe00660c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c1806183000030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c0300c1806183
|
|
060030601800c18060180619800c02007d0010fe000dc18618000618060c186030000183fe00660c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c1806183000030619860c1860c03060186618300c030601866198660180601860c180601866198061830601860c0300c180618306
|
|
0030601800c18060180619800c02007d0010fe000dc18618000619860c186030000183fe00660c0306198661986000300c0300c186619860c0300c18661986618306198661986618300c1866183000030619860c1860c03061986618300c030619866198661986619860c186619866198661830619860c0300c18661830618
|
|
30619860c18661986619860c02007d0010fe000dc0cc30000618cc0c186030386183fe00660c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc6183000030619860c0ce0c03033186338300c030330ce61986330cc331860c0cc33186618cc61830331860c0300c0cc6183033030
|
|
330cc0c0cc330cc618cc0c02007d7b10000007f878300003e8787f986030383f1fe000000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c0786183000030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e18661878618301e1860c0300c078618301e0301e
|
|
0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200790010fa007301e078618787f9861e1861e0000c1fe0c0780c030001fe7f9f
|
|
e7f8780c0787f9fe7f8301e0300c1fe1e0301e0301e1fe7f8780c1fe7f9fe0c0307f8787f9fe1e0301e1fe7f9fe1e0780c0301e0781e0307f8781e0300c0780c1fe1e0307f9fe7f8780c1fe1e1fe1e0787f8781e0780c0787f8200790010fa00730330cc718cc601c633186330000c0301e0cc1e078000300c0300c0cc1e0c
|
|
c0c0300c078330781e0303307833078330300c0cc1e0300c0301e0780c0cc0c03033078330300c030330cc1e078330cc330780c0cc330781e0cc1e030330780c0300c0cc1e03033030330cc0c0cc330cc1e0cc0c0200790010fa007306198671986601c661986618000c03033186330cc000300c0300c186331860c0300c0c
|
|
c618cc33030618cc618cc618300c186330300c030330cc0c1860c030618cc618300c03061986330cc61986618cc0c186618cc3318633030618cc0c0300c1863303061830619860c18661986331860c0200790010fa007306018679980601e660186600000c03033180330cc330300c0300c180331800c0300c0cc600cc3303
|
|
0600cc600cc600300c180330300c030330cc0c1800c030600cc600300c03060180330cc60180600cc0c180600cc3318033030600cc0c0300c1803303060030601800c18060180331800c0200790010fa007306018679980601e660186600000c0306198061986330300c0300c180619800c0300c1866018661830601866018
|
|
6600300c180618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa00730601866d8c0601b630186300000c03061980619861e0300c0300c180619800c0300c18660186618306018660186600300c18
|
|
0618300c030619860c1800c03060186600300c030601806198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa00730601866d8787e1b61e1861e0000c0306199e619867f8300c0300c1806199e0c0300c18667986618306018667986678300c180618300c03
|
|
0619860c19e0c03060186678300c0306019e6198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa00730601866780c6019e03186030000c0307f9867f9fe1e0300c0300c1807f9860c0300c1fe619fe7f830601fe619fe618300c1807f8300c0307f9fe0c18
|
|
60c030601fe618300c030601867f9fe60180601fe0c180601fe7f9807f830601fe0c0300c1807f83060030601800c180601807f9800c0200790010fa0073060186678066019e01986018000c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c0306018
|
|
6618300c030601866198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa0073060186638066018e01986018000c0306198661986330300c0300c180619860c0300c18661986618306018661986618300c180618300c030619860c1860c03060186618300c03
|
|
0601866198660180601860c180601866198061830601860c0300c1806183060030601800c18060180619800c0200790010fa0073061986639866018e61986618000c0306198661986000300c0300c186619860c0300c18661986618306198661986618300c186618300c030619860c1860c03061986618300c030619866198
|
|
661986619860c186619866198661830619860c0300c1866183061830619860c18661986619860c0200790010fa00730330cc618cc60186330cc330000c030618ce61986000300c0300c0cc618ce0c0300c18633986618303318633986338300c0cc618300c030619860c0ce0c03033186338300c030330ce61986330cc3318
|
|
60c0cc33186618cc61830331860c0300c0cc6183033030330cc0c0cc330cc618cc0c0200790010fa007301e078618787f9861e0781e0000c0306187a61986000300c0300c0786187a0c0300c1861e986618301e1861e9861e8300c078618300c030619860c07a0c0301e1861e8300c0301e07a619861e0781e1860c0781e18
|
|
661878618301e1860c0300c078618301e0301e0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200
|
|
0b0010ed00000c9d0001020007001f88ff01fe0007001f88ff01fe000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c
|
|
0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe
|
|
000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0
|
|
fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c03
|
|
0000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0
|
|
f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c
|
|
0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300
|
|
c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f0000102000b0010ed00000c9d00010200590010ed
|
|
00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c00000030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f000010200590010ed00000cfa0000c0fd0003c0300c03fe00290c0300c0000c0300c0300c0300c0000c0300c000
|
|
00030000300c030000300c0300c000000300c0300003fe000e0c0300c0000c0000c0300c0300c030fe0000c0f0000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed0000
|
|
0c9d000102000b0010ed00000c9d000102000b0010ed00000c9d0001020007001f88ff01fe0007001f88ff01fe00180010fc000006fe00010180fe000060fc00000c9d00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc90000
|
|
01fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de0001020024151000004010000600004001800200006000100400000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de0001020024151000018060000600
|
|
0180018001800060000c0300000cc9000001fa550040de00010200241510000300c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de00010200241510000c03000006000c0001800030006000018060000cc900
|
|
0002faaa00a0de000102002415100018060000060018000180001800600000c030000cc9000001fa550040de00010200241510000c03000006000c0001800030006000018060000cc9000002faaa00a0de00010200241510000601800006000600018000600060000300c0000cc9000001fa550040de000102002415100003
|
|
00c00006000300018000c0006000060180000cc9000002faaa00a0de00010200241510000180600006000180018001800060000c0300000cc9000001fa550040de000102002415100000c03000060000c001800300006000180600000cc9000002faaa00a0de00010200241510000040100006000040018002000060001004
|
|
00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180
|
|
fe000060fc00000cc9000001fa550040de00010200200010fc000006fe00010180fe000060fc00000cc9000002faaa00a0de00010200200010fc000006fe00010180fe000060fc00000cc9000001fa550040de00010200180010fc000006fe00010180fe000060fc00000c9d0001020007001f88ff01fe0007001f88ff01fe
|
|
000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200340010ed00000cf700020300c0f70001780cf700020780c0f70001040cf700021fe0c0f70001780cf700021fe0c0f70003780c020033
|
|
0010ed00000cf700020701e0f70001cc1ef700020cc1e0f700010c1ef700021801e0f70001cc1ef6000161e0f70003cc1e0200360010ed00000cf700020f0330f80002018633f70002186330f700011c33f70002180330f80002018633f600016330f800040186330200360010ed00000cf700021b0330f80002018633f700
|
|
02186330f700013c33f70002180330f80002018033f60001c330f800040186330200370010ed00000cf70002130618f70002066180f700016618f700026c6180f80002180618f8000301806180f70001c618f800040186618200370010ed00000cf70002030618f70002066180f70001c618f70002cc6180f800021b8618f8
|
|
000301b86180f80002018618f70003cc618200390010ed00000cf70002030618f700020c6180f80002038618f80003018c6180f800021cc618f8000301cc6180f80002018618f7000378618200370010ed00000cf70002030618f70002386180f70001c618f80003018c6180f700016618f8000301866180f80002030618f7
|
|
0003cc618200380010ed00000cf70002030618f70002606180f700016618f8000301fe6180f700016618f8000301866180f80002030618f800040186618200350010ed00000cf70002030330f70001c033f70002186330f700010c33f600016330f80002018633f70002060330f800040186330200370010ed00000cf70002
|
|
030330f80002018033f70002186330f700010c33f70002186330f80002018633f70002060330f800040186330200350010ed00000cf700020301e0f8000201801ef700020cc1e0f700010c1ef700020cc1e0f70001cc1ef700020c01e0f70003cc1e0200350010ed00000cf700021fe0c0f8000201fe0cf700020780c0f700
|
|
010c0cf700020780c0f70001780cf700020c00c0f70003780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d00010200100010ed00000cad0001ffc0f300010200100010ed00000cad0001ffc0f3000102007d0010fe0002
|
|
01fe01fe0007041e0001e0000003fe00660c1fe0c0780c0307f9fe7f9fe1e0301e1fe7f9fe0c0780c0307f8780c0780c0787f9fe1e0307f9fe7f8300c1fe1e1fe7f8780c0787f9fe7f8781e0300c0781e0780c1fe1e0780c0301e0307f8780c1fe7f9fe00078f3dfe1e1fe1e0787f8781e0780c0787f82007d0010fe000201
|
|
8003fe00070c33000330000007fe00660c0301e0cc1e0780c0300c03033078330300c0301e0cc1e0780c0cc1e0cc1e0cc0c030330780c0300c0781e030330300c0cc1e0cc0c0300c0cc330781e0cc330cc1e030330cc1e078330780c0cc1e0300c030000cce1c3033030330cc0c0cc330cc1e0cc0c02007d0010fe00020180
|
|
07fe00071c6180061800000ffe00660c03033186330cc0c0300c030618cc618300c03033186330cc0c18633186331860c030618cc0c0300c0cc33030618300c186331860c0300c186618cc33186619863303061986330cc618cc0c186330300c03000186ccc3061830619860c18661986331860c02007d0010fe000201800f
|
|
fe00073c6000061800001bfe00660c03033180330cc0c0300c030600cc600300c03033180330cc0c18033180331800c030600cc0c0300c0cc33030600300c180331800c0300c180600cc33180601803303060180330cc600cc0c180330300c03000180ccc3060030601800c18060180331800c02007d0010fe000201801bfe
|
|
00076c60000018000013fe00660c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c18661830600300c180619800c0300c180601866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007d0010fe000d01b8330003
|
|
f0cc6e078030003f03fe00660c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c18661830600300c180619800c0300c180601866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007d0010fe000d01cc63000619
|
|
8c730cc0e0006183fe00660c0306199e619860c0300c03060186678300c0306199e619860c1806199e6199e0c030601860c0300c18661830678300c1806199e0c0300c180679866198060180618306018061986601860c180618300c0307f9809e43060030601800c18060180619800c02007c0010fd000c06630000198c61
|
|
986030006003fe00660c0307f9867f9fe0c0300c030601fe618300c0307f9867f9fe0c1807f9867f9860c030601fe0c0300c1fe7f830618300c1807f9860c0300c180619fe7f980601807f830601807f9fe601fe0c1807f8300c030001808043060030601800c180601807f9800c02007c0010fd000c067f8003f9fe619fe0
|
|
18003f03fe00660c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c180619866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007c0010fd000c06030006180c6198061800
|
|
0183fe00660c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c180619866198060180618306018061986601860c180618300c030001809e43060030601800c18060180619800c02007d0010fe000d0186030006180c619806180001
|
|
83fe00660c03061986619860c0300c03061986618300c03061986619860c18661986619860c030619860c0300c18661830618300c186619860c0300c186619866198661986618306198661986619860c186618300c030001869e43061830619860c18661986619860c02007c0010fd000ccc030006180c330c6330386183fe
|
|
00660c030618ce619860c0300c03033186338300c030618ce619860c0cc618ce618ce0c030331860c0300c18661830338300c0cc618ce0c0300c0cc33986618cc330cc61830330cc61986331860c0cc618300c030000cc9e43033030330cc0c0cc330cc618cc0c02007c0010fd007678030003e80c1e07c1e0383f1fe00000
|
|
0c0306187a619860c0300c0301e1861e8300c0306187a619860c0786187a6187a0c0301e1860c0300c186618301e8300c0786187a0c0300c0781e986618781e078618301e078619861e1860c078618300c030000789e4301e0301e0780c0781e078618780c0200100010ed00000cad0001ffc0f300010200100010ed00000c
|
|
ad0001ffc0f300010200100010ed00000cad0001ffc0f300010200100010ed00000cad0001ffc0f300010200100010ed00000cad0001ffc0f3000102000b0010ed00000c9d000102000b0010ed00000c9d00010200790010fa007301e078618787f9861e1861e0000c1fe0c0780c0307f9fe7f9fe1e0301e1fe7f9fe0c0780
|
|
c0307f8780c0780c0787f9fe1e0307f9fe7f8300c1fe1e1fe7f8780c0787f9fe7f8781e0300c0781e0780c1fe1e0780c0301e0307f8780c1fe7f9fe000780c1fe1e1fe1e0787f8781e0780c0787f8200790010fa00730330cc718cc601c633186330000c0301e0cc1e0780c0300c03033078330300c0301e0cc1e0780c0cc1
|
|
e0cc1e0cc0c030330780c0300c0781e030330300c0cc1e0cc0c0300c0cc330781e0cc330cc1e030330cc1e078330780c0cc1e0300c030000cc1e03033030330cc0c0cc330cc1e0cc0c0200790010fa007306198671986601c661986618000c03033186330cc0c0300c030618cc618300c03033186330cc0c18633186331860
|
|
c030618cc0c0300c0cc33030618300c186331860c0300c186618cc33186619863303061986330cc618cc0c186330300c030001863303061830619860c18661986331860c0200790010fa007306018679980601e660186600000c03033180330cc0c0300c030600cc600300c03033180330cc0c18033180331800c030600cc0
|
|
c0300c0cc33030600300c180331800c0300c180600cc33180601803303060180330cc600cc0c180330300c030001803303060030601800c18060180331800c0200790010fa007306018679980601e660186600000c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c1866
|
|
1830600300c180619800c0300c180601866198060180618306018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa00730601866d8c0601b630186300000c03061980619860c0300c03060186600300c03061980619860c18061980619800c030601860c0300c18661830600300
|
|
c180619800c0300c180601866198060180618306018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa00730601866d8787e1b61e1861e0000c0306199e619860c0300c03060186678300c0306199e619860c1806199e6199e0c030601860c0300c18661830678300c1806199e0
|
|
c0300c180679866198060180618306018061986601860c180618300c0307f9806183060030601800c18060180619800c0200790010fa00730601866780c6019e03186030000c0307f9867f9fe0c0300c030601fe618300c0307f9867f9fe0c1807f9867f9860c030601fe0c0300c1fe7f830618300c1807f9860c0300c1806
|
|
19fe7f980601807f830601807f9fe601fe0c1807f8300c030001807f83060030601800c180601807f9800c0200790010fa0073060186678066019e01986018000c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c18061986619806
|
|
0180618306018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa0073060186638066018e01986018000c03061986619860c0300c03060186618300c03061986619860c18061986619860c030601860c0300c18661830618300c180619860c0300c180619866198060180618306
|
|
018061986601860c180618300c030001806183060030601800c18060180619800c0200790010fa0073061986639866018e61986618000c03061986619860c0300c03061986618300c03061986619860c18661986619860c030619860c0300c18661830618300c186619860c0300c1866198661986619866183061986619866
|
|
19860c186618300c030001866183061830619860c18661986619860c0200790010fa00730330cc618cc60186330cc330000c030618ce619860c0300c03033186338300c030618ce619860c0cc618ce618ce0c030331860c0300c18661830338300c0cc618ce0c0300c0cc33986618cc330cc61830330cc61986331860c0cc6
|
|
18300c030000cc6183033030330cc0c0cc330cc618cc0c0200790010fa007301e078618787f9861e0781e0000c0306187a619860c0300c0301e1861e8300c0306187a619860c0786187a6187a0c0301e1860c0300c186618301e8300c0786187a0c0300c0781e986618781e078618301e078619861e1860c078618300c0300
|
|
0078618301e0301e0780c0781e078618780c02000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102000b0010ed00000c9d000102
|
|
0007001f88ff01fe00028500028500a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.4\tab A typical display from the join editor in XBAP.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.8\tab Selecting primers and templates\par
|
|
\par
|
|
\pard\plain \qj \f4\fs16 {\plain \f20 1. Select "Edit contig". The primer and template selection function is available from the popup menu of the contig editor.\par
|
|
}\pard \qj {\plain \f20 \par
|
|
}\pard \qj {\plain \f20 2. Open the oligo selection window, by selecting "Select Oligo" from the contig editor popup menu.\par
|
|
}\pard \qj {\plain \f20 \par
|
|
}\pard \qj {\plain \f20 3. Position the cursor to where you want the oligo to be chosen. While the oligo selection window is visible, you will still have complete control over positioning and editing within the contig editor.\par
|
|
}\pard \qj {\plain \f20 \par
|
|
}\pard \qj {\plain \f20 4. Indicate the strand for which you require an oligo. This is done by toggling the direction arrow ("----->" or "<------"), if necessary.\par
|
|
}\pard \qj {\plain \f20 \par
|
|
}\pard \qj {\plain \f20
|
|
5. Press the "Find Oligos" button to find all suitable oligos (See "Oligo selection" in Note 17.) Information for the closest oligo to the cursor position is given in the output text window. In the contig editor the position of the oligo is marked by a
|
|
temporary tag on the consensus. The window is recentered if the oligo is off the screen. Selecting "Display Selection Information" will print a short report on the numbers of oligos considered and rejected during oligo selection. \par
|
|
}\pard \qj {\plain \f20 \par
|
|
}\pard \qj {\plain \f20 6. If this oligo is not suitable (it may have been previously chosen, and found to be unsuitable by experimentation, say), the next closest oligo can be viewed by pressing "Select Next". \par
|
|
}\pard \qj {\plain \f20 \par
|
|
}\pard \qj {\plain \f20
|
|
7. Suitable templates are automatically identified for the currently displayed oligo (See "Template selection" in Note 18.) By default, the template is that closest to the oligo site. If the choice is not suitable (it may be known to be a poor quality
|
|
template, say) another can be chosen from the "Choose Template for this Oligo" menu. Templates that do not appear on the menu can be specified by selecting "other". However, the template must be on the correct strand and be upstream of the oligo. \par
|
|
}\pard \qj {\plain \f20 \par
|
|
}\pard \qj {\plain \f20
|
|
8. A tag can be created for the current oligo by pressing the button "Create a tag for this oligo". The annotation for this tag holds the name of the template and the oligo primer sequence. There are fields to allow the user to specify their own primer
|
|
name ("serial#") and comments ("flags") for this tag. An example of oligo tag annotation\: \par
|
|
}\pard \qj {\plain \f20 \par
|
|
serial#= \par
|
|
template=a16a9.s1 \par
|
|
sequence=CGTTATGACCTATATTTTGTATG \par
|
|
flags=\par
|
|
\par
|
|
}\pard \qj {\plain \f20 9. The oligo selection window is closed when "Create a tag for this oligo" or "Quit" is selected. \par
|
|
}\pard \qj {\plain \f20 \par
|
|
}\pard\plain \s6\qj\sa60\tx560\tx860 \b\f20 \par
|
|
\pard \s6\sa60\sl280\tx560\tx860 2.9\tab Examining the "quality" of a contig\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This function reports on the proportion of the consensus that is "well determined" and will display a sequence of symbols that indicate the quality
|
|
of the consensus at each position or produce a graphical display. Each strand of the contig is analysed separately using the consensus algorithm, and a position is declared "well determined" if it is assigned one of the symbols a,c,g,t. The current consen
|
|
sus calculation cutoff score is used.\par
|
|
\pard \s4\qj\sa120\sl280 A summary showing the percentage of the consensus that falls into each category of quality is shown. The analysis divides the data into five categories, assigning each a code as shown in figure 4.5. Code 0 means well
|
|
determined on both strands and they agree, 1 means well determined on the plus strand only, 2 means well determined on the minus strand only, 3 means not well determined on either strand and 4 means well determined on both strands but they disagree. If
|
|
the user chooses to have the data displayed graphically the following scheme is used. A rectangular box is drawn so that the x coordinate represents the length of the contig. The box is notionally divided vertically into 5 possible levels which are given t
|
|
he y values\:
|
|
-2,-1,0,1,2. The quality codes assigned to each base position are plotted as rectangles. Each rectangle represents a region in which the quality codes are identical, so a single base having a different code from its immediate neighbours will a
|
|
ppear as a very narrow rectangle. Obviously a single line at the midheight shows a perfect sequence. In figure 4.6 we show the result for the section of contig shown in figure 4.8.\par
|
|
\pard \s4\qj\sa120\sl280 \par
|
|
\par
|
|
\par
|
|
\par
|
|
\par
|
|
\pard \s4\qj\li1580\ri1760\sb160\sl280\box\brsp100\brdrth \tqc\tx2000\tqc\tx3960\tqc\tx6360 \tab {\b Strands\tab Quality\tab Y cordinates\par
|
|
}\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx2000\tqc\tx3960\tqc\tx6200 {\b \tab OK\tab code\par
|
|
}\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx1780\tqc\tx2120\tx2380\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab +\tab -\tab and the same\tab 0\tab 0\tab to\tab 0\par
|
|
\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx1780\tqc\tx2120\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab +\tab \tab 1\tab 0\tab to\tab 1\par
|
|
\tab -\tab \tab 2\tab -1\tab to\tab 0\par
|
|
\pard \s4\qj\li1580\ri1760\sa120\sl280\box\brsp100\brdrth \tqc\tx2120\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab neither\tab 3\tab -1\tab to\tab 1\par
|
|
\pard \s4\qj\li1580\ri1760\sa60\sl280\keepn\box\brsp100\brdrth \tqc\tx1780\tqc\tx2120\tx2400\tqc\tx3960\tqr\tx6000\tx6220\tqr\tx6740 \tab +\tab -\tab but different \tab 4\tab -2\tab to\tab 2\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.5\tab The codes and coordinates used by the "Quality plot". \par
|
|
\par
|
|
\pard\plain \li1500\ri1660\sb400\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 94.67 % OK on both strands and they agree(0)\par
|
|
\pard \li1500\ri1660\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth 0.67 % OK on plus strand only(1)\par
|
|
2.00 % OK on minus strand only(2)\par
|
|
2.67 % Bad on both strands(3)\par
|
|
0.00 % OK on both strands but they disagree(4)\par
|
|
\pard \li1500\ri1660\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth {\fs22 \par
|
|
}\pard \li1500\ri1660\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth 3310 3320 3330 3340 3350\par
|
|
0000000000 0000000000 0000000000 0000000000 0000000000\par
|
|
\par
|
|
3360 3370 3380 3390 3400\par
|
|
0020000000 0000000032 0000032000 0000000000 0300000030\par
|
|
\par
|
|
3410 3420 3430 3440 3450\par
|
|
\pard \li1500\ri1660\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth 0000000000 0010000000 0000000000 0000000000 0000000000\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4 6\tab Listed output from "Examine Quality" showing the results for the section of contig displayed in figure 4.8.\par
|
|
\pard \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.10\tab Using graphical displays to examine contigs\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The programs contain three graphical displays to aid the examination of contigs. The first simply gives an overview of all the contigs in the database and provides, with the use of a
|
|
crosshair, a mechanism for the other two displays to select contigs. One of these displays produces a schematic representation of each of the readings in a contig. The lines in the display show the relative positions of each reading and also their sense. T
|
|
he plot is divided vertically into two sections by a line that is identified by an asterisk drawn at each end. All lines that lie above this line represent readings that are in their original sense, all lines below show readings that are in the complementa
|
|
ry sense. The final graphical display is of the "quality" of the data as described above.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
When these graphical displays are visible users may employ a crosshair, moved by mouse or keyboard commands, to examine the data in more detail. The crosshair is positioned and when keyboard characters S, Q, N or Z are typed the program will show the local
|
|
aligned sequences in a text window, produce the quality plot, give the names of the nearest readings or zoom into the display. \par
|
|
\pard \s4\qj\sa120\sl280 A typical display of all three plots
|
|
is shown in figure 4.7. The top rectangle shows a separate line for each of the projects contigs. The righthand one is bisected by a vertical line indicating that it has been selected by the user. The next rectangle below is divided by a horizontal line ma
|
|
rked at each end by an asterisk. Each of the other horizontal lines in the box represents one of the selected contigs gel readings. Those above the dividing line are in their original orientation, those below have been complemented. The box below is also d
|
|
ivided by a horizontal line and shows the "quality" for each base in the contig. Rectangluar areas marked above the central line show sections that only have a good consensus on the minus strand, and rectangles below show good sections from the other stran
|
|
d. Places where the vertical lines reach the top and bottom of the box show disagreements between the two strands. Places with only the midline have a good consensus on both strands.\par
|
|
\pard\plain \li80\sl220\keepn\tx720 \f4\fs16 {{\pict\macpict\picw441\pich231
|
|
237effffffff00e601b81101a0008201000affffffff00e601b8090000000000000000310000000000e501b79800780000000001f103bb0000000001f103bb0000000000e501b70001028900028900028900028900028900090100158e550054ff000901000f8eff00f0ff000d0100089b000008f5000010ff000d0100089b
|
|
000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b00
|
|
0008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b0000
|
|
08f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff005d14000801ffc0003ff803ff801ffc00007fff003fff80fb00003ffdff0ef000007fffe0
|
|
003ffc03ffff00001ffdff00c0fd00003ffdff03f000007ff8ff04f003fffffefe00003ffcff00f0fc00000ffdff00fcfd000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff
|
|
000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff00
|
|
0d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d
|
|
0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff005813000fff007fffe00ffe00fff007ffffc001ffe000faff00e0fd000e1fffffc0003fffe007fe0001fffff0fd00007ffdff00e0fd00031fffffc0f800041ffe000003feff00e0fc00001ffcff00f8fd000007f0ff00
|
|
f0ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010
|
|
ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff
|
|
000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000d0100089b000008f5000010ff000901000f8eff00f8ff0009010008
|
|
8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009
|
|
0100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0018010008ba00007ffaff038000003ffaff00
|
|
f8e9000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00
|
|
090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001d010008c40000
|
|
1ffbff00e0f7000007faff00fcef000007f9ff00f0ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009010008
|
|
8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001d
|
|
010008e2000307fffffcd600000ffaff00feee000007faff01fe10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010
|
|
ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e
|
|
000010ff00090100088e000010ff001f010008fc000001fbff00c0c000007ffaff00c0ee000007fbff02fe0010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00
|
|
090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e0000
|
|
10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001a02000801f9ff00f8c000faff02c00000faff00f0eb000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e0000
|
|
10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100188e000030ff00090100188e000030ff00090100188e000030ff00090100188e00023000000b020718
|
|
e09000030e31c0000b020718e09000030e31c0000b020789e09000030f13c0000a0100ff8f000101feff00090100188e000030ff000901003f8eff02f800000b0201db8090000303b700000b020799e09000030f33c0000b020718e09000030e31c0000b020618609000030c30c000090100188e000030ff000901003c8e00
|
|
0078ff00090100188e000030ff00090100188e000030ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100
|
|
088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001d01000ffaff00c0d5000007f0ff00e0f3000001f0ff00e0f6000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00
|
|
090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e0000
|
|
10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001c02000803fbff00f8c400007ff0ff0200000ff9ff0080f1000010ff00090100088e000010ff00090100088e000010ff00090100088e
|
|
000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff000901
|
|
00088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001f010008f800f1ff00c0d500007ffbff00c0ed00003ffbff00c0f8000010ff00090100088e000010ff00090100088e0000
|
|
10ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009010008
|
|
8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff001f010008e900007ff3ffe2000007fbff00fcea00007ffaff00f0fc000010ff
|
|
00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e00
|
|
0010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0019010008dc000001f2ff00feef
|
|
000003faff0080df000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0009010008
|
|
8e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff0019
|
|
010008db00007ffbff00f0de000003faff00c0e8000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff000901
|
|
00088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff00090100088e000010ff
|
|
000901000f8eff00f0ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001
|
|
f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce
|
|
000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f800
|
|
0008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff00190100
|
|
08f8000008ce000001f3000010fc000004e1000010ff004203000fffb0fa00034800027fefff03fc02065fe7ff03f3800001fe00133fffffc201880000105177000001042408006002fe0001425ff4ff00fcfb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe001320
|
|
00244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410
|
|
ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe0013200024420188
|
|
0000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff00520300
|
|
0873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177
|
|
000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa00
|
|
0348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe001320002442018800001051770000010424
|
|
08006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240
|
|
ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe
|
|
00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402
|
|
065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe
|
|
000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc0002
|
|
6c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe000301000040fe0003e0000004fb0002400410ff005203000873b0fa000348000240ef00030402065cfc00026c00c0ef000313800001fe00132000244201880000105177000001042408006002fe00014250fe0003010000
|
|
40fe0003e0000004fb0002400410ff000901000f8eff00f0ff005703000873b8fc00052800c8000240ef00030402067cfc00026c00c0ef000313800001fe00136000244201880800105177000001043408086002fe000163f0fe000301000040fe0006e0000004000010fe0002400410ff004c03000852b8fc00022800c8e9
|
|
000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe00
|
|
00e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc0012400024000188
|
|
08001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc0002
|
|
2800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe00030100
|
|
0040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc00124000
|
|
2400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852
|
|
b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe
|
|
000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc
|
|
001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c
|
|
03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd00
|
|
0163f0fe000301000040fe0000e0fc000010fe0002400410ff004c03000852b8fc00022800c8e9000020fc00026c0040ef00010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc000010fe0002400410ff004a03000852b8fc00022fffc8e9000020fc00026c0040ef00
|
|
010180fc001240002400018808001041770000010434000820fd000163f0fe000301000040fe0000e0fc00001ffcff00f0ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010
|
|
ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1
|
|
000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc00
|
|
0004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff0019010008f8000008ce000001f30000
|
|
10fc000004e1000010ff0019010008f8000008ce000001f3000010fc000004e1000010ff000901000f8eff00f0ff00028900028900028900028900028900028900a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.7\tab A typical graphical display from XBAP or SAP.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \par
|
|
2.11\tab Disassembling contigs\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Sometimes it is necessary to drastically alter contigs. We may need to break a contig in two, remove a single reading, remove a whole set of consecutive readings from a contig, or remove a set of readings from the database independent of which contigs they
|
|
are in. \par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.11.1\tab Removing a single reading\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This function is found in the "Alter relationships" menu. The user types in the number of the reading to be removed. If the reading is required to hold the contig together - i.e. is the only one cove
|
|
ring a particular region - the program will create an extra contig consisting of the data to the right of the removed reading. The original contig will be shortened accordingly.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.11.2\tab Removing a set of readings\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This function is called "Disassemble readings" and can remove any group of readings from a database. It works in two modes\:
|
|
1. A set of adjacent readings in a contig can be removed by the user naming the two end ones (the left one first); 2. A set of readings from any number of contigs can be remove
|
|
d by the user giving the name of a file that contains their names. In both modes the program cleans up the database by moving data to fill up any holes made in the files.\par
|
|
|
|
For both modes of operation the program request a file of file names. If the user creates their own file (i.e. mode 2) each reading name must be on a separate line of the file. For mode 1 the user names the leftmost then the rightmost reading for removal.
|
|
They MUST be in left to right order. They and all intervening readings will be removed. For both modes, if necessary, new contigs will be created. \par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.11.3\tab Breaking a contig\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This function is found in the "Alter relationships" menu. It can be used to break a contig at the beginning of a particular reading so that the identified reading becomes the left end of a new contig. The user types in the number of the reading that will b
|
|
ecome the left end.\par
|
|
\pard \s4\qj\sa120\sl280 \par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.12\tab Shuffling pads\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 One weakness of the assembly routine is that padding characters introduced to line up the readings are not always aligned with the pads in other sequences\:
|
|
a single problem such as a compression can give rise to pads apparently randomly arranged in the different readings covering the region. This function attempts to shuffle the pads around so that they align with one another, h
|
|
ence simplifying editing. No information is lost in the process\: only the positions of padding characters are changed. The function is best used prior to editing.\par
|
|
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.13\tab Displaying a contig\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The "Display a contig" option shows the aligned readings for any par
|
|
t of a contig. Users select "Display a contig", then select the contig. The number, name and strandedness of each reading is shown and the consensus is written below. A typical example, showing part of a contig from positions 3301 to 3450, is seen in figu
|
|
re 4.8. Overlapping this region are readings 3, 40, 8, 37, 35 and 2, with archive names L3.SEQ, A21A7.S1 and so on. Readings 3, 8, 35 and 2 are in reverse orientation as indicated by the minus signs. There are a few padding characters in the working versio
|
|
ns, but the consensus (shown below each page width) has a definite assignment for every position except 3376. \par
|
|
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.14\tab Highlighting differences between readings and the consensus\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
During the latter stages of a project this option is used to highlight disagreements between individual gel readings and their consensus sequences. Typical output is seen in the figure 4.9 which shows the result for the section of contig shown in figure 4.
|
|
8. Characters that agree with the consensus are shown as + symbols for the plus
|
|
strand and - for the minus strand. Characters that disagree with the consensus are left unchanged and so stand out clearly. Note that a similar display is now more conveniently available within the contig editor.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Set the consensus cutoff score.\par
|
|
2.\tab Redirect output to disk.\par
|
|
3.\tab Display the contig.\par
|
|
4.\tab Close the redirection file.\par
|
|
5.\tab Select "Highlight disagreements".\par
|
|
6.\tab Define the name of the redirection file.\par
|
|
7.\tab Define an output file name.\par
|
|
8.\tab Select a symbol for good plus strand data.\par
|
|
9.\tab Select a symbol for good minus strand data.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \page \par
|
|
\pard\plain \li760\ri760\sl220\box\brsp100\brdrth \tqr\tx8240 \f4\fs16 10.\tab Print the file.{\plain \f20 \par
|
|
}\pard \li760\ri760\sl220\box\brsp100\brdrth \tqr\tx8240 \tab 3310 3320 3330 3340 3350\par
|
|
\pard \li760\ri760\sl220\box\brsp100\brdrth -3\tab L3.SEQ \tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
40\tab A21A7.S1\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
-8\tab A16A2.S1\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
37\tab A21A2.S1\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
\tab CONSENSUS\tab atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
\par
|
|
\tab 3360 3370 3380 3390 3400\par
|
|
-3\tab L3.SEQ\tab gatctgaccaagcgacag*tttaaa*gtgctgcttgccatt*ctgcgt*a\par
|
|
40\tab A21A7.S1\tab gatctgaccaagcgacag*gttaaagttgctgctt\par
|
|
-8\tab A16A2.S1\tab gatctgaccaagcgacag*tttaaa*gtgctgcttgccatt*ctgcgt*a\par
|
|
37\tab A21A2.S1\tab ga-ctgaccaagcgacag*tttaaa*gtgctgcttgccatt*ctgcgt*a\par
|
|
35\tab A16D12.S1\tab gttttaaa-gtgctgcttgccatttctgcgtaa\par
|
|
-2\tab L2.SEQ\tab t*ctgcgt*a\par
|
|
\tab CONSENSUS\tab gatctgaccaagcgacag*tttaaa-gtgctgcttgccatt*ctgcgt*a\par
|
|
\par
|
|
\tab 3410 3420 3430 3440 3450\par
|
|
-3\tab L3.SEQ\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
|
|
-8\tab A16A2.S1\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
|
|
37\tab A21A2.S1\tab aaacctatgggtgggaataaaccaatggacagaatcaccgattctcaact\par
|
|
35\tab A16D12.S1\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
|
|
-2\tab L2.SEQ\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
|
|
\pard \li760\ri760\sl220\box\brsp100\brdrth \tab CONSENSUS\tab aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.8\tab Typical output from "Display contig".\par
|
|
\pard\plain \li840\ri940\sb320\sl220\box\brsp100\brdrth \f4\fs16 3310 3320 3330 3340 3350\par
|
|
\pard \li840\ri940\sl220\box\brsp100\brdrth -3 L3.SEQ --------------------------------------------------\par
|
|
40 A21A7.S1 ++++++++++++++++++++++++++++++++++++++++++++++++++\par
|
|
-8 A16A2.S1 --------------------------------------------------\par
|
|
37 A21A2.S1 ++++++++++++++++++++++++++++++++++++++++++++++++++\par
|
|
atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
\par
|
|
3360 3370 3380 3390 3400\par
|
|
-3 L3.SEQ -------------------------*------------------------\par
|
|
40 A21A7.S1 +++++++++++++++++++g+++++gt++++++++\par
|
|
-8 A16A2.S1 -------------------------*------------------------\par
|
|
37 A21A2.S1 ++-++++++++++++++++++++++*++++++++++++++++++++++++\par
|
|
-35 A16D12.S1 -t----------------------t------a-\par
|
|
-2 L2.SEQ ----------\par
|
|
gatctgaccaagcgacag*tttaaa-gtgctgcttgccatt*ctgcgt*a\par
|
|
\par
|
|
3410 3420 3430 3440 3450\par
|
|
-3 L3.SEQ --------------------------------------------------\par
|
|
-8 A16A2.S1 --------------------------------------------------\par
|
|
37 A21A2.S1 ++++++++++++g+++++++++++++++++++++++++++++++++++++\par
|
|
-35 A16D12.S1 --------------------------------------------------\par
|
|
-2 L2.SEQ --------------------------------------------------\par
|
|
\pard \li840\ri940\sl220\keepn\box\brsp100\brdrth aaacctatgggt*ggaataaaccaatggacagaatcaccgattctcaact\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 \par
|
|
\pard \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 Figure 4.9\tab Typical output from "Highlight disagreements", showing the results for the section of contig displayed in figure 4.8.\par
|
|
\pard \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.15\tab Screen editing contigs in SAP\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 When using SAP the best way for users to edit a whole contig interactively is to use their prefered external editor on the standard display of a contig. When the screen edit function is selected SAP writ
|
|
es a text file containing a display of the contig and passes it to an external editor - say EDT on the VAX or emacs on a UNIX system. The user modifies the file using the editor and when the editor is exited SAP moves the changed contig back into the proje
|
|
ct database.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Screen edit".\par
|
|
2.\tab Select the contig to edit.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define a temporary file for use by the editor. After a slight pause the editor will start and the first page of the contig will appear on the screen.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Edit the contig using the editors standard commands.\par
|
|
5.\tab Exit from the editor.\par
|
|
6.\tab Accept "Put contig back into the database".\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.16\tab Automatic editing of contigs in SAP\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This function automatically changes characters in gel readings to make them agree with the consensus sequence. At first sight this may seem like an unethical procedure but as is explained in the notes it is quite legitimate and saves a great deal of time.
|
|
In figure 4.10 we show the effect on using autoedit on the section of contig displayed in figure 4.8. All changed characte
|
|
rs (for example position 3369, reading A21A7.S1) are denoted by uppercase letters. Note that apart from position 3375 which has an unresolved consensus all other changes have been made. These edits were made using a combined consensus for both strands, but
|
|
the standard version of the program treats each strand separately and will only make a change if the consensus for the two strands agree.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Redirect output to disk.\par
|
|
2.\tab Select "Display contig".\par
|
|
3.\tab Identify the contig to edit/display.\par
|
|
4.\tab Close the redirection file.\par
|
|
5.\tab Print the file containing the displayed contig.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Check the contig and the original films and annotate the printout to indicate the required edits.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Set the cutoff for the consensus calculation.\par
|
|
8.\tab Select "Auto edit".\par
|
|
9.\tab Identify the contig and the section to edit. \par
|
|
10.\tab The program will display a summary of changes made.\par
|
|
11.\tab Display the contig and compare it with the annotated printout.\par
|
|
12.\tab Use another editing method to finish the editing.\par
|
|
\pard\plain \li820\ri960\sl220\pagebb\box\brsp100\brdrth \f4\fs16 3310 3320 3330 3340 3350\par
|
|
\pard \li820\ri960\sl220\box\brsp100\brdrth -3 L3.SEQ atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
40 A21A7.S1 atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
-8 A16A2.S1 atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
37 A21A2.S1 atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
CONSENSUS atggttacgccagactatcaaatatgctgcttgaggcttattcgggcgca\par
|
|
\par
|
|
3360 3370 3380 3390 3400\par
|
|
-3 L3.SEQ gatctgaccaagcgacagtttaaa*gtgctgcttgccattctgcgtaaaa\par
|
|
40 A21A7.S1 gatctgaccaagcgacagTttaaagGtgctg\par
|
|
-8 A16A2.S1 gatctgaccaagcgacagtttaaa*gtgctgcttgccattctgcgtaaaa\par
|
|
37 A21A2.S1 gaTctgaccaagcgacagtttaaa*gtgctgcttgccattctgcgtaaaa\par
|
|
-35 A16D12.S1 gtttaaa-gtgctgcttgccattctgcgtaaaa\par
|
|
-2 L2.SEQ tctgcgtaaaa\par
|
|
CONSENSUS gatctgaccaagcgacagtttaaa-gtgctgcttgccattctgcgtaaaa\par
|
|
\par
|
|
3410 3420 3430 3440 3450\par
|
|
-3 L3.SEQ cctatgggtggaataaaccaatggacagaatcaccgattctcaacttag\par
|
|
-8 A16A2.S1 cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par
|
|
37 A21A2.S1 cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par
|
|
-35 A16D12.S1 cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par
|
|
-2 L2.SEQ cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc\par
|
|
\pard \li820\ri960\sl220\keepn\box\brsp100\brdrth CONSENSUS cctatgggtggaataaaccaatggacagaatcaccgattctcaacttagc{\fs22 \par
|
|
}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 4.10\tab The result of applying the "Auto editor" to the section of contig displayed in figure 4.5.\par
|
|
\pard\plain \s6\sb400\sa60\sl280\tx560\tx860 \b\f20 2.17\tab Using the original editor in SAP\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This simple editor can insert, delete
|
|
and change gel reading sequences by performing one selected operation at a time. It is used during the interactive entry of new readings and interactive joining of contigs. The commands request the position at which the edit is required and the number of
|
|
characters to insert, delete or change.\par
|
|
\pard\plain \s5\sb400\sa160\sl320\tx560 \b\f20\fs28 3. NOTES\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
|
|
As each reading is entered into a project database it is given a unique number. The first is numbered 1, the second 2 and so on. Their original file names (known as "archives" because they are kept outsid
|
|
e the database and never edited) are also copied into the database. During assembly contigs are constantly being changed and reordered so the program identifies them by the numbers or names of the readings they contain. Whenever the program asks users to i
|
|
dentify a contig or reading they can type its number or its archive name. If they type its archive name they must precede the name by a slash "/" symbol to denote that it is a name rather than a number. For example if the archive name is fred.gel with numb
|
|
er 99, users should type /fred.gel or 99 when asked to identify the contig. Generally, when it asks for the reading to be identified, the program will offer the user a default name, and if the user types only return, that contig will be accessed. When a da
|
|
tabase is opened the default contig will be the longest one, but if another is accessed, it will subsequently become the current default. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab An XBAP database is made from five separate files\: the "archive names" file *.ARN, the "relationships" file *.RLN,
|
|
the "sequences" file *.SQN, the "tag" file *.TGN, and the "comments" file *.CCN. If the database is called FRED then version 0 of database FRED comprises files FRED.AR0, FRED.RL0, FRED.SQ0, FRED.TG0 and FRED.CC0. The version is the last symbol in the file
|
|
names. If the "copy database" option is used it will ask the user to define a new "version". The normal strategy is to use version 0 for all work and to use other versions as backups. Program SAP uses databases formed from only the first three of these f
|
|
iles. Normally the program is used to handle DNA sequences but many of the functions also work on protein sequences. The choice of sequence type is made when the database is started.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab The vector sequence should be stored in a simple text file with up to 80 characters of data per line. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
|
|
Almost all readings are assembled automatically in their first pass through the assembly routine. Those that are not can be dealt with in two ways. Either they can be put through assembly again as single named rea
|
|
dings (Users should type n when asked "Use file of file names"), with the parameters set to allow the reading in. Or they can be entered through the assembly routine using the "Put all readings in new contigs" mode, and then joined to the contig they overl
|
|
ap using the Contig Joining Editor. If it is found that readings are not being assembled in their first pass through the assembler, then it is likely that the contigs require some editing to improve the consensus. Also it may be that poor quality data is b
|
|
eing used, possibly by users overinterpretting films or traces. In the long term it can be more efficient to stop reading early and save time on editing. For those using fluorescent sequencing machines the unused data can be incorporated after assembly.
|
|
\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Obviously we cannot use a script to operate a program that expects to be controlled by mouse clicks! The program BAP is an xterm version of XBAP which can be used from a script.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab There is a remote possibility of a join being missed by the "Find internal
|
|
joins" routine. If a small contig is wholly contained within a larger one, such that its ends are further than ("Probe length" - "Minimum initial match length") from the ends of the larger contig, and the consensus for the small contig lies to the left of
|
|
the consensus for large contig, the overlap will not be discovered. (See the search strategy).\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab For those using fluorescent sequencing machines and XBAP the combination of the contig editor and the graphical displays of consensus "quality" will probably
|
|
be sufficient for checking and editing contigs as everything can be done at the computer screen. For those using autoradiographs the facility to produce printouts of "display" and "highlight disagreements" options for use while checking films, and the aut
|
|
oedit command are most appropriate.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab
|
|
In general the quality of a reading deteriorates along the length of the gel and so it is also possible to use a length cutoff for the quality calculation. Only the data from the first section of each reading will be included in the calculation. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab
|
|
There are some limitations on the changes that can be made to the contigs when using the SAP screen editor. Alignments must be maintained during editing. Whole lines of sequence should not be deleted or added unless the order of the gel readings in the
|
|
contig is preserved. Each line in the contig display consists of gel reading numbers, their names and 50 character sections of sequence. Insertions are limited in the following way. No line of sequence can be extended rightwa
|
|
rds more than 5 characters beyond the end of a full length line (a full length line is 50 characters long). Only one character can be added to the left end of full length lines, but sections of sequence beginning further into a line can be extended leftwar
|
|
ds up to an equivalent position. Do not delete any non-sequence lines in the file. Before returning the contig to the database the program checks that the rules have been obeyed. If an error is found the number of the erroneous line in the file is displaye
|
|
d and the contig will not be changed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab
|
|
The following is a justification for using the auto edit function. The general strategy employed when collecting shotgun sequence data is to keep sequencing until the redundancy in the contigs is fairly high, and then to get a printout of a contig, che
|
|
ck problems against the films, note corrections on the printout, and make the changes using an interactive editor. In general the consensus is correct except for places where padding characters have been used to accomm
|
|
odate a single gel with an extra character, or where the consensus is dash. The important point for the auto editor is that most edits simply make the gel readings conform to the consensus, or remove columns of pads. The auto editor does the following. 1)
|
|
calculates a consensus for the contig (or part of a contig) to be edited, and then uses this consensus to direct the editing of the contig in 3 stages 2) stage 1\:
|
|
find and correct all places where, if the order of two adjacent characters is swapped, they will both agree with the consensus (given that they did not match the consensus before). These corrections are termed "transpositions" 3) stage 2\:
|
|
find and correct all places where there is a definite consensus but the gel reading has a different character. These corrections are termed "changes". 4) stage 3\:
|
|
delete all positions in which the consensus is a padding character. These corrections are termed "deletions". All changed characters are shown in uppercase letters so it will be obvious which characters
|
|
have been assigned by the program (except for deletions). The number of each type of correction will be displayed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab
|
|
The "calculate consensus" function, the "display contig" routine, the contig editor and the "show quality" option use the rules outlined here to calculate a consensus from aligned gel readings. The consensus sequence can contain any of 6 possble symbols\:
|
|
a,c,g,t,* and -. The last symbols is assigned if none of the others makes up a sufficient proportion of the aligned characters at any pos
|
|
ition in the contig. The following calculation is used to decide which symbol to place in the consensus at each position. Each uncertainty code contributes a score to one of a,c,g,t,* and also to the total at each point. Symbols like r and y which don't co
|
|
rrespond to a single base type contribute only to the total at each point. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab Definite assignments i.e. A,C,G,T,a,c,g,t,b,d,h,v,k,l,m,n,a,c,g,t,* =1 probable assignments i.e. 1,2,3,4 = 0.75 other uncertainty codes including r,y,5,6,7,8,- = 0.1 A cutoff scor
|
|
e between 51 and 100% is set by the user. (When the program starts this is set to 75%.). At each position in the contig we calculate the total score for each of the 5 symbols a,c,g,t and * (denote these by Xi, where i=a,c,g,t or *), and also the sum of the
|
|
se totals (denote this by S). Then if 100 Xi / S > the cutoff for any i, symbol i is placed in the consensus; otherwise - is assigned. For the "examine quality" algorithm each strand is treated separately but the calculation is the same. \par
|
|
12.\tab Databases can
|
|
become corrupted if the machine crashes so the programs contain a function "Check database for logical consistency" which checks to see if all the relational data is internally consistent. Some routines automatically perform this check before they start.
|
|
Users are advised to make frequent copies of their databases using the "Copy database" option. Note that if BAP is used in "execute with dialogue" mode the "Check logical consistency" function also creates a consensus for the whole database and scans it t
|
|
o find any regions which contain 15 dashes in 20 characters. Such a finding would indicate problems with the database.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\pagebb\tx560 13.\tab
|
|
We have covered many of the most important or complicated operations peformed by SAP and XBAP, but several others have not been mentioned. These include those for creation of consensus sequence files for processing by other programs, and complementing
|
|
contigs, both of which are trivial. There is also a set of routines for fixing corrupted databases.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 14\tab The VAX version of SAP will only a
|
|
llow one person to access a sequencing database at a time - producing an "unable to open database" error message if a second person tries. On UNIX machines there is no such check in program SAP so users need to make sure that simultaneous use does not occu
|
|
r. Otherwise the data will be corrupted. Program BAP prevents more than one person from using a database at any time. It does so using the following mechanism. When a user requests to open a particular copy (say 0) of a database (say DB) the program checks
|
|
for the existence of a file named DB_BUSY0 in the current directory. In normal circumstances, if the file exists, it indicates that somebody else is currently using the database and the program displays the message "Sorry database busy" and does not open
|
|
the files. If the file does not exist the program creates it and opens the database. When a user stops using the database (usually by quitting the program) the "busy file" is deleted, hence allowing others to use the database. If the program terminates abn
|
|
ormally the busy file will not be deleted and so the database will not be useable until the busy file is explicitly deleted using the rm command. Obviously it is dangerous to delete the file before checking if another user is using the database.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 15\tab
|
|
After a run of the assembly routine, reading names can appear in the file of failed reading names for the following reasons. 1. The reading file was not found; 2. the reading file was too short (less than the minimum match length); 3. the reading appear
|
|
ed to matc
|
|
h somewhere but failed to align sufficiently well (too many padding characters or too high a percentage mismatch); 4. a reading of the same name was already present in the database; 5. the reading was entered but also appeared to match another contig and t
|
|
he join was not made. This can occur for two reasons\: a. because the overlap between the two contigs was too large, or b. because after the reading is entered into one contig a new consensus is calculated and compared to the other contig\:
|
|
it may then not match as well as it did originally, and the join will not be made.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16\tab
|
|
We have recently devised our own file format (called SCF) for storing traces, sequences and confidence values for data produced by automated sequence readers (Dear and Staden, 1992). For ABI data these typically reduce the storage required to 30% of the
|
|
original. Data from the ABI 373A and the Pharmacia A.L.F. can be converted to this form using the program makeSCF. Note that A.L.F. files must first be processed by program alfsplit which s
|
|
plits the original data into one file per reading. Sequences can be extracted from SCF files in a form suitable for assembly by use of the program trace2seq. To locate and mark regions of a sequence from an automated sequence reader that are of too low a q
|
|
uality to be used for assembly we use the script clip-seqs. This script takes as input a file of reading file names. For each reading it renames the original file "original-filename~" and writes a new file called "original-filename" in which the poor quali
|
|
ty regions are marked.\par
|
|
\pard\plain \qj \f4\fs16 {\plain \f20 \par
|
|
}\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 17\tab The oligo selection engine is the one used in the program OSP. It is described in some detail in\:
|
|
Hillier, L., and Green, P. (1991). The parameters controlling the selection of oligos can be changed in the "Oligo Selection Parameters" window. The weigh
|
|
ts controlling the scoring of selected oligos can be changed in the "Oligo Selection Weights" window. By default, the oligos are selected from a window that extends 40 bases either side of the cursor. The size and location of this
|
|
window relative to the cursor position can be changed in the "Parameters" window. In XBAP oligos are ranked according to their proximity to the cursor position, rather than by their scores. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 18\tab For simplicity, each reading is considered to represent a template. In practise, many readings can be made off the same template. Suitable templates that are identified are those that\:
|
|
1. are in the appropriate sense, 2. have 5' ends that start upstream of the oligo, and 3. are sufficiently close to the o
|
|
ligo to be useful. This last criterion relates to the insert size for the subclones used for sequencing and the average reading length. A template is considered useful if a full reading can be made from it, taking into account both of these factors. The d
|
|
efault insert size is 1000 bases, and the default average reading length is 400 bases. These values can be changed in the "Parameters" window. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1982. Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. {\i Nucl. Acids Res}. {\b 10 }(15)\:4731-4751.\par
|
|
2.\tab Staden, R. 1990. An improved sequence handling package that runs on the Apple Macintosh. Comput. {\i Applic. Biosci}. {\b 4}, 387-393.\par
|
|
3.\tab Dear S and Staden,R. 1991. A sequence assembly and editing for efficient management of large projects. {\i Nucl. Acids Res}. {\b 19}, 3907-3911.\par
|
|
4.\tab Hillier, L., and Green, P. 1991. "OSP\: an oligonucleotide selection program," PCR Methods and Applications, {\b 1}\:124-128. \par
|
|
5.\tab Dear S and Staden, R. 1992. A standard file format for data from DNA sequencing instruments. DNA Sequence, {\b 3}, 107-110.\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 5. Analysing Sequences to Find Genes\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1720 2.1\tab The uneven positional base frequencies method.\par
|
|
2.2\tab The positional base preferences method\par
|
|
2.3\tab The codon usage method\par
|
|
2.4\tab Searching for open reading frames\par
|
|
2.5\tab Searching for tRNA genes\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 We outline three methods for finding protein genes and one for locating tRNA genes, plus routines for finding open reading frames and displaying the p
|
|
ositions of stop codons. All the methods are contained in the program NIP. The correct interpretation of the analyses presented requires a good understanding of the underlying ideas used by the methods. Despite this we concentrate here on the use of the te
|
|
chniques and refer the reader to earlier publications (1-5) for more background information. \par
|
|
\pard \s4\qj\sa120\sl280 The assumption made by the methods for finding protein genes is that protein coding regions, when analysed in terms of 3 letter nonoverlapping "words", will look
|
|
different to noncoding regions analysed in the same way. Suppose we analyse a sequence in one reading frame and count its codons. Then we define the "positional base composition" as the frequency at which each of the four base types occupies each of the th
|
|
ree positions in codons. In coding regions the positional base frequencies will be less random than they are in noncoding regions. This is the basis of method 1\:
|
|
the "Uneven positional base frequencies method". If this reading frame is coding for a protein
|
|
the positional base composition will tend towards a particular bias which is common to the majority of genes. This is the basis of method 2 the "Positional base preferences method". If the sequence has a very biased base composition then in protein genes
|
|
this may effect the choice of amino acids, and will effect the use of bases in the third positions of codons. This bias is also utilised by the positional base preferences method. Finally if the reading frame is coding for a protein its use of codons is al
|
|
so likely to be nonrandom and this is the basis of method 3, the "Codon usage method".\par
|
|
\pard \s4\qj\sa120\sl280
|
|
All the methods perform their analyses over segments of the sequence of size "window", and then move the window on by three bases and repeat the calculation. The "Uneven positional base frequencies" method only produces a single value for each segment and
|
|
hence cannot distinguish between frames or strand - it only measures the probability that a region is coding and nothing more. The other two methods produce different va
|
|
lues for each of the three potential reading frames and hence can help to decide which is coding. Their results are plotted in three separate boxes arranged one above the other. For these we also indicate which of the three reading frames is the highest sc
|
|
oring at each position along the sequence. This is done by plotting a single dot at the mid-height of the box that contains the highest score, so that if one frame is the highest scoring for many consecutive positions, the dots will produce a solid line at
|
|
the mid-height of its box. We also mark the positions of stop codons. These are represented by short vertical lines and are positioned so that they bisect the mid-height of each box. Start codons are marked at the base of the box for each reading frame.
|
|
\par
|
|
\pard \s4\qj\sa120\sl280 The search for tRNA genes involves looking for segments that could fold into the cloverleaf structure and which have the expected conserved bases in the appropriate positions.\par
|
|
\pard \s4\qj\sa120\sl280 Notice that we have not mentioned searches for relevent "signals" like promoters
|
|
or splice junctions which are also useful for finding genes. These searches are described in the chapter on searching for motifs. In the current chapter the only "signal" we include is the stop codon. However as all results are presented graphically it is
|
|
easy for users to overlay the displays of signal searches with those presented here and so effectively combine them.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab The uneven positional base frequencies method.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This method produces a single value for each segment of the sequence, and wou
|
|
ld give the same result if applied to each reading frame or to the complementary strand. The results are plotted in a box that is cut by a horizontal line. This line is labelled 76% and we expect 76% of noncoding sequences to score below this line and 76%
|
|
of coding sequence to score above it. Of the methods described this one makes the fewest assumptions and so is a good unbiased indicator of the probability that a sequence is coding.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Uneven positional base frequencies".\par
|
|
2.\tab Define "Odd window length". \par
|
|
3.\tab Define "Plot interval".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 5.1. In the example shown the 5' end of the sequence codes for several proteins and the 3' end codes for ribosomal RNAs.\par
|
|
\pard\plain \li100\sb300\sl160\keepn \f4\fs16 {\plain \fs16 {\pict\macpict\picw436\pich41
|
|
1103ffffffff002801b31101a00082a0008c01000affffffff002801b3070000000022000100010000a000a0a100a400020de801000a00000000000000000700010001220027000100da23000021000101b22300002300262300002100270001230000a000a301000affffffff002801b32300da21000101b2230026210027
|
|
0001a000a12000170001001701b2220025000100df2300032300062301002300fb2300fd2300022300fe2302032300ff2300002300fe2300fd2301002300032300002300fd2302022300042300002300052300002301fd2300002300032300002300012302fd2300fd2300002300fd2300ff2301fe23000023000023000223
|
|
00062302012300fc2300032300012300002301052300062300fa2300f82302ff2300fb2300002300002300002301002300002300002300002300002302002300002300032300052300012301092300ff2300042300022300002302042300fa2300fc2300fe2300022301002300fd2300002300002302032300002300fe2300
|
|
ff2300012300002300022301012300fc2300062300012300032302ff2300002300032300fe2300022301082300f92300fd2300032302022300032300fb2300fa2300002302ff2300fe2300002300fc2300fe2301f92300fb230000230000230000230200230000230000230000230000230100230000230000230000230003
|
|
2302fd2300002300002300002301002300002300002300002300002302002300002300002300002300002301002300002300002300032300032302082300fa23000723000223000423010b2300f82300fd2300fa2300fc2302fe2300fd2300fc2300fe23010023000023000023000023000023020023000023000023000023
|
|
00002301002300002300002300002300002300022300fe2302002300002300002300002301002300002300002300002300002302022300002300fe2300002300052301002300fb2300002300062300fc2302032300012300002300fc2300032301002300012300ff2300012300032302022300fb2300fd2300ff2302fe2300
|
|
032300ff2300042300ff2301fb2300002300002300002300002302002300002300022300032300fe2301002300052300012300032300ff2302042300052300042300002300082301fb2300fc2300fb2300fa2302ff2300fd2300002300fa2300012301ff2300042300ff2300032300012302fc2300012300032300fd230006
|
|
2301032300032300032300022300062300fd2302fb2300032300f92300002300002301fb2300f72300022300002300002302002300fe2300ff2300fe2300002301002300052300fe2300ff2300002302fe2300002300002300022300072301fa2300032300ff2300042302ff2300fa23000023000323000323010023000623
|
|
00fd2300032300fb2302032300ff2300012300fd2300052302042300fa2300fd2300ff2300002301f82300002300032300fd2300002302002300002300002300002301032300092300062300022300032302fd2300fa2300fe2300fd2300ff2301fd2300fb2300062300022300fe2302fc2300012300062300fc2300032301
|
|
032300fe2300002300022300032302002300032300002300032300fe2300022301fe2300052300032300fe2300022302fa2300032300fa2300fe2300002301062300032300ff2300fe2302fd2300ff2300f22300022300fb2301022300fe230000230000230000230200230000230000230000230000230100230000230000
|
|
2300002300002302002300002300002300002300022301062300012300032300002302fd2300022300062300042300fd2301002300fd2300032300fc2300fd2302fb2300002300fa2300022300002302fe2300002300002300002300002301002300002300002300002300052302042300022300002300fe23000323010323
|
|
00002300002300022302fd2300002300fd2300fe2300ff2301032300062300012300032300f72300002300fa2302062300032300fd2300fd2301ff2300fe2300002300fc2300fe2302002300002300002300002300002301002300022300032300012300002302ff2300042300002300082300042301032300052300fa2300
|
|
012300ff2302fe2300fc2300042300032301002300fd2300fa2300062300002302fd2300ff2300032300fa2300fe2301092300fc2300fe2300032300002302052300fa2300fe2300fc2300032301fd2300002300002300fa2300fe2302032300fd2300052300032301032300f82300032300fc2300072302062300fa230005
|
|
2300002300032302fd2300fe2300f92300fe2300ff2301012300ff2300fb2300032300002300052302012300ff2300fd2300002300032301002300012300fa2300022300fb23020023000023000023000023000023010023000023000023000023000023020023000023000023000023000023010023000023000023000323
|
|
02032300022300012300062300fd2301032300f92300012300ff2300032302fb2300ff2300012300fd2300032301fd2300022300002300032300042302fd2300062300032300fd2300002301ff2300042300032300002302ff2300f823000b2300fb2300f92301002300002300fe2300fd2300fd2302ff2300fe2300002300
|
|
002300002301002300002300022300002300fe2302002300002300002300062300022302012300ff2300fd2300012300022300002301fd2300012300022300fe2300ff2302fd2300fe230000230000230000230100230000230000230000230000230200230000230000230000230100230000230000230000230000230200
|
|
2300002300002300002300002301002300002300002300002300002302002300002300002300002300002301002300002300002300002300002302002300002300002300022301fe2300002300002300002300002302002300002300002300002300002301002300022300032300062300fd23020423000323000323000023
|
|
00fc2301fe2300022300fb2300fc2300fe2302fd2300002300002300002301002300002300002300002300002302002300002300002300002300002300002300002301002300002300002300002300002302002300002300002300002302002300002300002300002300002301002300002300002300002300002302002300
|
|
002300002300002300002301002300002300002300032300032302fa2300032300ff2300fe2300022301092300012300062300ff2302002300012300062300022300fb2301002300002300fd2300032300fd2302fd2300032300052300062300fd2301fe23000a2300fb2300032300fd230203230003230000230000230000
|
|
2301fc2300fb2300fd2300f82302002300022300002300012300032301022300032300012300022300002302002300f82300082300fa2300012301002300052300002300fe2300002300042302fe2300fe2300ff2300032300032301002300fb2300072300002300fd23020423000023000023000023000023010023000023
|
|
00002300002300002302002300002300002300fd2300ff2302042300002300002300fd2301022300fd2300fe2300fa2300032302002300002300002300fe2300022301002300032300032300f823000a2302012300fd2300002300032300002301ff2300fe2300002300002300032302002300002300fd2300032301002300
|
|
fa2300032300002300ff2302032300012300f82300002300002301022300032300032300002300002302002300002300fa2300fe2300052301fd2300fd2300032300002300022300002302002300012300032300fd2300022301012300fd2300ff2300032300fb2302fe2300022300002300052300f8230101230000230002
|
|
2300002300022302002300fb2300fd2300012301ff2300012300032300042300fb2302032300fb2300032300042300fc2302fa2300032300002300002300072301f92300022300012300fa23000a2302042300002300ff2300fd2300002301f52300012300fd23000223020023000023000123000223000323010523000023
|
|
00032300012300002302002300002300002300f52300002301ff2300032300052300fe2300fe2302ff2300fd2300002300062300062301002300ff2300002300fd2302032300f92300ff2300032300fd2300002300fd2301042300ff2300fb2300002300022302002300fe2300032300ff2300012301022300fd2300032300
|
|
032302fe2300072300fd2300012300002301002300fd2300062300002300fd2302fb2300002300ff2300fd2300002301042300fd2300ff2300012300022302002300fb2300fd2300052300042301fc2300012300ff2300012302fd2300032300fa2300062300022302002300fe2300022300fb2300022301fe230003230005
|
|
2300fd2300032302022300fb2300fd2300002300042301fc2300012300082300032300002302002300002300ff2300002301012300002300002300fd2300fd2302002300022300fe2300052300f62301ff2300fe2300ff2300062300032300fe2300022302fb2300fd2300062300042301fe23000323000323000023000023
|
|
02ff2300fb2300002300032300022301f92300fa2300fc2300fb2300002302ff2300072300002300ff2300002301fe2300062300022300092300fd2302002300022300fe2300032301022300002300012300002300fc2302fe2300062300fc2300042300002301002300002300002300002300ff2302fe2300032300fc2300
|
|
012300032301002300002300ff2300002300002302fe2300002300ff2300042302fc2300042300002300ff2300fd2301012300ff2300042300002300002302002300002300002300002300002301002300002300002300fd2300002300022302012300fd2300022300fb2300fb2301032300fc23000b2300fd230000230204
|
|
2300ff2300fe2300022300fd2301fe2300002300fd2300002300fd2302fe2300032300fc2300032300fe2301ff2300042300fc2300012302002300002300022300062300fd2301032300052300fd2300012300ff2302012300fd2300fe2300022300062301fc2300032300fb2300022300fc2302042300002300fb2300fa23
|
|
00002301fb23000623000a2300012302022300fe2300032300fa2300022301fe2300fe2300ff2300fb23000a2302002300012300022300012300002302002300fd2300fd2300032300fa2301012300ff2300012300fd2300052302002300fb2300052300fb2300ff2300fe2301022300032300032300fe2300022302002300
|
|
002300002300032300002301fd2300fe2300f72300f923000023020123000623000b2300002301002300fe2300002300fd2300022302fe23000b2300fa2300022300032301fd2300f92300072300042300002302002300ff2300012300fd2300fa2301002300012300042300032300fb2302fe2300fd2300032300022301fd
|
|
2300052300042300ff2300012302002300002300002300fc2300042301fd2300022300fd2300fe2300fe2302fd2300032300002300fa2300052301032300002300002300fa2300012302032300052300022300002301fd2300fe2300fe2300002300022302fa2300fe2300fa2300032300022300fe2300032302fa23000623
|
|
00ff2300032301fa2300032300032300012300ff2302fe2300052300052300f92300022301fb2300062300ff2300012300002302052300fb2300fc2300002300fb2301032300062300072300012300fa2302fd2300082300f82300fd2300002301012300072300fc2300042302fe2300fa2300f82300022300042301022300
|
|
fe2300022300002300fe2302002300fd2300002300f92300fd2301092300002300092300012300022302002300032300002300032300002301002300f52300fa2300032302fd2300002300fc23000923000b2301f92300002300fd2300fa2300fd2302062300052300012300ff2300032300fb2301072300042300002300ff
|
|
2300fe2302022300012300002300002300002302002300002300fc2300fea0008da00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.1.\tab Example output from the uneven positional base frequencies method. The 5' end codes for proteins and the 3' end contains ribosomal RNA genes.\par
|
|
\pard\plain \s6\sb360\sa60\sl280\tx560\tx860 \b\f20 2.2\tab The positional base preferences method\par
|
|
\pard\plain \s4\qj\sa120\sl260 \f20 As a result of the genetic code and the relative frequencies with which amino acids are used in proteins, DNA sequences codi
|
|
ng for proteins have a particular bias in their positional base frequencies. This method scans DNA sequences and measures the closeness of each reading frame to this bias in their positional base frequencies. The closeness to the expected bias is expressed
|
|
as a \:
|
|
"score". By default the program will use a "global" set of expected values for the positional base frequencies which are derived from average amino acid compositions in known proteins. Alternatively users may create their own set of expected values
|
|
by analysing known genes from the same genome. In addition users can combine the "global" values for the first two positions in codons with third position values derived from other genes of the same genome.\par
|
|
\pard \s4\qj\sa80\sl260
|
|
In order to use a nonglobal standard, a codon table in the format described in the chapter on statistical analysis of nucleic acid sequences, can be created using the method "Creating a codon usage file". Alternatively a section of the sequence being analy
|
|
sed can be scanned to produce an internal standard. The method is particularly useful for selecting which reading frame is coding.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.2.1\tab Using the global standard\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Positional base preferences method".\par
|
|
2.\tab Select "Standard source" as "Global".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Window length". The default length of 67 should be used for most cases. Shorter windows give noisier plots and the longer the window the more chance there is of missing a short exon.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Plot interval".\par
|
|
\pard\plain \s4\qj\sa120\sl260 \f20 The plot will appear as in figure 5.2. This shows a 10,000 base section of sequence tha
|
|
t codes for several proteins in each of the three reading frames. See the introduction for an explanation of the plotting scheme used.\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb300\sa120\sl240\keepn\tx1140 \f21\fs20 {{\pict\macpict\picw447\pich225
|
|
0d7effffffff00e001be1101a0008201000affffffff00e001be090000000000000000310000000000df01bd98002400000000008d012000000000008d011f0000000000df01bd000102dd0006007fdfff00fc140040ed000e01f000e1ffffebffff87ff83d40004140040ed000e0110009200002a00008800425c00041400
|
|
40ed000e2908009200002c00007800442a0004140040ed000e5a08008c0000140000400044220004170040f3000008fc00068608010c000010fd000324220004170078f3000008fc0002860501fe000010fd000328020004130040f3000008fc0002800701f9000328020004130040f3000014fc0002800101f90003300100
|
|
04150040f30008140000100000800087f9000310010004150040f300081400001800010000a4f9000310010004130040f300081400002400010000e4f700018004130040f30008240000240001000018f700018004130040f30008220000240001000018f700018004130040f30008220000220002000018f7000180041300
|
|
40f30008220000420002000008f700018004140040fa000002fb0005210000420002f400018004140040fa000002fb000541000042001cf400014004160040fa000003fd0007440041040042001cf400014004170040fb00011003fd0007cc00410c00410024f4000140041d1476befc5eafdbeff59adfb1e0d6ddbbc5ad0f
|
|
e1bd24f600031000f7bc1d1476befc5eafdbeff59edfb1e0d6fdbbc5ff0fe1bd20f600031000f7bc1d1476befc5eafdbeff59affb1e0d6ffbbc5ff0fe1bd40f600031000f7bc1b1476befc5eafdbeff59fffb1e0d7ffbbffffdfffbdbff4ff01f7bc1a014008fd000e2288080a0000010380800299008180f4000120041901
|
|
400efd000d2288100a00000102810000690081f30001200419016016fd000d5588100200000102810000650081f30001101419016012fd000d5588100100000100410000050081f30001101419016022fd000d9508100100000100410000050081f3000110141a026021b0fe000d8d08100100000600410000030081f30001
|
|
102c1a136041c80000030808200100000600410000020041f300010c2c1a1350410e2800030806200100000a00210000020041f300010c6c1a135081015900020006200100000800210000020049f300010aec190e5081008700040005a0014200080022fd000036f300010304180e4880000700040005a000c200080022fd
|
|
000012f2000004140e48800004c0240001a000c600080022ed000004140045fe000aa03400004000a600080012ed000004140045fe000aa048000040002602c8001aed000004140045fe000a1048000040002a03480014ed00000413007dfe00011080fd00041a04480014ed000004120043fe000011fc00041904280010ed
|
|
000004100042fe000015fc0002016428eb000004100042fe00000dfc0002016430eb0000040f0042fe00000afc00010198ea0000042523400a00000a44013c4001109a0034842208e0400200808100020806088001c094080800042501400afe001e44013c4001109a0034842208e0400200808100020806088001c0940808
|
|
000406007fdfff00fc0a0040fb00000ce60000040a0040fb00000ae60000040a0040fb000012e60000040a0040fb000011e60000040b0040fc00010191e60000040b0078fc000101a1e60000040b0040fc00010941e6000004100040fc0002094080fc000010ed000004100040fc00020a0080fc000010ed000004100040fc
|
|
00020e0080fc000018ed000004100040fc00020e00c0fc00001ced000004100040fc00020e0040fc000024ed000004130040fc00020a0040fc000324000002f0000004130040fc0002080040fc000322000006f0000004130040fc0002100020fc000322000006f0000004130040fc0002100020fc000322000005f0000004
|
|
130040fc0002100020fc000322000009f0000004140040fd000318100020fc000322000009f0000004170040fd000318100020fc0006220000090000c0f300000425235dea924fb4a5900076f67fdddb6f23effd311f5fe9f8769dc2bbc579fa7e5fd7e7f7fd7c25235dea924fb4a6600076f67fdddb6f63effd311f5fedf8
|
|
769dc2bbc579fa7e5fd7e7f7fd7c25235debd24fb4a6600076f67fdddb6f63effd209f5fedf8769dc2bbc579fa7e5fd7e7f7fd7c25045debfa4fb4feff1bf6f67fdddb6f7feffd3f9f5feff8769dc2bbc579fa7e5fd7e7f7ff7c1b1440010800004020001000003000004100002080021af4000102041e1740020400004000
|
|
001000002a00004100002080021a000004f7000102041e1760020700004000000800002e000040800020800419400004f7000165042017600205000080000008200022002040c0004080840140000af90003100055a4201760040100008000000a680041805280c0024040c400a4000bf9000310004d6420045004008001fe
|
|
000f0eac0041825280c00340413800a6000bf900032800806c20045008008001fe000f079200418355802803804100002a0011f900032800803c21045008008001fe001002120081834d803402004100001a0030c0fa00032400801421044808008002fe001002118180458d003404004200001a0040c0fa00034400801421
|
|
044808008002fc000e41005c8000030400420000110080a0fc000540004400800421044808004002fc000e6100548000029400240000010080a0fc0005a0004400800421044410004004fc0002220064fe000894002c000001010020fc0005a0004201000423044410004004fc0002220020fe001358001000000101002400
|
|
0010000120008201000420044210002004fc000012fc000068fd000e0101002c00001000011000820100041f044210003014fc000012fc000060fc000d82003c00001800021000820200041f047a20000818fc00001efc000060fc000d82000200081c02021000820200041f044220000838fc000010fc000020fc000d8200
|
|
0200142c020410110204000417044240000828f0000d44000200146205040a1101080004170442400008e8f0000d4400020022a315040a1101880004160342400005ef000d48000100e2a335040d2a01700004250642d00307440c06fe001910a040025000c00000040800340401018100c8a4456a21741304250643d00304
|
|
440c06fe001910a040025000c00000040800340401fe010088dc45ac2074130406007fdfff00fc0a0043fe000008e30000040a0043fe000008e30000040a0043fe000008e30000040e044280000414fc000010e90000040e044280000494fc000018e900000410047a80000776fc0002188020eb00000410044280000756fc
|
|
0002148030eb00000410044480000402fc0002278030eb00000410044440000802fc0002278048eb00000410044440000801fc0002264048eb00000410044440000801fc0002224048eb00000410044440000801fc0002224048eb00000410044440000801fc0002204048eb00000410044820001001fc0002205848eb0000
|
|
0410044820002001fc0002405948eb0000041105482000400080fd0002402588eb0000041105482000400080fd0002c02588eb0000041105481000800080fd0002800588eb0000041205501000800040fe000301000684eb0000042523701fefe001cb3d2bffeb00020629f73b0ef1c60fef7ddff6f7dfe5f75e54fbacfd37
|
|
34fc2523701fefe001cb3d2bffeb00020629f73b0ef1c61fef7ddff6f7dfe5f75e54fbacfd3734fc2523701fefe001cb3d2bffeb00020229f73b4ef1c62fef7ddff6f7dfe5f75e54fbacfd3734fc25237fffefffffcb3d2bffebfffffe29f73b4ef1c67fef7ddff6f7dfe5f75e54fbacfd373dfc1a05501001000040fe0003
|
|
01000002fe000360000044f3000109b41a05600801000020fe000301000002fe000360000042f300010ab41a05600805000020fe000302000001fe000360000042f300010e141a05600806000020fe000302000001fe000390000042f3000116041a0560080800002cfe000302000001fe000398000082f3000110041a0540
|
|
0908000014fe000012fe000681000088400082f3000110041a05400d08000014fe00001afe000682c00084c00081f3000110041a05400308000002fe00001afe000682400104c00081f3000110041a094002f00000020000082afe000682420103200101f3000120041a0940009000000300003426fe00069c250100200101
|
|
f3000120041b09400080000001000054a4fe000664248100200201f400022020041f0040fc0003c0004364fe000c60288200204201000004000002fa00023040041e0040fc0002a00043fd000c4018820020a40100000c000102fa00023040041e0040fc0002a00040fd000c4010620020a40100000a000102fa0002314004
|
|
1c0040fc0002200080fb000a6200132801000012000142fa00023140041f0078fc0002100080fb000a24001318010000121002cdfd00050800004a80041f0040fc0002100080fb000a14001b10010000111802adfd00050800004a80041f0040fc0002080080fb000a14001c0000880021e802b5fd0005140000ca80041f00
|
|
40fc00010801fa00141400040000740020280c35803400001400014400041f0040fc00010c01fa00141800040000440040240c04803600001400010400042523400000aa0a020ec280020c801021001c809050009204c405501c846aee0573625284900c2523400000aa0a020ffe80020c801021001c8090500091ffc407f0
|
|
1cffebfffff3fffe84900c06007fdfff00fc02dd00a00083ff}}\par
|
|
\pard \s8\qj\fi-1140\li1140\sa120\sl240\tx1140 Figure 5.2\tab Example output from the positional base preferences method. Most of the sequence is coding for proteins.\par
|
|
\pard\plain \s9\fi-560\li860\sb400\sa60\sl280\tx1140 \b\f20 2.2.2\tab Using a nonglobal standard\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Make an appropriate codon usage file as described in the chapter on statistical analysis of nucleotide sequences.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Positional base preferences method".\par
|
|
3.\tab Select "Standard source" as "Codon usage table".\par
|
|
4.\tab Define "File name of standard". The file will be read and displayed on the screen.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
|
|
Select "Normalisation" as "Combine with global standard". This alternative means we will use the values for the first two positions of codons combined with the third position values from our codon table. Otherwise ("Use observed frequencies") will use a
|
|
ll three positions from our codon table. The positional base frequencies to be used will be displayed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Use 1.0 for positional weights". The alternative allows users to
|
|
give greater or lesser emphasis to any of the three positions by defining weights for each. The program displays the "Expected scores per codon in each frame".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Window length". Windows shorter than the default of 67 may be useful if the bias is sufficiently strong. Look at the "Expected scores in each frame" to help decide.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Plot interval".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Accept "Plot relative scores". This means that for each frame we plot its score divided by the sum of the scores for all three frames. It produces
|
|
smoother plots than the alternative "Plot absolute scores" which simply plots the scores for each frame. The minimum and maximum expected scores for the given standard and window length are displayed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Accept "Leave scaling values unchanged". The expected scores just displayed will be used to scale the plots. If required the user can change the scaling values at this point.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 The plot will now appear as in figure 5.2. Typical dialogue is shown in figure 5.3.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab The codon usage method\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The codon usage meth
|
|
od scans along a sequence and measures the closeness of each reading frames codon composition to an expected set of codons. Of the methods described it is the most sensitive, but consequently has to make the strongest assumption, namely that we know the ap
|
|
proximate codon usage for the genes being searched for. The codon usage will depend on the codon preferences and the amino acid composition of the protein product. For this reason the program contains three methods of "normalisation". The table of codon us
|
|
age may be used as read "Observed frequencies"; the table may be transformed to reflect an average amino acid composition "Normalise to average amino acid composition"; the table may be transformed to have no amino acid bias "Normalise to no amino acid bia
|
|
s". The table can be read from a file produced by "Creating a codon usage file" as described in the chapter on statistical analysis of nucleic acid sequences, or an "internal standard" can be used by the user defining a region of the current sequence. In t
|
|
he latter case the program will calculate the codon usage for the defined region.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Codon usage method".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Reject "Define internal standard". If an internal standard is used the program will ask for the end points of the segments over which to calculate the codon usage.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "File name of standard". The file will be read and displayed on the screen.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Normalisation" as "Average amino acid composition". The program will display the expected values for each reading frame for the window lengths 21, 31 and 41 codons. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Window length".\par
|
|
6.\tab Select "Plot interval".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The plot will appear as in figure 5.4. This shows a 10,000 base section of sequence that codes for several proteins in each of the three reading frames. See the introduction for an explanation of the plotting scheme used.\par
|
|
\pard\plain \li1840\ri1980\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Positional base preferences method to find protein genes\par
|
|
\pard \li1840\ri1980\sl220\box\brsp100\brdrth Select standard source\par
|
|
X 1 Use global standard\par
|
|
2 Use internal standard\par
|
|
3 Use codon usage table\par
|
|
? Selection (1-3) (1) =3\par
|
|
? File name of standard=atpase.cods\par
|
|
===========================================\par
|
|
F TTT 21. S TCT 33. Y TAT 15. C TGT 5.\par
|
|
F TTC 55. S TCC 40. Y TAC 40. C TGC 4.\par
|
|
L TTA 8. S TCA 7. * TAA 8. * TGA 0.\par
|
|
L TTG 19. S TCG 12. * TAG 1. W TGG 17.\par
|
|
===========================================\par
|
|
L CTT 22. P CCT 17. H CAT 6. R CGT 73.\par
|
|
L CTC 21. P CCC 4. H CAC 30. R CGC 23.\par
|
|
L CTA 1. P CCA 10. Q CAA 19. R CGA 5.\par
|
|
L CTG 168. P CCG 48. Q CAG 80. R CGG 3.\par
|
|
===========================================\par
|
|
I ATT 47. T ACT 14. N AAT 17. S AGT 8.\par
|
|
I ATC 98. T ACC 54. N AAC 52. S AGC 26.\par
|
|
I ATA 6. T ACA 7. K AAA 85. R AGA 0.\par
|
|
M ATG 75. T ACG 13. K AAG 28. R AGG 0.\par
|
|
===========================================\par
|
|
V GTT 67. A GCT 56. D GAT 41. G GGT 90.\par
|
|
V GTC 29. A GCC 53. D GAC 66. G GGC 66.\par
|
|
V GTA 49. A GCA 59. E GAA 101. G GGA 5.\par
|
|
V GTG 57. A GCG 64. E GAG 41. G GGG 8.\par
|
|
===========================================\par
|
|
Select normalisation\par
|
|
X 1 Use observed frequencies\par
|
|
2 Combine with global standard\par
|
|
? Selection (1-2) (1) =2\par
|
|
T C A G Range\par
|
|
1 0.177 0.211 0.277 0.336 0.159\par
|
|
2 0.271 0.238 0.310 0.182 0.128\par
|
|
3 0.242 0.301 0.168 0.289 0.132\par
|
|
? Use 1.0 for positional weights (y/n) (y) =\par
|
|
Expected scores per codon in each frame\par
|
|
0.785 0.736 0.736\par
|
|
? odd span length (31-101) (67) =\par
|
|
? plot interval (1-11) (5) =\par
|
|
? Plot relative scores (y/n) (y) =\par
|
|
\par
|
|
Minimum maximum range\par
|
|
0.3219 0.3519 0.0214\par
|
|
\pard \li1840\ri1980\sl220\keepn\box\brsp100\brdrth ? Leave scaling values unchanged (y/n) (y) =\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.3\tab
|
|
Typical dialogue from the "Positional base preferences method" using a nonglobal standard in the form of a codon table to specify the values for the third positions in codons.\par
|
|
\pard\plain \s6\sb400\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Searching for open reading frames\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This routine finds all open reading frames of some minimum length and writes its results in the form of an EMBL feature table. \par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find open reading frames".\par
|
|
\pard\plain \sl220\keepn \f4\fs16 {{\pict\macpict\picw442\pich218
|
|
0f42ffffffff00d901b91101a0008201000affffffff00d901b9090000000000000000310000000000d801b898002400000000008d012000000000008d011f0000000000d801b8000102dd0006007fdfff00fc1e0040fb000ef0fe00f26100dc0e004000180ffa40fe00020ffdc0fa0000041f0070fc000f01110101159180
|
|
a412004000280906c0fe0002100240fa0000041f0040fc000f011101010d92808232004008280802a0fe0002100240fa0000041f0040fc000f02110101080a81027200c008288801a0fe0002100040fa0000041f0070fc000f02110081080a89037108d49425900120fe0002200040fa0000041f0040fc000f02090082000a
|
|
8900a118d59445700120fe0002200040fa0000041f0070fc0015040a00c200048900a114dd9446700120000003c00020fa0000041f0040fc0015040a002200048a002124db5446100020000002000020fa0000041f0040fc0015040a002e00044a000123526282100020000002000020fa0000041f0070fc0015040a001000
|
|
004a000121226280000020000002000020fa000004220040fc000b040a001000005a0001212223fe000920000002000020000008fd000004210040fc00010404fd00055c0000a02003fe000920000002000020000018fd000004210070fc00010404fd0005540000a02003fe000910000002000020000018fd000004200040
|
|
fc000004fc0005740000a02001fe000910000002000020000018fd000004210070fc000008fc0005500000c02001fe000e100000040000200000180020000004210040fc000008fc0005100000c00001fe000e1000000400001000001400500000041e0070fc000008f90002c00001fe000e10000004000010000014005000
|
|
00041c0040fc000018f90000c0fc000e1000001400001000001400480000041c0040fc000018f9000040fc000e1000001c00001000002400480000041e067c66de6dd21858f9000040fc000e1e7ff6fc00003dbebfe797cf9ddefc1e066c66de6dd21850f9000040fc000e1e7ff6e400003dbebfe797cf9ddefc1a067c66de
|
|
6dd2185ff3ff02fe7ff6feff08fdbebfff97ff9ddefc1a066c66de6dd21850f3000e1e7ff6e400003dbebfe797cf9ddefc180040fc000010f3000e100000200000094002a40584001004180070fc000010f3000e080000400000094002a40584003004180040fc000010f3000e080000400000094007424804002804180040
|
|
fc000010f3000e0800004000000a400543c804002804180070fc000020f3000e0800004000000a4004033804002804180060fc000020f3000e0800004000000a3004023804002804180070fc000020f3000e04000040000006300c023004002804180060fc000020f3000e040000400000063008003002002804190060fd00
|
|
010220f3000e040000400000063008001002004804190078fd00010220f3000e040000400000060808001002004804190058fd00010220f3000e040000400000060808000002004804190078fd00010220f3000e040000800000020808000002004804190048fd00010320f3000302000080fe000708080000020048041900
|
|
44fd00010520f3000302000080fe00070810000002004804190074fd00010520f3000302000080fe000708100000020044041a0644040000014520f3000302000080fe000704100000020084041a067406008001c520f3000302000080fe000704100000020084041a06440a0080012540f3000302000180fe000704100000
|
|
010084041a06440a0080022540f3000302000180fe000704100000010084041a06720901400224c0f3000301800280fe00070220000001008704252362796940a3daec02e005042000000800400000a70019e403041201200220210005b90484252373f9df7fffdaec02e005042000000800400000a70019ffff0412012003
|
|
e0210005ff04fc06007fdfff00fc180643803f0e1e00c0f2000171eefd000101e0fe0002ff00041906728041111200c0f20002891280fe00010120fe0002810004190642804090a200c0f200028a1280fe00010120fe00028100041906424040a0a10140f20002860180fe00010120fe00028100041a06724080e0618140f3
|
|
000301060180fe00010220fe00028080041a0642408000018120f3000301040180fe00010210fe00028080041a0672298000004920f3000301000080fe000702100000010080041a014419fe00014920f3000301000080fe000702100000010080041a014416fe00012920f3000301000080fe000702100000010080041a01
|
|
7406fe00013620f3000e0100004000000202100000010080041a014406fe00011620f3000e020000400000020210000001004004190074fd00011620f3000e020000400000060210000001004004190044fd00010620f3000e020000400000060210000001004004180044fc000020f3000e02000040000006021000000100
|
|
4004180074fc000020f3000e020000400000060208020002002004180044fc000020f3000e0200004000000a0408020002002004180074fc000020f3000e0200004000000a0408030002002004180048fc000010f3000e0400004000000a0408030002002004180048fc000010f3000e040000400000090408030002002004
|
|
230078fc001d3c7a36ac17fffffdf7dddefebfffb1fc0000768bba9b5c0e85c31a003cfc230068fc001d3c7a36ac17fffffdf7dddefebfffb1fc0000768bba9b5c0e85c39c003cfc230068fc001d3c7a36ac17fffffdf7dddefebfffb1f40000768bba9b5c0ec5c39c003cfc23007ffcff0efc7a36ac17fffffdf7dddefebf
|
|
ffb1feff0bf68bba9f5ffec7c39ffffcfc180048fc000010f3000e100000200000111404c482880010041c0078fc000010f9000040fc000e100000200000111404c484880010041c0068fc000010f9000040fc000e100000200000111404c484880010041c0070fc000010f90000c0fc000e10000020000011340524848800
|
|
10041e0070fc000008f90002c00001fe000e10000020000010b4052454480008041e0070fc000008f90002c00001fe000e10000020000020f405285850000804210070fc000008fc0005400000c00001fe000e10000020000020e805285850000804210050fc000008fc0005640000a00003fe000e10000020000020880628
|
|
7850000804220040fc00010804fd0005640000a00003fe000e100000200000208806287850000804220070fc00010404fd0005640001200003fe000e200000200000208806283050000804230040fc000b040a001000005a0001210203fe000e2000003c0000200800283050000804230070fc000b040a003000005a000123
|
|
0203fe000e200000240000200800280050000804230040fc001d040a002800005a0001230203001000200000240000200800180060000804230040fc001d040a004c00009a0001250302861000200000020000200000180020000404230070fc001d040a0044000099008114830286300120000003c0004000001000200004
|
|
04230040fc001d020900820000890081148302853001200000022000400000100020000404230070fc00180211008200048900c108850445480120000002200040000010fe00010404230040fc000f02110102000a8900c208850429480120fe0005100240000010fe00010404200040fc000f01110102018a808122088484
|
|
294802a0fe0002100240fb00010404200070fc000f011101010291808122004484288906a0fe0002100540fb000104042523400184262c0000949223065500813a00449418898ac68212084805400420800000106d6c2523700184262c0000f4ee22fe7500ff3e007c7c1887fac68212084ff8800420800000106ffc06007f
|
|
dfff00fc070040e0000104fc070070e000010704070040e000010404070040e000010404070070e000010404070040e000010404070070e000010404070040e0000104040b0040e6000008fc000108040b0070e6000008fc000108040b0040e6000008fc000108040b0070e6000008fc000108040b0040e6000008fc000108
|
|
040b0050e6000008fc000108040c0070e600013404fd000108040d0070e6000734040028000008040d0070e6000774040038000008040d0070e6000754070048000008040d0068e60007540700480000080425107fdcef8d2bebf7efdfffc720ffcda7fdfbfeef0fff303dfbe9fc7fdff44eddf6976ef80425107fdcef8d2b
|
|
ebf7efdfffc720ffcda7fdfbfeef0fff303dfbe9fc7fdfb44efdf6976ef00425107fdcef8d2bebf7efdfffc720ffcda7fdfbfeef0fff303dfbe9fc7fdfb44cfdf6976ef00425107fdcef8d2bebf7efdfffc720ffcda7fdfbfeef0fff303dfbedfc7fdffc4ffdfe976efffc140048ed00030800002cfe000784082084840010
|
|
04180078fc000008f300030c00003cfe00078408208486001004180048fc000008f300030c000034fe0007840821034a001004180044fc000008f3000312000034fe0007840811034a002004180074fc000008f3000e12000024000001040811024a0020041c0044fc000018fc000010f9000e12000024000001020811003a
|
|
002004200074fc000018fc000010fe000020fd000e120000240000010210110029002004200044fc000018fc000010fe000020fd000e120000220000010210110001004004210044fd00010418fc000010fe000020fd000e12000022000019021012000100400422017404fe00011424fc000010fe000020fd000e12000022
|
|
000016021012000100400423014406fe00011424fc000018fe00012020fe000e11000042000016021012000100400423017406fe00013a24fc00002cfe0013306080000021000042000016021014000100800425014419fe00072a2400000600002cfe00135250c0000021000042000026021014000100800425014429fe00
|
|
1e4a2400000600042c0020015a50c000002100004200002202100c000100800425237229800000492400000600042a002001de914040002100004200002202100c000100800425234240800000412400000600062a002002d695404000210401420000220210080000808004252372404060604126000006080a6a03300ad6
|
|
8f40c000210601820000220220080000808004252342404090a081460400090c0a4a02b016c18820c001208a01820000200120080000810004251d428040912080c20c00090c1a4102b010418820a001a08a12810000200120fe0002810004251d728040911080c10a00091419812470204100212002c08912811000400120
|
|
fe00028100042523529a212a1190d95e0dcb3aa381ddf873c10835a20ac0972e8338a04801202028048108a42523739a3f2e1e90d9ffffbbfb6381ddfff3c1081fbffec0f7ec83ffffc801e0202804ff08a406007fdfff00fc02dd00a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.4\tab Example output from the codon usage method. Most of the sequence is coding for proteins.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sb400\sa120\sl280\tx560 \f20 2.\tab Define "Minimum open frame in amino acids".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select "Strands". The alternatives are\: + strand only, - strand only, or both strands. Typical output is shown in figure 5.5.\par
|
|
\pard\plain \li2120\ri2240\sb400\sl220\box\brsp100\brdrth \f4\fs16 FT CDS 525..965 \par
|
|
\pard \li2120\ri2240\sl220\box\brsp100\brdrth FT CDS 956..1789 \par
|
|
FT CDS 2128..2607 \par
|
|
FT CDS 2604..3155 \par
|
|
FT CDS 3159..4709 \par
|
|
FT CDS 4733..5623 \par
|
|
FT CDS 5539..7032 \par
|
|
FT CDS 7044..7454 \par
|
|
FT CDS 7797..8134 \par
|
|
FT CDS complement(2227..2634)\par
|
|
FT CDS complement(2250..3023)\par
|
|
FT CDS complement(3027..3899)\par
|
|
FT CDS complement(3903..4760)\par
|
|
FT CDS complement(4327..4626)\par
|
|
FT CDS complement(4646..5332)\par
|
|
FT CDS complement(5345..5647)\par
|
|
FT CDS complement(5635..6012)\par
|
|
FT CDS complement(6016..6441)\par
|
|
FT CDS complement(6445..7083)\par
|
|
FT CDS complement(7035..7445)\par
|
|
\pard \qj\li2120\ri2240\sl220\keepn\box\brsp100\brdrth FT CDS complement(7406..7777)\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.5\tab Typical output from "Find open reading frames"\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Searching for tRNA genes\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 tRNA genes have two classes of feature that can be used to locate them in genomic sequences\:
|
|
their ability to fold into the cloverleaf secondary structure, and the presence of specific "conserved" bases at particular positions relative to this structure. The level of congruence with the canonical structure is quite variable\:
|
|
some tRNA genes contain intervening sequences and others, particular those from organelles, have few of the conserved bases. The program searches for potential cloverleaf forming str
|
|
uctures and optionally the presence of conserved bases. The user can define the range of loop sizes, the minimum numbers of potential base pairs, a range of intron sizes, and which, if any, of the conserved bases should be present. The results are presente
|
|
d either textually or graphically. \par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "tRNA search".\par
|
|
2.\tab Define "Maximum tRNA length".\par
|
|
3.\tab Define "Aminoacyl stem score". See note 8.\par
|
|
4.\tab Define "Tu stem score".\par
|
|
5.\tab Define "Anticodon stem score".\par
|
|
6.\tab Define "D stem score".\par
|
|
7.\tab Define "Minimum base pairing total".\par
|
|
8.\tab Define "Minimum intron length".\par
|
|
9.\tab Define "Maxmimum intron length".\par
|
|
10.\tab Define "Minimum length for TU loop".\par
|
|
11.\tab Define "Maximum length for TU loop".\par
|
|
12.\tab Accept "Skip search for conserved bases". See notes section.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 13.\tab Reject "Plot results".
|
|
This gives listed output in which the potential cloverleafs are displayed. The alternative plotted output simply draws a vertical line to represent the score for the potential gene, at the position it has been found. Typical dialogue and the beginning of s
|
|
ome listed output is shown in figure 5.6.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
|
|
In general, for finding protein genes, we recommend the use of all the methods. The "Uneven positonal base frequencies" method can show which regions are likely to be coding but not which strand or fram
|
|
e. The "Positional base preferences" method can show the correct frame and also help to find which regions are coding. The "Codon usage" method has the greatest resolution, having been used successfully with windows of 11 codons, and can help find small ex
|
|
ons and to pinpoint exon/intron boundaries.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab
|
|
When the "Uneven positional base frequencies" calculation was applied to all the sequences in the 1984 version of the EMBL library 14% of noncoding segments failed to reach the value represented by the base of
|
|
the box, whereas all coding segments did. The top value of the box was not reached by any noncoding segments but was exceeded by 16% of coding sequences. 76% of noncoding segments failed to reach the line labelled 76% but 76% of coding segments fell above
|
|
it. We would not expect this result change significantly if it were to be recalculated on the current libraries.\par
|
|
3.\tab When the "Positional base preferences" method, using "global" values, was applied to all the {\i E. coli} genes in the 1984 version of the EMBL library it chose the correct reading frame for 91% of coding segments. {\i E. coli}
|
|
sequences were used for technical rather than scientific reasons and we have no reason to believe that other organisms should give significantly different results. This result used only the values for the first two positions in codons and so for genes wit
|
|
h a strongly biased base composition we would expect even better discrimination.\par
|
|
\pard\plain \li1180\ri1440\sb100\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 tRNA search\par
|
|
\pard \li1180\ri1440\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth ? Maximum trna length (70-130) (92) =\par
|
|
? Aminoacyl stem score (0-14) (11) =\par
|
|
? Tu stem score (0-10) (8) =\par
|
|
? Anticodon stem score (0-10) (8) =\par
|
|
? D stem score (0-8) (3) =\par
|
|
? Minimum base pairing total (30-44) (30) =\par
|
|
? Minimum intron length (0-30) (0) =\par
|
|
? Maximum intron length (0-30) (0) =\par
|
|
? Minimum length for TU loop (4-12) (6) =\par
|
|
? Maximum length for TU loop (6-12) (9) =\par
|
|
? Skip search for conserved bases (y/n) (y) =n\par
|
|
Give a score for each base, then a minimum total at the end\par
|
|
? Base 8, T is 100% conserved. Score (0-100) (0) =\par
|
|
? Base 10, G is 95% conserved. Score (0-100) (0) =\par
|
|
? Base 11, Y is 96% conserved. Score (0-100) (0) =\par
|
|
? Base 14, A is 100% conserved. Score (0-100) (0) =\par
|
|
? Base 15, R is 100% conserved. Score (0-100) (0) =\par
|
|
? Base 21, A is 97% conserved. Score (0-100) (0) =\par
|
|
? Base 32, Y is 100% conserved. Score (0-100) (0) =\par
|
|
? Base 33, T is 98% conserved. Score (0-100) (0) =\par
|
|
? Base 37, A is 91% conserved. Score (0-100) (0) =\par
|
|
? Base 48, Y is 100% conserved. Score (0-100) (0) =\par
|
|
? Base 53, G is 100% conserved. Score (0-100) (0) =\par
|
|
? Base 54, T is 95% conserved. Score (0-100) (0) =\par
|
|
? Base 55, T is 97% conserved. Score (0-100) (0) =\par
|
|
? Base 56, C is 100% conserved. Score (0-100) (0) =\par
|
|
? Base 57, R is 100% conserved. Score (0-100) (0) =\par
|
|
? Base 58, A is 100% conserved. Score (0-100) (0) =\par
|
|
? Base 60, Y is 92% conserved. Score (0-100) (0) =\par
|
|
? Base 61, C is 100% conserved. Score (0-100) (0) =\par
|
|
? Minimum total conserved base score (0-0) (0) =\par
|
|
? Plot results (y/n) (y) =n\par
|
|
264\par
|
|
t\par
|
|
t-a\par
|
|
c-g\par
|
|
a-t\par
|
|
t+g\par
|
|
\pard \li1180\ri1440\sl220\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth a-t\par
|
|
a a\par
|
|
a-t gta\par
|
|
c aacgc\par
|
|
a t !!!! c\par
|
|
cgt gtgcg a\par
|
|
!!! t cga\par
|
|
a gca c\par
|
|
g t g\par
|
|
c aa t\par
|
|
a-t a\par
|
|
t-a t a\par
|
|
t-a\par
|
|
t-a\par
|
|
g t\par
|
|
c g\par
|
|
\pard \li1180\ri1440\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth caa\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 5.6\tab Typical dialogue and textual output from "Find tRNA genes".\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa80\sl280\tx560 \f20 4.\tab If the codon table used by the "Codon usage" me
|
|
thod is normalised to have average amino acid composition it retains its codon preference bias for each amino acid type but now the amino acid composition is the average of all proteins. In general this is optimal\:
|
|
we have the expected codon preference bia
|
|
s plus an expected amino acid bias. If we normalise to no amino acid bias we are safeguarding ourselves against missing a protein of anomalous composition but at the expense of not employing all of the useful information for distinguishing coding from nonc
|
|
oding. \par
|
|
\pard \s7\qj\fi-560\li560\sa80\sl280\tx560 5.\tab
|
|
The program also contains a graphical version of Ficketts method (6), except here we use a window to analyse each segment of the sequence rather than giving a single value for each open reading frame. The tables used are those from the original publicat
|
|
ion.\par
|
|
\pard \s7\qj\fi-560\li560\sa80\sl280\tx560 6.\tab If the results from the "Find open reading frames" option are directed to disk (See the introductory chapter), the file can be used by the routines that use feature tables as input.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab The program also contains several routines for plotting the positions of stop and start codons for either strand of the sequence. One form of the output is included in figures 5.2 and 5.4.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab The tRNA gene search using a simple scoring system for base pairing\:
|
|
A-T and G-C base pairs each score 2 and G-T scores 1. The use of a "Minimum base pairing total" allows low cutoffs to be set for each individual stem, but that overall some reasonable
|
|
level of stability is possible. In this way a low score for one stem can be compensated by a high score in another.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Th
|
|
e cloverleaf is composed of four base-paired stems and four loops. Three of the stems are of fixed length but the fourth, the dhu stem which usually has four base pairs, sometimes has only three. All of the loops can vary in size. The following relationshi
|
|
ps between the stems in the cloverleaf are assumed in the program\:
|
|
(a) there are no bases between one end of the aminoacyl stem and the adjoining tuc stem; (b) there are two bases between the aminoacyl stem and the dhu stem; (c) there is one base between t
|
|
he dhu stem and the anticodon stem; (d) there are at least three bases between the anticodon stem and the tuc stem.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. and McLachlan, A.D. 1982. Codon preference and its use in identifying protein coding regions in long DNA sequences. {\i Nucl. Acids Res.} {\b 10}\:151-156.\par
|
|
2.\tab Staden, R. 1984. Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes. {\i Nucl. Acids Res}. {\b 12}\:551-567.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1985. Computer methods to locate genes and signals in nucleic acid sequences. (in) {\i Genetic Engineering, Principle and Methods}, Setlow J.K., Hollaender A., (eds.), {\b 7}\:
|
|
67-114, (Plenum Press, New York).\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Staden, R. 1990. Finding Protein Coding Regions in Genomic Sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:163-180 (Academic Press, New York).\par
|
|
5.\tab Staden, R. 1980. A computer program to search for tRNA genes. {\i Nucl. Acids Res}. {\b 8}\:817-825.\par
|
|
6.\tab Fickett, J.W. 1982. Recognition of protein coding regions in DNA sequences. {\i Nucl. Acids Res}. {\b 10}\:5303-5318.\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 6. Searching for Motifs in Nucleic Acid Sequences\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Searching for percentage matches to consensus sequences\par
|
|
2.2\tab Searching for consensus sequences using a score matrix\par
|
|
2.3\tab Using weight matrices for searching nucleotide sequences\par
|
|
2.4\tab Using "hardwired" motif searches.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
4.\tab References\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The program NIP contains several ways of defining and searching for motifs (1-4), and also contains a number of "hardwired" motifs that are already
|
|
defined and can be selected as separate searches. We describe searches for percentage matches to consensus sequences, the use of score matrices and the creation and use of nucleotide and dinucleotide weight matrices (see note 7). In addition we give detail
|
|
s of the "hardwired" motifs available from the program. In another chapter we have covered searches for exact matches to consensus sequences by describing how to find restriction enzyme recognition sequences. When searching for exact matches, percentage ma
|
|
tches or using a score matrix the search string or consensus sequence may include IUB redundancy codes. All of the searches produce both listed and graphical output. The listed output displays the matching sequence and its position and the graphical output
|
|
draws a box to represent the length of the sequence, and plots vertical lines within the box at the positions of matches. The heights of the lines are proportional to the match score (see figure 6.1).\par
|
|
\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw447\pich44
|
|
032fffffffff002b01be1101a0008201000affffffff002b01be0900000000000000003100000000002a01bd98002400000000001d012000000000001d011f00000000002a01bd000102dd0006007fdfff00fc060040df000004060040df000004060040df0000041002400088f7000020f1000001fd0000041002400088f7
|
|
000020f1000001fd0000041002400088f7000020f1000001fd00000421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe0005014200
|
|
05c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020
|
|
012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc0003021004
|
|
60fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482
|
|
b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00421044482b81210fc000302100460fc00078080000020012008fe000501
|
|
420005c00421044482b81210fc000302100460fc00078080000020012008fe000501420005c00406007fdfff00fc02dd00a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.1\tab Typical graphical output from a motif sea
|
|
rch. It shows a rectangular box in which each match is identified by a vertical line whose height gives the match score and whose x coordinate indicates the position in the sequence.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Searching for percentage matches to consensus sequences\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find percentage matches".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par
|
|
3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par
|
|
4.\tab Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default.\par
|
|
5.\tab Accept "This sense". The alternative directs the program to search for the complement of the string.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Percent match". The search is performed, the results are presented graphically (see figure 6.1), the number of matches displayed, and the scores and positions of the top 10 matches displayed.
|
|
\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define the number of matches to "Display". For the number of mat
|
|
ches chosen the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round to step 3. See figure 6.2.\par
|
|
\pard\plain \li220\ri280\sb400\sl220\box\brsp100\brdrth \f4\fs16 Find percentage matches\par
|
|
\pard \li220\ri280\sl220\box\brsp100\brdrth ? Type in string (y/n) (y) =\par
|
|
? Keep picture (y/n) (y) =\par
|
|
? String=AAAATTTT\par
|
|
STRING=AAAATTTT\par
|
|
? This sense (y/n) (y) =\par
|
|
? Percent match (1.00-100.00) (70.00) =\par
|
|
\par
|
|
Total scoring positions above 70.000 percent = 41\par
|
|
Scores 7 7 7 7 6 6 6 6 6 6\par
|
|
Positions 428 534 2994 7026 130 191 192 372 427 429\par
|
|
? Display (0-41) (0) =4\par
|
|
\par
|
|
428\par
|
|
aaaatatt\par
|
|
***** **\par
|
|
AAAATTTT\par
|
|
1\par
|
|
\par
|
|
534\par
|
|
aaagtttt\par
|
|
*** ****\par
|
|
AAAATTTT\par
|
|
1\par
|
|
2994\par
|
|
aaaatttc\par
|
|
*******\par
|
|
AAAATTTT\par
|
|
1\par
|
|
\par
|
|
7026\par
|
|
aaaacttt\par
|
|
**** ***\par
|
|
AAAATTTT\par
|
|
\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth 1\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.2\tab Worked example for the percentage match search\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.2\tab Searching for consensus sequences using a score matrix\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
A score matrix gives a score for the alignment of each possible pair of sequence symbols. The matrix used by this program includes all the IUB redundancy codes and gives scores that represent the level of redundancy. The matrix is shown in figure 6.3.
|
|
\par
|
|
\pard\plain \s7\qj\fi-560\li560\sb200\sa120\sl280\tx560 \f20 1.\tab Select "Find matches using a score matrix".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par
|
|
3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par
|
|
4.\tab Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default.\par
|
|
5.\tab Accept "This sense". The alternative directs the program to search for the complement of the string. The program displays the maximum possible score for the string.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Score". The search is performed, the results are presented graphically (see figure 6.1), the number of matches displayed, and the scores and positions of the top 10 matches displayed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab
|
|
Define the number of matches to "Display". For the number of matches chosen the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round
|
|
to step 3. The dialogue shown in figure 6.2 is almost exactly the same as that for "Searching for consensus sequences using a score matrix".\par
|
|
\pard\plain \li1580\ri1560\sb300\sl220\box\brsp100\brdrth \f4\fs16 T C A G - R Y W S M K H B V D N ?\par
|
|
\pard \li1580\ri1560\sl220\box\brsp100\brdrth T 36 0 0 0 9 0 18 18 0 0 18 12 12 0 12 9 0\par
|
|
C 0 36 0 0 9 0 18 0 18 18 0 12 12 12 0 9 0\par
|
|
A 0 0 36 0 9 18 0 18 0 18 0 12 0 12 12 9 0\par
|
|
G 0 0 0 36 9 18 0 0 18 0 18 0 12 12 12 9 0\par
|
|
- 9 9 9 9 36 18 18 18 18 18 18 27 27 27 27 36 0\par
|
|
R 0 0 18 18 18 36 0 9 9 9 9 6 6 12 12 18 0\par
|
|
Y 18 18 0 0 18 0 36 9 9 9 9 12 12 6 6 18 0\par
|
|
W 18 0 18 0 18 9 9 36 0 9 9 12 6 6 12 18 0\par
|
|
S 0 18 0 18 18 9 9 0 36 9 9 6 12 12 6 18 0\par
|
|
M 0 18 18 0 18 9 9 9 9 36 0 12 6 12 6 18 0\par
|
|
K 18 0 0 18 18 9 9 9 9 0 36 6 12 6 12 18 0\par
|
|
H 12 12 12 0 27 6 12 12 6 12 6 36 8 8 8 27 0\par
|
|
B 12 12 0 12 27 6 12 6 12 6 12 8 36 8 8 27 0\par
|
|
V 0 12 12 12 27 12 6 6 12 12 6 8 8 36 8 27 0\par
|
|
D 12 0 12 12 27 12 6 12 6 6 12 8 8 8 36 27 0\par
|
|
N 9 9 9 9 36 18 18 18 18 18 18 27 27 27 27 36 0\par
|
|
\pard \li1580\ri1560\sl220\keepn\box\brsp100\brdrth ? 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.3\tab The DNA score matrix using IUB symbols\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Using weight matrices for searching nucleotide sequences\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 A we
|
|
ight matrix is the most sensitive way of defining a motif. It is a table of values that gives scores for each base type in each position along a motif. For a motif of length 8 bases the weight matrix would be a table 8 positions long and 4 deep. The simple
|
|
st way of choosing the values for the table is to take an alignment of all known examples of the motif and to count the frequency of occurrence of each base type at each position. These frequencies can be used as the table of weights. When the table is use
|
|
d to search a new sequence the program calculates a score for each position along the sequence by adding or multiplying (see note 6) the relevant values in the table. All positions that exceed some cutoff score are reported as matching the original set of
|
|
motifs.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
How can we select a suitable cutoff score? The simplest way is to apply the weight matrix to all the known occurrences of the motif - i.e. the set of sequence segments used to create the table - and to see what scores they achieve. The cutoff can b
|
|
e selected accordingly. For convenience the weight matrix is stored as a file along with its cutoff score, a title that is displayed when the file is read, and a few other values need by the program. A routine for creating weight matrix files from sets of
|
|
aligned sequences is included in the program. When a search using the weight matrix is performed the program will either list the matching sequence segments or plot their positions as for the other motif search methods.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.3.1\tab Creating a weight matrix file from a set of aligned sequences\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par
|
|
2.\tab Select "Make weight matrix".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 3.\tab
|
|
Define "Name of aligned sequences file". We assume the file of aligned sequences has already been created (See note 3). The program reads and displays the contents of the file numbering each sequence as it goes. Then it displays the length of the longes
|
|
t sequence.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Sum logs of weights". The alternative is to sum the weights when calculating scores (see note 4). \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Accept "Use all motif
|
|
positions". The alternative allows the user to define a "mask" which identifies positions within the motif that should be ignored when the matrix is created (see note 5). The program now calculates the weights and applies them in turn to each of the seque
|
|
nces in the file. The number and score for each sequence is displayed, followed by the top, bottom and mean scores and the standard deviation. In addition the mean plus and minus 3 standard deviations is displayed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Cutoff score". The default is the mean minus 3 standard deviations, but users may, for example, decide to use the lowest score obtained by the sequences in the file.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Top score for scaling plots". This parameter is used by the graphics output routine when scaling the plots. Its value will influence the height of lines plotted to represent matches.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Position to identify". When a search is performed it is not always appropriate to report the position of a match relative to the leftmost base in the motif. For example wh
|
|
en performing a splice junction search we may want to know the position of the G in the conserved GT, rather than the position of the first base in the matrix. The "Position to identify" allows the user to define which base is marked. The bases in the tabl
|
|
e are number 1,2,3 and so on.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Define a "Title". This is a title that will be displayed when the matrix file is read prior to performing a search. It is limited to 60 characters.\par
|
|
10.\tab Define "Name for new weight matrix file". Give a name for the weight matrix file. Typical dialogue is shown in figure 6.4.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 \page 2.3.2\tab Searching using a weight matrix\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Once a weight matrix has been stored in a file it can be used to search any sequence. Results can be displayed graphically or the matching sequence segments can be listed out with their scores.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par
|
|
2.\tab Select "Use weight matrix".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Motif weight matrix file". The name of the file containing the weight matrix. The program reads the file and displays its title.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define
|
|
"Cutoff score". The default will be the value set when the weight matrix file was created. If the score is negative the program will calculate sums of logs of frequencies, otherwise it will add frequencies.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Accept "Plot results". Alternatively they will be listed.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The results will appear as in figure 6.5\par
|
|
\pard\plain \li1440\ri1500\sb200\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Motif search using weight matrix\par
|
|
\pard \li1440\ri1500\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select operation\par
|
|
X 1 Use weight matrix\par
|
|
2 Make weight matrix\par
|
|
3 Rescale weight matrix\par
|
|
? Selection (1-3) (1) =2\par
|
|
? Name of aligned sequences file=heatshock.seq\par
|
|
1 ATAAAGAATATTCTAGAA\par
|
|
2 CTCGAGAAATTTCTCTGG 144\par
|
|
3 TTCTCGTTGCTTCGAGAG 36\par
|
|
4 GCCTCGAATGTTCGCGAA 15\par
|
|
5 GACTGGAATGTTCTGACC 45 DROSOPHILA HSP68\par
|
|
6 ATCTCGAATTTTCCCCTC 12\par
|
|
7 ATCCAGAAGCCTCYAGAA 35 DROSOPHILA HSP83\par
|
|
8 CTCTAGAAGTTTCTAGAG 25\par
|
|
9 TTCTAGAGACTTCCAGTT 15\par
|
|
10 CCCCAGAAACTTCCACGG 147 DROSOPHILA HSP22\par
|
|
11 GCGAAGAAAATTCGAGAG 46\par
|
|
12 TGCCGGTATTTTCTAGAT 26\par
|
|
13 CCCGAGAAGTTTCGTGTC 97 DROSOPHILA HSP23\par
|
|
14 TTCCGGACTCTTCTAGAA 13 DROSOPHILA HSP26\par
|
|
15 CTCGAGAAAGCTCGCGAA 204 XENOPUS HSP70\par
|
|
16 CTCGCGAATCTTCCGCGA 194\par
|
|
17 CTCGCGAAAGTTCTTCGG 139\par
|
|
18 CTCGGGAAACTTCGGGTC 72\par
|
|
19 TGCCAGAAGTTGCTAGCA 124 XENOPUS HSP30\par
|
|
20 CTCGGGAACGTCCCAGAA 14\par
|
|
21 ATCCCGAAACTTCTAGTT 129 SOYBEAN HSP17\par
|
|
22 GTCCAGAATGTTTCTGAA 98\par
|
|
23 TTTCAGAAAATTCTAGTT 78\par
|
|
24 CCCAAGGACTTTCTCGAA 28\par
|
|
25 TTTTAGAATGTTCTAGAA 179 DICTYOSTELIUM DIRS-1\par
|
|
26 TTCTAGAACATTCGAAGA 169\par
|
|
Length of motif 18\par
|
|
? Sum logs of weights (y/n) (y) =\par
|
|
? Use all motif positions (y/n) (y) =\par
|
|
Applying matrix to input sequences\par
|
|
1 -15.609 ATAAAGAATATTCTAGAA\par
|
|
2 -15.965 CTCGAGAAATTTCTCTGG\par
|
|
3 -18.186 TTCTCGTTGCTTCGAGAG\par
|
|
4 -15.331 GCCTCGAATGTTCGCGAA\par
|
|
5 -20.897 GACTGGAATGTTCTGACC\par
|
|
6 -17.347 ATCTCGAATTTTCCCCTC\par
|
|
7 -16.271 ATCCAGAAGCCTCYAGAA\par
|
|
8 -12.227 CTCTAGAAGTTTCTAGAG\par
|
|
9 -15.933 TTCTAGAGACTTCCAGTT\par
|
|
10 -15.604 CCCCAGAAACTTCCACGG\par
|
|
11 -17.866 GCGAAGAAAATTCGAGAG\par
|
|
12 -17.159 TGCCGGTATTTTCTAGAT\par
|
|
13 -16.399 CCCGAGAAGTTTCGTGTC\par
|
|
14 -14.646 TTCCGGACTCTTCTAGAA\par
|
|
15 -14.801 CTCGAGAAAGCTCGCGAA\par
|
|
16 -16.163 CTCGCGAATCTTCCGCGA\par
|
|
17 -16.280 CTCGCGAAAGTTCTTCGG\par
|
|
18 -15.598 CTCGGGAAACTTCGGGTC\par
|
|
19 -17.721 TGCCAGAAGTTGCTAGCA\par
|
|
20 -16.257 CTCGGGAACGTCCCAGAA\par
|
|
21 -14.243 ATCCCGAAACTTCTAGTT\par
|
|
22 -16.456 GTCCAGAATGTTTCTGAA\par
|
|
23 -15.453 TTTCAGAAAATTCTAGTT\par
|
|
24 -17.443 CCCAAGGACTTTCTCGAA\par
|
|
25 -13.335 TTTTAGAATGTTCTAGAA\par
|
|
26 -15.914 TTCTAGAACATTCGAAGA\par
|
|
Top score -12.227 Bottom score -20.897\par
|
|
Mean -16.119 Standard deviation 1.636\par
|
|
Mean minus 3.sd -21.028 Mean plus 3.sd -11.210\par
|
|
? Cutoff score (-999.00-9999.00) (-21.03) =\par
|
|
? Top score for scaling plots (-21.03-999.00) (-11.21) =\par
|
|
? Position to identify (0-18) (1) =\par
|
|
? Title=Heatshock weights 24-10-91\par
|
|
\pard \li1440\ri1500\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth ? Name for new weight matrix file=heatshock.wts\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.4\tab An example run of creating a weight matrix\par
|
|
\pard\plain \li1400\ri1500\sb300\sl220\box\brsp100\brdrth \f4\fs16 Motif search using weight matrix\par
|
|
\pard \li1400\ri1500\sl220\box\brsp100\brdrth Select operation\par
|
|
X 1 Use weight matrix\par
|
|
2 Make weight matrix\par
|
|
3 Rescale weight matrix\par
|
|
? Selection (1-3) (1) =\par
|
|
? Motif weight matrix file=heatshock.wts\par
|
|
Heatshock weights 24-10-91\par
|
|
? Cutoff score (-9999.00-9999.00) (-21.03) =\par
|
|
? Plot results (y/n) (y) =\par
|
|
\par
|
|
619 -20.84 gctcggaagcttctgctc\par
|
|
818 -20.74 ttggcgaagctttcaaag\par
|
|
1190 -21.02 gccaggtaagtttcagac\par
|
|
1601 -20.91 tttgcgactgttcggtaa\par
|
|
2387 -20.24 cgctcgcagattctggac\par
|
|
2534 -20.87 gccgagaagatcatcgaa\par
|
|
2890 -16.38 ctcccggatgttctggag\par
|
|
2989 -19.54 ctcgcgaaaatttctgct\par
|
|
3451 -20.76 atcctggaagttccggtt\par
|
|
6020 -20.73 tctcaggaactgctggaa\par
|
|
6335 -20.51 gctgagaaattccgtgac\par
|
|
7107 -20.31 ctctggtctggtcgagaa\par
|
|
7117 -19.61 gtcgagaaaatccaggta\par
|
|
\pard \li1400\ri1500\sl220\keepn\box\brsp100\brdrth 7892 -20.18 cttccgaaagtgctgcat\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.5\tab Example run of a search using a weight matrix to produce text output.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Using "hardwired" motif searches.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The program contains predefined motif definitions for the following\:\par
|
|
\pard \s4\qj\li1120\sa120\sl280 {\i E. coli} promoters\par
|
|
prokaryotic ribosome binding sites\par
|
|
mRNA splice junctions\par
|
|
eukaryotic ribosome binding sites\par
|
|
polyadenylation sites\par
|
|
\pard \s4\qj\sb240\sa120\sl280 All except the po
|
|
lyadenylation site, which is simply defined as an exact match to the string AATAAA, are represented as weight matrices. Each search is performed simply by the user selecting the appropriate option from the menu and each plots its results in its own graphic
|
|
s window. The ribosome binding site searches are reading frame specific and so they normally plot their results to fit nicely with the output from the "gene search by content" methods described in the chapter on finding genes. Likewise the splice junction
|
|
searches produce separate output for each of the three reading frames. Below, as an example of using the hardwired motifs, we show how to perform such a search.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.4.1\tab Searching for splice junctions\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Splice search using weight matrix". The program automatically reads in weight matrices that define the donor and acceptor sites and displays their titles.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Donor cutoff score". The default is stored in the file.\par
|
|
3.\tab Define "Acceptor cutoff score". The default is stored in the file.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4. \tab Accept "Plot results". The alternative lists the results giving the position, score, matching sequence and reading frame. A typical plotted result appears in figure 6.6.\par
|
|
\pard\plain \qj\ri-100\sb240\sl480\keepn \f4\fs16 {{\pict\macpict\picw454\pich123
|
|
04be00000000007b01c6001102ff0c00fffe0000002d8f9e002d8f9e00000000004e011f000000000001000a00000000004e011f0098802400000000004e011f0000000000000000002d8f9e002d8f9e000000010001000100000000000000000000000000439867000000010000ffffffffffff0001000000000000000000
|
|
00004e011f00000000004e011f000002dd0006007fdfff00fc060040df000004060040df0000040a0040e9000020f80000040a0040e9000020f80000040c0040e9000020fa00022000040c0040e9000020fa0002200004110040eb0005200020000080fd0002200004170040fd000001f200071000200020000090fd000220
|
|
0004170040fd000001f200071000600020080090fd0002200004170040fd000001f200071000600020080090fd0002200004180040fe00011001f2000712006000200c0090fd000224000406007fdfff00fc060040df0000040a0040ee000008f30000040a0040ee000008f30000040a0040ee000008f30000040a0040ee00
|
|
0008f30000040a0040ee000008f30000040a0040ee000008f300000c0a0040ee000008f300000c0a0040ee000008f300000c0e0040ee000008fe000010f700000c180040f6000001fc0002010008fe000010fc000008fd00000c2002400004fd0005400010000001fc000601000808001010fc000008fe0001800c06007fdf
|
|
ff00fc060040df000004060040df0000040a0040fc000004e50000040a0040fc000004e50000040c0040fc000004e700020104040c0041fc000004e70002010404100041fc000004fe000008eb0002010404100041fc000004fe000008eb0002010404150041fc000004fe00010814f3000010fb00020104041a014180fd00
|
|
0004fe0005081400400040f7000010fb00020904041b02498008fe000904400200081400400040f7000050fb000209040406007fdfff00fc060040df000004060040df000004060040df000004060040df000004060040df0000040a0040fe000010e30000040e0040fe000010f4000001f10000040e0040fe000010f40000
|
|
01f10000040e0040fe000010f4000001f10000040e0040fe000018f4000001f1000004180040fe000018f60002080001fb00040800008001fc0000041d04400000081afd000005fc000308080001fb00040800008001fc00000406007fdfff00fc060040df000004060040df0000040a0040f8000008e90000040a0040f800
|
|
0008e90000040a0040f8000008e90000040a0040f8000008e90000040e0040f8000008ee000004fd000004140040fa0002400008f6000002fa000004fd000004180040fe000040fe0002400008f6000002fa000004fd000004190040fe000040fe0002400008f600010a02fb000004fd000004220048fe000a402000004000
|
|
4801000001fe0006408000000a0202fc000004fd00000406007fdfff00fc060040df000004060040df000004060040df000004060040df000004090040e2000340000004090040e20003400000040c0340000002e50003400000040c0340000002e50003400000040e0340000002e70005080040020004120340000002eb00
|
|
0001fe0005080040020004120340080002eb000001fe00050800400200041b044008020280f6000040fd000008fd000001fe000508004002000406007fdfff00fc02dd000000ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 6.6\tab
|
|
Typical graphical output from using the hardwired splice junction search. The results are presented in a reading frame specific way so it shows, in the bottom three boxes, results for donor sites and in the top three boxes those for acceptor sit
|
|
es. In both cases the vertical ordering of the boxes is frame 0 at the bottom, frame 1 in
|
|
the middle and frame 2 at the top. For example there is a very strong peak corresponding to an acceptor in frame 1 that can be seen just over halfway along the sequence .\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
|
|
For this program a motif is a short segment of sequence of fixed length. More complex structures termed "patterns" which we define as sets of motifs separated by varying gaps, are covered in another chapter. The current chapter should be read before the
|
|
chapter on patterns. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab It is debateable whether the gain in sensitivity that is afforded by the use of a score matrix is of value for searching nucleotide sequences, however it is very important for protein sequences.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
The files of aligned sequences used to make weight matrices have the following format. Each sequence should be on a separate line. The sequence should start in column 2 and is terminated by a new line or a space. Anything after the space is treated as
|
|
a comment. The files can be created by previous searches or using an editor.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab The frequencies in the weight
|
|
matrix can be used in two ways to calculate scores for sequences. Some users prefer to add the frequencies to give a total score, and others to multiply them by summing their logs. If we regard the frequencies as probabilities then multiplication seems the
|
|
correct procedure. The user chooses which method will be employed when the weight matrix is created, however the choice can be overridden when the matrix is used. If multiplication is selected then all results will presented as sums of logs.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Masking th
|
|
e weight matrix is particularly useful in cases where a limited number of examples of a motif are available, or when the motif may have several components. In the first case the limited number of examples may make the matrix unrepresentative of the motif b
|
|
ecause the bases in the unconserved positions may bias the results of searches. When a large number of examples is available to create the matrix, the unconserved positions should tend towards equal base composition and hence have no influence on the overa
|
|
ll score. We stated that a motif might have several components\: for example a motif might have both structural and specificity components. We may want to separate out the two parts and masking provides such a facility.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab
|
|
The weight matrix handling routine contains a further option "Rescale weight matrix". If the user has edited a weight matrix to change the frequency values this provides a way of selecting a new cutoff score. It allows users to read in a set of aligned
|
|
sequences and a weight matrix and to apply the matrix to the set of sequences to see the range of scores achieved. A new weight matrix file containing the selected cutoff score is written to disk.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab The program also contains a set of routines identical to those used to create and search for nucleotide weight matrices, but which deal instead with dinucleotide weight matrices. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab The reader is reminded that most options in the program, if selected when in "execute without dialogue" mode, will automatically use a set of defaults and produce a
|
|
result with little or no user input. Most motif searches require far less user input than that shown above, where we have tried to show the scope of the methods.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab
|
|
Although the program contains hardwired motifs we expect most sites that use the programs to accumulate their own libraries of motifs and patterns, which users can employ by simply knowing the names of the corresponding files.\par
|
|
\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \page 4.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1984. Computer methods to locate signals in nucleic acid sequences. {\i Nucl. Acids Res}. {\b 12}\:521-538.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Staden, R. 1985. Computer methods to locate genes and signals in nucleic acid sequences. (in) {\i Genetic Engineering, Principle and Methods, }Setlow J.K., Hollaender A., (eds.), {\b 7}\:
|
|
67-114, (Plenum Press, New York).\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4 (1)}\:53-60.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 7. Using Patterns to Analyse Nucleic Acid Sequences\par
|
|
\pard\plain \s5\sb200\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Creating a pattern file containing an exact match motif and weight matrix motif.\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.2\tab Searching a sequence using a pattern file\par
|
|
2.3\tab Comparing a sequence against a library of patterns\par
|
|
2.4\tab Searching sequence libraries for patterns\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
4.\tab References\par
|
|
\pard\plain \s5\sb200\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Here we describe one of the most powerful facilities provided by the program NIP\: the ability to define and search for complex patterns of motifs (1-3).
|
|
In another chapter we give details of seaching for individual motifs but here we show how to create patterns and libraries of patterns and to use them to search single sequences and sequence libraries. Once a pattern has been defined and stored in a file
|
|
it can used to search any sequence. In addition if users want to routinely screen sequences against libraries of patterns this can be achieved by use of files of file names. The program can produce several alternative forms of output. It will display the s
|
|
egment of sequence matching each individual motif in the pattern, display all the sequence between and including the two outermost motifs, produce a description of the match in the form of an EMBL feature table, or draw a simple graphical plot.\par
|
|
\pard \s4\qj\sa120\sl280 At the end of the chapter we describe how a related program NIPL is used to search libraries of sequences to find patterns. NIPL is capable of producing alignments of sequence families.\par
|
|
\pard \s4\qj\sa120\sl280 Patterns are defined as sets of motifs with variable spacing. Each motif in a pat
|
|
tern can be defined using any of several methods, and their positions relative to one other are defined in terms of minimum and maximum separations. In addition, by the use of logical operators, each motif can be declared to be essential (the AND operator)
|
|
, optional (the OR operator), or forbidden (the NOT operator). The following methods (termed "classes" by the program) for defining motifs are provided\:
|
|
1) exact match to a short sequence; 2) percentage match to a short sequence; 3) match to a short sequen
|
|
ce using a score matrix and cutoff score; 4) match to a weight matrix; 5) match to the complement of a weight matrix; 6) inverted repeat or stem-loop; 7) exact match to a short sequence with a defined step; 8) direct repeat. Classes 1, 2 , 3 and 7 permit t
|
|
he use of IUB redundancy codes.\par
|
|
\pard \s4\qj\sa120\sl280 The motifs in a pattern are numbered sequentially and motif spacing is defined in the following way. When a new motif is added to a pattern the user specifies the "Reference motif" by its number and then a "Relative start po
|
|
sition". The "Relative start position" is defined by taking the first base of the "Reference motif" as position 1, the next as 2, and so on. Then the user defines the allowed variation in the spacing by specifying the "Number of extra positions". Notice th
|
|
at the position of a motif can be defined relative to any other motif, and that a negative "Relative start position" declares the motif to be to the left of its "Reference motif".\par
|
|
\pard \s4\qj\sa120\sl280 The probability of finding each individual motif in the current sequence, th
|
|
e product of the probabilities for all the motifs in a pattern "Probability of finding pattern", and the "Expected number of matches" is calculated and displayed by the program. In addition to the cutoffs used for the individual motifs, users can apply two
|
|
pattern cutoffs\: "Maximum pattern probability" and "Minimum pattern score".\par
|
|
Below we describe\: how to create a pattern; how to use a pattern file to search a sequence; how to use a "File of pattern file names" to search a sequence for a whole library of
|
|
patterns. To describe how to create a pattern file we first show all the steps to make one containing two motifs, and then, to save space, the parts specific to the individual motif types are sketched in the notes section.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2. Methods\par
|
|
\pard\plain \s6\fi-560\li560\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Creating a pattern file containing an exact match motif and weight matrix motif.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher".\par
|
|
2.\tab Select "Pattern definition mode" as "Use keyboard".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select "Results display mode" as "Motif by motif". The alternatives are listed in the introduction.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Motif definition mode" as "Exact match".\par
|
|
5.\tab Define "Motif name". Each motif can be given an 8 character name.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "String". Type in the sequence of the motif. The program will display the probability of finding the motif.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Select "Motif definition mode" as "Weight matrix".\par
|
|
8.\tab Define "Motif name".\par
|
|
9.\tab Select "Logical operator" as "AND". The alternatives are "OR" and "NOT".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Select "Number of reference motif". At this stage the only choice is 1 and this is the default.\par
|
|
11.\tab Define "Relative start position". The base position relative to the "Reference motif". See the introduction.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 12.\tab Define "Number of extra positions".\par
|
|
13.\tab Define "Weight matrix file name". Type the name of the file containing the weight matrix.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab The program now cycles round to step 7 and all subsequent passes round the loop to add further motifs to the pattern would differ only in the details for the different motif "classes".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 14.\tab Select "Pattern complete"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 15.\tab Accept "Save pattern in a file". The alternative does not save the pattern and so it can only be used once on the current sequence.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16.\tab Define "Pattern definition file". Give a name for the new file.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 17. \tab
|
|
"Define "Pattern title". All patterns can have a 60 character title that can be displayed when the pattern file is read and the sequence searched. The program will now display a detailed textual description of the pattern, the "Probability of finding
|
|
the pattern" and the "Expected number of matches".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 18.\tab Define "Maximum pattern probability". Yes maximum\: any match with a greater probability of being found will be rejected. If no value is specified the search will be quicker (see notes).\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 19.\tab
|
|
Define "Minimum pattern score". A minimum pattern score only makes sense if all the motifs in the pattern are defined with compatible scoring methods. For example percentage matches and weight matrices using sums of logs are incompatible. Searching wil
|
|
l now commence and any matches displayed using the chosen method. A worked example of creating such a pattern and performing a search is shown in figure 7.1, and the actual pattern file is shown in figure 7.2.\par
|
|
\pard\plain \li1360\ri1300\sb200\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Pattern searcher\par
|
|
\pard \li1360\ri1300\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select pattern definition mode\par
|
|
X 1 Use keyboard \par
|
|
2 Use pattern file \par
|
|
3 Use file of pattern file names\par
|
|
? Selection (1-3) (1) =\par
|
|
Select results display mode\par
|
|
X 1 Motif by motif \par
|
|
2 Inclusive \par
|
|
3 Graphical \par
|
|
4 EMBL feature table \par
|
|
? Selection (1-4) (1) =\par
|
|
Select motif definition mode\par
|
|
X 1 Exact match \par
|
|
2 Percentage match \par
|
|
3 Cut-off score and score matrix \par
|
|
4 Cut-off score and weight matrix\par
|
|
5 Complement of weight matrix \par
|
|
6 Inverted repeat or stem-loop \par
|
|
7 Exact match, defined step \par
|
|
8 Direct repeat \par
|
|
9 Pattern complete \par
|
|
? Selection (1-9) (1) =\par
|
|
? Motif name=T run\par
|
|
? String=TTTTT\par
|
|
Probability of score 5.0000 = 0.870E-03\par
|
|
Select motif definition mode\par
|
|
X 1 Exact match \par
|
|
2 Percentage match \par
|
|
3 Cut-off score and score matrix \par
|
|
4 Cut-off score and weight matrix\par
|
|
5 Complement of weight matrix \par
|
|
6 Inverted repeat or stem-loop \par
|
|
7 Exact match, defined step \par
|
|
8 Direct repeat \par
|
|
9 Pattern complete \par
|
|
? Selection (1-9) (1) =4\par
|
|
? Motif name=heat\par
|
|
Select logical operator\par
|
|
X 1 And \par
|
|
2 Or \par
|
|
3 Not \par
|
|
? Selection (1-3) (1) =\par
|
|
? Number of reference motif (1-1) (1) =\par
|
|
? Relative start position (-1000-1000) (6) =10\par
|
|
? Number of extra positions (0-1000) (0) =20\par
|
|
? Weight matrix file name=heatshock.wts\par
|
|
Heatshock weights 18-12-90 \par
|
|
Probability of score -21.0280 = 0.117E-02\par
|
|
Select motif definition mode\par
|
|
1 Exact match \par
|
|
2 Percentage match \par
|
|
3 Cut-off score and score matrix \par
|
|
X 4 Cut-off score and weight matrix\par
|
|
5 Complement of weight matrix \par
|
|
6 Inverted repeat or stem-loop \par
|
|
7 Exact match, defined step \par
|
|
8 Direct repeat \par
|
|
9 Pattern complete \par
|
|
? Selection (1-9) (4) =9\par
|
|
? Save pattern in a file (y/n) (y) =\par
|
|
? Pattern definition file=_paper.pat\par
|
|
? Pattern title=demo pattern\par
|
|
Pattern description\par
|
|
\par
|
|
demo pattern \par
|
|
Motif 1 named T run is of class 1\par
|
|
Which is an exact match to the string\par
|
|
TTTTT\par
|
|
Motif 2 named heat is of class 4\par
|
|
Which is a match to a weight matrix with score -21.028\par
|
|
and the 5 prime base can take positions 10 to 30\par
|
|
relative to the 5 prime end of motif 1\par
|
|
It is anded with the previous motif.\par
|
|
Probability of finding pattern = 0.1015E-05\par
|
|
Expected number of matches = 0.1734E+00\par
|
|
? Maximum pattern probability (0.00-1.00) (1.00) =\par
|
|
? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par
|
|
Working\par
|
|
Match\par
|
|
505 T run \par
|
|
ttttt\par
|
|
528 heat \par
|
|
ttaaagaaagttttatac\par
|
|
Total matches found 1\par
|
|
\pard \li1360\ri1300\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Minimum and maximum observed scores -15.34 -15.34\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 7.1\tab Worked example of creating a simple pattern and performing a search.\par
|
|
\pard\plain \li2380\ri2520\sb300\sl220\box\brsp100\brdrth \f4\fs16 demo pattern \par
|
|
\pard \li2380\ri2520\sl220\box\brsp100\brdrth A1 T run Class \par
|
|
TTTTT\par
|
|
@ End of string\par
|
|
A4 heat Class \par
|
|
1 Relative motif\par
|
|
10 Relative start position\par
|
|
20 Number of extra positions\par
|
|
\pard \li2380\ri2520\sl220\keepn\box\brsp100\brdrth heatshock.wts\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb80\sa120\sl240\tx1140 \f21\fs20 Figure 7.2\tab The pattern file created by the work shown in figure 7.1.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.2\tab Searching a sequence using a pattern file\par
|
|
\pard\plain \s7\qj\fi-560\li560\sb160\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Pattern definition mode" as "Use pattern file".\par
|
|
3.\tab Select "Results display mode" as "Inclusive"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Pattern definition file". Type the name of the file containing the pattern. The pr
|
|
ogram will read the file then display its title, a detailed textual description of the pattern, the "Probability of finding the pattern", and the "Expected number of matches".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Maximum pattern probability". \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Minimum pattern score". Searching will now commence and any matches displayed using the chosen method. A worked example, using the pattern file created in figure 7.1 is shown in figure 7.3.\par
|
|
\pard\plain \li1300\ri1320\sb300\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Pattern searcher\par
|
|
\pard \li1300\ri1320\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select pattern definition mode\par
|
|
X 1 Use keyboard \par
|
|
2 Use pattern file \par
|
|
3 Use file of pattern file names\par
|
|
? Selection (1-3) (1) =2\par
|
|
? Pattern definition file=_paper.pat\par
|
|
Select results display mode\par
|
|
X 1 Motif by motif \par
|
|
2 Inclusive \par
|
|
3 Graphical \par
|
|
4 EMBL feature table \par
|
|
? Selection (1-4) (1) =2\par
|
|
Probability of score 5.0000 = 0.870E-03\par
|
|
Heatshock weights 18-12-90 \par
|
|
Probability of score -21.0280 = 0.117E-02\par
|
|
\par
|
|
Pattern description\par
|
|
\par
|
|
demo pattern \par
|
|
Motif 1 named T run is of class 1\par
|
|
Which is an exact match to the string\par
|
|
TTTTT\par
|
|
Motif 2 named heat is of class 4\par
|
|
Which is a match to a weight matrix with score -21.028\par
|
|
and the 5 prime base can take positions 10 to 30\par
|
|
relative to the 5 prime end of motif 1\par
|
|
It is anded with the previous motif.\par
|
|
Probability of finding pattern = 0.1015E-05\par
|
|
Expected number of matches = 0.1734E+00\par
|
|
? Maximum pattern probability (0.00-1.00) (1.00) =\par
|
|
? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par
|
|
Working\par
|
|
505 T run \par
|
|
tttttgatgcttgactctaagccttaaagaaagttttatac\par
|
|
Total matches found 1\par
|
|
\pard \li1300\ri1320\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Minimum and maximum observed scores -15.34 -15.34\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 7.3\tab Worked example of using a pattern file as input.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.3\tab Comparing a sequence against a library of patterns\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This mode of operation allows a sequence to be searched, in turn, for any number of patterns each stored in a separate pattern file. The names of the files containing the individual patterns must be stored in a simple text file. This file is called "a file
|
|
of pattern file names" and its name is the only user input required to define the search.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par
|
|
2.\tab Select "Pattern definition mode" as "Use file of pattern file names".\par
|
|
3.\tab Select "Results display mode" as "Inclusive"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
|
|
Define "File of pattern file names". Type the name of the file containing the list of pattern file names. The program will read the file and then, in turn, all the pattern files it names. Each of these patterns will be compared against the current seque
|
|
nce but only those that give matches will produce any output. The pattern title and each match will be displayed.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Searching sequence libraries for patterns\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The program NIPL can be used to search sequence libraries for patterns. Its use is similar to the pattern search routine described above, except that it does not have the facility for creating pattern files, so they must be created beforehand using NIP. In
|
|
addition to its obvious application of finding new occurrences of patterns or checking on their frequency it is a usef
|
|
ul way of obtaining sequence alignments. It can restrict its search to a list of named entries or can search all but those on a list of entries. It can restrict its output to showing the highest scoring match in each sequence, but by default it will show a
|
|
ll matches.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
Of its modes of output, two require further description. The first "Padded sections" creates a new file for each match. The file will contain the sequence between and including the two outermost motifs in the pattern. It will be gapped to the f
|
|
urthest extent defined by the pattern, which means that if all the files were subsequently written one above the other all the motifs in the pattern would be exactly aligned, with the sections between them containing the requisite numbers of padding charac
|
|
ters. The second such mode of output is called "Complete padded sequences". Here the user must know the maximum distance between the leftmost motif and the start of all the sequences that match. A trial run in which only the positions of matches are report
|
|
ed is usually required. The user gives this maximum distance to the program. The program then writes a new file containing the full length of all matching sequences, again maximally gapped (including their left ends) so that they would all align if written
|
|
above one another. For both of these modes of output the files created are named "entryname" where "entryname" is the name given to the sequence in the sequence library. These modes are best used with the option "Report all matches" rejected, so that only
|
|
the best match for each sequence is reported. The sequences can be lined up using the sequence assembly program SAP.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select NIPL.\par
|
|
2.\tab Define "Name for results file."\par
|
|
3.\tab Select a library.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
|
|
Select "Search whole library". The alternatives are "Search only a list of entries" and "Search all but a list of entries". The files containing the list of entries should contain one entry name per line, left justified.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Results display mode" as "Inclusive". The alternatives include "Motif by motif", "Scores only", "Complete padded sequences" and "Padded sections".\par
|
|
6.\tab Accept "Report all matches". The alternative only shows the best match for each sequence.\par
|
|
7.\tab Define "Pattern definition file". The name of the file containing the pattern created using NIP. \par
|
|
\tab The program displays a textual description of the pattern and the expected number of matches per 1000 residues assuming an average nucleic acid composition.\par
|
|
8.\tab Define "Maximum pattern probability". The program will run much more quickly if none is given.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Define "Minimum pattern score".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The search will start.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
|
|
The "exact match" motif class requires a consensus sequence. The "percentage match" motif class requires a consensus sequence and a cutoff score. The "score matrix" motif class requires a consensus sequence and a cutoff score. The "weight matrix" searc
|
|
h and the "complement of a weight matrix" only require the name of the file containing the matrix. The "inverted repeat" or "stem-loop" requires a stem length, minimum and maximum loop sizes,
|
|
and a cutoff score using scores A-T = G-C = 2, G-T = 1. Note that if the user defines an inverted repeat as a "Reference motif" the "Relative position" can be defined from either its 5' or 3' ends. The "direct repeat" motif class requires a repeat length
|
|
, the minimum and maximum gap between the two occurrences of the repeat, and a minimum score.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab The motif class "Exact match, defined step" is rarely used. A typical use might be to find a start codon followed, for some minimum distance, by no stop codons
|
|
in the same reading frame. The step would have the value 3 to keep the reading frame the same as that of the start codon, and the stop codon searches would be included using the NOT operator.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
The details of the probabilty calculations are outside the scope of this article. They are quite rapid and are essential both for assessing the statistical significance of any matches found and for allowing meaningful cutoffs to be applied to patterns.
|
|
Obviously, in general, cutoff scores are inappropriate for patterns containing a mixture of motif classes.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
|
|
The program calculates the "Probability of finding the pattern" and the "Expected number of matches". The first figure is actually the product of the individual motif probabilities but the latter figure is more useful because it takes into account the a
|
|
llowed variation in spacing between motifs and the length of the current sequence. In both cases the composition of the current sequence is also used so that different probabilities would be calculated for other sequences.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
|
|
The pattern definition system is very flexible. Assume that a laboratory has a large library of patterns stored in its computer. Different groups or users may want to screen their sequences against different subsets of a pattern library. Each group ther
|
|
efore uses its own "File of pattern file names" which contains only the names of the pattern files that are relevant to their sequences. Of course a pattern may contain only one motif. Hence a library of patterns can include both simple and comp
|
|
lex patterns. In the same way a laboratory may have a large library of weight matrices defining different motifs and different users may want to combine them in different ways to produce their own patterns. \par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4(1)}\:53-60.\par
|
|
2.\tab Staden, R. 1989. Methods for calculating the probabilities of finding patterns in sequences. {\i CABIOS} {\b 5(2)}\:89-96.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 8. Searching for Restriction Sites\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Search for restriction sites and list them enzyme by enzyme\par
|
|
2.2\tab Search for restriction sites and list them by position\par
|
|
2.3\tab Search for restriction sites and list their names above the sequence\par
|
|
2.4\tab Search for restriction sites and plot their positions\par
|
|
2.5\tab Find restriction enzymes that cut infrequently\par
|
|
2.6\tab Producing a back translation from a protein sequence\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The program NIP contains a routine for finding and displaying the positions of the cut sites of restriction enzyme recognition sequences. Linear or circular sequences can be searched and the results can be listed in various forms or displayed graphically.
|
|
The recognition sequences to be searched for can be typed on the keyboard or read from files. The format of these files is given in note 1. At the end of the chapter we also describe how to pro
|
|
duce back translations of protein sequences so that these routines can be used to search them for restriction sites.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Search for restriction enzyme sites and list them enzyme by enzyme\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select "Input source" as "All enzymes file". A number of standard files are available and users may also have their own.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Accept "Search for all names". \par
|
|
4.\tab Select "Order results enzyme by enzyme".\par
|
|
5.\tab Accept "List matches".\par
|
|
6.\tab Accept "The sequence is linear". The alternative is circular.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Accept "Search for definite matches". The alternative is to search for possible matches in a sequence containing IUB redundancy codes.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The results will then appear in the form shown in figure 8.1 Each match is numbered and its enzyme name given, followed by the matching sequence with the cut site indicated by a ' symbol. The position of the cut site is given followed by the length of the
|
|
potential fragment ending at that site, followed by a list of fragments sizes sorted on length.\par
|
|
\pard\plain \li1160\ri1380\sl220\box\brsp100\brdrth \f4\fs16 Matches found= 3\par
|
|
\pard \li1160\ri1380\sl220\box\brsp100\brdrth Name Sequence Position Fragment length\par
|
|
1 AccII cg'cg 313 312 51\par
|
|
2 AccII cg'cg 364 51 188\par
|
|
3 AccII cg'cg 552 188 312\par
|
|
449 449\par
|
|
Matches found= 6\par
|
|
Name Sequence Position Fragment length\par
|
|
1 AciI cc'gc 503 502 12\par
|
|
2 AciI gc'gg 553 50 12\par
|
|
3 AciI gc'gg 714 161 50\par
|
|
4 AciI gc'gg 872 158 105\par
|
|
5 AciI gc'gg 884 12 158\par
|
|
6 AciI cc'gc 896 12 161\par
|
|
105 502\par
|
|
Matches found= 3\par
|
|
Name Sequence Position Fragment length\par
|
|
1 AcyI gg'cgtc 698 697 5\par
|
|
2 AcyI gg'cgtc 765 67 67\par
|
|
\pard \li1160\ri1380\sl220\keepn\box\brsp100\brdrth 3 AcyI ga'cgcc 996 231 231\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 8.1\tab Typical output from "List enzyme by enzyme".\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Search for restriction enzyme sites and list them by position\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par
|
|
2.\tab Select "Input source" as "All enzymes file". \par
|
|
3.\tab Accept "Search for all names". \par
|
|
4.\tab Select "Order results by position".\par
|
|
5.\tab Accept "List matches". \par
|
|
6.\tab Accept "The sequence is linear".\par
|
|
7.\tab Accept "Search for definite matches". \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The results will then appear in the form shown in figure 8.2 Each match is numbered and its enzyme name given, followed b
|
|
y the matching sequence with the cut site indicated by a ' symbol. The position of the cut site is given followed by the length of the potential fragment ending at that site.\par
|
|
\pard\plain \s6\fi-540\li560\sb240\sa60\sl280\tx560 \b\f20 2.3\tab Search for restriction enzyme sites and list their names above the sequence\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par
|
|
2.\tab Select "Input source" as "All enzymes file". \par
|
|
3.\tab Accept "Search for all names". \par
|
|
4.\tab Select "Show names above the sequence".\par
|
|
5.\tab Reject "Hide translation".\par
|
|
6.\tab Accept "Use 1 letter codes".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Line length". This is the number of bases that will appear on each line of output. It must be a multiple of 30. \par
|
|
\pard\plain \li1640\ri1720\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Name Sequence Position Fragment length\par
|
|
\pard \li1640\ri1720\sl220\box\brsp100\brdrth 1 HapII c'cgg 2 1\par
|
|
2 HpaII c'cgg 2 0\par
|
|
3 MspI c'cgg 2 0\par
|
|
4 MseI t'taa 14 12\par
|
|
5 HincII gtt'aac 15 1\par
|
|
6 HindII gtt'aac 15 0\par
|
|
7 HpaI gtt'aac 15 0\par
|
|
8 DsaV 'ccagg 23 8\par
|
|
9 EcoRII 'ccagg 23 0\par
|
|
10 TspAI 'ccagg 23 0\par
|
|
11 ApyI cc'agg 25 2\par
|
|
12 BstNI cc'agg 25 0\par
|
|
13 MvaI cc'agg 25 0\par
|
|
14 ScrFI cc'agg 25 0\par
|
|
15 MaeIII 'gttac 47 22\par
|
|
16 BsrI actggt' 49 2\par
|
|
17 MseI t'taa 55 6\par
|
|
18 MaeII a'cgt 63 8\par
|
|
19 SfaNI gcatcaacaa'gata 86 23\par
|
|
\pard \li1640\ri1720\sl220\keepn\box\brsp100\brdrth 20 MaeII a'cgt 91 5\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 8.2\tab Typical output from "List by position".\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 8.\tab Accept "The sequence is linear".\par
|
|
9.\tab Accept "Search for definite matches". \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The results will then appear in the form shown in figure 8.3 The sequence is listed with a 3 phase translation underneath and every tenth base numbered. Above the sequence the positions of the cut sites of res
|
|
triction enzymes are marked.\par
|
|
\pard\plain \s6\sb160\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Search for restriction enzyme sites and plot their positions \par
|
|
\pard\plain \s7\qj\fi-560\li560\sa80\sl260\tx560 \f20 1.\tab Select "Search".\par
|
|
2.\tab Select "Input source" as "All enzymes file". \par
|
|
3.\tab Accept "Search for all names". \par
|
|
4.\tab Select "Order results by position".\par
|
|
5.\tab Reject "List matches". \par
|
|
6.\tab Accept "The sequence is linear".\par
|
|
7.\tab Accept "Search for definite matches".\par
|
|
\pard\plain \s4\qj\sa80\sl260 \f20 The results will then appear in the form shown in figure 8.4. Each enzyme that has a match is named at the left edge of the display and its cut sites are marked by short
|
|
vertical lines. If the display window fills up the bell will ring. Users may then take a screen dump before typing return. The program then displays the message " ? Restart plotting from bottom of frame". To do so type return. To quit type !.\par
|
|
\pard\plain \li1200\ri1240\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Search for restriction enzyme sites\par
|
|
\pard \li1200\ri1240\sl220\box\brsp100\brdrth Select operation\par
|
|
X 1 Search\par
|
|
2 List enzyme file\par
|
|
3 Clear text\par
|
|
4 Clear graphics\par
|
|
? Selection (1-4) (1) =\par
|
|
Select input source\par
|
|
1 All enzymes file\par
|
|
X 2 Six cutter file\par
|
|
3 Four cutter file\par
|
|
4 Personal file\par
|
|
5 Keyboard\par
|
|
? Selection (1-5) (2) =1\par
|
|
? Search for all names (y/n) (y) =\par
|
|
Select results display mode\par
|
|
X 1 Order results enzyme by enzyme\par
|
|
2 Order results by position\par
|
|
3 Show only infrequent cutters\par
|
|
4 Show names above the sequence\par
|
|
? Selection (1-4) (1) =4\par
|
|
? Hide translation (y/n) (y) =n\par
|
|
? Use 1 letter codes (y/n) (y) =\par
|
|
? Line length (30-90) (60) =\par
|
|
? The sequence is linear (y/n) (y) =\par
|
|
? Search for definite matches (y/n) (y) =\par
|
|
\par
|
|
HapII\par
|
|
HpaII\par
|
|
MspI MseI\par
|
|
. .HincII\par
|
|
. .HindII\par
|
|
. .HpaI DsaV\par
|
|
. .. EcoRII\par
|
|
. .. TspAI\par
|
|
. .. . ApyI\par
|
|
. .. . BstNI\par
|
|
. .. . MvaI\par
|
|
. .. . ScrFI MaeIII\par
|
|
. .. . . . BsrI MseI\par
|
|
ccggttagactgttaacaacaaccaggttttctactgatataactggttacatttaacgc\par
|
|
10 20 30 40 50 60\par
|
|
P V R L L T T T R F S T D I T G Y I * R\par
|
|
R L D C * Q Q P G F L L I * L V T F N A\par
|
|
\pard \li1200\ri1240\sl220\keepn\box\brsp100\brdrth G * T V N N N Q V F Y * Y N W L H L T P\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 8.3\tab Typical dialogue and output for a "Names above the sequence" search.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Finding restriction enzymes that cut infrequently\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par
|
|
2.\tab Select "Input source" as "All enzymes file". \par
|
|
3.\tab Accept "Search for all names". \par
|
|
4.\tab Select "Show only infrequent cutters".\par
|
|
5.\tab Define "Maximum number of cuts".\par
|
|
6.\tab Accept "The sequence is linear".\par
|
|
\pard\plain \li160\ri200\sl220\keepn\box\brsp100\brdrth \f4\fs16 {{\pict\macpict\picw430\pich254
|
|
0b99ffffffff00fd01ad1101a0008201000affffffff00fd01ad090000000000000000310000002400fa01ac9800240000000000b7011f0000000000b7011f0000002400fa01ac000102dd001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001
|
|
f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000210000006007fdfff00fc06f5000020ea0006f5000020ea0006f5000020ea0006f5000020ea0006f5000020ea0006f5000020ea0006007f
|
|
dfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc040020de00040020de00040020de00040020de00040020de00040020de0006007fdfff00fc1402
|
|
000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000010ff001402000080fd000001f00002100040fc000210000006007fdfff00fc06fb000004e40006fb000004e40006fb00
|
|
0004e40006fb000004e40006fb000004e40006fb000004e40006007fdfff00fc0af8000080fe000080eb000af8000080fe000080eb000af8000080fe000080eb000af8000080fe000080eb000af8000080fe000080eb0006007fdfff00fc040020de00040020de00040020de00040020de00040020de00040020de0006007f
|
|
dfff00fc0602000040e0000602000040e0000602000040e0000602000040e0000602000040e00006007fdfff00fc0602000040e0000602000040e0000602000040e0000602000040e0000602000040e00006007fdfff00fc0602000040e0000602000040e0000602000040e0000602000040e0000602000040e00006020000
|
|
40e00006007fdfff00fc06eb000040f40006eb000040f40006eb000040f40006eb000040f40006eb000040f40006007fdfff00fc06eb000040f40006eb000040f40006eb000040f40006eb000040f40006eb000040f40006eb000040f40006007fdfff00fc06eb000010f40006eb000010f40006eb000010f40006eb000010
|
|
f40006eb000010f40006007fdfff00fc040020de00040020de00040020de00040020de00040020de00040020de0006007fdfff00fc06eb000008f40006eb000008f40006eb000008f40006eb000008f40006eb000008f40006007fdfff00fc06fe000020e10006fe000020e10006fe000020e10006fe000020e10006fe0000
|
|
20e10006fe000020e10006007fdfff00fc06fe000020e10006fe000020e10006fe000020e10006fe000020e10006fe000020e10006007fdfff00fc06fe000020e10006fe000020e10006fe000020e10006fe000020e10006fe000020e10006007fdfff00fc06eb000008f40006eb000008f40006eb000008f40006eb000008
|
|
f40006eb000008f40006eb000008f40006007fdfff00fc06eb000010f40006eb000010f40006eb000010f40006eb000010f40006eb000010f40006007fdfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fe000008e10006fe000008e10006fe000008e1
|
|
0006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fa000080e50006fa000080e50006fa000080e50006fa000080e50006fa000080e50006007fdfff00fc06eb000008f40006eb000008f40006eb000008f40006eb000008f40006eb000008f40006eb000008f40006007fdfff00fc06fe000008e100
|
|
06fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc06fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006fe000008e10006007fdfff00fc02dd00a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00020000000e00
|
|
252c000800140554696d65730300140d00092e0004000001002b010b055472753949a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a000c0000001800252a0a055366614e49a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a0014000000
|
|
2000252a08055363724649a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a001c0000002800252a08044d766149a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00260000003200252a0a044d737049a00097a10096000c010000000200
|
|
000000000000a1009a0008fffd00000011000001000a002e0000003a00252a08044d736549a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00370000004300252a09064d6165494949a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00
|
|
400000004c00252a09054d61654949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00490000005500252a09054d70614949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00510000005d00252a08044d706149a00097a10096000c01
|
|
0000000200000000000000a1009a0008fffc00000011000001000a00590000006500252a080648696e644949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00630000006f00252a0a0648696e634949a00097a10096000c010000000200000000000000a1009a0008fffc000000
|
|
11000001000a006b0000007700252a080648696e503149a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00750000008100252a0a0548696e3649a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a007d0000008900252a080448686149a0
|
|
0097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00870000009300252a0a054861704949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a008f0000009b00252a08054861654949a00097a10096000c010000000200000000000000a1009a00
|
|
08fffd00000011000001000a0098000000a400252a090645636f524949a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00a1000000ad00252a090745636c31333649a00097a10096000c010000000200000000000000a1009a0008fffc00000011000001000a00a9000000b50025
|
|
2a080444736156a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00b2000000be00252a090444646549a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00ba000000c600252a080443666f49a00097a10096000c01000000020000000000
|
|
0000a1009a0008fffc00000011000001000a00c3000000cf00252a09054273744f49a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00cc000000d800252a09054273744e49a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00d4000000
|
|
e000252a080442737249a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00de000000ea00252a0a084273703134334949a00097a10096000c010000000200000000000000a1009a0008fffd00000011000001000a00e6000000f200252a08054273694c49a00097a10096000c0100
|
|
00000200000000000000a1009a0008fffd00000011000001000a00f0000000fc00252a0a0441707949a00097a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 8.4\tab Typical output from "Plot positions".\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 7.\tab Accept "Search for definite matches". \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The names and number of cut sites of all enzymes that cut less than or equal to the "Maximum number of cuts" will then be displayed.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Producing a back translation from a protein sequence \par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The routine for producing back translations is contained in the program PIP. It back translates protein sequences into DNA using the standard genetic code. The translation can use either the IUB symbols or a set of codon preferences. If a set of codon pre
|
|
ferences is used they must conform to the format of codon tables pr
|
|
oduced by the nucleotide interpretation program, and the back translation will contain the favoured codons. If, for any amino acid there is no favoured codon, the IUB symbols will be employed. The program will plot the redundancy along the sequence and hen
|
|
ce can be used to find the best sequences to use as primers. The DNA sequence can be saved to a file and analysed using the nucleotide analysis program. \par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Back translate".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "No codon preference". The alternative will cause the program to ask for "File name of codon table", which should be in the same format as those created by the nucleotide interpretation program.
|
|
\par
|
|
3.\tab Reject "Plot redundancy". The alternative will ask for a window length to use for the plot. The window length is in codons. A plot will appear in which the best primers are sited at the peaks and the worst at the troughs.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Save DNA to disk"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "File name for DNA sequence". This file can later be read into program NIP and all the searches described above employed.\par
|
|
\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \page 3.\tab Notes\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
|
|
The file containing the definitions of the restriction enzymes names and their recognition sequences uses the standard IUB redundancy symbols and has the following format. Each name is followed by a /, then each of its recognition sequences is followed
|
|
by a /. The last recognition sequence for each enzyme is followed by //. The cut sites should be indicated by a '. If the cut site is not contained in the recognition sequence, the recognition sequence should be extended by sufficent N symbo
|
|
ls. For example the two lines from the standard file shown below define the enzymes Alw21I and Alw26I. These files are kindly updated each month by Dr. Rich Roberts.\par
|
|
\pard \s7\qj\li1720\sa120\sl280\tx1720 Alw21I/GWGCW'C//\par
|
|
Alw26I/GTCTCN'NNNN/'NNNNNGATCC//\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab
|
|
To search for a subset of the restriction enzymes in a file the user should reject "Search for all names" and the program will ask for the names of the enzymes wanted and extract their recognition sequences from the file. Alternatively, if a user was al
|
|
ways using the same subset, then a file containing only those enzymes could be created by editing the standard file. This file would then be selected as "Personal file" for "Input source".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
The routine also allows names and recognition sequence to be entered on the keyboard. This is selected as "Keyboard" for "Input source", and the program will prompt for names and their recognition sequences. In this way the routine can be used to search
|
|
for exact matches to any short sequence. Again IUB redundancy codes can be used.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab When back translating
|
|
from proteins it is often useful to produce a back translation using both a table of codon preferences and one using the IUB symbols. This is because the restriction enzyme search program can distinguish between definite and possible cuts in the sequence.
|
|
Those matches that the program terms "definite matches" are ones in which the specification of the recognition sequence corresponds exactly to that of the back translation. The program will also find what it terms "possible matches" which are ones that dep
|
|
end on the particular codons chosen for each amino acid. These are sites at which recognition sequences could be engineered to produce a cut in the DNA without changing the amino acid, but which are not necessarily found in the original sequence. \par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 9. Statistical and Structural Analysis of Nucleotide Sequences\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Calculating the base composition\par
|
|
2.2\tab Calculating the dinucleotide composition\par
|
|
2.3\tab Calculating the codon composition\par
|
|
2.4 \tab Creating a codon usage file\par
|
|
2.5\tab Plotting the base composition\par
|
|
2.6 \tab Searching for anomalous compositions\par
|
|
2.7\tab Search for anomalous word usage\par
|
|
2.8\tab Calculate codon constraint\par
|
|
2.9 \tab Searching for stem-loops\par
|
|
2.10\tab Searching for long range inverted repeats\par
|
|
2.11\tab Searching for long range repeats\par
|
|
2.12\tab Searching for repeated words\par
|
|
2.13\tab Searching for possible Z DNA\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
4.\tab References\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we deal with performing simple statistical and structural analysis of nucleotide sequences and also describe some more unusual test
|
|
s. We cover base, dinucleotide and codon compositions, potential amino acid compositions, and the relative frequencies of each base in each position of codons. We describe how to produce plots to show regions of unusual composition and to measure the codon
|
|
bias for a gene. In addition we describe a set of functions for finding "structures" in nucleotide sequences, including short range inverted repeats or stem-loops, long range inverted repeats, long range direct repeats, and Z DNA. All the methods are cont
|
|
ained in the program NIP.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Calculating the base composition\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab Select "Calculate base composition". The composition of the active region is shown.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.2\tab Calculating the dinucleotide composition\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 \tab
|
|
Select "Calculate dinucleotide composition". The dinucleotide composition of the active region and an expected dinucleotide composition is shown. The expected composition is calculated from the base composition assuming a random order of bases in the sequ
|
|
ence. See figure 9.1.\par
|
|
\pard\plain \li1180\ri1440\sb200\sl220\box\brsp100\brdrth \f4\fs16 T C A G\par
|
|
\pard \li1180\ri1440\sl220\box\brsp100\brdrth Obs Expected Obs Expected Obs Expected Obs Expected\par
|
|
T 5.86 5.97 6.18 5.99 4.24 5.91 8.14 6.56\par
|
|
C 6.10 5.99 5.14 6.02 5.91 5.93 7.38 6.59\par
|
|
A 5.57 5.91 5.64 5.93 7.91 5.84 5.05 6.49\par
|
|
\pard \li1180\ri1440\sl220\keepn\box\brsp100\brdrth G 6.90 6.56 7.56 6.59 6.11 6.49 6.30 7.22\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa200\sl240\tx1140 \f21\fs20 Figure 9.1\tab The dinucleotide composition display\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Calculating the codon composition\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This function counts codons, amino acid composition, protein molecular weights, hydrophobicity and base compos
|
|
itions. Users select the segments of the sequence to be analysed. The segments can be defined on the keyboard or from an EMBL/GenBank feature table.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate codon composition".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Show observed counts". The alternative displays its codon tables so that the total for each amino acid sums to 100. This makes it easier to see any bias present in the codon usage.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Accept "Define segments using keyboard". The alternative is to use a feature table.\par
|
|
4.\tab Define "From". The start of the segment to be analysed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
|
|
Define "To". The end of the segment to be analysed. The results will be displayed as in figure 9.2 and then the program will again ask "From". The user should define a zero value for "From" when all segments of interest have been analysed. The program w
|
|
ill then display a cummulative total for all the values it calculates.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The counts are broken down into several figures. Apart from the codon counts we see the base composition by position in codon expressed as a percentage of each bases own
|
|
frequency; base composition by position in codon expressed as a percentage of the overall base composition of the segment; base composition expected for the observed amino acid composition if there was no codon preference; percentage deviations of the ob
|
|
served amino acid composition from an average amino acid composition (1) ; the molecular weight and hydrophobicity (2) of the putative amino acid sequence.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4 Creating a codon usage file\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This method writes a file of codon usage in the form of a codon tab
|
|
le (see figure 9.2). Such tables can be used by several other methods contained within the programs. If required the user can start with an existing file and add to it.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate a codon table and write it to disk".\par
|
|
2.\tab Accept "Start with empty table".\par
|
|
\pard\plain \li440\ri500\sl220\pagebb\box\brsp100\brdrth \f4\fs16 Calculate base, codon and amino acid compositions\par
|
|
\pard \li440\ri500\sl220\box\brsp100\brdrth ? Show observed counts (y/n) (y) =\par
|
|
? Define segments using keyboard (y/n) (y) =\par
|
|
\par
|
|
? From (0-8134) (0) =1\par
|
|
? To (1-8134) (8134) =1000\par
|
|
? + strand (y/n) (y) =\par
|
|
===========================================\par
|
|
F TTT 5. S TCT 7. Y TAT 4. C TGT 2.\par
|
|
F TTC 17. S TCC 3. Y TAC 5. C TGC 3.\par
|
|
L TTA 3. S TCA 4. * TAA 3. * TGA 1.\par
|
|
L TTG 4. S TCG 3. * TAG 0. W TGG 7.\par
|
|
===========================================\par
|
|
L CTT 3. P CCT 6. H CAT 6. R CGT 3.\par
|
|
L CTC 1. P CCC 1. H CAC 4. R CGC 2.\par
|
|
L CTA 0. P CCA 4. Q CAA 3. R CGA 1.\par
|
|
L CTG 36. P CCG 6. Q CAG 5. R CGG 4.\par
|
|
===========================================\par
|
|
I ATT 12. T ACT 3. N AAT 6. S AGT 0.\par
|
|
I ATC 13. T ACC 5. N AAC 7. S AGC 7.\par
|
|
I ATA 1. T ACA 2. K AAA 9. R AGA 0.\par
|
|
M ATG 9. T ACG 7. K AAG 3. R AGG 1.\par
|
|
===========================================\par
|
|
V GTT 6. A GCT 5. D GAT 7. G GGT 9.\par
|
|
V GTC 3. A GCC 6. D GAC 6. G GGC 9.\par
|
|
V GTA 7. A GCA 2. E GAA 5. G GGA 5.\par
|
|
V GTG 9. A GCG 7. E GAG 3. G GGG 3.\par
|
|
===========================================\par
|
|
Total codons= 333.\par
|
|
T C A G\par
|
|
1 25.00 34.27 40.28 35.94\par
|
|
2 45.42 28.63 36.02 22.27\par
|
|
3 29.58 37.10 23.70 41.80\par
|
|
----- ----- ----- -----\par
|
|
= 100% 100% 100% 100%\par
|
|
1 21.32 25.53 25.53 27.63 = 100%\par
|
|
2 38.74 21.32 22.82 17.12 = 100%\par
|
|
3 25.23 27.63 15.02 32.13 = 100%\par
|
|
% 28.43 24.82 21.12 25.63 Observed, overall totals\par
|
|
% 29.65 23.25 23.95 23.15 Expected, even codons per acid\par
|
|
A C D E F G H I K L\par
|
|
20. 5. 13. 8. 22. 26. 10. 26. 12. 47.\par
|
|
O-E % -27. -11. -25. -61. 71. 10. 38. 52. -36. 59.\par
|
|
M N P Q R S T V W Y\par
|
|
9. 13. 17. 8. 11. 24. 17. 25. 7. 9.\par
|
|
O-E % 14. -10. 1. -39. -41. 6. -11. 15. 64. -15.\par
|
|
\pard \li440\ri500\sl220\keepn\box\brsp100\brdrth Total acids= 329. Molecular weight= 36493. Hydrophobicity= 64.7\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb80\sa280\sl240\tx1140 \f21\fs20 Figure 9.2\tab A worked example of calculating codon, base and amino acid compositions.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 3.\tab Accept "Show observed counts". The alternative is to have the counts for each amino acid type sum to 100.\par
|
|
4.\tab Accept "Define segments using keyboard". The alternative is to use an EMBL/GenBank feature table.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "From". The start of the segment to count over.\par
|
|
6.\tab Define "To". The end of the segment.\par
|
|
7.\tab Accept "+ strand". Alternatively the minus strand.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab The table will appear on the screen and the program will cycle round to step 5. When all segments have been defined a zero v
|
|
alue for "From" will instruct the program to display on the screen a table which is the sum of all the individual tables.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab Define "Name for codon table file". Give the name of the file in which to save the final table. \par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Plotting the base composition\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This function plots the base composition for each "window length" of the sequence. The frequency of any combinations of bases can be plotted.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot base composition".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Select which combination of bases to plot. The default is A+T, but any single base or combination of bases can be used.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
Select "Odd window length". This is the size of window over which each count is made, it is "odd" so that the plotted point exactly corresponds to the centre of each window. The count is made over the window and then the window is moved on by 1 base, an
|
|
d the count repeated.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Plot interval". Especially when using long windows it is unnecessary to plot the results for every point along the sequence. A plot interval of 5 will mean the value for every fif
|
|
th point will be plotted. The plot will appear in the form shown in figure 9.3\par
|
|
\pard\plain \ri-100\sb360\sl220\keepn \f4\fs16 {\plain \fs16 {\pict\macpict\picw451\pich82
|
|
343affffffff005101c21101a00082a0008c01000affffffff005101c2070000000022000100010000a000a0a100a400020de801000a0000000000000000070001000122004f000100b223000021000101c123000023004e23000021004f0001230000a000a301000affffffff005101c22300b221000101c123004e21004f
|
|
0001a000a122003c000100ff2300fb2300fa2300f82300fa2300fb2300fe2300022300012300022301002300002300022300042300ff2300012300002300ff2300fe2300032300022300022300032300012300fd2302022300fe2300022300fe2300fd2300032300fd23000323000223000023000023000023000123000523
|
|
00fe2301002300ff2300fe2300ff2300002300012300002300fd2300002300032300022300002300fe2300002300ff2300fd2302002300032300fd2300fe2300fe2300002300002300022300032300022300002300012300002300022300002301012300fd2300022300fe2300002300ff2300fe2300002300032300002300
|
|
022300fe2300ff2300fe2300fd2302002300002300fe2300022300002300fe2300022300fe2300022300032300032300ff2300002300fe2300032302ff2300012300032300002300fa2300ff2300012300002300002300002300fb2300002300002300022300002300022301fe2300052300002300032300002300fe2300ff
|
|
2300002300fe2300032300ff2300fd2300002300012300ff2302012300032300ff2300fd2300002300062300fe2300022300fe2300ff2300fe2300022300002300fe2300ff2301002300002300012300022300fe2300002300022300fd2300012300fd2300002300022300002300fe2300002302022300012300022300fe23
|
|
00032300002300022300fe2300022300fe2300032300022300022300fe2300fe2301002300002300002300002300ff2300fb2300022300fe2300002300002300002300002300fd2300002300032300ff2302002300032300fe2300022300002300fe2300002300022300fe2300002300ff2300002300fe2300002300032302
|
|
002300022300fd2300012300032300002300ff2300002300fd2300fc2300022300002300022300fe2300022301012300002300022300032300012300002300022300fe2300ff2300fd2300032300fe2300002300fd2300022302012300002300002300022300fd2300012300002300022300fe2300f8230002230002230000
|
|
2300032300002300fd2300002300002300062300002301012300ff2300fe2300002300052300022300002300fc2300002300ff2300002300012300ff2300002300002300fe2302032300fd2300ff2300012300ff2300002300042300ff2300fe2300022300012300002300ff2300002300002301fe2300ff2300fe23000023
|
|
00ff2300012300002300002300022300002300012300022300002300002300002302012300022300fe2300002300022300fe2300002300002300002300ff2300fe2300032300022300fd2300002302012300002300fd2300022300fe2300ff2300fe2300022300012300ff2300002300002300002300012300ff2300002301
|
|
002300fe2300032300ff2300002300012300032300042300fe2300022300fe2300fe2300022300002300002302fd2300fe2300022300002300fe2300022300fe2300002300022300012300002300002300ff2300002300012301ff2300012300002300002300ff2300002300fe230000230000230002230000230000230000
|
|
2300fd2300012302002300022300fd2300fe2300002300ff2300fe2300022300002300012300032300ff2300042300002300002302002300022300fe2300022300fd2300002300fb2300002300002300022300fe2300022300002300012300022300012301002300042300002300002300002300002300002300fe2300fe23
|
|
00022300002300002300052300002300002302fe2300022300002300002300fe2300ff2300fc2300002300022300022300002300002300fe2300fe2300022301002300fd2300fe2300002300022300012300022300022300002300002300fe2300022300002300fb2300fe2302ff2300fe2300002300002300032300ff2300
|
|
032300012300002300022300fe2300002300022300fe2300002301002300ff2300012300fd2300002300002300002300ff2300002300002300002300fd2300002300002300012300ff2302012300032300002300032300022300022300032300fe2300002300002300ff2300002300fc2300002300002302ff230000230001
|
|
2300022300022300002300012300ff2300012300022300012300fd2300ff2300fe2300fe2301ff2300fe2300fd2300fd2300002300fe2300022300052300002300012300022300032300032300ff2300012300002300ff2300fe2300002300002302022300fe2300002300fe2300ff2300012300002300ff23000023000123
|
|
00ff2300fd2300002300012300ff2300fd2301fc2300fd2300022300012300002300ff2300002300fd2300032300032300002300fe2300ff2300002300032302022300002300002300012300022300002300002300012300ff2300fe2300032300002300ff2300012300032301002300002300022300002300fe2300ff2300
|
|
012300fd2300ff2300fe2300022300002300002300002300fe2302032300022300012300ff2300012300fd2300022300012300022300fb2300ff2300012300002300002300002302002300fd2300022300012300022300fe2300ff2300002300012300002300002300ff2300fd230001230000230000230103230000230000
|
|
2300022300002300012300022300fe2300002300002300022300fd2300012300ff2300012302ff2300fe2300022300fe2300ff2300002300012300fd2300002300022300002300002300002300002300002301012300022300fe2300ff2300012300002300022300fe2300002300ff2300012300002300002300ff23000023
|
|
02fe2300022300fe2300052300fe2300ff2300032300002300002300002300012300042300fb2300032300002300fd2301002300012300fd2300002300ff2300012300ff2300032300002300fd2300fd2300002300012300002300032302ff2300002300012300052300022300012300fb2300002300ff2300012300fd2300
|
|
002300022300fe2300022302012300ff2300fe2300002300032300fd2300002300032300fc2300012300002300032300ff2300012300022301fd2300fe2300ff2300032300fd2300012300fd2300002300002300002300022300012300ff2300032300fe2302032300002300022300002300fe2300fd2300002300ff230001
|
|
2300ff2300002300fe2300002300002300032300fd2301ff2300012300ff2300002300032300002300fb2300fd2300032300022300002300002300012300fd2300032302002300022300012300fd2300032300002300ff2300012300022300fe2300002300022300002300002300002300fe2300002300002300ff23000023
|
|
02012300ff2300012300002300002300002300fd2300002300002300002300fd2300fe2300022300022300fe2301032300032300ff2300012300022300fe2300002300002300002300ff2300002300fe2300022300012300032302fd2300002300022300012300042300032300fd2300fe2300fe2300002300022300fe2300
|
|
002300fc2300012300022301002300012300ff2300002300002300002300002300012300002300002300022300022300002300012300002302ff2300002300fe2300fd2300012300ff2300fe2300002300002300ff2300fe2300ff2300032300fe2300022301fe2300ff2300fe2300fe2300002300022300fe2300042300fe
|
|
2300022300002300012300032300022300fe2302002300022300fe2300002300022300fe2300002300ff2300002300fe2300002300022300fe2300002300ff2300012302ff2300002300002300002300012300002300022300002300002300012300022300012300022300002300fe2301002300022300fe2300002300fd23
|
|
00ff2300fd2300002300fe2300002300002300fd2300012300042300032302012300002300ff2300002300032300012300002300022300fd2300012300002300022300fe2300ff2300fe2301002300ff2300012300002300fd2300022300002300012300002300fd2300ff2300fe2300022300002300002302032300002300
|
|
fe2300022300012300032300ff2300032300002300fe2300022300002300022300002300012300002301ff2300012300ff2300002300012300022300012300002300ff2300002300fd2300002300fe2300fe2300ff2302fd2300002300032300012300002300022300002300002300002300022300fe2300fe2300ff230001
|
|
2300022302fe2300022300fe2300002300022300022300002300002300002300fc2300002300022300002300fe2300022301fe2300002300002300022300002300002300002300022300fc2300ff2300012300ff2300002300002300fe2302ff2300fd2300012300002300fb2300ff2300032300fe23000223000223000123
|
|
00002300002300032300022300fd2301012300022300fd2300002300002300002300012300022300fe2300002300fd2300022300fe2300022300012300fc2300012300ff2300012300002302022300002300002300002300012300022300012300022300fd2300fe2300022300012300fd2300022300012302002300002300
|
|
022300002300fd2300002300012300002300fd2300ff2300012300032300fd2300022300012301022300fe2300022300022300002300fe2300002300002300fd2300012300ff2300fe2300ff2300002300002300fe2302022300032300fe2300032300002300002300002300ff2300fe2300fd230000230002230000230001
|
|
2300002301002300ff2300002300012300ff2300012300022300002300002300012300042300032300fb2300022300fe2302002300fe2300fd2300002300ff2300fe2300022300012300002300002300022300002300012300022300022301002300002300fe2300fd2300002300012300002300ff2300fe2300ff23000323
|
|
00fe2300022300002300fe2302022300002300fe2300ff2300fe2300032300ff2300002300fe2300002300032300002300ff2300002300012300032302002300002300002300fd2300022300012300002300fd2300002300002300022300fd2300002300012300ff2301fe2300002300002300002300022300012300ff2300
|
|
012300022300fd2300fe2300002300002300002300ff2302012300002300ff2300012300022300042300ff2300fe2300002300002300ff2300032300012300002300ff2301002300002300fe2300ff2300032300002300fe2300002300002300032300ff2300032300fe2300002300002302ff230000230000230000230000
|
|
2300012300ff2300002300fe2300002300ff2300002300fd2300002300fe2300002301fe2300ff2300002300002300002300052300012300022300002300032300032300002300002300022300fe2302fd2300002300002300002300fb2300052300012300022300fe2300052300002300ff2300032300012300ff2302fd23
|
|
00002300012300ff2300fe2300002300002300fb2300022300012300fd2300ff2300002300002300fd2301012300022300fe2300032300022300012300ff2300002300012300ff2300012300002300fd2300fd2300002300002302032300ff2300032300002300fe2300022300fe2300022300012300002300ff2300012300
|
|
ff2300002300002300032300fe2300022300022300002301fe2300fe2300002300ff2300fd2300032300002300fe2300fd2300032300022300fe2300032300022300022302002300002300002300fe2300fe2300042300fe2300002300fe2300ff2300fe2300ff2300002300fe2300ff2301002300fe230000230002230003
|
|
2300002300012300022300012300ff2300002300fd2300fe2300022300fe2302002300ff2300fc2300002300042300012300032300fd2300002300022300012300022300fe2300ff2300002300012302002300ff2300002300002300012300022300032300fe2300002300002300022300fe2300022300fd2300fe23010023
|
|
00002300ff2300fe2300002300022300002300012300ff2300002300012300ff2300fe2300002300ff2302032300fd2300042300ff2300012300022300032300002300fe2300002300ff2300fb2300002300002300022301002300002300012300022300fd2300062300002300002300fe2300ff2300fe2300002300ff2300
|
|
012300ff2300012302fd2300022300002300002300012300002300002300002300022300012300002300022300002300fe2300022302002300002300fe2300022300fe2300ff2300002300012300022300fe2300ff2300002300fe2300022300002301012300002300ff2300002300002300012300002300ff2300fe230000
|
|
2300002300ff2300002300002300012302002300032300ff2300002300fe2300022300032300002300fe2300fd2300fd2300002300022300012300022301fe2300022300002300002300012300052300ff2300fe2300002300002300fe2300022300fe2300ff2300002300fe23020023000023000223000023000023000023
|
|
00012300ff2300032300022300fe2300fe2300002300002300002301002300002300002300022300002300002300fe2300022300022300fe2300002300fe2300ff2300012300022302022300002300002300fe2300022300002300002300002300002300002300002300002300012300002300002302022300fb2300022300
|
|
fe2300fd2300012300fd2300fd2300002300fd2300022300002300002300002300002300fe2300022300fc2300002300022300fe2301002300022300022300032300fd2300012300002300022300012300fd2300002300fd2300022300fe2300032302002300002300ff2300002300fe230000230000230000230002230000
|
|
2300012300ff2300032300fe2300002301002300002300002300002300ff2300002300012300032300002300022300012300002300ff2300012300ff2302002300fe2300002300ff2300fd2300fc2300022300002300032300fd2300052300fe2300022300012300032301002300022300fe2300022300fd2300002300fe23
|
|
00002300002300ff2300fd2300fe2300022300fe2300022300002302002300fe2300022300032300012300022300002300002300fe2300ff2300012300002300fa2300032300002302002300ff2300032300fe2300fd2300002300022300fe2300fe2300022300022300002300012300032300002301ff2300012300002300
|
|
022300002300fd2300fd2300002300fe2300032300022300012300002300ff2300002302032300fe2300022300fe2300002300022300fb2300002300022300fe2300032300022300fe2300ff2300fe2301032300ff2300012300fd2300002300002300002300002300022300fe2300022300fd2300032300002300002300fe
|
|
2302ff2300002300fc2300042300032300fe2300ff2300fe2300fe2300022300022300002300012300fd2300fe2301022300022300042300002300002300002300002300002300002300002300ff2300012300022300fd2300002302002300fe2300022300002300032300fe2300ff23000123000223000123000223000023
|
|
00002300fe2300002302042300fe2300fe2300002300042300fe2300032300ff2300012300002300ff2300012300ff2300002300fc2301002300022300002300fd2300002300002300012300002300022300002300002300fd2300032300002300fe2300ff2302fe2300ff2300012300002300ff2300002300002300012300
|
|
002300022300012300022300fb2300072300fc2301042300002300012300002300ff2300fe2300032300fd2300fb2300002300fd2300ff2300002300002300012302ff2300002300012300002300032300002300fd2300ff2300002300002300012300022300002300012300fd2300022300002300032300fe2300002302ff
|
|
2300002300fb2300052300002300fe2300002300002300002300002300022300032300fe2300fc2300002300fe2301032300ff2300012300ff2300fb2300002300002300032300022300002300012300032300ff2300012300002302052300fe2300002300ff2300fd2300002300002300002300fe23000223000023000023
|
|
00002300002300002301012300ff2300002300fe2300ff2300002300002300012300ff2300002300002300002300002300042300022302fe2300ff2300012300ff2300fe2300ff2300fe2300fd2300fe2300ff2300012300002300022300002300032301032300002300ff2300042300002300002300ff2300fe2300002300
|
|
ff2300012300022300fe2300fd2300fd2300002302002300fe2300002300ff2300fb2300012300ff2300032300fe2300022300032300002300002300032300002302042300002300012300002300ff2300002300012300022300fb2300002300032300fd2300002300022300fe2301022300fe230002230000230001230003
|
|
2300fd2300fd2300032300ff2300002300002300002300fe2300fe2302052300ff2300002300002300fc2300002300022300002300002300fd2300012300fd2300022300fe2300ff2300fc2301002300fd2300022300fe2300002300ff2300012300002300022300fe2300032300002300ff2300012300ff2302fe23000023
|
|
00fd2300ff2300fe2300002300002300002300002300002300022300012300032300022300012301ff2300032300002300022300fe2300fe2300ff2300fd2300002300fe2300fd2300fe2300022300022300002302012300022300012300ff2300042300002300ff2300012300fd2300ff2300fe2300ff2300012300fd2300
|
|
fe2302022300fe2300052300022300012300002300002300032300022300002300002300002300fd2300002300002300012301fd2300002300fd2300022300fe2300022300fe2300ff2300fe2300002300002300002300002300022300002302032300002300012300002300022300002300012300fc2300fe2300022300fe
|
|
2300ff2300fe2300fe2300022301022300012300ff2300012300002300032300022300002300002300fe2300002300002300022300012300ff2300fb2300022300fe2300022300002302002300fe2300fd2300032300fd2300fe2300ff23000023000123000223000223000323000123000523000523020523000323000123
|
|
00ff2300fe2300002300fd2300fd2300002300fe2300002300002300022300012300022300032301022300002300fe2300002300ff2300fe2300fe2300ff2300fe2300ff2300fe2300fd2300002300002300fe2302022300002300012300fd2300002300002300022300fe2300002300022300002300fe2300022300fd2300
|
|
012301002300022300052300fe2300fd2300002300012300022300022300032300002300fd2300012300022300012302022300fd2300002300fe2300ff2300012300ff2300fe2300022300032300012300ff2300002300fe2300ff2300fe2301fe2300002300ff2300fd2300032300fd2300002300032300fe2300032300fd
|
|
2300ff2300002300012300002302022300012300022300fe2300042300fe2300022300fe2300022300fc2300fd2300ff2300012300fd2300032302022300012300fd2300002300022300012300ff2300012300ff2300002300fb2300022300012300002300022301012300ff2300fe23000223000323000023000023000223
|
|
00fe2300fe2300022300022300002300012300022302fe2300ff2300032300002300fe2300002300002300002300002300002300ff2300002300002300fe2300fe2300ff2301002300002300fe2300022300fd2300002300002300032300002300fd2300002300012300ff2300012300002302ff2300002300002300012300
|
|
002300ff2300012300022300002300012300ff2300002300002300012300022301022300fc2300022300002300002300002300022300fe2300fd2300002300fd2300012300022300fe2300002302ff2300032300012300002300052300002300002300fd2300032300022300fb2300fe2300002300ff230000230203230000
|
|
2300002300fd2300032300022300002300fe2300022300fe2300fe2300022300fd2300002300fe2300002301002300ff2300012300002300ff2300fe2300022300002300fe2300ff2300012300002300002300022300012302ff2300012300022300012300fd2300022300012300002300022300002300fd2300fe2300ff23
|
|
00042300ff2300fd2300002300002300fe2300022301032300002300fd2300002300002300032300fe2300052300002300fd2300fe2300032300002300022300002302fe2300002300022300032300002300ff2300032300fe2300ff2300012300022300012300ff2300012300042300fc2301ff2300012300022300002300
|
|
fe2300022300022300fe2300002300fe2300022300fe2300fd2300022300fe2302ff2300fc2300002300002300ff2300002300fd2300fe2300fd2300fd2300012300ff2300012300ff2300002302012300042300032300002300fd2300002300042300022300012300042300002300002300032300002300032301fe230000
|
|
2300002300002300022300fe2300002300022300fe2300042300002300002300fe2300002300fe2302ff2300fe2300002300ff2300002300fe2300002300002300002300002300002300022300fe2300022300002300032301fe2300002300022300012300fd2300022300012300fd2300ff23000323000023000023000023
|
|
00032300002302002300fe2300022300022300fe2300002300022300fe2300fe2300002300042300002300fb2300fd2300002302fe2300052300fe2300002300fd2300002300022300012300022300012300fc2300012300ff2300002300012301022300fe2300022300012300002300ff2300fe2300022300002300012300
|
|
002300ff2300002300002300012302042300fe2300fd2300fe2300032300002300002300002300022300002300032300022300002300fe2300002300ff2301002300012300ff2300fc2300fd2300022300002300002300012300002300002300002300002300022300002302022300fc2300fd2300022300002300fd230004
|
|
2300ff2300fe2300fd2300002300022300fc2300022300022301fc2300ff2300032300022300012300fd2300002300fd2300012300002300ff2300002300002300012300042302fe2300032300002300002300002300ff2300fe2300002300fe2300002300fd2300ff23000123000223000123000223020023000223000123
|
|
00002300022300fe2300022300002300012300fd2300002300002300022300032300022300fe2300fe2300002300002300002301002300022300fd2300012300022300002300002300022300002300fc2300002300002300002300ff2300012302ff2300fe2300002300022300002300002300032300fe2300ff2300032300
|
|
022300fe2300002300002300002301fe2300022300022300012300ff2300002300002300012300fd2300022300002300fe2300002300002300022302002300002300002300012300022300fe2300ff2300fc2300002300fd2300022300fd2300002300fc2300042300fc2301022300022300002300fe230002230000230001
|
|
2300ff2300012300022300002300032300002300022300fe2302002300fe2300022300fd2300012300ff2300012300002300002300022300002300002300022300012300ff2302fc2300002300002300022300fe2300ff2300032300fd2300002300002300002300002300fd2300012300002301ff23000023000023000323
|
|
00fb2300032300032300ff2300002300012300022300002300002300022300012300fd2302002300032300fd2300fe2300022300fe2300022300fe2300022300fe2300002300002300042300fc2300042301002300fe2300002300022300002300002300002300002300012300ff2300012300022300fe2300022300fe2302
|
|
022300012300fd2300022300fe2300ff2300032300fb2300032300ff2300fe2300022300fe2300022300032301fe2300032300fd2300002300002300002300022300fd2300002300012300fd2300022300032300012300fd2302032300002300002300002300022300fd2300002300fe2300002300ff2300012300ff230000
|
|
2300fe2300022300fe2302032300fd2300022300fe2300fe2300042300002300002300002300002300002300fe2300022300002300002301fe2300fe2300022300022300012300022300002300002300fe2300022300002300002300002300fd2300002302002300002300fc2300022300fe23000223000223000023000023
|
|
00002300fe2300022300012300022300fb2301002300fe2300022300002300022300002300012300ff2300012300022300012300002300002300fd2300ff2302002300fe2300fe2300002300002300ff2300002300032300002300002300022300002300fe2300002300032300ff2300012300002300022300002300012302
|
|
022300012300fc2300002300002300fe2300ff2300fe2300022300fc2300002300ff2300012300ff2300fe2301ff2300002300fc2300022300fd2300002300012300002300002300052300fd2300002300022300032300fe2302002300002300032300022300002300002300022300fb2300002300012300022300fd230001
|
|
2300ff2300002301fd2300012300022300fb2300022300012300002300fd2300022300012300002300002300022300fe2300022300fe2302ff2300002300fe2300002300022300fe2300002300fe2300ff2300012300002300002300002300fc2300012301002300002300002300022300032300fd23000123000423000123
|
|
00022300002300002300fe2300ff2300002302fe2300002300002300fe2300002300002300022300022300002300032300fe2300ff2300012300002300022302fd2300002300012300fd2300002300032300002300002300ff2300002300fe2300002300022300002300fe2301002300002300002300032300022300002300
|
|
012300ff2300002300002300002300fe2300ff2300002300002300002302002300012300002300002300ff2300002300002300012300002300002300ff2300fe2300022300002300002301fe2300002300fe2300022300002300002300fe2300002300002300042300002300fe2300002300022300002302002300fe230000
|
|
2300fb2300022300012300ff2300012300fd2300002300022300fe2300022300fe2300002301032300ff2300052300032300002300032300002300022300002300fe2300022300fe2300fe2300ff2300002302002300002300012300002300002300022300002300002300032300032300002300ff2300002300fe23000223
|
|
00002302002300002300fe2300002300ff2300012300002300002300002300022300fd2300fc2300002300022300002301022300fe2300002300002300002300002300002300fe2300fc2300fc2300022300fd2300032300fb2300032302fc2300fe2300022300032300fe2300002300032300042300002300012300022300
|
|
012300ff2300fe2300022301fe2300fd2300032300002300032300ff2300032300fe2300fd2300002300002300fd2300022300012300032300ff2300032300fd2300012300002300042302fe2300fd2300002300012300ff2300002300fe2300ff2300fc2300022300022300fe2300fd2300fe2300032302ff230000230001
|
|
2300ff2300002300012300022300022300002300012300002300002300002300002300032301ff2300002300fb2300002300002300002300002300fd2300fd2300012300ff2300012300022300fe2300022302032300022300032300002300002300fe2300032300002300042300fe2300fe2300fd23000223000123000223
|
|
01fe2300002300002300fd2300022300002300032300fd2300fd2300002300fe2300002300022300fe2300fd2300fe2302ff2300002300012300002300022300002300012300002300002300022300032300fd2300fe2300ff2300fe2301022300002300012300002300ff2300002300032300032300022300012300022300
|
|
002300022300002300012302ff2300fe2300fe2300022300fe2300002300fd2300022300012300002300002300042300fc2300022300022302012300002300ff2300002300002300fe2300fe2300022300002300fe2300ff2300002300012300ff2300002300002301002300fe2300002300022300002300fe2300ff230001
|
|
2300ff2300012300fd2300fe2300002300ff2300012302042300012300032300002300ff2300fe2300ff2300042300ff2300fd2300fe2300fe2300ff2300032300002301032300ff2300fe2300002300022300002300012300002300002300002300002300022300002300002300002302012300ff2300002300fe2300ff23
|
|
00032300fe2300002300002300fd2300002300052300002300fe2300002301ff2300fc2300002300022300fb2300002300002300032300002300002300022300022300012300032300022300022302fe2300022300fe2300022300002300012300002300ff2300012300002300ff2300012300022300012300fd2302002300
|
|
ff2300fc2300ff2300002300fe2300ff2300fe2300032300002300002300022300002300012300022301022300012300fd2300fe2300002300002300002300002300002300fd2300ff2300012300002300ff2300002300fe2300032300002300fd2300022302012300002300002300002300022300fe230000230003230000
|
|
2300fd2300fd2300fe2300ff2300032300022301002300002300002300002300062300052300002300032300fd2300002300002300002300002300002300fe2300002302ff2300012300022300002300012300022300fe2300032300002300002300002300fd2300002300002300002301ff2300fe2300ff2300002300fc23
|
|
00022300002300fe2300022300002300022300002300002300032300fe2302022300002300fe2300032300ff2300012300ff2300002300012300ff2300012300002300ff2300fe2300032302002300ff2300012300ff2300012300002300032300ff2300fe2300ff2300012300ff2300012300ff2300002300fe2301002300
|
|
ff2300012300022300fe2300002300002300002300002300fd2300032300022300fd2300002300012302fd2300022300012300ff2300002300fe2300002300022300fc2300022300fe2300002300042300012300002301002300022300012300002300002300002300ff2300fe2300002300022300fd2300012300fd230002
|
|
2300012302ff2300002300fe2300fe2300002300ff2300032300fd2300fe2300022300fd2300002300012300002300002302002300002300002300ff2300032300032300fe2300002300042300fe2300022300002300012300022300fe2300002301002300ff2300fe2300022300002300fe23000023000223000023000023
|
|
00fe2300022300002300002300012302002300ff2300012300022300fe2300022300002300fe2300002300ff2300002300002300fc2300022300fe2301ff2300012300ff2300fe2300002300002300022300012300002300022300fe2300022300002300fe2300ff2302fe2300ff2300032300032300fd2300012300ff2300
|
|
002300032300022300002300fe2300fe2300042300002301fe2300022300fc2300002300022300fe2300ff2300002300fe2300002300ff2300002300002300fe2300022300012302ff2300002300032300032300022300002300002300002300002300002300fc2300002300ff2300fe2300002302002300002300fd230003
|
|
2300022300012300002300002300022300fe2300022300022300fc2300022300fe2300002300042300fe2300022300012301022300002300fe2300fd2300022300002300fc2300002300ff2300fd2300002300012300032300002300002302002300002300002300022300022300fe2300002300002300fe23000223000523
|
|
00fe2300022300fe2300002300ff2301012300ff2300fe2300002300022300fc2300022300002300002300002300002300022300fe2300002300022302fe2300fe2300fd2300002300002300022300fe2300022300002300fd2300032300012300042300002300002301032300002300012300002300ff2300fe2300002300
|
|
002300ff2300fe2300022300fc2300022300022300012302ff2300002300002300012300ff2300012300fd2300fe2300002300042300fe2300fe2300022300fd2300002302012300ff2300012300ff2300fe2300022300002300012300002300022300fe2300002300fd2300032300ff2300002301002300002300002300fe
|
|
2300ff2300012300002300002300032300002300ff2300012300ff2300012300002302022300fe2300002300ff2300fe2300022300012300002300ff2300fe2300002300022300fe2300002300002301ff2300fb2300012300022300022300fe2300002300002300022300012300032300022300fd2300012300022302fe23
|
|
00022300002300fe2300ff2300012300022300002300fe2300002300002300022300fe2300002300002301022300022300032300fe2300ff2300012300022300012300002300002300ff2300fe2300ff2300012300ff2300fc2302002300002300002300fd2300ff2300012300022300012300002300022300052300fe2300
|
|
002300032300002302ff2300032300002300fe2300fd2300002300002300022300002300002300fe2300002300022300012300ff2301012300ff2300fd2300002300002300002300fe2300022300fc2300ff2300fd2300fe2300fe2300042300fc2302002300ff230003230000230005230003230002230000230001230000
|
|
2300022300fd2300002300002300002300fb2301002300fe2300ff2300002300032300012300ff2300002300032300022300fe2300032300032300ff2300fe2302022300012300ff2300fd2300012300ff2300fe2300022300012300ff2300002300012300032300fc2300012300ff2300fc2300fd2300fd23000023020023
|
|
00fd2300032300002300022300032300002300052300012300002300022300fe2300032300fc2300002301012300fd2300002300fe2300002300022300fd2300012300022300fe2300022300fe2300002300ff2300002302002300fe2300022300002300012300022300fd2300002300002300032300002300fe2300002300
|
|
ff2300012300fd2301032300022300022300002300032300002300002300002300012300002300fd2300032300fc2300002300002302002300002300002300012300022300fd2300032300fe2300ff2300012300002300ff2300fe2300002300002301002300002300032300fd2300fe2300022300022300fe2300fe230005
|
|
2300fd2300002300022300012300002302ff2300002300fe2300fe2300002300042300002300fc2300022300022300032300032300002300032300002300fe2302002300002300002300fc2300fe2300fd2300fe2300fd2300ff2300012300ff2300002300032300fe2300022301fe2300002300ff23000023000023000023
|
|
00fe2300002300002300032300002300002300002300022300012302ff2300002300002300fe2300032300022300fe2300ff2300fe2300022300002300fd2300002300fb2300fd2301002300fd2300012300002300022300fe2300032300032300052300032300022300002300002300022300012302002300ff2300002300
|
|
fc2300042300fe2300fe2300ff2300fe2300fa2300fd2300fe2300002300022300002300002301002300012300002300032300ff2300012300ff2300fd2300002300fe2300022300002300002300002300fe2302002300032300002300002300ff2300012300ff2300002300002300032300fe230002230003230000230000
|
|
2302002300032300002300ff2300fe2300fe2300fc2300012300022300012300002300022300002300052300002301002300fe2300ff2300fe2300002300002300fe2300ff2300032300022300fe2300002300002300fe2300002302002300002300ff2300002300032300032300ff2300fe2300032300002300002300fd23
|
|
00002300002300002300032300002300ff2300002300002300002301012300002300022300fd2300002300fc2300022300fe2300052300022300002300002300002300012300002302002300002300002300ff2300002300002300012300ff2300012300022300fe2300fd2300ff2300fe2300002302fd2300fe2300032300
|
|
002300022300fe230002230003230000230008230003230003230005a0008da00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa340\sl240\tx1140 \f21\fs20 Figure 9.3\tab A typical base composition plot. This is an A+T plot for bacteriophage Lambda and shows that one half is A+T rich and the other G+C rich.\par
|
|
\pard\plain \s6\sb240\sa100\sl280\tx560\tx860 \b\f20 2.6\tab Searching for anomalous compositions\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This "search" is performed by comparing a standard composition against each segment of the sequence and plotting the difference. The difference between the observed and expected composition at each point is expressed as the chi-square value.
|
|
Any one of the base, dinucleotide or trinucleotide compositions can be used as the standard. No expected level of divergence is used so the program always displays the results so that the plots fill the alloted space on the screen. At the end the observed
|
|
range is displayed.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot dinucleotide composition differences as chi squared". Alternatively select base or trinucleotides.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Start". Define the position of the first base to be used in the standard.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "End". Define last base of the standard. The default standard region is the whole sequence.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Odd window length". \par
|
|
5.\tab Define "Plot interval".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 9.4\par
|
|
\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw448\pich119
|
|
06f6ffffffff007601bf1101a0008201000affffffff007601bf0900000000000000003100000000007501be98002400000000004e012000000000004e011f00000000007501be000102dd0006007fdfff00fc0a0040fc000002e50000040a0040fc000002e50000040a0040fc000003e50000040a0040fc000003e5000004
|
|
0a0040fc000003e50000040b0040fc00010380e60000040b0040fc00010280e60000040b0040fc00010280e60000040b0040fc00010240e60000040b0040fc00010240e60000040d0040fc0003027ffff8e80000040d0040fc000302000008e80000040d0040fc000302000008e80000040d0040fc000302000008e8000004
|
|
0d0040fc000302000008e80000040d0040fc000302000008e80000040d0040fc000304000008e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040d0040fc000304000004e80000040e
|
|
0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc00030400
|
|
0004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000004e9000102040e0040fc000304000002e9000102040f0040fc000304000002ea00028002040f0040fc000304000002ea00028002040f0040fc000304000002ea0002a006040f0040fc000304000002ea00
|
|
02a00604130040fc000304000002f6000080f60002a00604130040fc000304000002f6000080f60002a00604130040fc000304000002f6000080f60002a00544140040fc000308000002f700010180f60002a00564150040fc000308000002f700010180f70003017005641a0040fc000308000002fd000003fc00010180f8
|
|
000420015005441a0040fc000308000002fd000003fc00010140f8000420015005441a0040fc000308000002fd000005fc00010240f8000420015005441e0040fe000620000800000202fe000005fd0002400240f8000430015005841e0040fe000620000800000202fe000005fd0002400240f8000430015005841f0040fe
|
|
000620000800000202fe00010480fe0002400240f8000450015005841f0040fe0011200008000001060002020480000001c00240f8000450015005841f0040fe0011300008000001060002020480000001c00240f800045001100584230040fe0011300008000001050002020480000001c00440fd000018fd000450011005
|
|
84230040fe0011300008000001050006060480020001a00440fd000014fd00044801080584230040fe0011300008000001050006060480020001200440fd000017fd0004c801080984241540018000500008000001090006060880060001200440fd0009110008800088020809842415400160005000080000010900050608
|
|
80050002200420fd0009110008800108020809042523400120005000080000010900050608800500022004200180080021000880010802080904252340012000500008000001110005050880050002200420028008002080088001080208090425234002200050001000000111000905888005040221842002800810208018
|
|
a001080208090425234002200050001000000111000905888009040221882002800810608018a00108020a1104250640022000500010fe001990800909884008870211882004401410608015600108020a1004250640021000500010fe001990800989884008950211882004401430808075600105020a1004250640021000
|
|
500010fe001990800889484008f90212882004401428808045500105020a1004250640021000500010fe0019a080088948400809020a88200440142880804550010502061004250640021800900010fe0019a080108948400808840a48100440142880804550010704051004250640041800880050fe0019a08010b9504090
|
|
08840a4810044012ac8040455101070405a004250640041802880050fe0019a080106050409000840a5010042022c50040835101068405a00425064004084b0800d0fe0019a08010605048900094065010082021430040821281028405a00425064004064d0dffa0fe0019a04010403048d000b40450104831e1030044800e
|
|
8900840560042506780406b4020020fe00196040204020555000b804700c48124003004480088900840560042506480805b4020020fe000d40402040203560006804100ca80cfe00087a80085600880040042203444805b0fb000d405fa0000033600048000003b80cfe00080a80005600d000400406007fdfff00fc02dd00
|
|
a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 9.4\tab An anomalous composition plot. This shows an immunoglobulin switch region and the plateau corresponds to a segment composed entirely of A and G bases.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.7\tab Search for anomalous word usage\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This function is designed to examine the abundances of short words in a nucleotide sequence to see if particular ones are either under or over repre
|
|
sented (3). It compares the observed and expected frequencies and plots them for each segment of the sequence. There has been some work on the relative abundances of CG dinucleotides in eukaryotic sequences (e.g. reference 4) and this routine can be used t
|
|
o examine such biases or any others that might be of interest.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot observed-expected word usage".\par
|
|
2.\tab Define "String". That is the word to search for. The default is CG.\par
|
|
3.\tab Define "Odd window length".\par
|
|
4.\tab Define "Plot interval".\par
|
|
5.\tab Define "Maximum plot value". Define the maximum expected value for the plot.\par
|
|
6.\tab Define "Minimum plot value".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 9.5.\par
|
|
\pard\plain \ri-60\sb200\sl220\keepn \f4\fs16 {\plain \fs16 {\pict\macpict\picw453\pich122
|
|
0800ffffffff007901c41101a00082a0008c01000affffffff007901c4070000000022000100010000a000a0a100a400020de801000a000000000000000007000100012200770001008a23000021000101c32300002300762300002100770001230000a000a301000affffffff007901c423008a21000101c3230076210077
|
|
0001a000a120003b0001003b01c322003b00011a082300022302002300fe2302022300fe2301fe2300022300fe2302022300fe2301022300002302002300002300fe2302022300002301022300002302fc2300042300fe2301fe2300ff2302012300022301002300fd2300002302002300002302002300002301fe23000223
|
|
00002302fe2300022301fd2300032300002302fd2300002302fe2300002300002301ff2300002302fe2300002301002300002300022302012300022301002300012302002300002300022302002300002301002300002302002300002301002300012300022302fe2300022302fe2300002301002300022300002302fe2300
|
|
022301002300002300fe2302022300002301002300002300022302002300012302002300022301002300002300012302002300022301fe2300002302022300002300002302002300002301002300032302002300002300ff2301012300002302ff2300002301002300002300012302ff2300002302012300002301022300fe
|
|
2300002302022300fe2300002301fd2300002302022300002302012300002300ff2301fc2300002302002300ff2301012300002300002302022300fd2302002300fe2301ff2300032302fe2300002300022301002300012302002300002301ff2300002300002302002300012302ff2300012301ff2300fe23000223020123
|
|
00002301002300022300002302fe2300022302fd2300fe2300002301022300002302002300002301002300fe2300022302002300012301002300022302fe2300002300002302002300002301002300022302002300002300002301002300002302002300002302002300032300fb2301002300022302002300022301012300
|
|
002300022302002300002300002301012300022302012300ff2302002300002300fe2301002300002302002300002301002300022300002302002300fe2302022300012301ff2300002302fe2300ff2300002301fe2300022302fe2300002301ff2300002300002302002300fe230200230000230100230000230002230201
|
|
2300ff2300012301022300002302fe2300002302002300002300002301022300fe2302fd2300002301002300002300022302002300fe2301022300012302022300002300fe2302022300002301002300fe2302ff2300002300012301ff2300042302002300002302002300ff2300012301fd2300002302022300002301fe23
|
|
00002300002302002300002300002302002300002301002300002302002300ff2300002301fe2300022302fe2300022301fe2300fe2300002302002300022302fe2300022301fe2300ff2302fe2300022300002301002300002302002300012302022300002300fe2301022300fe2302fd2300022301002300fe2300002302
|
|
022300002300002301012300ff2302fe2300032302002300ff2300002301012300ff2302fe2300002301002300002300002302ff2300012302002300ff2301002300012300002302fd2300022301002300fe2302032300002300002301022300fe2302022300012302ff2300002300012301ff230000230200230000230000
|
|
2301002300012302ff2300002300012302002300ff2301002300002302012300002300022301002300002302002300002301022300002302012300002300002302ff2300012301022300fe23020023000023000023010023000223020023000123020223000123000023010223000023020023000023010023000023000223
|
|
02012300002300022301002300002302002300012302002300002300ff2301002300002302002300002301012300ff2300002302002300002302012300002301002300002300002302002300002301ff2300002302002300012300002301ff2300fe2302032300ff2302002300002300002301002300002302002300fe2300
|
|
002301022300012302002300002300ff2302002300002301002300002302002300002300012301002300002302002300002302022300fd2301fe2300022300012302ff2300012301002300ff2302002300012300fd2301022300fe2302002300002302022300002300002301fe2300002302fd2300fe2300002301022300fe
|
|
2302002300002300022302002300032301ff2300002302002300012300ff2301012300002302032300002301022300022300fb23020123000023020023000023010023000223000023020023000023010223000023020023000023000123020023000023010223000023020023000023000023010023000023000023020023
|
|
00012301002300002302ff2300012300002302022300002301fe2300022302002300012300002301042300002302002300012302002300002301002300002300002302002300002301002300ff2302fe2300fe2300002301ff2300002302032300fe2302022300002300fe2301ff2300002302002300002300fe2301002300
|
|
002302002300fd2300ff2302002300012301fb2300002302002300fd2300ff2301fe2300032302ff2300002301fe2300fd2300012302ff2300012302fd2300ff2301002300002300fe2302ff2300fe2301002300002302002300fe2300002302002300002301002300ff2302002300fe2300ff2301fe2300fd2300002302fe
|
|
2300ff2302fe230000a0008da00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 9.5\tab
|
|
A plot of anomalous word usage. This shows a plot of CG usage for the Human CMV immediate-early region. The frequency of CG is much lower than would be expected from the composition.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.8\tab Calculate codon constraint\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This method measures the level of constraint imposed on a sequence by coding for a protein. The codon constraint is the difference between the observe
|
|
d codon improbability and the mean improbability for a sequence of the same composition. That is it is a measure of the codon bias and the program performs the calculation over windows of length 99 codons. See reference 5. The user can select segments to a
|
|
nalyse either by defining them on the keyboard or by using an EMBL/GenBank feature table. The result for each selected segment, which is simply a single number, is displayed.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa80\sl280\tx560 \f20 1.\tab Select "Calculate codon constraint".\par
|
|
2.\tab Accept "Define segments using keyboard".\par
|
|
3.\tab Define "From". The start of the segment.\par
|
|
4.\tab Define "To". The end of the segment.\par
|
|
5.\tab Accept "+ strand".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The result will be displayed, and the program will ask for the next segment to be defined. \par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.9\tab Searching for stem-loop structures\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This routine finds simple putative stem-loop structures having a minimum number of base pairs in their stems. Results can be listed or plotted.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search for hairpin loops".\par
|
|
2.\tab Define "Minimum loop size".\par
|
|
3.\tab Define "Maximum loop size".\par
|
|
4.\tab Define "Minimum number of base pairs"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Reject "Plot results". The alternative writes out the stem-loops as shown in figure 9.6. The plotted output marks the position of each stem, the height of the mark showing the length of the stem.\par
|
|
\pard\plain \li3480\ri3940\sb200\sl220\box\brsp100\brdrth \f4\fs16 g\par
|
|
\pard \li3480\ri3940\sl220\box\brsp100\brdrth g.t\par
|
|
t.g\par
|
|
c-g\par
|
|
a-t\par
|
|
t.g\par
|
|
t.g\par
|
|
g-c\par
|
|
t.g\par
|
|
g.t\par
|
|
g.t\par
|
|
t.g\par
|
|
t.g\par
|
|
g-c\par
|
|
t.g\par
|
|
tggcga gttttaa\par
|
|
\pard \li3480\ri3940\sl220\keepn\box\brsp100\brdrth 843\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 9.6\tab A typical textual display from the routine for finding simple hairpin loops.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.10\tab Searching for long range inverted repeats\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This method finds inverted repeats. It allows for no mismatches, insertions or deletions within the matching segments.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find long range inverted repeats".\par
|
|
2.\tab Accept "Plot results". The alternatve lists out all the matching segments.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Start". The beginning of the region to analyse. In general the whole sequence will be analysed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "End".\par
|
|
5.\tab Define "Minimum inverted repeat". The length of the minimum match.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The results will now be plotted in an unusual way as shown in figure 9.7 in which the positions of matching segments are joined by rectangular lines.\par
|
|
\pard\plain \li100\sb200\sl220\keepn\box\brsp20\brdrth \f4\fs16 {{\pict\macpict\picw445\pich118
|
|
0448ffffffff007501bc1101a0008201000affffffff007501bc0900000000000000003100000000007401bb98001e00000000003d00f000000000003d00ec00000000007401bb000102e3000701001fe6ff00c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c0070100
|
|
18e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c007010018e60000c00a00
|
|
7ff1ff00c0f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00b014018f2000040f60000c00e007ff5ff00e0fe000040f60000c00f017818fb0000
|
|
01f4ff00f0fb0000c019017818fb000501c1800000e0fe000040fe00017030fb0000c019017818fb000501c1800000e0fe000040fe00017030fb0000c019017818fb000501c1800000e0fe000040fe00017030fb0000c01502781807f7ff00e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040
|
|
fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1
|
|
800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01a02781804fc000501c1800000e0fe000040fe00017030fb0000c01102781804fc000001f5ff01f030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c180
|
|
0000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01a02781804fc000e01c1800000f0006000400008007030fb0000c01c1678
|
|
1804000007ffffe1c1800000f0006000400008007030fb0000c01c167ffffc000007ffffe1c1800000f0006000400008007030fb0000c01c16781804000007ffffe1c1800000f0006000400008007030fb0000c01c16781804000007ffffe1c1800000f0006000400008007030fb0000c01c1678180407fe07ffffe1c18000
|
|
00f0006000400008007030fb0000c01c1678180407fe07ffffe1c1800000f0006000400008007030fb0000c01c1678180407fe07ffffe1c1800000f0006000400008007030fb0000c002e300a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl20\tx1140 \f21\fs20 Figure 9.7\tab
|
|
A plot of direct or inverted repeats. Each matching segment is joined by a rectangular line. Here we show the direct repeats of at least 25 bases in a mouse immunoglobulin switch region.\par
|
|
\pard\plain \s6\sb120\sa40\sl280\tx560\tx860 \b\f20 2.11\tab Searching for long range repeats\par
|
|
\pard\plain \s4\qj\sa120\sl260 \f20 This method finds direct repeats. It allows for no mismatches, insertions or deletions within the matching segments.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa80\sl260\tx560 \f20 1.\tab Select "Find long range repeats".\par
|
|
2.\tab Accept "Plot results". The alternatve lists out all the matching segments.\par
|
|
\pard \s7\qj\fi-560\li560\sa80\sl260\tx560 3.\tab Define "Start". The beginning of the region to analyse. In general the whole sequence will be analysed.\par
|
|
\pard \s7\qj\fi-560\li560\sa80\sl260\tx560 4.\tab Define "End".\par
|
|
5.\tab Define "Minimum repeat". The length of the minimum match.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 The results will now be plotted in an unusual way as shown in figure 9.7 in which the positions of matching segments are joined by rectangular lines.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.12\tab Searching for repeated words\par
|
|
\pard\plain \s7\qj\sa120\sl260\tx540 \f20 \tab This function can be used to examine the frequencies of repeated words within a sequence. It finds all words that occ
|
|
ur more than once. A "word" is a particular sequence of bases so we are dealing only with exact repeats. The user selects a minimum word length and the program finds all words of that length that occur more than once. Then it "follows" each repeated word u
|
|
ntil it becomes unique. For each word length it can report the number of different repeated words, the number of occurrences of each word, and their actual sequences and positions.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 1.\tab Select "Examine repeats".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 2.\tab Define "Minimum word length". The maximum expected and observed word lengths are displayed.\par
|
|
3.\tab Define "Minimum word length for display of repeated word frequencies". The number of different repeated words of each length is listed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 4.\tab Define "Minimum frequency for display of repeated words". \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 5.\tab Define "Minimum word length for display of repeated words". All words occurring this number of times and of this given word length will be displayed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 \par
|
|
\pard\plain \sl220\box\brsp100\brdrth \f4\fs16 {\f22\fs18 Expected length of longest repeat 12\par
|
|
}\pard \sl220\box\brsp100\brdrth {\f22\fs18 ? Minumim word length (1-6) (6) = \par
|
|
Working\par
|
|
Memory used in bytes 75164. Length of longest repeat 13\par
|
|
? Show repeat frequencies for words of at least length (6-13) (13) = 10\par
|
|
For length 10 the number of different repeated words is 86\par
|
|
For length 11 the number of different repeated words is 21\par
|
|
For length 12 the number of different repeated words is 5\par
|
|
For length 13 the number of different repeated words is 2\par
|
|
? Show repeats for words of length (6-13) (13) = 10\par
|
|
? Show repeats for words occuring with frequency (2-9999) (2) = 3\par
|
|
aaggcatcat\par
|
|
occurs at 276\par
|
|
occurs at 969\par
|
|
occurs at 6938\par
|
|
gtctggcggc\par
|
|
occurs at 1891\par
|
|
occurs at 4714\par
|
|
occurs at 7250\par
|
|
? Show repeats for words of length (6-13) (13) = 12\par
|
|
? Show repeats for words occuring with frequency (2-9999) (2) = \par
|
|
gttactggtggt\par
|
|
occurs at 641\par
|
|
occurs at 851\par
|
|
aaaggcatcatg\par
|
|
occurs at 968\par
|
|
occurs at 6937\par
|
|
aaggcatcatgg\par
|
|
occurs at 969\par
|
|
occurs at 6938\par
|
|
ttactggtggtg\par
|
|
occurs at 642\par
|
|
occurs at 852\par
|
|
ctgctgggccgt\par
|
|
occurs at 3477\par
|
|
occurs at 6424\par
|
|
}\pard \sl220\box\brsp100\brdrth {\f22\fs18 ? Show repeats for words of length (6-13) (13) =!\par
|
|
}\pard \sl220 {\f22\fs18 \par
|
|
}{\f22\fs20 Figure 9.8 Typical output from "Examine repeats".\par
|
|
}\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 \par
|
|
2.13\tab Searching for possible Z DNA\par
|
|
\pard\plain \s4\qj\sa60\sl260 \f20
|
|
The program contains three algorithms for searching for sequences with the potential for forming Z DNA. In varying ways they look for segments of alternating purines and pyrimidines and they all plot their results. A typical result is shown in figure 9.9.
|
|
\par
|
|
\pard\plain \sl220\keepn \f4\fs16 {{\pict\macpict\picw447\pich119
|
|
0512ffffffff007601be1101a0008201000affffffff007601be0900000000000000003100000000007501bd98002400000000004e012000000000004e011f00000000007501bd000102dd0006007fdfff00fc060040df000004060040df000004060040df000004060040df000004060040df000004060040df0000040600
|
|
40df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040
|
|
df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df0000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001
|
|
fb000080f40000040e0040f4000001fb000080f40000040e0040f4000001fb000080f40000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080
|
|
f600022000041202400040f6000001fb000080f600022000041202400040f6000001fb000080f600022000041702400040f6000001fb000380000004fb000440002000041702400040f6000001fb000380000004fb000440002000041702400040f6000001fb000380000004fb000440002000041702400040f6000001fb00
|
|
0380000004fb000440002000041802400040f600010180fc0003c0000004fb000440002000041802400040f600010180fc0003c0000004fb000440002000041802400040f600010180fc0003c0000004fb0004400020000421044000400020fc0005020002000181fc0006c0004004000440fe000440003084142104400040
|
|
0020fc0005020002000181fc0006c0004004000440fe0004400030841421044000400020fc0005020002000181fc0006c0004004000440fe0004400030841421044000400020fc0005020002000181fc0006c0004004000440fe0004400030841422044000c00030fc0005020003000281fd0007014000c006000440fe0004
|
|
600051843c22044000c00030fc0005020003000281fd0007014000c006000440fe0004600051843c22044000c00030fc0005020003000281fd0007014000c006000440fe0004600051843c23044000c01430fc001903004300028181020042014000c0060146600040006404d3563c23044000c01430fc0019030043000281
|
|
81020042014000c0060146600060006404d3563c23044000c01c30fc001903004300028181020042014000c00601e6600060006406d3563c23044000c01c30fc001903004300028181020042014000c00601e6600050006406d3563c23044000c01e28fc0019030062800282818600a3014000c00601e6a00088006406d55e
|
|
3c23044000c01628fc0019030062800282818600a3014000c0060156a00088006405d55e3c23044000c01628fc0019030062800282818600a3014000c0060156a00084006405d55e3c20045ffffff7effaff03fefffefefeff02bfff7ffdff095fbfff87fffffddd7ffc060050df000004060060df000004060060df000004
|
|
060060df000004060040df00000406007fdfff00fc02dd00a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 9.9\tab A plot of predictions for potential Z DNA containing some high peaks produced by regions of alternating purines and pyrimidines.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Whenever the program reads a sequence file it always displays the base composition to provide the user with a check on the correctness of the file.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab
|
|
The search for anomalous words function operates in the following way. Users select a "word" - say CG and a window length. The program examines each successive window length along the sequence, with each window overlapping the previous one by windowleng
|
|
th-1 bases. For each window position the program calculates the base composition and the number of
|
|
occurrences of the chosen word. From the base composition it calculates an expected number of occurrences of the chosen word by simply multiplying the relevent frequencies and assuming random ordering. It plots observed - expected hence showing regions tha
|
|
t are enriched or depleted in the chosen word.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
The codon constraint calculation offers a measure of the codon bias that is independent of any set tables of expected codons. Although some users may find the underlying mathematics difficult to understand
|
|
the values obtained provide an interesting measure. It was shown (5) for a set of {\i E. coli} genes that their values of codon constraint correlated with their levels of expression. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab The algorithm for finding possible stem loops counts A-T, G-C and G-T pairs as matching but will only find stems with no mismatches or loopouts.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab The long range inverted and direct repeat routines are fast but only find exact matches. More flexible and exhaustive methods are described in the chapter on sequence comparisons.\par
|
|
6.\tab It is also possible to use the pattern searching routines to define and search for inverted and direct repeats. They are particularly useful for finding specific structures - for example tRNA folds.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl260\tx560 7.\tab
|
|
It is possible that the "Examine repeats" algorithm may run out of memory, particularly if a short minimum word length is chosen or the sequence is very long or very repetitive. If this occurs the maximum word length reported may not be the longest in t
|
|
he sequence\: the memory will have been consumed before it was found.\par
|
|
\pard\plain \s5\sb320\sa60\sl320\tx560 \b\f20\fs28 \page 4.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab McCaldon,P. and Argos,P. 1988 Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences. {\i Proteins} {\b 4}, 99-122.\par
|
|
2.\tab Sweet,R.M. and Eisenberg,D. 1983. Correlation of sequence hydrophobicity measures similarity in three-dimensional protein structure. {\i J. Mol. Biol}. {\b 171}\:479-488.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Honess,R.W., Gompels,U.A., Barrell,B.G., Craxton,M., Cameron,K.R., Staden,R., Chang,Y.-N and Hayward,G.S. 1989 Deviations from expected frequencies
|
|
of CpG dinucleotides in herpesvirus DNAs may be diagnostic of differences in the states of their latent genomes. {\i J. Gen. Virol}, {\b 70}, 837-855.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Bird,A.P. 1980 DNA methylation and the frequency of CpG in animal DNA. {\i Nucl. Acids Res}. {\b 8}, 1499-1504.\par
|
|
5.\tab McLachlan, A.D., Staden, R., and Boswell, D.R. 1984. A method for measuring the non-random bias of a codon usage table. {\i Nucl. Acids Res}. {\b 12}\:9567-9575.\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 10. Translating and Listing Nucleic Acid Sequences\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Listing the sequence with all six reading frames translated\par
|
|
2.2\tab Listing the sequence with its open reading frames translated\par
|
|
2.3\tab Listing the sequence with defined segments translated\par
|
|
2.4\tab Listing the sequence with translated segments defined from a feature table\par
|
|
2.5\tab Producing a file of protein sequences for all open reading frames.\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.6\tab Producing a file of protein sequences for segments defined from a feature table\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we deal with producing simple listings from nucleotide seque
|
|
nces. All functions are contained in the program NIP. We can list the sequence alone, in single or doubled stranded format or with translations to protein. The translations can be of all six phases, all open reading frames, or of specified segments. The p
|
|
ositions of these segments can be defined on the keyboard or read from a EMBL/GenBank feature table. Translations can use the one letter or three letter codes. In addition we can produce files containing only the protein translations, and which are suitabl
|
|
e for processing by other programs. Again the positions of the translated segments can be defined on the keyboard, read from a feature table, or be all open reading frames. For the user, producing all these results is very simple, so we only give examples
|
|
of "methods" and show what the results look like. All outputs that list the sequence can be produced from the menu option named "Translate and list".\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Listing the sequence with all six reading frames translated\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par
|
|
2.\tab Accept "Show translation".\par
|
|
3.\tab Select "The segments to translate will be "All six frames"".\par
|
|
4.\tab Accept "Use 1 letter codes".\par
|
|
5.\tab Define "Start". Where to list from.\par
|
|
6.\tab Define "End". Where to list to.\par
|
|
7.\tab Define "Line length". The number of characters in each line of output.\par
|
|
8.\tab Reject "Number ends of lines". This alternative writes the positions underneath each line.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The listing will then appear. Given the choices taken it will look the same as figure 10.1.\par
|
|
\pard\plain \li1240\ri1280\sb200\sl220\box\brsp100\brdrth \f4\fs16 Q D Y I G H H L N N L Q L D L R T F S L\par
|
|
\pard \li1240\ri1280\sl220\box\brsp100\brdrth R I T * D T T * I T F S W T C V H S R W\par
|
|
G L H R T P P E * P S A G P A Y I L A\par
|
|
caggattacataggacaccacctgaataaccttcagctggacctgcgtacattctcgctg\par
|
|
1010 1020 1030 1040 1050 1060\par
|
|
gtcctaatgtatcctgtggtggacttattggaagtcgacctggacgcatgtaagagcgac\par
|
|
L I V Y S V V Q I V K L Q V Q T C E R Q\par
|
|
P N C L V G G S Y G E A P G A Y M R A P\par
|
|
S * M P C W R F L R * S S R R V N E S\par
|
|
\par
|
|
V D P Q N P P A T F W T I N I D S M F F\par
|
|
W I H K T P Q P P S G Q S I L T P C S S\par
|
|
G G S T K P P S H L L D N Q Y * L H V L\par
|
|
gtggatccacaaaaccccccagccaccttctggacaatcaatattgactccatgttcttc\par
|
|
1070 1080 1090 1100 1110 1120\par
|
|
cacctaggtgttttggggggtcggtggaagacctgttagttataactgaggtacaagaag\par
|
|
H I W L V G W G G E P C D I N V G H E E\par
|
|
P D V F G G L W R R S L * Y Q S W T R R\par
|
|
T S G C F G G A V K Q V I L I S E M N K\par
|
|
\par
|
|
S V V L G L L F L V L F R S V A K K A T\par
|
|
R W C W V C C S W F Y S V A * P K R R P\par
|
|
L G G A G S V V P G F I P * R S Q K G D\par
|
|
tcggtggtgctgggtctgttgttcctggttttattccgtagcgtagccaaaaaggcgacc\par
|
|
1130 1140 1150 1160 1170 1180\par
|
|
agccaccacgacccagacaacaaggaccaaaataaggcatcgcatcggtttttccgctgg\par
|
|
R H H Q T Q Q E Q N * E T A Y G F L R G\par
|
|
P P A P D T T G P K I G Y R L W F P S W\par
|
|
E T T S P R N N R T K N R L T A L F A V\par
|
|
\par
|
|
S G V P G K F Q T A I E L V I G F V N G\par
|
|
A V C Q V S F R P R L S W * S A L L M V\par
|
|
Q R C A R * V S D R D * A G D R L C * W\par
|
|
agcggtgtgccaggtaagtttcagaccgcgattgagctggtgatcggctttgttaatggt\par
|
|
1190 1200 1210 1220 1230 1240\par
|
|
tcgccacacggtccattcaaagtctggcgctaactcgaccactagccgaaacaattacca\par
|
|
A T H W T L K L G R N L Q H D A K N I T\par
|
|
R H A L Y T E S R S Q A P S R S Q * H Y\par
|
|
\pard \li1240\ri1280\sl220\keepn\box\brsp100\brdrth L P T G P L N * V A I S S T I P K T L P\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 10.1\tab A six phase translation using the 1 letter codes\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Listing the sequence with its open reading frames translated\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par
|
|
2.\tab Accept "Show translation".\par
|
|
3.\tab Select "The segments to translate will be "Open reading frames"".\par
|
|
4.\tab Define "Minimum open frame in amino acids".\par
|
|
5.\tab Accept "Use 1 letter codes".\par
|
|
6.\tab Define "Start". Where to list from.\par
|
|
7.\tab Define "End". Where to list to.\par
|
|
8.\tab Define "Line length". The number of characters in each line of output.\par
|
|
9.\tab Select "Both strands"\par
|
|
10.\tab Accept "Number ends of lines".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 A typical result is shown in figure 10.2.\par
|
|
\pard\plain \li720\ri680\sb200\sl220\box\brsp100\brdrth \tx7780 \f4\fs16 Q D Y I G H H L N N L Q L D L R T F S L\par
|
|
\pard \li720\ri680\sl220\box\brsp100\brdrth \tx7780 caggattacataggacaccacctgaataaccttcagctggacctgcgtacattctcgctg\tab 1060\par
|
|
. \: . \: . \: . \: . \: . \:\par
|
|
gtcctaatgtatcctgtggtggacttattggaagtcgacctggacgcatgtaagagcgac\par
|
|
L I V Y S V V Q I V K L Q V Q T C E R Q\par
|
|
* S S R R V N E S\par
|
|
\par
|
|
V D P Q N P P A T F W T I N I D S M F F\par
|
|
gtggatccacaaaaccccccagccaccttctggacaatcaatattgactccatgttcttc\tab 1120\par
|
|
. \: . \: . \: . \: . \: . \:\par
|
|
cacctaggtgttttggggggtcggtggaagacctgttagttataactgaggtacaagaag\par
|
|
H I W L V G W G G E P C D I N V G H E E\par
|
|
T S G C F G G A V K Q V I L I S E M N K\par
|
|
\par
|
|
S V V L G L L F L V L F R S V A K K A T\par
|
|
tcggtggtgctgggtctgttgttcctggttttattccgtagcgtagccaaaaaggcgacc\tab 1180\par
|
|
. \: . \: . \: . \: . \: . \:\par
|
|
agccaccacgacccagacaacaaggaccaaaataaggcatcgcatcggtttttccgctgg\par
|
|
R H H Q T Q Q E Q N * E T A Y G F L R G\par
|
|
E T T S P R N N R T K N R L T A L F A V\par
|
|
\par
|
|
S G V P G K F Q T A I E L V I G F V N G\par
|
|
agcggtgtgccaggtaagtttcagaccgcgattgagctggtgatcggctttgttaatggt\tab 1240\par
|
|
. \: . \: . \: . \: . \: . \:\par
|
|
tcgccacacggtccattcaaagtctggcgctaactcgaccactagccgaaacaattacca\par
|
|
A T H W T L K L G R N L Q H D A K N I T\par
|
|
L P T G P L N * V A I S S T I P K T L P\par
|
|
\par
|
|
S V K D M Y H G K S K L I A P L A L T I\par
|
|
agcgtgaaagacatgtaccatggcaaaagcaagctgattgctccgctggccctgacgatc\tab 1300\par
|
|
. \: . \: . \: . \: . \: . \:\par
|
|
tcgcactttctgtacatggtaccgttttcgttcgactaacgaggcgaccgggactgctag\par
|
|
A H F V H V M A F A L Q N S R Q G Q R D\par
|
|
\pard \li720\ri680\sl220\keepn\box\brsp100\brdrth \tx7780 L T F S M Y W P L L L S I A G S A R V I\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa180\sl240\tx1140 \f21\fs20 Figure 10.2\tab A listing showing the translation of open reading frames from both strands of a sequence from position 1001 to 1300\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Listing the sequence with defined segments translated\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par
|
|
2.\tab Accept "Show translation".\par
|
|
3.\tab Select "The segments to translate will be "Typed on the keyboard"".\par
|
|
4.\tab Accept "Use 1 letter codes".\par
|
|
5.\tab Define "Start". Where to list from.\par
|
|
6.\tab Define "End". Where to list to.\par
|
|
7.\tab Define "Line length". The number of characters in each line of output.\par
|
|
8.\tab Select "Both strands".\par
|
|
9.\tab Accept "Number ends of lines".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define "Translate from". Define the start of the next segment to translate - say the next exon.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab Define "Translate to". Define the end of the next segment to translate.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 12.\tab Select "Strand". As both strands have been selected above the program will allow either to be translated for each defined segment.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The program will now cycle around through steps 10, 11 and 12 until a zero value is defined for "Translate from". At which point the listing will appear. Given the choices made it will look the same as figure 10.2.
|
|
\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Listing the sequence with translated segments defined from a feature table\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and list".\par
|
|
2.\tab Accept "Show translation".\par
|
|
3.\tab Select "The segments to translate will be "Read from a feature table"".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Feature table file name". Type the name of the file containing the appropriate feature table in EMBL/GenBank format.\par
|
|
5.\tab Define "Operator". This defines which feature table operators should be employed when selecting the segments to translate.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Use 1 letter codes"\par
|
|
7.\tab Define "Start". Where to list from.\par
|
|
8.\tab Define "End". Where to list to.\par
|
|
9.\tab Define "Line length". The number of characters in each line of output.\par
|
|
10.\tab Select "Both strands"\par
|
|
11.\tab Accept "Number ends of lines".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The program will now read the feature table file and translate the segments defined using the selected operator(s) and the listing will appear as in figure 10.2.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Producing a file of protein sequences for all open reading frames.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and write protein sequences to disk".\par
|
|
2.\tab Reject "Translate selected regions". The alternative is "Open reading frames".\par
|
|
3.\tab Define "Minimum open frame in amino acids".\par
|
|
4.\tab Select "Both strands".\par
|
|
5.\tab Define "File name for translation".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
A typical results file is shown in figure 10.3. It shows that the file is written in FASTA format (i.e. an entry name line starting with a > symbol (here the first entry name is 188, the start of the DNA segment), followed by a title (here in EMBL feature
|
|
table format giving the start and end of the DNA that produced the protein), followed by the sequence terminated by an *.\par
|
|
\pard \s4\qj\sa120\sl280 \par
|
|
\pard\plain \sl220 \f4\fs16 {\f22\fs18 \par
|
|
}\pard \li1260\ri1360\sl220\box\brsp100\brdrth {\f22\fs18 >188 188..733\par
|
|
}\pard \li1260\ri1360\sl220\box\brsp100\brdrth {\f22\fs18 TMEVNKKQLADIFGASIRTIQNWQEQGMPVLRGGGKGNEVLYDSAAVIKWYAERDAEIEN\par
|
|
EKLRREVEELRQASEADLQPGTIEYERHRLTRAQADAQELKNARDSAEVVETAFCTFVLS\par
|
|
RIAGEIASILDGLPLSVQRRFPELENRHVDFLKRDIIKAMNKAAALDELIPGLLSEYIEQ\par
|
|
SG*\par
|
|
>711 711..2633\par
|
|
VNISNSQVNRLRHFVRAGLRSLFRPEPQTAVEWADANYYLPKESAYQEGRWETLPFQRAI\par
|
|
MNAMGSDYIREVNVVKSARVGYSKMLLGVYAYFIEHKQRNTLIWLPTDGDAENFMKTHVE\par
|
|
PTIRDIPSLLALAPWYGKKHRDNTLTMKRFTNGRGFWCLGGKAAKNYREKSVDVAGYDEL\par
|
|
AAFDDDIEQEGSPTFLGDKRIEGSVWPKSIRGSTPKVRGTCQIERAASESPHFMRFHVAC\par
|
|
PHCGEEQYLKFGDKETPFGLKWTPDDPSSVFYLCEHNACVIRQQELDFTDARYICEKTGI\par
|
|
WTRDGILWFSSSGEEIEPPDSVTFHIWTAYSPFTTWVQIVKDWMKTKGDTGKRKTFVNTT\par
|
|
LGETWEAKIGERPDAEVMAERKEHYSAPVPDRVAYLTAGIDSQLDRYEMRVWGWGPGEES\par
|
|
WLIDRQIIMGRHDDEQTLLRVDEAINKTYTRRNGAEMSISRICWDTGGIDPTIVYERSKK\par
|
|
HGLFRVIPIKGASVYGKPVASMPRKRNKNGVYLTEIGTDTAKEQIYNRFTLTPEGDEPLP\par
|
|
GAVHFPNNPDIFDLTEAQQLTAEEQVEKWVDGRKKILWDSKKRRNEALDCFVYALAALRI\par
|
|
SISRWQLDLSALLASLQEEDGAATNKKTLADYARALSGEDE*\par
|
|
>74 complement(74..727)\par
|
|
LFDIFTQQPRYQFIQRGCFVHGFDDIPFQEINMSVFQFRKTPLHRQGEPVENTGNFTCDP\par
|
|
RQHESTECGFHHFSGVSGILQFLCVGLRTRKSMAFVLNSSWLEICLAGLPQFFNLPAQLF\par
|
|
VLNFSIPFGIPFYDGGRVIKHLITLATASQNGHSLFLPVLNGTDTRTENVSQLLFVDFHC\par
|
|
SFHGQKQRKETTEAKKPRFQHLSFPFFSEGILNKNIKL*\par
|
|
>313 complement(313..732) \par
|
|
PDCSIYSLSNPGISSSSAAALFMALMISRFRKSTCRFSSSGKRRCTDRGSPSRILAISPA\par
|
|
IRDSTKVQNAVSTTSAESLAFFSSCASACARVSRWRSYSIVPGWRSASLACRSSSTSRRS\par
|
|
}\pard \li1260\ri1360\sl220\box\brsp100\brdrth {\f22\fs18 FSFSISASLSAYHFMTAAES*\par
|
|
}\pard \li1260\ri1360\sl220 {\f22\fs18 \par
|
|
}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 10.3\tab The contents of a file containing the protein sequences of the open reading frames found by the program\par
|
|
\pard\plain \s6\fi-560\li560\sb240\sa60\sl280\tx560 \b\f20 2.6\tab Producing a file of protein sequences for segments defined from a feature table\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Translate and write protein sequences to disk".\par
|
|
2.\tab Accept "Translate selected regions".\par
|
|
3.\tab Reject "Define segments using keyboard". The alternative is to use a feature table.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Feature table file name". Type the name of the file containing the appropriate feature table in EMBL/GenBank format.\par
|
|
5.\tab Define "Operator". This defines which feature table operators should be employed when selecting the segments to translate.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "File name for translation"\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The program will now read the feature table file and translate the segments defined using the selected operator(s). The results will be stored as in figure 10.3.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab To produce a listing without translation the "Translate and list" function can be used with the "Show translation" option rejected. Alternatively the function "List the sequence" can be used.
|
|
\par
|
|
2.\tab Some users may be confused by the fact that the program asks "Where to list from, and to" and also "Define segments to translate". This allows for 5' and 3' untranslated regions to be included in the listing.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
The feature table file employed by the programs is a simple text file containing the data for the current sequence. Because of the multiplicity of different sequence library formats we have not provided the facility of reading such data directly from li
|
|
braries. The feature tables for individual library entries must be extracted (see the introductory chapter) or files can be created for new sequences.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
|
|
The current feature tables use "operators" such as "join" or "order" to specify which segments should be translated together to make a complete protein sequence. The program allows users to select which ones to employ, the default being "Use all operato
|
|
rs".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab The program contains a function "Set genetic code" which allows users to choose from a menu of codes or to define their own by specifying amino acid and codon pairs. This sets the code for all functions.
|
|
\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 11. Statistical and Structural Analysis of Protein Sequences\par
|
|
\pard\plain \s3\sb200\sa120\sl360 \b\f20\fs32 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Plotting hydrophobicity\par
|
|
2.2 \tab Plotting charge\par
|
|
2.3\tab Plotting hydrophobic moment and hydrophobicity\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700\tx1980 2.4\tab Drawing helical wheels\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.5\tab Producing a Robson secondary structure prediction\par
|
|
2.6\tab Calculating the amino acid composition and molecular weight\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
4.\tab References\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we describe the use of routines for plotting hydrophobicity, charge and hydrophobic moments, drawing helix wheels and predicting second
|
|
ary structure. Use of all these routines is very straightforward and they are contained in the program PIP.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Plotting hydrophobicity\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This method uses the values of Kyte and Doolittle (1)\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot hydrophobicity".\par
|
|
2.\tab Define "Window length".\par
|
|
3.\tab Define "Plot interval".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 11.1.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Plotting charge\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot charge".\par
|
|
2.\tab Define "Window length".\par
|
|
3. \tab Define "Plot interval".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear and will be similar to that shown in figure 11.1.\par
|
|
\pard\plain \sl220\keepn \f4\fs16 {{\pict\macpict\picw448\pich81
|
|
0396ffffffff005001bf1101a0008201000affffffff005001bf0900000000000000003100000000004f01be9800240000000000350120000000000035011f00000000004f01be000102dd0006007fdfff00fc060040df000004060040df000004060040df000004060040df000004060078df000004060040df0000040600
|
|
40df000004060040df000004060040df00000407017840e0000004070140b0e000000407014108e000000407014104e000000407014204e00000040b017a02fc000020e60000040c014202fd00010250e60000040f014402fd00010590e900031000000418014401fd00010490f800010380f8000020fe0003700000041c01
|
|
4801fd00010808f800010480fd000010fd000060fe0003880000041d017801fd00010808f800010440fe00010428fd000090fe00038804000424074801000002000804fe00010110fd00010440fe00010a28fe000801108004010806000424075000800005001004fe000101a8fd00010840fe000d09440000020111800b01
|
|
04090004240c500080000480100200800002a4fd00010840fe000d10c40020030a0a4009010709002425236000800004e0100201400002440020000210400004401002005004960c400882010881e42523780080000810100202300002430050000d1020001ba0200100900490004810820090412425234000800008102001
|
|
02080002008088001120200020104000811004600037f0c20090222406007fdfff00fc2523400040002008e00104032c040022020020c0180080038000420808000020000c00600a14241440004000200500010400c204001201002080080080fe0002420410fd000404004004142113780030004005000084000104001401
|
|
0040000401fd0002220410fb00024004141f13400008004005000098000088000c010040000401fd0002240410f900000c1f134000080040020000e00000900000010040000407fd00021c0220f90000041c05400009008002fc0008600000010e80000208fc000103e0f900000416044000068080fb000040fe0004918000
|
|
0210f2000004110378000441f6000490800003f0f20000040d0340000022f60000a0ee0000040d0340000022f6000060ee000004090340000014e2000004090340000008e2000004060078df000004060040df000004060040df000004060040df000004060040df000004060078df000004060040df000004060040df0000
|
|
04060040df000004060040df00000406007fdfff00fc02dd00a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.1\tab A hydrophobicity plot using the values of Kyte and Doolittle.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Plotting hydrophobic moment and hydrophobicity\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This method plots the hydrophobic moment and the hydrophobicity as defined by Eisenberg {\i et al} (2).\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Plot hydrophobic moment".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Angle". This is the angle between the residues when the helix is viewed end on. The default value of 100 degrees is that found in alpha helices.\par
|
|
3.\tab Define "Window length". The default of 18, if used in conjunction with the default "Angle", is equivalent to 5 turns of the helix.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Plot interval".\par
|
|
\pard\plain \s4\qj\sa120\sl280\tx560 \f20
|
|
The plot will appear as in figure 11.2. with the hydrophobicity shown above the hydrophobic moment. The scale for the hydrophobicity runs from -1.0 to 1.5 and for the hydrophobic moment from 0.0 to 1.5. The program plots the mean values for each window pos
|
|
ition with the value at position x representing the segment from x-window length+1 to x.\par
|
|
\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw447\pich160
|
|
0659ffffffff009f01be1101a0008201000affffffff009f01be0900000000000000003100000000009e01bd9800240000000000670120000000000067011f00000000009e01bd000102dd0006007fdfff00fc060040df000004060070df000004060040df000004060070df000004060040df000004060070df0000040600
|
|
40df000004060070df000004060040df00000406007edf000004060040df000004060070df000004060040df000004060070df000004060040df000004060078df000004060074df0000040a0072fc000008e50000040a0061fc000038e50000040e007ffc000044f2000020f500000413016080fd000084f2000050f90000
|
|
08fe00000414017080fe00010104f2000048f9000016fe00000419016040fe00010202f2000088fe000001fe000502110008000419017030fe00010401f2000084fe000902818000052088140004200c40080001000401e00800000380f900012102fe00090242600004e096120004240c70080002800800101400000c40fd
|
|
000008fe000e320380000004741000040061210004240c40080004800800082200003040fd000034fe000e4a004000001c0c0c00080001210004250c70040004500800042200004020fe00131c440000e04c0020000020000b0e080000c08004250c400400086f10000241f8038010fe001323820001104000100000400000
|
|
91100000808004241e7e02000800f00001800404001100001c4002000e0880000c400080000060e0fd0000041f0340020008fd0012800208000ef000244002001004800003b00080f90000041e0370010010fc000b023800000fc0228001001003fe0002080080f90000041c0340010010fc000101c0fe00053042800100a0
|
|
fd00010801f8000004170370008020f70005084280008160fd00010401f8000004160340008040f700040481000041fc000107c6f8000004150370004440f700040301000021fb000028f8000004110340004a80f300001afb000010f80000041002700031f2000006fb000010f8000004060040df00000406007edf000004
|
|
060040df000004060070df000004060040df000004060070df000004060040df000004060070df000004060040df000004060070df000004060040df00000406007fdfff00fc060040df000004060040df000004060070df000004060040df000004060040df000004060040df000004060070df000004060040df00000406
|
|
0040df000004060070df000004060040df000004060040df000004060070df000004060040df000004060040df000004060040df00000406007edf000004060040df000004060040df000004060070df000004060040df000004060040df000004060070df000004060040df000004060040df000004060040df0000040600
|
|
70df0000040a0040f6000010eb0000040a0040f600002ceb0000040a0070f6000024eb0000040a0040f6000042eb0000040c0040f80002800082eb00000411007efe000040fd000303400082eb000004110040fe0000a0fd000304400101eb000004130040fe0000a0fd0005082002010001ed0000041e044000000110fd00
|
|
070820040100030010f9000040fe000103e0fd0000042204700000011cfd000730100800c0048128fd000480000004c0fe00010410fd00000423044000000204fd000720081000200482c4fe00050380000006a0fe00012808fd00000424044000000402fe0008204008100020088404fe0005024000000920fe00015808fd
|
|
00000424047000001c02fe0008504004100020084404fe0005044000004920fe00018004fd00000425104000002002700000484004200010084802fe000f84400000a820040001000400e00000042523400000200190000048800420001010480200000144420000b0100a018a000403100080042523700001200109000084
|
|
80044000087030020000024826000110105206740002241003000425234001c2c0000a8008848002800008801002000804482508071010b208000003d812040004251b40022440000a601703000280000500000104340428291408000d0108fe0004080d080004241b60041800000411a00200030000070000012a44043019
|
|
240800030108fd0003088800041e077f88100000040a60f9000701d982880000c408fe000090fc00025000041b014050fd00000cf7000601037800008210fe000070fc00025000040e014030ed000103f0f8000220000406007fdfff00fc02dd00a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.2\tab A hydrophobic moment (below) and hydrophobicity plot. The hydrophobicity plot displays the mean va
|
|
lues on a scale of -1.5 to 1.0 and the hydrophobic moment on a scale of 0.0 to 1.5.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Drawing helical wheels\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This method draws helical wheels for any segment of the sequence (3). In addition it displays the hydrophobic moment for the segment (2).\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Draw helix wheel".\par
|
|
2.\tab Define "Angle". The default angle of 100 degrees is that found in alpha helices.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Window length". The default of 18, if used in conjunction with the default "Angle", is equivalent to 5 turns of the helix.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Step
|
|
". To produce a display for a sequence position N bases from the current one type N, and the display will appear in place of the previous one. The default value of N is 1, so by repeatedly hitting carriage return the user can step, residue by residue, thro
|
|
ugh the sequence.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The display for the current position in the sequence will appear as in figure 11.3. and the bell will ring. The program now allows the user to "step through the sequence displaying the helix wheel for each position.
|
|
\par
|
|
\pard\plain \li900\ri960\sb500\sl220\keepn\box\brsp120\brdrth \f4\fs16 {{\pict\macpict\picw355\pich329
|
|
0c64ffffffff014801621101a00082a0008c01000affffffff0148016209000000000000000031010f01050121011338a10096000c010000000200000000000000a1009a0008fffd00000004000001000a01100106011e01112c000c00150948656c76657469636103001504010d000c2e00040000010028011a01070144a0
|
|
0097a0008da0008c01000affffffff0148016231012600ba013800c838a10096000c010000000200000000000000a1009a0008fffc00000004000001000a012700bb013500c628013100bc014ca00097a0008da0008c01000affffffff0148016231011d0087012e009538a10096000c010000000200000000000000a1009a
|
|
0008fffc00000004000001000a011d0088012b009328012700890146a00097a0008da0008c01000affffffff014801623100df004600f1005438a10096000c010000000200000000000000a1009a0008fffd00000004000001000a00e0004700ee00532800ea00480156a00097a0008da0008c01000affffffff0148016231
|
|
0097003900a8004738a10096000c010000000200000000000000a1009a0008fffc00000004000001000a0097003a00a500452800a1003b0159a00097a0008da0008c01000affffffff0148016231006b004d007c005b38a10096000c010000000200000000000000a1009a0008fffc00000004000001000a006b004e007900
|
|
59280075004f014ca00097a0008da0008c01000affffffff01480162310032008a0044009838a10096000c010000000200000000000000a1009a0008fffc00000004000001000a0033008b0041009628003d008c014ba00097a0008da0008c01000affffffff0148016231002b00ba003d00c838a10096000c010000000200
|
|
000000000000a1009a0008fffd00000004000001000a002c00bb003a00c628003600bc0144a00097a0008da0008c01000affffffff0148016231003300f1004500ff38a10096000c010000000200000000000000a1009a0008fffd00000004000001000a003400f2004200fd2b37080148a00097a0008da0008c01000affff
|
|
ffff0148016231005101190063012738a10096000c010000000200000000000000a1009a0008fffd00000004000001000a0052011a006001252b281e0145a00097a0008da0008c01000affffffff014801623100b9014400cb015238a10096000c010000000200000000000000a1009a0008fffc00000004000001000a00b9
|
|
014500c701512b2b67014ba00097a0008da0008c01000affffffff01480162310098014400aa015238a10096000c010000000200000000000000a1009a0008fffc00000004000001000a0099014500a701512800a30146014ba00097a0008da0008c01000affffffff0148016231003e00ba004f00c838a10096000c010000
|
|
000200000000000000a1009a0008fffc00000004000001000a003f00bb004d00c728004900bc0131a00097a0008da0008c01000affffffff014801623100b9013100ca013f38a10096000c010000000200000000000000a1009a0008fffd00000005000001000a00ba013200c8013e2b777b0132a00097a0008da0008c0100
|
|
0affffffff014801623101080090011a009e38a10096000c010000000200000000000000a1009a0008fffc00000004000001000a010900910117009c28011300920133a00097a0008da0008c01000affffffff01480162310075005b0087006938a10096000c010000000200000000000000a1009a0008fffd000000050000
|
|
01000a0076005c00840068280080005d0134a00097a0008da0008c01000affffffff0148016231005c0109006e011738a10096000c010000000200000000000000a1009a0008fffc00000005000001000a005d010a006b0116280067010b0135a00097a0008da0008c01000affffffff014801623100f900fe010b010c38a1
|
|
0096000c010000000200000000000000a1009a0008fffd00000004000001000a00fa00ff0108010b28010401000136a00097a0008da0008c01000affffffff014801623100d5005700e7006538a10096000c010000000200000000000000a1009a0008fffd00000004000001000a00d6005800e400632800e000590137a000
|
|
97a0008da0008c01000affffffff014801623100480093005a00a138a10096000c010000000200000000000000a1009a0008fffc00000005000001000a00490094005700a028005300950138a00097a0008da0008c01000affffffff01480162310098013200a9014038a10096000c010000000200000000000000a1009a00
|
|
08fffc00000004000001000a0099013300a7013e2b9f500139a00097a0008da0008c01000affffffff0148016231010f00b7011c00d038a10096000c010000000200000000000000a1009a0008fffd00000009000001000a011000b8011e00cd28011a00b9023130a00097a0008da0008c01000affffffff01480162310097
|
|
004a00a6006338a10096000c010000000200000000000000a1009a0008fffd00000009000001000a0098004b00a600602800a2004c023131a00097a0008da0008c01000affffffff0148016231004600e3005700f838a10096000c010000000200000000000000a1009a0008fffc00000008000001000a004700e4005500f6
|
|
28005100e5023132a00097a0008da0008c01000affffffff014801623100e2011700f3012c38a10096000c010000000200000000000000a1009a0008fffc00000007000001000a00e3011800f101292b349c023133a00097a0008da10096000c010000000200000000000000a1009a0008fffd0000003a000001000a000000
|
|
00000e007728000a00010d444b464c4544564b4b4c594853a00097a10096000c010000000200000000000000a1009a0008000400000007000001000a00180002003400132b0218044d20200d2a0e0148a00097a10096000c030000000200000000000000a1009a0008000b00000004000001000a0018000d00420031280022
|
|
001a05372e38310d2800300016062d322e39370d2b070e03313532a00097a0008c01000affffffff0148016231003300890045009738a10096000c010000000200000000000000a1009a0008fffd00000005000001000a0034008a00420096296e014ba00097a0008da0008c01000affffffff014801623100f30123010401
|
|
3138a10096000c010000000200000000000000a1009a0008fffd00000005000001000a00f40124010201302b9ac00153a00097a0008d01000affffffff0148016207000000002200bc01210000a000a0a100a4000209fd01000a0000000000000000070001000109ffffffffffffffff22005900bf62632300002100fc009f
|
|
23000023cc8723000021006d00fe2300002100ee00fe2300002100d8006b23000023338723000021009f0120230000239f6323000023a29c23000021005f00e2230000233278230000a000a301000affffffff0148016222005900bf62632100fc009f23cc8721006d00fe2100ee00fe2100d8006b23338721009f0120239f
|
|
6323a29c21005f00e2233278a000a1a10096000c030000000200000000000000a1009a0008fffc00000003000001000a002000f9003101020d000e28002c00fa012ba00097a10096000c030000000200000000000000a1009a0008fffc00000003000001000a002100820032008b28002d0083012ba00097a10096000c0300
|
|
00000200000000000000a1009a0008fffc00000003000001000a0096015800a701612bd675012ba00097a10096000c030000000200000000000000a1009a0008fffc00000003000001000a00b7015700c801602800c30158012ba00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a00
|
|
4401250055012f280050012a012da00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a001900b7002a00c128002500bc012da00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a011d0107012e0111280129010c012da00097a10096000c030000
|
|
000200000000000000a1009a0008fffc0000ffff000001000a013600b6014700c028014200bc012ea00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a012a007c013b00862801360082012ea00097a10096000c030000000200000000000000a1009a0008fffc0000fffe000001000a
|
|
00e4003100f5003b2800f00037012ea00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a0092002400a3002e28009e002a012ea00097a10096000c030000000200000000000000a1009a0008fffc0000ffff000001000a005a003e006b00472800660043012ea00097a00083ff}}
|
|
\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 11.3\tab A typica
|
|
l helix wheel display using a window of only 13 residues. The display includes a schematic of the helix showing the links between residues, with each vertex numbered according to position; the residue type at each vertex; a symbol denoting a classification
|
|
as hydrophobic (.), positively charged (+), negatively charged (-), or otherwise (). The residue number of the first sequence element in the current window is displayed at the top left corner along with the sequence. Below this is the total hydrophobicity
|
|
and hydrophobic moment according to Eisenberg {\i et al }(2).\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Producing a Robson secondary structure prediction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This method uses the method of Garnier {\i et al} (4) to predict the positions of alpha helices, beta sheets, turns and random coil. The results can be either plotted or listed.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Robson secondary structure prediction".\par
|
|
\pard \s7\qj\fi-560\li560\ri-100\sa120\sl280\tx560 \page 2.\tab Accept "Plot results". The alternative produces a listing like that shown in figure 11.4.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 11.5. and the program also prints a count of the number of positions at which each of the 4 structure types is the highest scoring.\par
|
|
\pard\plain \li1500\ri1460\sb200\sl220\box\brsp100\brdrth \tqr\tx3220\tqr\tx4700\tqr\tx6140\tqr\tx7420 \f4\fs16 350 P\tab 274\tab -178\tab -84\tab -77\par
|
|
\pard \li1500\ri1460\sl220\box\brsp100\brdrth \tqr\tx3220\tqr\tx4700\tqr\tx6140\tqr\tx7420 351 L\tab 16\tab -192\tab -21\tab -38\par
|
|
352 K\tab 371\tab -223\tab -75\tab -68\par
|
|
353 L\tab 365\tab -152\tab -101\tab -65\par
|
|
354 S\tab 331\tab -82\tab -84\tab -63\par
|
|
355 K\tab 311\tab -43\tab -110\tab -88\par
|
|
356 A\tab 280\tab -23\tab -110\tab -80\par
|
|
357 V\tab 234\tab -12\tab -135\tab -75\par
|
|
358 H\tab 177\tab -10\tab -143\tab -92\par
|
|
359 K\tab 153\tab 2\tab -180\tab -138\par
|
|
360 A\tab 158\tab 52\tab -175\tab -130\par
|
|
361 V\tab 144\tab 78\tab -187\tab -115\par
|
|
362 L\tab 132\tab 58\tab -186\tab -80\par
|
|
363 T\tab 124\tab 63\tab -142\tab -78\par
|
|
364 I\tab 144\tab 32\tab -111\tab -43\par
|
|
365 D\tab 120\tab -49\tab -29\tab 5\par
|
|
366 E\tab 103\tab -80\tab 13\tab 43\par
|
|
367 K\tab 111\tab -113\tab 23\tab 42\par
|
|
368 G\tab 132\tab -127\tab -13\tab 64\par
|
|
369 T\tab 172\tab -132\tab -42\tab 52\par
|
|
\pard \li1500\ri1460\sl220\keepn\box\brsp100\brdrth \tqr\tx3220\tqr\tx4700\tqr\tx6140\tqr\tx7420 370 E\tab 216\tab -170\tab -122\tab -4{\b \par
|
|
}\pard\plain \s8\qj\fi-1140\li1140\sb120\sa200\sl240\tx1140 \f21\fs20 Figure 11.4\tab A listing of the Robson secondary structure prediction. It includes the sequence position, the residue type and the values for the four structure classes.\par
|
|
\pard\plain \sb200\sl220\keepn \f4\fs16 {{\pict\macpict\picw446\pich256
|
|
0d0fffffffff00ff01bd1101a0008201000affffffff00ff01bd090000000000000000310000000000fe01bc9800240000000000a601200000000000a6011f0000000000fe01bc000102dd0006007fdfff00fc060040df000004060040df000004060040df000004060041df00000407014280e0000004060042df0000040b
|
|
0042fd00010140e500000410014380fe000101a0fd0000c0ea000004110640000008800120fe000101a0ea000004160c40000019900124000010022380fc00000cf1000004200c40000066e80216000070022240fc000012fe00014001fc000060fd00010404240e400000a68804190000908214400006fe000612200000a0
|
|
0180fe00010190fd00010c04252340001100080409020111421820000b02800012500000a181900000040f10001006001204252340003f0008040903c20d3c0020001103c00022900000a1827900038e080800380a90120425234061410004340104a205200010021084200022910000a272279ea495300800440ba8218425
|
|
234292e03b1c5f070a12f300e0d0059e9830002751860122f20493de5161f800440bb82d042508429400000648010814fe00171004a0b814002009b621221400600860c008008508a44104250843140000018000880cfe000d0f0460c01ad9200a4952220c0020fe000604808590424004230842140000018000880cfe000b
|
|
0d046080032ac006014a2204fc000605e883904380041f014018fc000050fc000084fe000604400401861c04fc0006051500500200041e014008fc000030fc000048fe0005040000018018fb0006060500600000041a0040fb000020fc000070fb0002010018fb0006020600200000040e0040f5000050f2000002fc000004
|
|
0a0040f5000040ec000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df00000406007fdfff00fc060040df000004060040df000004060040df00000407014380e0000004060041df000004060041df0000040a0041f7000020ea000004100041fe0002
|
|
8000c0fd000060ea000004190640000008e00120fe00010161fb000008fd000001f60000041b064000000da00120fe000202e280fc000008fe00018001f60000041c0640600053200110fe000202e280fc00000cfe0002c00180f70000042306405010a3100212fe000202a280fe0002c00014fe0004c001800002fc000304
|
|
00100425064090110310020efe001904948000000140002400000121018000c200018000000420120425234090190210040e0000c0049c8000000220002480000121814000a38c014000000c202e8425234090190210140d020330280880000c222000248100022281430123920f6000000ab02d04252340891d02182c0503
|
|
0234d80040001a54202025830022224124821291083000300ab44d04252342f766eef868e50484150323c0002a55d0303be2e033e34124c21279381000500efc560425234287e60018800504c4170000440722881049c39491322231242a1a01e00880580adc440425234306a00015000108241600003c05418810c943140b
|
|
4c1211343c0c00000980480a8a440425234106000016000108141400001218c1800d0902140780140d38240c000009a0841b0a8004252340040000060000881c080000111000000f0e02140780140608000c00000950852b0a8004252340040000040000880008000001900000070002080500180600000400000908872b0b
|
|
000420014004fc0002900008fe000050fb000f040018060000040000050481230300041b0040fb000060fc000050fb000304000802fc0006060700a30000041b0040fb000020fc000060fb000304000802fc0006020100e2000004140040f5000060f9000008fb0006020000220000040a0040e5000002fc0000040a0040e5
|
|
000002fc000004060040df000004060040df000004060040df000004060040df00000406007fdfff00fc060040df000004060040df000004060040df0000040b014380f8000008ea0000040a0042f700000cea0000040e0043f700000cf1000008fb0000040e0042f700000af100000cfb0000040f014380f8000012f10000
|
|
14fb0000041c014004fd000008fd000012fd000004fc00010204fd000014fb0000041e01410afd000014fd000012fd00000afc0001060cfe00010224fc000101041f014292fd000014fd000012fd00000bfc00010a0cfe0001c522fd000203010421014291fd000024fd000621200000020904fd00060a12000001a522fd00
|
|
02028284240642910000a00024fd00182150000007110f000007000a12000002252200010000028204252342508000d006443804000820900000091109c0000900091200300218e2180380000282042523426043f0918582280c000e20900030089109200008801112002802180128048000028404252343bfc32912899a68
|
|
1200113e3000380e9f8a60001c802f1e002803f7ff2804802002858425234000440f0e480184122011201180c80890902000204e21120044040001281c406164840425234000440208500184225320c009410808a09010002052e0a110441c0001241040b2a44404252340002402085001044254a00009220810a060100040
|
|
2380a12d841000012620210a34440425234000240000300002829c60000924041040600800802080c127022000014220210e08440425234000180000200002818000000614041000000881000080812002200001424011000848042402400018fd000c0300800000061c04a000000981fe000c01200240000082401b000028
|
|
04200040fb000002fc00061004c000000a42fe000c012002400000014004000030041f0040fb000002fb0005028000000a64fe00070120014000000180fe000130041b0040f40005038000000614fe000301200180fe000080fe00011004160040f4000002fe00010614fe000301400080f9000004110040ef000008fe0003
|
|
01400080f90000040a0040ea0000c0f70000040a0040ea000080f7000004060040df000004060040df000004060040df00000406007fdfff00fc0a0040e40000c0fd0000040a0040e40000a0fd00000413014280f4000080fe000006f7000090fd00000417014280f600021000c0fe000009f80006011001c00000041c0143
|
|
80f600026800a0fe00010880fe000001fd00060110032000000421014280fa000002fe0002480110fe00011080fe00010280fe00060108042000000421014280fa000005fe0002c40110fe00011040fe00010640fe000602080420000004250f4000060001c000380405300001040110fe00011040fe000b3420e000000204
|
|
0820008004250f40000500023000280a04a80003020210fe00101040080000482090000004060820018004252340000900021000440a0868000c0102080038001020140000c0211000000405901002400425234000110002080082120864001401020801e4002021e21c0100111000000801d0100240042523400011000404
|
|
00826108040010008208030300202222240100121000000800201004200425234000108004040103a09002001000e208040381c012022201000a080000080000100420042523400020600404010000900200200024078c00418014022202000c08010008000008c82104252341c06020080401000060020020001804f00022
|
|
001c0243e200000801c008000008a81704252342a0ff100ffe01ffff0ffe105fffebf811fda3bff3ff789ffffff802201ffffffdeff5042506401b000e880202fe0002012840fe000610003200000180fe00090b06201000000518148423064004000bc80202fe0002012480fc00001cfe000080fe00090a84383000000300
|
|
0c841e0640040002480104fd0001a680fc000008fa00090a8c2ce00000020000041e0640000002280104fd0001e380fc000008fa0002045005fe000302000004170040fe0002100104fd00018080f40002042003fb000004140040fe0002100104fd000080f1000003fb0000040a0040fc0000fce50000040a0040fc000020
|
|
e5000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df000004060040df00000406007fdfff00fc060040df000004060043df00000407014280e000000407014280e000000407014280e000000407014380e000000425134000203b1807070200f000e0c0000e00300006
|
|
40fe000cf000000e1001f800000b980c04060040df000004060040df000004060040df000004060040df000004060040df0000042406407000eee000e0fe0011032380000001c00038800001e10100000208fd000304201204060040df000004060040df000004060040df000004060040df000004060040df000004060040
|
|
df0000042202439f80fe000018fd00111e200010061f0260000c000e1e000001f7fefc00010184060040df000004060040df000004060040df000004060040df000004060040df000004252340005f0007fc01ffff0ffc001fffe3f801fd81bff3fe3801fffff800000ffffff8cfe004060040df000004060040df00000406
|
|
0040df000004060040df000004060040df000004060040df00000406007fdfff00fc02dd00a00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.5\tab A secondary structure plot using the method of Robson. The likelihood that each 17 residue segment of the sequence forms one of the four structure classes\:
|
|
helix (H), extended (E) normally termed sheet, turn (T) and coil (C) are each plotted out across the screen in four strips. Below this
|
|
is a "decision" strip (D) in which a single dot is poltted for the higest scoring structure class at each point. Here we see a sequence that is predicted to be predominantly helical.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Calculating the composition and molecular weight of a sequence.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
Select "Count amino acid composition". The composition and molecular weight are displayed as in figure 11.6.. Each column contains the one letter code for the amino acid, the number of occurrences of that amino acid in the sequence, and the number expresse
|
|
d as a percentage, and its molecular weight.\par
|
|
\pard\plain \li220\ri280\sb200\sl220\box\brsp100\brdrth \f4\fs16 Sequence composition\par
|
|
\pard \li220\ri280\sl220\box\brsp100\brdrth A C S T P A G N D E Q B Z H\par
|
|
N 0. 14. 19. 12. 30. 26. 3. 10. 11. 4. 0. 0. 0.\par
|
|
% 0.0 5.3 7.3 4.6 11.5 9.9 1.1 3.8 4.2 1.5 0.0 0.0 0.0\par
|
|
W 0. 1219. 1921. 1165. 2132. 1483. 342. 1151. 1420. 513. 0. 0. 0.\par
|
|
\par
|
|
A R K M I L V F Y W - X ? \par
|
|
N 7. 7. 10. 15. 39. 23. 13. 11. 8. 0. 0. 0. 0.\par
|
|
% 2.7 2.7 3.8 5.7 14.9 8.8 5.0 4.2 3.1 0.0 0.0 0.0 0.0\par
|
|
W 1093. 897. 1312. 1697. 4413. 2280. 1913. 1795. 1490. 0. 0. 0. 0.\par
|
|
\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth Total molecular weight= 28256.254\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 11.6\tab A typical molecular weight and composition display. It includes the residue type, their number, their percentage and their contribution to the molecular weight.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab The methods described in the chapters on motif and pattern searching can also be used to search for specifi
|
|
c structures. For example a sequence can be searched for all the structures contained in the PROSITE motif library.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab It is often convenient to produce displays in which several of the plots described above appear together on the screen.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Kyte, J. and Doolittle, R.F. 1982. A simple method for displaying the hydropathic character of a protein. {\i J.Mol. Biol}. {\b 157}\:105-132. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Eisenberg, D., Schwarz, E., Komaromy, M., and Wall, R. 1984. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. {\i J. Mol. Biol.} {\b 179}\:125-142.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Schiffer,M and Edmundson,A.B. 1967 Use of helical wheels to represent the structures of proteins and to identify the segments with helical potential. {\i Biophys. J}. {\b 7}, 121-135.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Garnier, J., Osguthorpe, D.J., and Robson, B. 1978. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. {\i J. Mol. Biol}. {\b 120}\:
|
|
97-120.\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 12. Searching for Motifs in Protein Sequences\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Searching for exact matches.\par
|
|
2.2\tab Searching for percentage matches to consensus sequences\par
|
|
2.3\tab Searching for consensus sequences using a score matrix\par
|
|
2.4\tab Using weight matrices for searching protein sequences\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
4.\tab References\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The program PIP contains several ways of defining and searching for motifs (1,2). We describe searches for exact matches and percentage matches, the use of score matrices and the creation and use of weight matrices. All of the searches produce
|
|
both listed and graphical output.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Searching for exact matches.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The routine for finding and displaying the positions of exact matches to sequences can display its results in various forms. It is equivalent to the restriction enzyme search routine in the nucleotide analysis programs. The sequences to be searched for ca
|
|
n be typed on the keyboard or read from files. The format of these files is given in the notes. Here we give only a single example of the use of the routine which shows how to produce a plot of the positions of all amino acid types in a sequence.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Search".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab
|
|
Select "Input source" as "All acids file". A number of standard files are available and users may also have their own. The one selected simply contains the one letter codes for all the standard amino acids.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Accept "Search for all names". The alternative allows users to select a subset of the entries in the file by name.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Order results name by name".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Reject "List matches". If results are listed the output gives the name and position of each match and also the separations between matches.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The results will then appear in the form shown in figure 12.1. \par
|
|
\pard\plain \li80\ri80\sl220\keepn\box\brsp40\brdrth \f4\fs16 {{\pict\macpict\picw441\pich182
|
|
14a4ffffffff00b501b81101a0008201000affffffff00b501b8090000000000000000310000000000b201b798002a000000000083014400000000008301440000000000b201b7000102d70020f90002020080fd000a8000008000010204200201fb000620000200401004fe0020f90002020080fd000a8000008000010204
|
|
200201fb000620000200401004fe00220050fa0002020080fd000a8000008000010204200201fb000620000200401004fe00220050fa0002020080fd000a8000008000010204200201fb000620000200401004fe00220070fa0002020080fd000a8000008000010204200201fb000620000200401004fe00220020fa000202
|
|
0080fd000a8000008000010204200201fb000620000200401004fe00220020fa0002020080fd000a8000008000010204200201fb000620000200401004fe0009fd000007deff01c00005d900014000070050da00014000070050da00014000070050da00014000070070da000140000b0070fe000007deff01c00025fb0018
|
|
c340000e04308008944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e04308008944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e04308008944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e043080
|
|
08944000004080c01000400404880128004afe000340481008fd00270050fc0018c340000e04308008944000004080c01000400404880128004afe000340481008fd000b0020fe000007deff01c00026fc00018004fe000a0100400001000a00020120fc0008224412200041000820fe000308010000280070fd00018004fe
|
|
000a0100400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100
|
|
400001000a00020120fc0008224412200041000820fe000308010000280020fd00018004fe000a0100400001000a00020120fc0008224412200041000820fe00010801ff0009fd000007deff01c00028fd0005014041802010fe00fe101900040500180001080080010084001804028000500500000480002a0020fe000501
|
|
4041802010fe00fe101900040500180001080080010084001804028000500500000480002a0050fe0005014041802010fe00fe101900040500180001080080010084001804028000500500000480002a0060fe0005014041802010fe00fe101900040500180001080080010084001804028000500500000480002a0010fe00
|
|
05014041802010fe00fe101900040500180001080080010084001804028000500500000480000b0070fe000007deff01c00026fc0014040010042000890000400310080040004180112058fe0008010401100800008004fd00280060fd0014040010042000890000400310080040004180112058fe00080104011008000080
|
|
04fd00280050fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd00280070fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd00280050fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd0028
|
|
0050fd0014040010042000890000400310080040004180112058fe0008010401100800008004fd0009fd000007deff01c00027fd0004040a000080fc00092a010808100001000090fe000e040100008021048630000050008000290020fe0004040a000080fc00092a010808100001000090fe000e04010000802104863000
|
|
0050008000290050fe0004040a000080fc00092a010808100001000090fe000e040100008021048630000050008000290050fe0004040a000080fc00092a010808100001000090fe000e040100008021048630000050008000290050fe0004040a000080fc00092a010808100001000090fe000e0401000080210486300000
|
|
500080000b0070fe000007deff01c000230020fa00070800801009010408fc000920000090000020200120fe000301000001fd00230060fa00070800801009010408fc000920000090000020200120fe000301000001fd00230050fa00070800801009010408fc000920000090000020200120fe000301000001fd00230070
|
|
fa00070800801009010408fc000920000090000020200120fe000301000001fd00230040fa00070800801009010408fc000920000090000020200120fe000301000001fd00230040fa00070800801009010408fc000920000090000020200120fe000301000001fd0009fd000007deff01c00021fd00080100880004800000
|
|
40fd0002100101fc0005020000101440fa000022fe00230050fe0008010088000480000040fd0002100101fc0005020000101440fa000022fe00230070fe0008010088000480000040fd0002100101fc0005020000101440fa000022fe00230070fe0008010088000480000040fd0002100101fc0005020000101440fa0000
|
|
22fe00230050fe0008010088000480000040fd0002100101fc0005020000101440fa000022fe000b0050fe000007deff01c0001ffd000604000001400108fd000010fc000028f8000020fe000380000080fe00210050fe000604000001400108fd000010fc000028f8000020fe000380000080fe00210070fe000604000001
|
|
400108fd000010fc000028f8000020fe000380000080fe00210050fe000604000001400108fd000010fc000028f8000020fe000380000080fe00210050fe000604000001400108fd000010fc000028f8000020fe000380000080fe000b0050fe000007deff01c00029fd00230220000410c020462080000081000024028812
|
|
06016000a0005000084842100c48208028ff0029fd00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800002b0040fe00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800002b0040fe00250220000410c020462080000081000024
|
|
02881206016000a0005000084842100c4820802800002b0040fe00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800002b0040fe00250220000410c02046208000008100002402881206016000a0005000084842100c4820802800000b0070fe000007deff01c00026fc000008
|
|
fd000c06000200008004010840000001fe000016fd000a5800044c00040000620000280050fd000008fd000c06000200008004010840000001fe000016fd000a5800044c00040000620000280060fd000008fd000c06000200008004010840000001fe000016fd000a5800044c00040000620000280060fd000008fd000c06
|
|
000200008004010840000001fe000016fd000a5800044c00040000620000280060fd000008fd000c06000200008004010840000001fe000016fd000a5800044c000400006200000b0070fe000007deff01c00027fc0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd
|
|
0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd0012540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd00
|
|
12540430210000800802800860b2a20100001808fe0004100a821022fd0005020020900000290020fd0012540430210000800802800860b2a20100001808fe0004100a821022fd000302002090ff0009fd000007deff01c0001bfb00011008fc000040fc000008fd000001f9000001fe000002fd001d0050fc00011008fc00
|
|
0040fc000008fd000001f9000001fe000002fd001d0050fc00011008fc000040fc000008fd000001f9000001fe000002fd001d0070fc00011008fc000040fc000008fd000001f9000001fe000002fd001d0050fc00011008fc000040fc000008fd000001f9000001fe000002fd000b0050fe000007deff01c00027fb002304
|
|
488809088d15210106240210080004400048001502010223060000800082000c500000290020fc002304488809088d15210106240210080004400048001502010223060000800082000c500000290050fc002304488809088d15210106240210080004400048001502010223060000800082000c500000290070fc00230448
|
|
8809088d15210106240210080004400048001502010223060000800082000c500000290050fc002304488809088d15210106240210080004400048001502010223060000800082000c500000290070fc002104488809088d15210106240210080004400048001502010223060000800082000c50ff0009fd000007deff01c0
|
|
0020fc000001fa000020fa0014010000120020004048000003000004010000020000220070fd000001fa000020fa0014010000120020004048000003000004010000020000220040fd000001fa000020fa0014010000120020004048000003000004010000020000220060fd000001fa000020fa0014010000120020004048
|
|
000003000004010000020000220040fd000001fa000020fa00140100001200200040480000030000040100000200000b0040fe000007deff01c00028fc0024a02800404010200008400080000010080240002021880800200000100020010880418000002a0070fd0024a02800404010200008400080000010080240002021
|
|
880800200000100020010880418000002a0040fd0024a02800404010200008400080000010080240002021880800200000100020010880418000002a0060fd0024a02800404010200008400080000010080240002021880800200000100020010880418000002a0040fd0024a0280040401020000840008000001008024000
|
|
2021880800200000100020010880418000000b0070fe000007deff01c00024fa000a820010400004c008201044fd0006a6000400000102fd000662000020040204ff0024fa000a820010400004c008201044fd0006a6000400000102fd0008620000200402040000260060fb000a820010400004c008201044fd0006a60004
|
|
00000102fd0008620000200402040000260050fb000a820010400004c008201044fd0006a6000400000102fd0008620000200402040000260050fb000a820010400004c008201044fd0006a6000400000102fd0008620000200402040000260050fb000a820010400004c008201044fd0006a6000400000102fd0008620000
|
|
2004020400000b0070fe000007deff01c0000dfa000301000002fb000001e9000f0020fb000301000002fb000001e9000f0050fb000301000002fb000001e9000f0040fb000301000002fb000001e9000f0040fb000301000002fb000001e9000b0070fe000007deff01c00028fc0024022004030450001016004001c02806
|
|
369020a0101a404280048180c49001222052000100002a0020fd0024022004030450001016004001c02806369020a0101a404280048180c49001222052000100002a0050fd0024022004030450001016004001c02806369020a0101a404280048180c49001222052000100002a0070fd0024022004030450001016004001c0
|
|
2806369020a0101a404280048180c49001222052000100002a0050fd0024022004030450001016004001c02806369020a0101a404280048180c49001222052000100000b0050fe000007deff01c00002d700a0008c310002000100b5001038a10096000c010000000200000000000000a1009a0008fffd0000000300000100
|
|
0a00050002000f000a2c000c00150948656c76657469636103001504010d00082e0004000001002b030c0159a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a000e00020018000a2a090157a00097a10096000c010000000200000000000000a1009a0008fffd0000000300000100
|
|
0a00150002001f000a2a070156a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a001f00020029000a2a0a0154a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a002700020031000a2a080153a00097a10096000c01000000020000000000
|
|
0000a1009a0008fffe00000003000001000a00300002003a000a2a090152a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a003900020043000a2a090151a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00420002004c000a2a090150a0
|
|
0097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a004a00020054000a2a08014da00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a00530002005d000a2a090148a00097a10096000c010000000200000000000000a1009a0008fffe0000000300
|
|
0001000a005c00020066000a2a09014ca00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00640002006e000a2a08014ba00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a006e00020078000a2a0a0149a00097a10096000c01000000020000
|
|
0000000000a1009a0008fffe00000003000001000a007600020080000a2a080148a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00800002008a000a2a0a0147a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a008800020092000a2a08
|
|
0146a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00900002009a000a2a080145a00097a10096000c010000000200000000000000a1009a0008fffd00000003000001000a0099000200a3000a2a090144a00097a10096000c010000000200000000000000a1009a0008fffe0000
|
|
0003000001000a00a2000200ac000a2a090143a00097a10096000c010000000200000000000000a1009a0008fffe00000003000001000a00aa000200b4000a2a080141a00097a0008da00083ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb40\sa120\sl240\tx1140 \f21\fs20 Figure 12.1\tab Typical graphical output from "Search for exact matches" in which the position of each matching string (here individual amino acid types) is marked.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Searching for percentage matches to sequences\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find percentage matches".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par
|
|
3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par
|
|
4.\tab Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Percent match". The search is performed, the results are presented graphically, the number of matches displayed, and the scores and positions of the top 10 matches displayed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define the number of matches to "Display". For the number of matches chose
|
|
n the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round to step 3.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Searching for sequences using a score matrix\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
A score matrix gives a score for the alignment of each possible pair of sequence symbols. This method is more sensitive than the simple percentage match search. The default matrix MDM78 used by this program is shown in figure 12.2.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Find matches using a score matrix".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Accept "Type in strings". The alternative allows the string to be extracted from a named file.\par
|
|
3.\tab Reject "Keep picture". This will cause the graphics window to be cleared. The alternative leaves it unchanged.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
|
|
Define "String". Type in the search string. When the program cycles round to this point again the previous string will be offered as a default. The program displays the minimum and maximum possible scores for the string.\par
|
|
5.\tab Define "Score". The search is performed, the results are presented graphically, the number of matches displayed, and the scores and positions of the top 10 matches displayed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab
|
|
Define the number of matches to "Display". For the number of matches chosen the program will display the search string and matching sequence written one above the other with matching characters indicated by asterisk symbols. The program now cycles round
|
|
to step 3. An example run is shown in figure 12.3.\par
|
|
\pard\plain \li220\ri280\sb200\sl220\box\brsp100\brdrth \f4\fs16 C S T P A G N D E Q B Z H R K M I L V F Y W - X ? \par
|
|
\pard \li220\ri280\sl220\box\brsp100\brdrth C 22 10 8 7 8 7 6 5 5 5 5 5 7 6 5 5 8 4 8 6 10 2 10 10 10 10\par
|
|
S 10 12 11 11 11 11 11 10 10 9 10 10 9 10 10 8 9 7 9 7 7 8 10 10 10 10\par
|
|
T 8 11 13 10 11 10 10 10 10 9 10 10 9 9 10 9 10 8 10 7 7 5 10 10 10 10\par
|
|
P 7 11 10 16 11 9 9 9 9 10 9 10 10 10 9 8 8 7 9 5 5 4 10 10 10 10\par
|
|
A 8 11 11 11 12 11 10 10 10 10 10 10 9 8 9 9 9 8 10 6 7 4 10 10 10 10\par
|
|
G 7 11 10 9 11 15 10 11 10 9 10 10 8 7 8 7 7 6 9 5 5 3 10 10 10 10\par
|
|
N 6 11 10 9 10 10 12 12 11 11 12 11 12 10 11 8 8 7 8 6 8 6 10 10 10 10\par
|
|
D 5 10 10 9 10 11 12 14 13 12 13 12 11 9 10 7 8 6 8 4 6 3 10 10 10 10\par
|
|
E 5 10 10 9 10 10 11 13 14 12 12 13 11 9 10 8 8 7 8 5 6 3 10 10 10 10\par
|
|
Q 5 9 9 10 10 9 11 12 12 14 11 13 13 11 11 9 8 8 8 5 6 5 10 10 10 10\par
|
|
B 5 10 10 9 10 10 12 13 12 11 13 11 11 10 10 8 8 6 8 5 7 4 10 10 10 10\par
|
|
Z 5 10 10 10 10 10 11 12 13 13 11 14 12 10 10 8 8 8 8 5 6 4 10 10 10 10\par
|
|
H 7 9 9 10 9 8 12 11 11 13 11 12 16 12 10 8 8 8 8 8 10 7 10 10 10 10\par
|
|
R 6 10 9 10 8 7 10 9 9 11 10 10 12 16 13 10 8 7 8 6 6 12 10 10 10 10\par
|
|
K 5 10 10 9 9 8 11 10 10 11 10 10 10 13 15 10 8 7 8 5 6 7 10 10 10 10\par
|
|
M 5 8 9 8 9 7 8 7 8 9 8 8 8 10 10 16 12 14 12 10 8 6 10 10 10 10\par
|
|
I 8 9 10 8 9 7 8 8 8 8 8 8 8 8 8 12 15 12 14 11 9 5 10 10 10 10\par
|
|
L 4 7 8 7 8 6 7 6 7 8 6 8 8 7 7 14 12 16 12 12 9 8 10 10 10 10\par
|
|
V 8 9 10 9 10 9 8 8 8 8 8 8 8 8 8 12 14 12 14 9 8 4 10 10 10 10\par
|
|
F 6 7 7 5 6 5 6 4 5 5 5 5 8 6 5 10 11 12 9 19 17 10 10 10 10 10\par
|
|
Y 10 7 7 5 7 5 8 6 6 6 7 6 10 6 6 8 9 9 8 17 20 10 10 10 10 10\par
|
|
W 2 8 5 4 4 3 6 3 3 5 4 4 7 12 7 6 5 8 4 10 10 27 10 10 10 10\par
|
|
- 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
|
|
X 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
|
|
? 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
|
|
\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa300\sl240\tx1140 \f21\fs20 Figure 12.2\tab The amino acid score matrix MDM78.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.4\tab Using weight matrices for searching protein sequences\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
A weight matrix is the most sensitive way of defining a motif. It is a table of values that gives scores for each amino acid type in each position along a motif. For a motif of length 8 amino acids the weight matrix would be a table 8 positions long and, a
|
|
llowing for 26 amino acid symbols, 26 deep. The simplest way of choosing the values for the table is to take an alignment of all known
|
|
examples of the motif and to count the frequency of occurrence of each amino acid type at each position. These frequencies can be used as the table of weights. When the table is used to search a new sequence the program calculates a score for each position
|
|
along the sequence by adding or multiplying (see notes) the relevant values in the table. All positions that exceed some cutoff score are reported as matching the original set of motifs.\par
|
|
\pard \s4\qj\sa120\sl280 How can we select a suitable cutoff score? The simplest way is to ap
|
|
ply the weight matrix to all the known occurrences of the motif - i.e. the set of sequence segments used to create the table - and to see what scores they achieve. The cutoff can be selected accordingly. For convenience the weight matrix is stored as a fil
|
|
e along with its cutoff score, a title that is displayed when the file is read, and a few other values need by the program. A routine for creating weight matrix files from sets of aligned sequences is included in the program. When a search using the weight
|
|
matrix is performed the program will either list the matching sequence segments or plot their positions as for the other motif search methods.\par
|
|
\pard\plain \li2000\ri2260\sb200\sl220\box\brsp100\brdrth \f4\fs16 Find matches using a score matrix\par
|
|
\pard \li2000\ri2260\sl220\box\brsp100\brdrth ? Keep picture (y/n) (y) =\par
|
|
? String=ALPHA\par
|
|
Minimum score= 23 Maximum score= 72\par
|
|
? Score (23-72) (72) =60\par
|
|
\par
|
|
For score 60 the number of matches= 5\par
|
|
Scores 62 62 62 61 61\par
|
|
Positions 120 217 420 54 326\par
|
|
? Display (0-5) (0) =\par
|
|
\par
|
|
120\par
|
|
PLDHD\par
|
|
* *\par
|
|
ALPHA\par
|
|
1\par
|
|
\par
|
|
217\par
|
|
ALANT\par
|
|
**\par
|
|
ALPHA\par
|
|
1\par
|
|
\par
|
|
420\par
|
|
QLDHG\par
|
|
* *\par
|
|
ALPHA\par
|
|
1\par
|
|
\par
|
|
54\par
|
|
SLPGN\par
|
|
**\par
|
|
ALPHA\par
|
|
1\par
|
|
\par
|
|
326\par
|
|
ALPII\par
|
|
***\par
|
|
ALPHA\par
|
|
1\par
|
|
? Keep picture (y/n) (y) =\par
|
|
Default String=ALPHA\par
|
|
\pard \li2000\ri2260\sl220\keepn\box\brsp100\brdrth ? String=!\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa420\sl240\tx1140 \f21\fs20 Figure 12.3\tab An example of the listed output from "Search using a score matrix".\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.4.1\tab Creating a weight matrix file from a set of aligned sequences\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par
|
|
2.\tab Select "Make weight matrix".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
Define "Name of aligned sequences file". We assume the file of aligned sequences has already been created (see note 5). The program reads and displays the contents of the file numbering each sequence as it goes. Then it displays the length of the longes
|
|
t sequence.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Accept "Sum logs of weights". The alternative is to sum the weights when calculating scores (see note 6). \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Accept "Use all motif positions". The alternative allows the user to define a "mask" which i
|
|
dentifies positions within the motif that should be ignored when the matrix is created (see note 7). The program now calculates the weights and applies them in turn to each of the sequences in the file. The number and score for each sequence is displayed,
|
|
followed by the top, bottom and mean scores and the standard deviation. In addition the mean plus and minus 3 standard deviations is displayed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Cutoff score". The default is the mean minus 3 standard deviations, but users may, for example, decide to use the lowest score obtained by the sequences in the file.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Define "Top score for scaling plots". This parameter is used by the graphics output routine when scaling the plots. Its value will influence the height of lines plotted to represent matches.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab
|
|
Define "Position to identify". When a search is performed it is not always appropriate to report the position of a match relative to the leftmost amino acid in the motif. For example when performing a helix-turn-helix motif search we may want to know
|
|
the position of the well conserved glycine rather than the position of the first amino acid in the matrix. The "Position to identify" allows the user to define which amino acid is marked. The amino acids in the table are number 1,2,3 and so on.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define a "Title". This is a title that will be displayed when the matrix file is read prior to performing a search. It is limited to 60 characters.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab Define "Name for new weight matrix file". Give a name for the weight matrix file.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 See the example run in figure 12.4.\par
|
|
\pard\plain \li1240\ri1180\sb300\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Motif search using weight matrix\par
|
|
\pard \li1240\ri1180\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select operation\par
|
|
X 1 Use weight matrix\par
|
|
2 Make weight matrix\par
|
|
3 Rescale weight matrix\par
|
|
? Selection (1-3) (1) =2\par
|
|
? Name of aligned sequences file=atpbinding.seq\par
|
|
1 GETLGIVGESGSG\par
|
|
2 GESLGVVGESGGGKSTFAR OppF\par
|
|
3 GDVISIDGSSGSGKSTFLR HisP\par
|
|
4 GEFVVFVGPSGGGKSTLLR MalK E. coli\par
|
|
5 NQVTAFIGPSGGGKSTLLR PstB\par
|
|
6 GRVMALVGENGAGKSTMMK RbsA(N)\par
|
|
7 GEVIGIVGRSGSGKSTLTK HlyB\par
|
|
8 GECFGLLGPNGAGKSTITR NodI R. leguminosarum\par
|
|
9 GEMAFLTGHSGAGKSTLLK FtsE E. coli\par
|
|
10 GQRELIIGDRQTGKTALAI ATPase\par
|
|
11 GGKVGLFGGAGVGKTVNMM ATPase\par
|
|
12 GRIVEIYGPESSGKTTLTL RecA\par
|
|
13 RSNLLVLAGAGSGKTRVLV UvrD\par
|
|
14 GGKIGLFGGAGVGKTVGIM ATPase Bovine\par
|
|
15 SKIIFVVGGPGSGKGTQCE Adenylate Kinase Rabbit\par
|
|
16 NQSILITGESGAGKTVNTK Myosin Rabbit\par
|
|
17 HVNVGTIGHVDHGKTTLTA EF-Tu E. coli\par
|
|
18 YRNIGISAHIDAGKTTERI EF-G E. coli\par
|
|
19 EYKLVVVGARGVGKSALTI v-ras (HARVEY)\par
|
|
20 EYKLVVVGASGVGKSALTI v-ras (KIRSTEN)\par
|
|
21 EYKLVVVGAVGVGKSALTI pEJ BLADDER CARCINOMA TRANSFORMING\par
|
|
22 EYKLVVVGAGGVGKSALTI pEJ BLADDER CARCINOMA CELLULAR\par
|
|
Length of motif 19\par
|
|
? Sum logs of weights (y/n) (y) =\par
|
|
? Use all motif positions (y/n) (y) =\par
|
|
Applying weights to input sequences\par
|
|
1 -36.651 GETLGIVGESGSGKSQSLR\par
|
|
2 -35.780 GESLGVVGESGGGKSTFAR\par
|
|
3 -38.180 GDVISIDGSSGSGKSTFLR\par
|
|
4 -35.403 GEFVVFVGPSGGGKSTLLR\par
|
|
5 -39.039 NQVTAFIGPSGGGKSTLLR\par
|
|
6 -40.653 GRVMALVGENGAGKSTMMK\par
|
|
7 -34.017 GEVIGIVGRSGSGKSTLTK\par
|
|
8 -37.454 GECFGLLGPNGAGKSTITR\par
|
|
9 -36.474 GEMAFLTGHSGAGKSTLLK\par
|
|
10 -43.431 GQRELIIGDRQTGKTALAI\par
|
|
11 -40.210 GGKVGLFGGAGVGKTVNMM\par
|
|
12 -40.720 GRIVEIYGPESSGKTTLTL\par
|
|
13 -45.143 RSNLLVLAGAGSGKTRVLV\par
|
|
14 -40.684 GGKIGLFGGAGVGKTVGIM\par
|
|
15 -45.197 SKIIFVVGGPGSGKGTQCE\par
|
|
16 -39.098 NQSILITGESGAGKTVNTK\par
|
|
17 -43.832 HVNVGTIGHVDHGKTTLTA\par
|
|
18 -44.817 YRNIGISAHIDAGKTTERI\par
|
|
19 -36.305 EYKLVVVGARGVGKSALTI\par
|
|
20 -35.101 EYKLVVVGASGVGKSALTI\par
|
|
21 -36.305 EYKLVVVGAVGVGKSALTI\par
|
|
22 -36.711 EYKLVVVGAGGVGKSALTI\par
|
|
Top score -34.017 Bottom score -45.197\par
|
|
Mean -39.146 Standard deviation 3.441\par
|
|
Mean minus 3.sd -49.470 Mean plus 3.sd -28.822\par
|
|
? Cutoff score (-999.00-9999.00) (-49.47) =\par
|
|
? Top score for scaling plots (-49.47-999.00) (-28.82) =\par
|
|
? Position to identify (0-19) (1) =13\par
|
|
? Title=ATP binding motif\par
|
|
\pard \li1240\ri1180\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth ? Name for new weight matrix file=atpbinding.wts\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa320\sl240\tx1140 \f21\fs20 Figure 12.4\tab An example run of the creation of a weight matrix from a set of aligned sequences.\par
|
|
\pard\plain \s9\fi-560\li860\sa60\sl280\tx1140 \b\f20 2.4.2\tab Searching using a weight matrix\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Once a weight matrix has been stored in a file it can be used to search any sequence. Results can be displayed graphically or the matching sequence segments can be listed out with their scores.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Motif search using weight matrix".\par
|
|
2.\tab Select "Use weight matrix".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Define "Motif weight matrix file". The name of the file containing the weight matrix. The program reads the file and displays its title.\par
|
|
4.\tab Accept "Use frequencies as weights". The alternative will use the weight matrix file as a definition of a "Membership of set" motif (see note 10).\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
|
|
Define "Cutoff score". The default will be the value set when the weight matrix file was created. If the score is negative the program will calculate sums of logs of frequencies, otherwise it will add frequencies.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Plot results". Alternatively they will be listed.\par
|
|
The results will appear.\par
|
|
\pard\plain \s5\sa60\sl320\tx560 \b\f20\fs28 \page 3.\tab Notes\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab The files containing the definitions of peptides that can be be searched for by the exact match search routine have the following format. Each name is followed by a /, th
|
|
en each of its peptide sequences is followed by a /. The last peptide sequence for each name is followed by //. For example a file might contain the following.\par
|
|
\pard \s7\qj\li1720\sb200\sa120\sl280\tx1880 Acidic/D/E//\par
|
|
\pard \s7\qj\li1720\sa120\sl280\tx1880 Basic/R/K/H//\par
|
|
Glyco/N-S/N-T//\par
|
|
\pard \s7\qj\fi-560\li560\sb200\sa120\sl280\tx560 \tab Users could then search for these named sets of sequences. Note that the symbol - matches any amino acid.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab To search for a subset of the names in a file employed by exact match routine the user should reject "Search for all names" and the program will ask for the names wanted and extract their sequences
|
|
from the file. Alternatively, if a user was always using the same subset, then a file containing only those names could be created. This file would then be selected as "Personal file" for "Input source".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
The exact match routine also allows names and their sequences to be entered on the keyboard. This is selected as "Keyboard" for "Input source", and the program will prompt for names and their sequences. In this way the routine can be used to search for
|
|
exact matches to any short sequence. \par
|
|
4.\tab For this pr
|
|
ogram a motif is a short segment of sequence of fixed length. More complex structures termed "patterns" which we define as sets of motifs separated by varying gaps, are covered in another chapter. The current chapter should be read before the chapter on pa
|
|
tterns. \par
|
|
5.\tab The files of aligned sequences used to make weight matrices have the following format. Each sequence should be on a separate line. The sequence should start in column 2 and is terminated by a new line or a space. Anything after the space is tr
|
|
eated as a comment. The files can be created by previous searches or using an editor.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab
|
|
The frequencies in the weight matrix can be used in two ways to calculate scores for sequences. Some users prefer to add the frequencies to give a total score, and others to multiply them by summing their logs. If we regard the frequencies as probabilit
|
|
ies then multiplication seems the correct procedure. The user chooses which method will be used when the weight matrix is created, however the choice can be overridden wh
|
|
en the matrix is used. If multiplication is selected then all results will presented as sums of logs.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab
|
|
Masking the weight matrix is particularly useful in cases where a limited number of examples of a motif are available, or when the motif may have several components. In the first case the limited number of examples may make the matrix unrepresentative o
|
|
f the motif because the amino acids in the unconserved positions may bias the results of searches. We stated that a motif might have several components\:
|
|
for example it might have both structural and specificity components. We may want to separate out the two parts and again masking provides such a facility.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 8.\tab
|
|
The weight matrix handling routine contains a further option "Rescale weight matrix". If the user has edited a weight matrix to change the frequency values this provides a way of selecting a new cutoff score. It allows users to read in a set of aligned
|
|
sequences and a weight matrix and to apply the matrix to the set of sequences to see the range of scores achieved. A new weight matrix file contining the selected cutoff score is written to disk.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab
|
|
The program contains no hardwired motifs as we expect most sites that use the programs to accumulate their own libraries of motifs and patterns, and to use the PROSITE library, both of which users can employ by simply knowing the names of the correspond
|
|
ing files.\par
|
|
10.\tab The weight matrix search can also used as a "Membership of a set" search. This means that at each position in the motif, any amino acid type tha
|
|
t is non-zero in the weight matrix is counted as a match and scores a value 1. See the chapter on searching protein sequences for patterns.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4(1)}\:53-60.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 13. Using Patterns to Analyse Protein Sequences\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 1.1\tab Introduction to the PROSITE motif library\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Creating a pattern file containing a weight matrix motif and a membership of a set motif.\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.2\tab Searching a sequence using a pattern file\par
|
|
2.3\tab Comparing a sequence against a library of patterns including PROSITE\par
|
|
2.4\tab Searching libraries for patterns\par
|
|
2.5\tab Preparing the PROSITE motif library for use by the programs\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
4.\tab References\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Here we describe one of the most powerful facilities provided by the program PIP\: the ability to d
|
|
efine and search sequences or libraries of sequences for complex patterns of motifs. In another chapter we give details of seaching for individual motifs but here we show how to create individual patterns and libraries of patterns and to use them to searc
|
|
h sequences. Once a pattern has been defined and stored in a file it can used to search any sequence. In addition if users want to routinely screen sequences against libraries of patterns this can be achieved by use of files of file names. For example, the
|
|
program can use the PROSITE protein motif library. The program can produce several alternative forms of output. It will display the segment of sequence matching each individual motif in the pattern, display all the sequence between and including the two o
|
|
utermost motifs, produce a description of the match in the form of a SWISSPROT feature table, or draw a simple graphical plot.\par
|
|
\pard \s4\qj\sa120\sl280 Towards the end of the chapter we describe how a related program PIPL is used to search libraries of sequences to find patterns. This program can produce alignments of sequence families.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
Patterns are defined as sets of motifs with variable spacing. Each motif in a pattern can be defined using any of several methods, and their positions relative to one other are defined in terms of minimum and maximum separations. In addition, by the use of
|
|
logical operators, each motif can be declared to be essential (the AND operator), optional (the OR operator), or forbidden (the NOT operator). The following methods (termed "classes" by the program) for defining motifs are provided\:
|
|
1) exact match to a short sequence; 2) percentage match to a short sequence; 3) match to a short sequence using a score matrix and cutoff score; 4) match to a weight matrix; 5) direct repeat; 6) membership of a set. \par
|
|
\pard \s4\qj\sa120\sl280
|
|
The motifs in a pattern are numbered sequentially and motif spacing is defined in the following way. When a new motif is added to a pattern the user specifies the "Reference motif" by its number and then a "Relative start position". The "Relative start pos
|
|
iti
|
|
on" is defined by taking the first base of the "Reference motif" as position 1, the next as 2, and so on. Then the user defines the allowed variation in the spacing by specifying the "Number of extra positions". Notice that the position of a motif can be d
|
|
efined relative to any other motif, and that a negative "Relative start position" declares the motif to be to the left of its "Reference motif".\par
|
|
\pard \s4\qj\sa120\sl280 The probability of finding each individual motif in the current sequence, the product of the probabilities for
|
|
all the motifs in a pattern "Probability of finding pattern", and the "Expected number of matches" is calculated and displayed by the program. In addition to the cutoffs used for the individual motifs, users can apply two pattern cutoffs\:
|
|
"Maximum pattern probability" and "Minimum pattern score".\par
|
|
\pard \s4\qj\sa120\sl280 Below we describe\: how to create a pattern; how to use a pattern file to search a sequence; how to use a "File of pattern file names" to search a sequence for a whole library of patterns; how to use a pattern file
|
|
to search a whole library of sequences; how to reformat the PROSITE motif library into a form compatible with these search programs. To describe how to create a pattern file we first show all the steps to make one containing two motifs, and then, to save
|
|
space, the parts specific to the individual motif types are sketched in the notes section.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 1.1\tab Introduction to the PROSITE motif library\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 A library of protein motifs (in our terminology, because they include variable gaps, many would be called patterns) has
|
|
recently become available from Amos Bairoch, Departement de Biochimie Medicale, University of Geneva. Currently it contains over 500 patterns/motifs and arrives on tape or cdrom in two files\:
|
|
a .DAT file and a .DOC file. There is also a user documentation file PROSITE.USR. Here we outline the library structure and what is required to prepare the PROSITE library for use by our programs. A typical entry in the .DAT file is shown in figure 13.1.
|
|
\par
|
|
\pard \s4\qj\sa120\sl280 Each entry has an accession number (in figure 13.1 PS00197), a pattern definition (in figure 13.1 C-x(1,2)-[STA]-x(2)-C-[STA]-\{P\}-C) and a documentation file cross reference (in figure 13.1 PDOC00175). This pattern means\:
|
|
C, gap of 1 or 2, any of STA, gap of 2, C, any of STA, not P, C.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
We need to convert all of these patterns into our pattern definitions (as membership of a set, with the appropriate gap ranges) and write each into a separate pattern file with corresponding "membership of a set" weight matrices. After the conversion each
|
|
pattern file is named accession_number.pat (here PS00197.PAT). The corresponding matrix files are accession_number.wtsa, accession_number.wtsb, etc for however many are needed (here PS00197.WTSA and PS00197.WTSB)\:
|
|
two are needed because of the variable gap.\par
|
|
|
|
n addition we can optionally split the .DAT and .DOC files into separate files, one for each entry, with names accession_number.dat and accession_number.doc. Also we create an index for the library which gives a one line description of each pattern, and en
|
|
ds with the pattern file and do
|
|
cumentation file numbers. The start of the file is shown in figure 13.2. So, refering to figure 13.2, the name of the pattern file for Glycosaminoglycan attachment site is PS00002.PAT, and for the documentation file PDOC00002.DOC\par
|
|
\pard \s4\qj\sa120\sl280
|
|
Finally we create a file of file names for all the patterns in the library. If this file of file names is PROSITE.NAM then to use the complete PROSITE library from program PIP, users select "pattern searcher" and choose the option "use file of pattern file
|
|
names", and give the file name PROSITE.NAM. For any matches found, the accession number and pattern title will be displayed.\par
|
|
\pard\plain \li360\ri440\sl220\pagebb\box\brsp40\brdrth \f4\fs16 ID 2FE2S_FERREDOXIN; PATTERN.\par
|
|
\pard \li360\ri440\sl220\box\brsp40\brdrth AC PS00197;\par
|
|
DT APR-1990 (CREATED); APR-1990 (DATA UPDATE); APR-1990 (INFO UPDATE).\par
|
|
DE 2Fe-2S ferredoxins, iron-sulfur binding region signature.\par
|
|
PA C-x(1,2)-[STA]-x(2)-C-[STA]-\{P\}-C.\par
|
|
NR /RELEASE=14,15409;\par
|
|
NR /TOTAL=69(69); /POSITIVE=63(63); /UNKNOWN=0(0); /FALSE_POS=6(6);\par
|
|
NR /FALSE_NEG=5(5);\par
|
|
CC /TAXO-RANGE=A?EP?; /MAX-REPEAT=1;\par
|
|
CC /SITE=1,iron_sulfur; /SITE=5,iron_sulfur; /SITE=8,iron_sulfur;\par
|
|
DR P15788, FER$APHHA , T; P00250, FER$APHSA , T; P00223, FER$ARCLA , T;\par
|
|
DR P00227, FER$BRANA , T; P07838, FER$BRYMA , T; P13106, FER$BUMFI , T;\par
|
|
DR P00247, FER$CHLFR , T; P07839, FER$CHLRE , T; P00222, FER$COLES , T;\par
|
|
DO PDOC00175;\par
|
|
\pard \li360\ri440\sl220\keepn\box\brsp40\brdrth //\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 13.1\tab A typical entry from the PROSITE library\par
|
|
\pard\plain \li440\ri480\sb300\sl220\box\brsp100\brdrth \f4\fs16 IN-glycosylation site. 00001,00001\par
|
|
\pard \li440\ri480\sl220\box\brsp100\brdrth Glycosaminoglycan attachment site. 00002,00002\par
|
|
Tyrosine sulfatation site. 00003,00003\par
|
|
\pard \li440\ri480\sl220\keepn\box\brsp100\brdrth cAMP- and cGMP-dependent protein kinase phosphorylation site. 00004,00004\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 13.2\tab The start of the index created by the conversion program\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
In order to make the PROSITE library useable by the search programs it is only necessary to run a program named SPLITP3. Two other programs, SPLITP1 and SPLITP2, only make the original files marginally easier to manage and produce an index. SPLITP1 split
|
|
s the PROSITE.DAT file to create a separate file for each entry. Each file is automatically named PSentry_number.DAT. In addition it creates an index for the library (see above).\par
|
|
\pard \s4\qj\sa120\sl280 SPLITP2 performs the same operation for the PROSITE.DOC file, except that no index is created. Files are named PSentry_number.DOC.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
SPLITP3 creates a separate pattern file and weight matrix files for each PROSITE entry from the file PROSITE.DAT. Pattern files are named PSentry_number.PAT, weight matrix files PSentry_number.WTSA, PSentry_number.WTSB, etc. The pattern title is the one li
|
|
ne description of the motif. SPLITP3 also creates a file of file names. Notice that it will ask for a path name so that the path can be included in the file of file names. This is the path to the directory in which the pattern files are stored\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\fi-560\li560\sb240\sa60\sl280\tx560\tx920 \b\f20 2.1\tab Creating a pattern file containing a weight matrix motif and a membership of a set motif.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par
|
|
2.\tab Select "Pattern definition mode" as "Use keyboard".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Select "Results display mode" as "Inclusive". The alternatives are listed in the introduction.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Motif definition mode" as "Weight matrix"\par
|
|
5.\tab Define "Motif name". Each motif can be given an 8 character name\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Define "Weight matrix file name". Type in the name of the file containing the weight matrix. The program will display the probability of finding the motif.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 7.\tab Select "Motif definition mode" as "Membership of a set".\par
|
|
8.\tab Define "Motif name".\par
|
|
9.\tab Select "Logical operator" as "AND". The alternatives are "OR" and "NOT".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Select "Number of reference motif". At this stage the only choice is 1 and this is the default.\par
|
|
11.\tab Define "Relative start position". The base position relative to the "Reference motif". See the introduction.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 12.\tab Define "Number of extra positions".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 13.\tab Select input mode as "Keyboard". The alternative is an existing file in the form of a weight matrix.\par
|
|
14.\tab Define "String". Type in the sets of allowed residue types using the one letter code. See note 1\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 15.\tab Define the "Minimum matches". This is the number of positions within the motif that must match. The default is that
|
|
all positions must match but users may want to allow some flexibility by giving a lower score.\par
|
|
\tab The program now cycles round to step 7 and all subsequent passes round the loop to add further motifs to the pattern would differ only in the details for the different motif "classes".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16.\tab Select "Pattern complete"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 17.\tab Accept "Save pattern in a file". The alternative does not save the pattern and so it can only be used once on the current sequence.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 18.\tab Define "Pattern definition file". Give a name for the new file.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 19.\tab "Define "Pattern title". All patterns can have a 60 character title that can be displayed when the pattern file is read and the sequence searched.\par
|
|
20.\tab Define "Weight matrix file name". The membership of a set motifs are stored in the form of weight matrices, and so the program needs the user to define a file name.\par
|
|
21.\tab Define "Title". Type in a title for the weight matrix like file. The title will be displayed when the file is read.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab The program will now display a detailed textual description of the pattern, the "Probability of finding the pattern" and the "Expected number of matches" (see figure 13.3).\par
|
|
22.\tab Define "Maximum pattern probability". Yes maximum\: any match with a greater probability of being found will be rejected. If no value is specified the search will be quicker (see notes).\par
|
|
\pard\plain \li1240\ri1360\sl220\pagebb\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 Pattern searcher\par
|
|
\pard \li1240\ri1360\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Select pattern definition mode\par
|
|
X 1 Use keyboard\par
|
|
2 Use pattern file\par
|
|
3 Use file of pattern file names\par
|
|
? Selection (1-3) (1) =1\par
|
|
Select results display mode\par
|
|
X 1 Motif by motif\par
|
|
2 Inclusive\par
|
|
3 Graphical\par
|
|
4 SWISSPROT feature table\par
|
|
? Selection (1-4) (1) =2\par
|
|
Select motif definition mode\par
|
|
X 1 Exact match\par
|
|
2 Percentage match\par
|
|
3 Cut-off score and score matrix\par
|
|
4 Cut-off score and weight matrix\par
|
|
5 Direct repeat\par
|
|
6 Membership of set\par
|
|
7 Pattern complete\par
|
|
? Selection (1-7) (1) =4\par
|
|
? Motif name=atp\par
|
|
? Weight matrix file name=atpbinding.wts\par
|
|
ATP binding\par
|
|
Probability of score -47.8010 = 0.302E-04\par
|
|
Select motif definition mode\par
|
|
1 Exact match\par
|
|
2 Percentage match\par
|
|
3 Cut-off score and score matrix\par
|
|
X 4 Cut-off score and weight matrix\par
|
|
5 Direct repeat\par
|
|
6 Membership of set\par
|
|
7 Pattern complete\par
|
|
? Selection (1-7) (4) =6\par
|
|
? Motif name=hydro\par
|
|
Select logical operator\par
|
|
X 1 And\par
|
|
2 Or\par
|
|
3 Not\par
|
|
? Selection (1-3) (1) =\par
|
|
? Number of reference motif (1-1) (1) =\par
|
|
? Relative start position (-1000-1000) (20) =22\par
|
|
? Number of extra positions (0-1000) (0) =5\par
|
|
Select input mode\par
|
|
X 1 Keyboard\par
|
|
2 File\par
|
|
? Selection (1-2) (1) =\par
|
|
Separate sets with commas\par
|
|
? String=ivl,ivl,,,rkhde\par
|
|
? Minimum matches (1.00-5.00) (3.00) =\par
|
|
Probability of score 3.000 = 0.145E-01\par
|
|
Select motif definition mode\par
|
|
1 Exact match\par
|
|
2 Percentage match\par
|
|
3 Cut-off score and score matrix\par
|
|
4 Cut-off score and weight matrix\par
|
|
5 Direct repeat\par
|
|
X 6 Membership of set\par
|
|
7 Pattern complete\par
|
|
? Selection (1-7) (6) =7\par
|
|
? Save pattern in a file (y/n) (y) =\par
|
|
? Pattern definition file=_paper.pat\par
|
|
? Pattern title=atpbinding plus\par
|
|
? Weight matrix file name=_hydro.wts\par
|
|
Weight matrix needs a title\par
|
|
? Title=hydrophobic and + spot\par
|
|
Pattern description\par
|
|
atpbinding plus\par
|
|
Motif 1 named atp is of class 4\par
|
|
Which is a match to a weight matrix with score -47.801\par
|
|
Motif 2 named hydro is of class 6\par
|
|
Which is membership of a set with score 3.000\par
|
|
It is anded with the previous motif.\par
|
|
Probability of finding pattern = 0.4368E-06\par
|
|
Expected number of matches = 0.1350E-02\par
|
|
? Maximum pattern probability (0.00-1.00) (1.00) =\par
|
|
? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par
|
|
{\f22\fs18 162\par
|
|
} GQRELIIGDRQTGKTALAIDAIINQR\par
|
|
Total matches found 1\par
|
|
\pard \li1240\ri1360\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Minimum and maximum observed scores -38.35 -38.35\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa300\sl240\tx1140 \f21\fs20 Figure 13.3\tab The creation and use of a pattern containing a weight matrix motif and a membership of a set motif.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 23.\tab
|
|
Define "Minimum pattern score". A minimum pattern score only makes sense if all the motifs in the pattern are defined with compatible scoring methods. For example membership of a set motifs and weight matrices using sums of logs are incompatible. Searc
|
|
hing will now commence and any matches displayed using the chosen method. In figure 13.3 we show a typical run i
|
|
n which a pattern containing a weight matrix and a membership of a set motif is created and stored on disk. Figure 13.4 shows the contents of the pattern file. \par
|
|
\pard\plain \li2260\ri2380\sb200\sl220\box\brsp100\brdrth \f4\fs16 atpbinding plus \par
|
|
\pard \li2260\ri2380\sl220\box\brsp100\brdrth A4 atp Class \par
|
|
atpbinding.wts \par
|
|
A6 hydro Class \par
|
|
1 Relative motif\par
|
|
22 Relative start position\par
|
|
5 Number of extra positions\par
|
|
\pard \li2260\ri2380\sl220\keepn\box\brsp100\brdrth _hydro.wts \par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa40\sl240\tx1140 \f21\fs20 Figure 13.4\tab The pattern file created in the worked example shown in figure 13.3.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Searching a sequence using a pattern file\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par
|
|
2.\tab Select "Pattern definition mode" as "Use pattern file".\par
|
|
3.\tab Select "Results display mode" as "Inclusive"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
|
|
Define "Pattern definition file". Type the name of the file containing the pattern. The program will read the file then display its title, a detailed textual description of the pattern, the "Probability of finding the pattern", and the "Expected number
|
|
of matches".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Define "Maximum pattern probability". \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab D
|
|
efine "Minimum pattern score". Searching will now commence and any matches displayed using the chosen method. Figure 13.5 shows a typical run using a pattern file and output in the form of a SWISSPROT feature table.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Comparing a sequence against a library of patterns including PROSITE\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This mode of operation allows a sequence to be searched, in turn, for any number of patterns each stored in a separate pattern file. The names of the files containing the individual patterns must be stored in a simple text
|
|
file. This file is called "a file of pattern file names" and its name is the only user input required to define the search. The file of file names could contain references to entries in the PROSITE motif library and also include the names of other patterns
|
|
.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Pattern searcher"\par
|
|
2.\tab Select "Pattern definition mode" as "Use file of pattern file names".\par
|
|
3.\tab Select "Results display mode" as "Inclusive"\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "File of pattern file names". Type the name of the file containing the list of pattern file na
|
|
mes. The program will read the file and then, in turn, all the pattern files it names. Each of these patterns will be compared against the current sequence but only those that give matches will produce any output. The pattern title and each match will be d
|
|
isplayed.\par
|
|
\pard\plain \li1240\ri1360\sb320\sl220\box\brsp40\brdrth \f4\fs16 Pattern searcher\par
|
|
\pard \li1240\ri1360\sl220\box\brsp40\brdrth Select pattern definition mode\par
|
|
X 1 Use keyboard\par
|
|
2 Use pattern file\par
|
|
3 Use file of pattern file names\par
|
|
? Selection (1-3) (1) =2\par
|
|
? Pattern definition file=_paper.pat\par
|
|
Select results display mode\par
|
|
X 1 Motif by motif\par
|
|
2 Inclusive\par
|
|
3 Graphical\par
|
|
4 SWISSPROT feature table\par
|
|
? Selection (1-4) (1) =4\par
|
|
ATP binding sequences\par
|
|
Probability of score -47.8010 = 0.302E-04\par
|
|
hydrophobic and + spot\par
|
|
Probability of score 3.0000 = 0.145E-01\par
|
|
\par
|
|
Pattern description\par
|
|
\par
|
|
atpbinding plus\par
|
|
Motif 1 named atp is of class 4\par
|
|
Which is a match to a weight matrix with score -47.801\par
|
|
Motif 2 named hydro is of class 6\par
|
|
Which is membership of a set with score 3.000\par
|
|
It is anded with the previous motif.\par
|
|
Probability of finding pattern = 0.4368E-06\par
|
|
Expected number of matches = 0.1350E-02\par
|
|
? Maximum pattern probability (0.00-1.00) (1.00) =\par
|
|
? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par
|
|
\par
|
|
FT atp 162 187 Program\par
|
|
\par
|
|
Total matches found 1\par
|
|
\pard \li1240\ri1360\sl220\keepn\box\brsp40\brdrth Minimum and maximum observed scores -38.35 -38.35\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 13.5\tab Worked example of using a pattern file to search a sequence, and writing the results in the form of a SWISSPROT feature table.\par
|
|
\pard\plain \s6\sa60\sl280\tx560\tx860 \b\f20 \page 2.4\tab Searching libraries for patterns\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The program PIPL can be used to search whole sequence
|
|
libraries for patterns. Its use is similar to the pattern search routine described above, except that it does not have the facility for creating pattern files, so they must be created beforehand using PIP. In addition to its obvious application of finding
|
|
new occurrences of patterns or checking on their frequency it is a useful way of obtaining sequence alignments. It can restrict its search to a list of named entries or can search all but those on a list of entries. It can restrict its output to showing t
|
|
he highest scoring match in each sequence, but by default it will show all matches.\par
|
|
\pard \s4\qj\sa120\sl280
|
|
Of its modes of output two require further description. The first "Padded sections" creates a new file for each match. The file will contain the sequence between and including the two outermost motifs in the pattern. It will be gapped to the furthest exten
|
|
t defined by the pattern, which means that if all the files were subsequently written one above the other all the motifs in the pattern would be exactly aligned, with the s
|
|
ections between them containing the requisite numbers of padding characters. The second such mode of output is called "Complete padded sequences". Here the user must know the maximum distance between the leftmost motif and the start of all the sequences th
|
|
at match. A trial run in which only the positions of matches are reported is usually required. The user gives this maximum distance to the program. The program then writes a new file containing the full length of all matching sequences, again maximally gap
|
|
ped (including their left ends) so that they would all align if written above one another. For both of these modes of output the files created are named "entryname" where "entryname" is the name given to the sequence in the sequence library. These modes ar
|
|
e best used with the option "Report all matches" rejected, so that only the best match for each sequence is reported. The sequences can be lined up using the sequence assembly program SAP.\par
|
|
\pard \s4\qj\sa120\sl280 The searches, which have recently been recoded, are very rapid. For
|
|
example a search of the current SWISSPROT library for a pattern defining the globin family as 6 weight matrices with widely varying gaps, finds only globins and takes less than 4 minutes using a single processor on an Alliant FX2800. This time includes re
|
|
ading in the whole library as stored in EMBL CDROM format.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select PIPL.\par
|
|
2.\tab Define "Name for results file."\par
|
|
3.\tab Select a library.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Select "Search whole library". The alternatives are "Search only a list of entries" and "Search all but a list of entries"
|
|
. The files containing the list of entries should contain one entry name per line, left justified.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab Select "Results display mode" as "Inclusive". The alternatives include "Motif by motif", "Scores only", "Complete padded sequences" and "Padded sections".\par
|
|
6.\tab Accept "Report all matches". The alternative only shows the best match for each sequence.\par
|
|
7.\tab Define "Pattern definition file". The name of the file containing the pattern created using PIP. \par
|
|
\tab The program displays a textual description of the pattern and the expected number of matches per 1000 residues assuming an average amino acid composition.\par
|
|
8.\tab Define "Maximum pattern probability". The program will run much more quickly if none is given.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab Define "Minimum pattern score".\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The search will start.\par
|
|
A typical run is shown in figure 13.6\par
|
|
\pard\plain \li1120\ri1280\sb200\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 PIPL (Protein interpretation program (library)) V4.1 Jul 1991\par
|
|
\pard \li1120\ri1280\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Author\: Rodger Staden\par
|
|
Searches protein libraries for patterns of motifs\par
|
|
\par
|
|
? Name for results file=globin.res\par
|
|
Select a library\par
|
|
1 EMBL nucleotide library \par
|
|
X 2 SWISSPROT protein library \par
|
|
3 Personal file in PIR format \par
|
|
4 Personal file in FASTA format \par
|
|
? Selection (1-4) (2) =\par
|
|
Library is in EMBL format with indexes\par
|
|
Select a task\par
|
|
X 1 Search whole library \par
|
|
2 Search only a list of entries \par
|
|
3 Search all but a list of entries \par
|
|
? Selection (1-3) (1) =\par
|
|
Select results display mode\par
|
|
X 1 Motif by motif \par
|
|
2 Inclusive \par
|
|
3 Scores only \par
|
|
4 Complete padded sequences\par
|
|
5 Padded sections \par
|
|
? Selection (1-5) (1) =5\par
|
|
? (y/n) (y) Report all matches n\par
|
|
? Pattern definition file=globin.pat\par
|
|
globin 1 \par
|
|
Probability of score -34.5300 = 0.197E-02\par
|
|
globin 2 \par
|
|
Probability of score -44.6000 = 0.409E-02\par
|
|
globin 3 \par
|
|
Probability of score -75.1000 = 0.293E-01\par
|
|
globin 4 \par
|
|
Probability of score -36.1000 = 0.147E-01\par
|
|
globin 5 \par
|
|
Probability of score -73.7000 = 0.375E-01\par
|
|
globin 6 \par
|
|
Probability of score -55.9000 = 0.483E-01\par
|
|
\par
|
|
Pattern description\par
|
|
Globin pattern file \par
|
|
Motif 1 named g1 is of class 4\par
|
|
Which is a match to a weight matrix with score -34.530\par
|
|
Motif 2 named g2 is of class 4\par
|
|
Which is a match to a weight matrix with score -44.600\par
|
|
and the N-terminal residue can take positions 17 to 22\par
|
|
relative to the N-terminal end of motif 1\par
|
|
It is anded with the previous motif.\par
|
|
Motif 3 named g3 is of class 4\par
|
|
Which is a match to a weight matrix with score -75.100\par
|
|
and the N-terminal residue can take positions 27 to 35\par
|
|
relative to the N-terminal end of motif 2\par
|
|
It is anded with the previous motif.\par
|
|
Motif 4 named g4 is of class 4\par
|
|
Which is a match to a weight matrix with score -36.100\par
|
|
and the N-terminal residue can take positions 29 to 53\par
|
|
relative to the N-terminal end of motif 3\par
|
|
It is anded with the previous motif.\par
|
|
Motif 5 named g5 is of class 4\par
|
|
Which is a match to a weight matrix with score -73.700\par
|
|
and the N-terminal residue can take positions 12 to 16\par
|
|
relative to the N-terminal end of motif 4\par
|
|
It is anded with the previous motif.\par
|
|
Motif 6 named g6 is of class 4\par
|
|
Which is a match to a weight matrix with score -55.900\par
|
|
and the N-terminal residue can take positions 29 to 33\par
|
|
relative to the N-terminal end of motif 5\par
|
|
It is anded with the previous motif.\par
|
|
Probability of finding pattern = 0.6273E-11\par
|
|
Expected number of matches per 1000 residues = 0.2119E-03\par
|
|
? Maximum pattern probability (0.00-1.00) (1.00) =\par
|
|
\pard \li1120\ri1280\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth ? Minimum pattern score (-9999.00-9999.00) (-9999.00) =\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 13.6\tab A typical run of PIPL using a pattern of 6 weight matrices to search the SWISSPROT library.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Preparing the PROSITE motif library for use by the programs\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 Only the program SPLITP3 is essential for preparing the PROSITE library for use by our programs. \par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select SPLITP3\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Prosite library file". Type the name of the file containing the prosite library (usually PROSITE.DAT).\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
Define "Name for file of pattern file names". This is the file of file names that users will employ to search the whole library. It will be convenient for them if an environment variable is defined for this file name.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Path name of motif directory". This is the full path name, including the final /, to the directory in which the converted library will be stored.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
|
|
The "exact match" motif class requires a consensus sequence. The "percentage match" motif class requires a consensus sequence and a cutoff score. The "score matrix" motif class uses the MDM78 matrix and requires a consensus sequence and a cutoff score.
|
|
The "weight matrix" search only requires the name of the file containing the matrix. The "direct repeat" motif class requires a repeat length, the minimum and maximum gap between the t
|
|
wo occurrences of the repeat, and a minimum score. The "membership of a set" motif class defines sets of residue types that are allowed at each position in the motif. When they are first entered into the pattern they are normally typed on the keyboard, but
|
|
when they are stored in a file, they are written in the same format as a weight matrix. To enter them on the keyboard use the following format. Type the one letter codes for the set of residue types allowed at each position terminated by a comma (,). For
|
|
positions where any residue type is allowed simply type an extra comma. For example VLI,FY,,,DE means any of Valine, Leucine or Isoleucine in the first position, either Phenylalanine or Tyrosine in the next position, anything in the next two positions, and
|
|
Aspartic acid or Glutamic acid in the next. When the pattern is stored on the disk the program will request a name for the file and a title for the motif.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab The details of the probabilty calculations are outside the scope of this article. They are quite
|
|
rapid and are essential both for assessing the statistical significance of any matches found and for allowing meaningful cutoffs to be applied to patterns. Obviously, in general, cutoff scores are inappropriate for patterns containing a mixture of motif cl
|
|
asses.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
The program calculates the "Probability of finding the pattern" and the "Expected number of matches". The first figure is actually the product of the individual motif probabilities but the latter figure is more useful because it takes into accoun
|
|
t the allowed variation in spacing between motifs and the length of the current sequence. In both cases the composition of the current sequence is also used so that different probabilities would be calculated for other sequences.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
|
|
The pattern definition system is very flexible. Assume that a laboratory has a large library of patterns stored in its computer. Different groups or users may want to screen their sequences against different subsets of a pattern library. Each group ther
|
|
efore uses its own "File o
|
|
f pattern file names" which contains only the names of the pattern files that are relevant to their sequences. Of course a pattern may contain only one motif. Hence a library of patterns can include both simple and complex patterns. In the same way a labor
|
|
atory may have a large library of weight matrices defining different motifs and different users may want to combine them in different ways to produce their own patterns.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab Also, of course, a library does not have to be used solely for performing mass screenings\:
|
|
each individual entry can be used as a single pattern by giving the name of its pattern file - eg pathname/PS00002.PAT.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab
|
|
Note that 5 of the PROSITE motifs contains the symbols > or < which means that the motifs must appear exactly at the N or C termini of the sequences. Currently our methods have no mechanism for such definitions and, for example KDEL motifs, will be perm
|
|
itted to occur anywhere throughout a sequence.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Staden, R. 1988. Methods to define and locate patterns of motifs in sequences. {\i CABIOS} {\b 4(1)}\:53-60.\par
|
|
2.\tab Staden, R. 1989. Methods for calculating the probabilities of finding patterns in sequences. {\i CABIOS} {\b 5(2)}\:89-96.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Staden, R. 1990. Searching for patterns in protein and nucleic acid sequences. (in) {\i Methods in Enzymology} R.F. Doolittle (ed.), {\b 183}\:193-211 (Academic Press, New York).\par
|
|
\pard\plain \s2\qc\sa200\sl480 \b\f20\fs36 \page 14. Comparing Sequences\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 Table of contents\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Introduction\par
|
|
2.\tab Methods\par
|
|
\pard \s7\qj\fi-560\li1700\sa120\sl280\tx1700 2.1\tab Producing a dot matrix plot (or list) of exact matches\par
|
|
2.2\tab Producing a dot matrix plot using the proportional algorithm\par
|
|
2.3\tab Producing a dot matrix plot using the quick scan algorithm\par
|
|
2.4\tab Producing a list of all matching segments using the proportional algorithm\par
|
|
2.5\tab Calculating the expected scores for the proportional algorithm\par
|
|
2.6\tab Calculating the observed scores for the proportional algorithm\par
|
|
2.7\tab Producing an optimal alignment\par
|
|
2.8\tab Comparing a sequence against a library of sequences\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab Notes\par
|
|
4.\tab References\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 1.\tab Introduction\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 In this chapter we describe methods for comparing and aligning pairs of nucleic acid or protein
|
|
sequences. The program described (SIP), the original version of which was first described in 1982 (1), is based around several methods for producing "dot matrix" plots and includes routines for assessing the statistical significance of the plots, plus a d
|
|
ynamic programming algorithm for finding optimal alignments. At the end of the chapter we describe a program SIPL that is used for comparing a single sequence against a whole library of sequences.\par
|
|
\pard \s4\qj\sa120\sl280 We assume the reader is familiar with the general principl
|
|
e of dot matrix diagrams. The program uses a number of different algorithms to calculate the score for each point in a dot matrix and the user defines a minimum score so that only those points in the diagram for which the score is at least this value will
|
|
be marked with a dot. The first scoring method finds uninterrupted sections of perfect identity i.e. those that contain no mismatches, insertions or deletions. Generally this method, termed "the identities algorithm" is of limited value, but runs very qui
|
|
ckly. \par
|
|
\pard \s4\qj\sa120\sl280
|
|
The second method looks for sections where a proportion of the characters in the sequence are similar, again allowing no insertions or deletions. For a thorough analysis this method, termed "the proportional algorithm", is the best. The original method, o
|
|
f this type was first described by McLachlan (2) and involves calculating a score for each position in the matrix by summing points found when looking forwards and backwards along a diagonal line of a given length (the window). The algorithm does no
|
|
t simply look for identity but uses a score matrix that contains scores for every possible pair of characters. For comparing amino acid sequences we usually use the score matrix MDM78 (3) which is shown in figure 14.1.. It is also possible to use other ma
|
|
trices, including an identity matrix for proteins. For nucleic acids we usually use an identity matrix.\par
|
|
\pard\plain \li220\ri280\sl220\box\brsp100\brdrth \f4\fs16 C S T P A G N D E Q B Z H R K M I L V F Y W - X ? \par
|
|
\pard \li220\ri280\sl220\box\brsp100\brdrth C 22 10 8 7 8 7 6 5 5 5 5 5 7 6 5 5 8 4 8 6 10 2 10 10 10 10\par
|
|
S 10 12 11 11 11 11 11 10 10 9 10 10 9 10 10 8 9 7 9 7 7 8 10 10 10 10\par
|
|
T 8 11 13 10 11 10 10 10 10 9 10 10 9 9 10 9 10 8 10 7 7 5 10 10 10 10\par
|
|
P 7 11 10 16 11 9 9 9 9 10 9 10 10 10 9 8 8 7 9 5 5 4 10 10 10 10\par
|
|
A 8 11 11 11 12 11 10 10 10 10 10 10 9 8 9 9 9 8 10 6 7 4 10 10 10 10\par
|
|
G 7 11 10 9 11 15 10 11 10 9 10 10 8 7 8 7 7 6 9 5 5 3 10 10 10 10\par
|
|
N 6 11 10 9 10 10 12 12 11 11 12 11 12 10 11 8 8 7 8 6 8 6 10 10 10 10\par
|
|
D 5 10 10 9 10 11 12 14 13 12 13 12 11 9 10 7 8 6 8 4 6 3 10 10 10 10\par
|
|
E 5 10 10 9 10 10 11 13 14 12 12 13 11 9 10 8 8 7 8 5 6 3 10 10 10 10\par
|
|
Q 5 9 9 10 10 9 11 12 12 14 11 13 13 11 11 9 8 8 8 5 6 5 10 10 10 10\par
|
|
B 5 10 10 9 10 10 12 13 12 11 13 11 11 10 10 8 8 6 8 5 7 4 10 10 10 10\par
|
|
Z 5 10 10 10 10 10 11 12 13 13 11 14 12 10 10 8 8 8 8 5 6 4 10 10 10 10\par
|
|
H 7 9 9 10 9 8 12 11 11 13 11 12 16 12 10 8 8 8 8 8 10 7 10 10 10 10\par
|
|
R 6 10 9 10 8 7 10 9 9 11 10 10 12 16 13 10 8 7 8 6 6 12 10 10 10 10\par
|
|
K 5 10 10 9 9 8 11 10 10 11 10 10 10 13 15 10 8 7 8 5 6 7 10 10 10 10\par
|
|
M 5 8 9 8 9 7 8 7 8 9 8 8 8 10 10 16 12 14 12 10 8 6 10 10 10 10\par
|
|
I 8 9 10 8 9 7 8 8 8 8 8 8 8 8 8 12 15 12 14 11 9 5 10 10 10 10\par
|
|
L 4 7 8 7 8 6 7 6 7 8 6 8 8 7 7 14 12 16 12 12 9 8 10 10 10 10\par
|
|
V 8 9 10 9 10 9 8 8 8 8 8 8 8 8 8 12 14 12 14 9 8 4 10 10 10 10\par
|
|
F 6 7 7 5 6 5 6 4 5 5 5 5 8 6 5 10 11 12 9 19 17 10 10 10 10 10\par
|
|
Y 10 7 7 5 7 5 8 6 6 6 7 6 10 6 6 8 9 9 8 17 20 10 10 10 10 10\par
|
|
W 2 8 5 4 4 3 6 3 3 5 4 4 7 12 7 6 5 8 4 10 10 27 10 10 10 10\par
|
|
- 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
|
|
X 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
|
|
? 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
|
|
\pard \li220\ri280\sl220\keepn\box\brsp100\brdrth 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 14.1\tab The amino acid score matrix MDM78.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
For the proportional method plotting dots at the centres of windows that reach the cutoff leads to a persistence effect that, to some extent, can be mitigated by a variation on the method. If, for example, all the high scoring amino acids are clustered at
|
|
the left end of a particular diagonal segment, dots will continue to be plotted to their right until the window score drops below the cutoff. Instead of plotting a single point for each window that reaches the cutoff score, the variant method plots p
|
|
oints for all the identities that lie in windows that reach the cutoff. Obviously the persistence effect can be more pronounced for long windows and low cutoff scores, but note that the variant method will plot nothing if there are no identities present, a
|
|
nd so similar regions could be missed! A further variant, useful for comparing a sequence against itself, ignores the main diagonal.\par
|
|
\pard \s4\qj\sa120\sl280 The third comparison method called "quick scan" is really a combination of the first two, and is similar to the FASTP prog
|
|
ram of Lipman and Pearson (4), but produces a dot matrix diagram. The algorithm is as follows. The dot matrix positions are found for all words of some minimum length (obviously length 1 is most sensitive) that are common to both sequences. Imagine a diago
|
|
nal line running from corner to corner of the diagram, at right angles to the diagonals in the dot matrix, The scores for the common words (according to the current score matrix, e.g. MDM78) are accummulated at the appropriate positions on that imaginary l
|
|
ine, hence producing a histogram. The histogram is analysed to find its mean and standard deviation. The diagonals that lie above some cutoff score (defined in standard deviation units), are rescanned using the proportional algorithm, and a diagram produce
|
|
d. The method is very fast, and is also employed by the library comparison program (see below).\par
|
|
\pard \s4\qj\sa120\sl280 \par
|
|
\pard \s4\qj\sa120\sl280 The dynamic programming alignment algorithm contained in the program is based on that of Myers and Miller (5). It guarantees to produce alignments with the opt
|
|
imum score given a score matrix, a gap start penalty, and a gap extension penalty. It is very useful to have the dot matrix methods and the alignment routine together in the same program because it allows users to produce a dot matrix diagram to help selec
|
|
t which regions of the sequence they wish to align. Selection is made by use of the crosshair. The crosshair is positioned first at the bottom left hand end of the segment to be aligned and then at the top right of the segment. When the alignment routine i
|
|
s selected the segment will be aligned. The alignment can replace the original segment of the sequence. By repeated plotting of dot matrices, followed by alignment, very long sequences can easily be aligned. \par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 2.\tab Methods\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.1\tab Producing a dot matrix plot (or list) of exact matches\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This method is relatively fast and can be useful for very similar sequences. It marks the position of every exact match of some minimum length with a dot or lists out the matching segments.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Apply identities algorithm".\par
|
|
2.\tab Define "Identity score". \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3.\tab
|
|
Select "Plot or List". The plot will appear as in figure 14.2, which shows a comparison of two protein sequences using a score of 2. Listed output displays the matching segments and defines their positions. \par
|
|
\pard\plain \li1700\sb300\sl220\keepn \f4\fs16 {{\pict\macpict\picw283\pich299
|
|
112800000000012b011b001102ff0c00fffe0000003cb4bc003cb4bc0000000000fc00ef000000000001000a0000000000fc00ef0098801e0000000000fc00ef0000000000000000003cb4bc003cb4bc00000001000100010000000000000000000000000048c23f000000010000ffffffffffff0001000000000000000000
|
|
0000fc00ef0000000000fc00ef000002e30006003fe5ff00f80f0020f6000020f8000020fc000104080d0020fa000302000004f0000048060020e50000081b0020fe00042000000802fb0002100002fe000040fd00031000000817012008fc000004fd000020fe00011001f9000301000008100020fb000001f600018080fa
|
|
00011008150020fe00014002f5000308000004fd000304000008060020e5000008110020f2000312010008fe000020fc0000080e0020f3000020fe000040f8000008060020e50000080d0320000020f3000001f7000008130020fb000040fa000080f90005200000100008110320000080f700042800000440f70000080b04
|
|
2002000001ea00010808110020f7000080f90002012048fc000102080f0020fc000001f60000a0f8000102080a0020fb000020ec0000481c05200010000001fe000010fd000008fc0002010004fe0003800000080c012001f800010202f10000080b0020ea000002fe00010408140320000020f6000302000004fb000008fe
|
|
00000816022010c0f9000020fc00040200100008fc0002200008130020fc000002fe000080fa000010f800010808150320000010fe000002f9000048fd000020fa000008160020fd0005100000040080fa000008fa0003040080080c0020f20002040080f7000008140020fe00010104f600010440fb000001fe0000081200
|
|
20f700041000808010f8000010fe0000080a0020ef000008f80000080a0020fa000010ed000008110020f3000040fb000710100000080000080a0020ee000010f90000080c012080fa00010802ef000008110020f6000040f8000780008000080000080e0020f5000002fe000010f60000080e0020f5000002fe000030f600
|
|
0008180620200000100020fd00041081808010f8000010fe000008110020f6000020f800072000000400000408060020e50000080c0020f60002200004f300000814012410f60002200040fb000010fe000308000008100022f9000008f40006400400100010080a0020fd000080ea0000080c0020fd00010888ec00010108
|
|
110320800004fe00040400001810f0000008130020f900010404fe000001fe000001f7000008060020e5000008100022f9000008f4000640060010000408080020e700020200080e0320800008fc00010802ef0000080a0020f5000008f2000008150020fd000308000040f6000080fd000408000001280d02200020fd0001
|
|
0108ed000008150020fd000001fe000010fd000008f6000380000008160020f8000340000042fe00011002fe000002fb0000080a0020f3000080f40000080a0020f6000010f1000008080020e70002020008190020fe00014002fe0002100008f900044004000010fd000008140620200000100020fc000081f5000004fe00
|
|
00080c0020fb000080ee0002040008100020fb000080f200010240fe00010108150020fa00018001fc0002040080fe000008fa0000080a0030f7000002f0000008100020fe000080fe0002010001ef000088100020fb000080f20006024000000401080c0020fa000021ef0002200008160020fe00042000000802fb000010
|
|
fc0000c0fa0000080d0020f7000320000001f3000008150020fc000088fd000080f90002012008fc000102081a0320000020fa00070400400002000004fd0002080008fe0000080d0320800004fe000004ec000008130020fa00018001fc000004fd000020f9000008140020f30002400080fe00010810fe00030800000814
|
|
0620200000100020fc000081f5000004fe0000080e0020f5000080f8000001fc000028120020f8000008fe000004f9000002fc000008100020fb000080f200010240fe000101080d012008ec0002080002fe000008060020e5000008100020f5000080fa0002800001fc000028060020e50000080a0020ee000080f9000008
|
|
0a0020f0000004f7000008120020fe000002f70000c0fa000006fc00000811042000020002f1000080fd000320000008160020fd000004fc000020fc0002020004fa0002200008060020e5000008180020fe0004200000080afb000030fe0002400040fa00000816042000800040fe000040f5000010fe000020fe0000080a
|
|
0020ee000080f90000080f0020f00002080080fc000304000008140020fd00051000000400c0fa000018f800018008120030f7000002fb000080fe000080fb000008060020e50000080f05208010000008f3000080f9000008160020fe000080fe000080f9000020fa000401000200080e0020f5000008f6000004fe000008
|
|
11072000200008000001fb000040f30000080d0320000080f9000020f10000081c042000040001fc000320000804fe000004fd000702400001404000080a0020fc000010eb0000080e0020fa000010f7000004f80000080d0020fa000004f0000380004008160020fe00046002000802fb000010fc000044fa000008120020
|
|
fb000010fc000010fc000010f80000080a0020f5000001f2000008150020fd000411c8000020fe000302000020f5000008150020fc000080f90002010040fb00050400008000081e0320000002fe00071000002000081082fe00040210000002fd000280000812052010c0000004f5000340100008fa0000080b012802eb00
|
|
0004fd0000081b042000040001fc000320000804fc000a40020002400000404000080a0020fc000010eb000008090320000008e8000008120020f700042000000108fa000080fc000008060020e5000008100020fa000021f7000008fa0002200008180020fc000080f90005010040000004fe0005040400800008110020f2
|
|
000304000020fc00040420000008140020fe000080fc000002fe000010f50002100008140920020000011140000020fb000020f600010808140020fe000080fc000002fe000010f500021010080a0020f4000010f3000008060020e5000008120020fe000001f5000004fa000001fe0000081002200004fe000080f6000004
|
|
f7000008140020fd0005100000040080fa000008f8000180081b0320000002fb00042000081082fe000006fe000002fd00028000081c0020fc000040fd000610008080100010fe000004fe000010fe0000080b0020f200018080f6000008090320000010e8000008150320000010fe000002f9000048fd000020fa0000080a
|
|
0020f2000040f500000807012004e6000008140020f90002100008f9000040fe000010fd000008060020e5000008140320000020f6000302000004fb00000cfe0000080e022010c0f10002100008fa0000080e0020fd000040ef000008fd0000081402200020fe00040201000010f9000001f800000810012020f800010180
|
|
fb000002f8000008140020fe000080fe0002010001f8000080f9000088060020e5000008090320000080e80000080a0020f2000002f5000008070020e6000120080b042002001001ea00010808120020f9000008fb000020fb000081fc000008190020fe000080fe000080f9000020fd000304000001fe0000081103200000
|
|
80f7000028fe000040f700000811072000200040020001f4000004fa000008100020ef000640020000080002fe0000081a042000800040fe000040fa000004fd000010fe000020fe0000080d0020fa000004f0000380004008120020f8000001fc000001fc000010fb000008130020fb000001fd000010fb000080f9000110
|
|
08110320000004fe00040400001810f00000080d0320800004fe000004ec0000080e02200010f00002010004fb000008140022f900000cfd000002f9000640040010000008060020e5000008060020e5000008160020fc000020fe000040fb00040810000040fa000008130020fc000002fe000080fa000010f80001080811
|
|
072000200040020001f4000004fa000008160020fe0002200008fc000008fb0002400202fa0000080c0020ed00010208fc000140081402200004fe000080f4000004fd000004fe0000080f0020fe00014002f2000004fa000008180020fd00014002fe000080fa000010fc000008fe00010808130020fc000001f60004a000
|
|
000208fc000142080c0020eb0002080002fe000008060020e50000080c0020f4000040f50002040008180020fe0002020002fe000010fd000040fa000004fc0000081002200020fd000001f6000001f8000008120021fd000040fc000080f5000008fd000008190020fd000610000004008080fb00040800012008fc000182
|
|
081605200000010004fb000020fd000340008018f9000008150020fc000080f90002010040fb0005040000800008100022f9000008f40006500400100200080a0030f7000001f0000008110020f3000040fb000010fe000308000008130022f9000308000020f7000640040010000008060020e50000080f0020fc00010402
|
|
f000044000080008160020f5000010fe000020fe00080100200002010000080a0020f1000020f6000008110320000804fe00040400001810f0000008060020e50000081a0020fe000080fe000080fe000020fd000020fa000001fe000008130320000080f90002200028fe000040f70000080f0028f90002220008f3000304
|
|
021008110320000080f7000028fe000040f7000008140020fd00018001f60000a0fe000040fc000102080c0020fa000021ef0002200008110020fd000302000010fd000001f2000008140620000080000088fa000028fe000040f70000080a0020e9000004fe00000812012020f7000080fb000302000040fb0000080b0520
|
|
8010000008ea000008120020fd000411c8000020fb000020f5000008120020fc000010fb000080f8000001fc000028140020fc0005020002008004fb000010f800010848160020fe00044002000010fd000001fa000004fa0000080e0020fd000004f4000004f8000008120020fd0002040001f700014080f9000110081100
|
|
20fc000304000040f9000001f70000081605200000010004fb000030fd000340000010f9000008120020fe000002f7000040fa000004fc000008140020fd000040f7000304002020fb0003200000081302200004fe00018080f200010240fe00010108150020fd000001fe000010fd000008f60003800000080a0020f20000
|
|
80f50000081102200044fe00018220f5000040f90000080c02200040ed000004fc0000080d0020f0000008fa0003040000080e0020fe000008f8000040f3000008160320810004fe000004fe00010202f8000004fb0000080a0020f6000040f100000813012001f800018002fa0002012008fc000102080c0020ed00010208
|
|
fc000140080e0020fc000088f0000040fd0000080a0020e9000008fe0000080e0020fa000008f1000008fe0000081605200000010004fb000020fd000340400010f90000080a0020e9000004fe000008160020fe000080fe000080f90000a0fa000001fe000008090320040010e8000008060020e50000080a0020fa000008
|
|
ed0000081605200000010004fb000020fd000340400010f90000081c0320000002fb00042800081086fe000002fe00010202fe00028000080a0020f3000080f4000008120020fc000040f7000010fe000004fa0000080e0020fd000004f4000004f80000081b0020fe000620000008020008fe00010410fc0002400002fc00
|
|
00080a0020f2000008f5000008060020e5000008140020fb000008f9000304020020fb0003300000081408202000001000200021fe000081f30002200008190020fe00042000080802fe000308000010fc000042fa0000080f0020f90002220008f30003040000081002200020fd000001f6000001f80000080f012001fb00
|
|
0008fe000002f1000008190020fc000002fe000080fa000010fe000080fe0003080008081b0020fd0008100000040080000080fd000008fd000001fd0001802806003fe5ff00f80000ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb80\sa400\sl240\tx1140 \f21\fs20 Figure 14.2\tab A dot-matrix for two related protein sequences using the "Identities algorithm" and a score of 2. Notice that the similarity is not apparent. \par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.2\tab Producing a dot matrix plot using the proportional algorithm\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This method gives the most thorough analysis.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Apply proportional algorithm".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Odd window length". The size of window over which the scores for each point are summed.\par
|
|
3.\tab Define "Proportional score". All points achieving at least this score will be marked with a dot in the diagram.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 14.3.\par
|
|
\pard\plain \qj\li1700\sb300\sl480\keepn \f4\fs16 {{\pict\macpict\picw283\pich301
|
|
08a200000000012d011b001102ff0c00fffe0000003c32b0003c32b00000000000fc00ed000000000001000a0000000000fc00ed0098801e0000000000fc00ed0000000000000000003c32b0003c32b000000001000100010000000000000000000000000048ae57000000010000ffffffffffff0001000000000000000000
|
|
0000fc00ed0000000000fc00ed000002e30006007fe5ff00f0060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100b0040f200010180f60000100a0040f2000003f50000100a0040f2000006f50000100a0040f2000004f50000100d0340000020f5000008f5000010090340000020e800
|
|
0010060040e5000010090340000080e80000100802400001e70000100802400003e7000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000
|
|
10060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f2000060f50000100a0040f2000040f5000010060040e50000100a0040ea000040fd0000100c0040ec0002040080fd000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040
|
|
e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040eb000080fc0000100a0040ec000001fb0000100a0040ec000002fb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000
|
|
10060040e50000100a0040f9000010ee0000100a0040f9000030ee0000100a0040f9000060ee0000100a0040f90000c0ee0000100e0040f9000080fc000020f40000100a0040eb000040fc000010060040e5000010060040e5000010060040e5000010060040e50000100a0040ee000004f9000010060040e5000010060040
|
|
e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f0000002f7000010060040e5000010
|
|
060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010070040e600018010060040e5000010060040e50000100b0040fd000101
|
|
80eb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f4000002f30000100a0040f4000006f30000100a0040f400000cf30000100a0040f4000008f30000100a0040f4000010f30000100a0040f4
|
|
000030f30000100a0040f4000060f30000100a0040f4000040f3000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010090040e8000301000010060040e5
|
|
000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f7000004f00000100a0040f7000004f0000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5
|
|
000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f9000008ee000010060040e5000010060040e50000100a0040f9000020ee0000100a0040f9
|
|
000040ee0000100a0040f9000080ee0000100a0040fa000001ed0000100a0040fa000002ed0000100a0040fa000004ed0000100a0040fa000004ed0000100a0040fa000008ed000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500
|
|
0010060040e5000010060040e5000010060040e5000010060040e50000100a0040fb000040ec0000100a0040fb000080ec0000100a0040fc000001eb0000100a0040fc000002eb0000100a0040fc000006eb0000100a0040fc000004eb0000100a0040fc000008eb0000100a0040fc000010eb0000100a0040fc000020eb00
|
|
00100a0040fc000060eb0000100a0040fc000080eb0000100b0040fd00010180eb0000100a0040fd000001ea0000100a0040fd000002ea0000100a0040fd000004ea0000100a0040fd000008ea0000100a0040fd000008ea0000100a0040fd000010ea0000100a0040fd000020ea0000100a0040fd000040ea000010060040
|
|
e50000100a0040fe000001e9000010060040e50000100a0040fe000002e90000100a0040fe000004e90000100a0040fe000008e9000010060040e50000100e0040fe000010f0000040fb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500001006
|
|
0040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100d0040fd000303000020ed000010060040e5000010060040e5000010060040e50000100a0040fc00000ceb0000100a0040fc00
|
|
0008eb000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500001006007fe5ff00f00000ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 14.3\tab
|
|
A dot-matrix for the two related protein sequences shown in figure 14.2, but here using the "Proportional algorithm" with a window of 21 and a score of 240. Notice that the similarity is now apparent. \par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.3\tab Producing a dot matrix plot using the quick scan algorithm\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This method is very fast. Using the current score matrix it accumulates the scores for all the exact matches that lie on each diagonal. The mean diagonal score and its standard deviation is calculated, and those diagonals that have scores more than a chose
|
|
n number of standard deviations above the mean are rescanned using the proportional algorithm and the points above the proportional algorithms cutoff are plotted.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Apply quick scan algorithm".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Identity score". The minimum number of consecutive identical sequence symbols that count as a match.\par
|
|
3.\tab Define "Odd window length". The size of window over which the scores for each point are summed when the proportional algorithm is applied to the best diagonals.\par
|
|
4.\tab Define "Proportional score". For the best diagonals all points achieving at least this score will be marked with a dot in the diagram.\par
|
|
5.\tab Define "Number of s.d. above mean". Diagonals with scores above the minimum number of standard deviations are rescanned using the proportional algorithm.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The plot will appear as in figure 14.4.\par
|
|
\pard\plain \qj\li1720\sb300\sl480\keepn \f4\fs16 {{\pict\macpict\picw283\pich301
|
|
07fa00000000012d011b001102ff0c00fffe0000003c32b0003c32b00000000000fc00ed000000000001000a0000000000fc00ed0098801e0000000000fc00ed0000000000000000003c32b0003c32b0000000010001000100000000000000000000000000491cbd000000010000ffffffffffff0001000000000000000000
|
|
0000fc00ed0000000000fc00ed000002e30006007fe5ff00f0060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5
|
|
000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500
|
|
0010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000
|
|
10060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010
|
|
060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500001006
|
|
0040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100600
|
|
40e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040
|
|
e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5
|
|
000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e500
|
|
0010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000100a0040f9000008ee000010060040e5000010060040e50000100a0040f9000020ee0000100a0040f9000040ee0000100a0040f9000080ee0000100a0040fa
|
|
000001ed0000100a0040fa000002ed0000100a0040fa000004ed0000100a0040fa000004ed0000100a0040fa000008ed000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000
|
|
10060040e50000100a0040fb000040ec0000100a0040fb000080ec0000100a0040fc000001eb0000100a0040fc000002eb0000100a0040fc000006eb0000100a0040fc000004eb0000100a0040fc000008eb0000100a0040fc000010eb0000100a0040fc000020eb0000100a0040fc000060eb0000100a0040fc000080eb00
|
|
00100b0040fd00010180eb0000100a0040fd000001ea0000100a0040fd000002ea0000100a0040fd000004ea0000100a0040fd000008ea0000100a0040fd000008ea0000100a0040fd000010ea0000100a0040fd000020ea0000100a0040fd000040ea000010060040e50000100a0040fe000001e9000010060040e5000010
|
|
0a0040fe000002e90000100a0040fe000004e90000100a0040fe000008e9000010060040e50000100a0040fe000010e9000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e50000
|
|
10060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010060040e5000010
|
|
06007fe5ff00f00000ff}}\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa240\sl240\tx1140 \f21\fs20 Figure 14.4\tab
|
|
A dot-matrix for the two related protein sequences shown in figures 14.2 and 14.3, but here using the "Quick scan algorithm" with an identity score of 1 and a window of 21 and a score of 240 for the proportional algorithm. Notice that the simil
|
|
arity is now apparent but the absence of background "noise" is misleading.\par
|
|
\pard\plain \s6\fi-540\li560\sb240\sa60\sl280\tx860 \b\f20 2.4\tab Producing a list of all matching segments using the proportional algorithm\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "List matching segments".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Define "Odd window length". The size of window over which the scores for each point are summed.\par
|
|
3.\tab Define "Proportional score". All segments achieving at least this score will be listed out with the two sequences written one above the other. See figure 14.5.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.5\tab Calculating the expected scores for the proportional algorithm\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 This function calculates the probability of achieving each possible score using the proportional algorithm. Hence it provides a method of setting
|
|
cutoff scores and assessing the statistical significance of the scores found. The algorithm calculates the "Double matching probability" described by McLachlan (2) which is defined as the probability of finding the scores in two infinitely long sequences
|
|
of the same composition as the pair being compared. It is very much faster than the alternative of repeatedly scrambling and recomparing the sequences. The program offers three ways for the user to see the results of the calculation\:
|
|
the user can type a \par
|
|
\pard\plain \li2320\ri2720\sl220\box\brsp100\brdrth \f4\fs16 List matching segments\par
|
|
\pard \li2320\ri2720\sl220\box\brsp100\brdrth ? Odd window length (1-401) (11) =\par
|
|
? Proportional score (1-567) (252) =\par
|
|
Working\par
|
|
62\par
|
|
GLRRGLDVKDLEHPIEVPVGK\par
|
|
DLAEGMKVKCTGRILEVPVGR\par
|
|
81\par
|
|
63\par
|
|
LRRGLDVKDLEHPIEVPVGKA\par
|
|
LAEGMKVKCTGRILEVPVGRG\par
|
|
82\par
|
|
65\par
|
|
RGLDVKDLEHPIEVPVGKATL\par
|
|
EGMKVKCTGRILEVPVGRGLL\par
|
|
84\par
|
|
66\par
|
|
GLDVKDLEHPIEVPVGKATLG\par
|
|
GMKVKCTGRILEVPVGRGLLG\par
|
|
85\par
|
|
67\par
|
|
LDVKDLEHPIEVPVGKATLGR\par
|
|
MKVKCTGRILEVPVGRGLLGR\par
|
|
\pard \li2320\ri2720\sl220\keepn\box\brsp100\brdrth 86\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb60\sa400\sl240\tx1140 \f21\fs20 Figure 14.5\tab A typical run of "List matching segments.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 score and the program will display its probability; the user can type a probability and the program will display the corresponding score, alternatively the program will list the full range of scores and probabilities.
|
|
\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate expected scores".\par
|
|
2.\tab Define "Odd window length".\par
|
|
\tab The calculation takes a noticeable time.\par
|
|
3.\tab Select "List scores and probabilities".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Number of steps between scores". This allows, say, every fifth score to be listed if the user defines the number of steps to be 5. The list will appear as in figure 14.6.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.6\tab Calculating the observed scores for the proportional algorithm\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
This function applies the proportional algorithm, but instead of producing a dot matrix it accumulates the scores and their frequencies of occurrence. It provides a method of setting cutoff scores and assessing the statistical significance of the scores fo
|
|
und. The program offers three ways for the user to see the results of the calculation\: the user can type a score and the program will display its frequency; the user can type a frequency and the progra
|
|
m will display the corresponding score, alternatively the program will list the full range of scores and frequencies. The frequencies are expressed as percentages.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Calculate observed scores".\par
|
|
2.\tab Define "Odd window length".\par
|
|
\tab The calculation takes a noticeable time.\par
|
|
\pard\plain \li1320\ri1300\sl220\box\brsp100\brdrth \f4\fs16 Calculate expected proportional scores\par
|
|
\pard \li1320\ri1300\sl220\box\brsp100\brdrth ? Odd window length (1-401) (21) =\par
|
|
Working\par
|
|
Average score= 196.99062\par
|
|
Select probability display mode\par
|
|
1 Show probability for a score\par
|
|
X 2 Show score for a probability\par
|
|
3 List scores and probabilities\par
|
|
? Selection (1-3) (2) =3\par
|
|
? Number of steps between scores (1-10) (5) =\par
|
|
\par
|
|
5 0.10000E+01 200 0.40004E+00 395 0.00000E+00\par
|
|
10 0.10000E+01 205 0.24037E+00 400 0.00000E+00\par
|
|
15 0.10000E+01 210 0.12555E+00 405 0.00000E+00\par
|
|
20 0.10000E+01 215 0.56905E-01 410 0.00000E+00\par
|
|
25 0.10000E+01 220 0.22402E-01 415 0.00000E+00\par
|
|
30 0.10000E+01 225 0.76821E-02 420 0.00000E+00\par
|
|
35 0.10000E+01 230 0.23031E-02 425 0.00000E+00\par
|
|
40 0.10000E+01 235 0.60614E-03 430 0.00000E+00\par
|
|
45 0.10000E+01 240 0.14064E-03 435 0.00000E+00\par
|
|
50 0.10000E+01 245 0.28888E-04 440 0.00000E+00\par
|
|
55 0.10000E+01 250 0.52741E-05 445 0.00000E+00\par
|
|
60 0.10000E+01 255 0.85917E-06 450 0.00000E+00\par
|
|
65 0.10000E+01 260 0.12534E-06 455 0.00000E+00\par
|
|
70 0.10000E+01 265 0.16433E-07 460 0.00000E+00\par
|
|
75 0.10000E+01 270 0.19425E-08 465 0.00000E+00\par
|
|
80 0.10000E+01 275 0.20772E-09 470 0.00000E+00\par
|
|
85 0.10000E+01 280 0.20155E-10 475 0.00000E+00\par
|
|
90 0.10000E+01 285 0.17801E-11 480 0.00000E+00\par
|
|
95 0.10000E+01 290 0.14353E-12 485 0.00000E+00\par
|
|
100 0.10000E+01 295 0.10599E-13 490 0.00000E+00\par
|
|
105 0.10000E+01 300 0.71886E-15 495 0.00000E+00\par
|
|
110 0.10000E+01 305 0.44920E-16 500 0.00000E+00\par
|
|
115 0.10000E+01 310 0.25938E-17 505 0.00000E+00\par
|
|
\pard \li1320\ri1300\sl220\keepn\box\brsp100\brdrth 120 0.10000E+01 315 0.13881E-18 510 0.00000E+00\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa500\sl240\tx1140 \f21\fs20 Figure 14.6\tab A typical run of "Calculate expected proportional scores." The scores are listed in three columns alongside their probabilities. e.g. score 250 has a probability 0.527x10
|
|
{\up6 -5}{\plain \b\f20 .}{\up6 \par
|
|
}\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 3.\tab Select "List scores and percentages".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Number of steps between scores". This allows, say, every fifth score to be listed if the user defines the number of steps to be 5. The list will appear as in figure 14.7.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.7\tab Producing an optimal alignment\par
|
|
\pard\plain \s7\qj\sa120\sl280\tx0 \f20 This function produces an optimal alignment for any segments of the two sequences
|
|
using the algorithm of Myers and Miller (5). It guarantees to produce alignments with the optimum score, given a score matrix, a "gap start penalty" and a "gap extension penalty". That is starting a gap costs a fixed penalty F and each residue added to the
|
|
gap costs a further penalty E, so for \par
|
|
\pard\plain \li1980\ri2060\sb400\sl220\box\brsp100\brdrth \f4\fs16 Calculate observed proportional scores\par
|
|
\pard \li1980\ri2060\sl220\box\brsp100\brdrth ? Odd window length (1-401) (21) =\par
|
|
Working\par
|
|
Maximum observed score is 285\par
|
|
Select score display mode\par
|
|
X 1 Show percentage reaching a score\par
|
|
2 Show score for a percentage\par
|
|
3 List scores and percentages\par
|
|
? Selection (1-3) (1) =3\par
|
|
? Number of steps between scores (1-10) (5) =\par
|
|
156 236949 0.99998E+02\par
|
|
161 236938 0.99993E+02\par
|
|
166 236792 0.99932E+02\par
|
|
171 235882 0.99548E+02\par
|
|
176 232582 0.98155E+02\par
|
|
181 222875 0.94058E+02\par
|
|
186 203232 0.85769E+02\par
|
|
191 171507 0.72380E+02\par
|
|
196 131216 0.55376E+02\par
|
|
201 89194 0.37642E+02\par
|
|
206 52791 0.22279E+02\par
|
|
211 27315 0.11528E+02\par
|
|
216 12117 0.51137E+01\par
|
|
221 4890 0.20637E+01\par
|
|
226 1774 0.74867E+00\par
|
|
231 656 0.27685E+00\par
|
|
236 263 0.11099E+00\par
|
|
241 111 0.46845E-01\par
|
|
246 66 0.27854E-01\par
|
|
251 36 0.15193E-01\par
|
|
256 23 0.97065E-02\par
|
|
261 16 0.67524E-02\par
|
|
266 15 0.63303E-02\par
|
|
271 10 0.42202E-02\par
|
|
276 6 0.25321E-02\par
|
|
\pard \li1980\ri2060\sl220\box\brsp100\brdrth 281 2 0.84405E-03\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa400\sl240\tx1140 \f21\fs20 Figure 14.7\tab
|
|
A typical run of "Calculate observed scores." The scores are followed by their observed number of occurrences expressed both absolutely and as a percentage of the total number of points.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
gap of length K residues the penalty is F + KE. Gaps at the ends of sequences incur no penalty. The size of the segments of sequence that can be aligned at once is limited to 5000 characters. The user can select the start and end of the segments by use of
|
|
the crosshair simply by clicking on any dot matrix plot. After the alignment has been produce the user can elect to have it replace the original sequence segments. By alternate use of dot matrix plotting and alignment, very long sequences can be aligned.
|
|
\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select "Align sequences". The crosshair will appear in the graphics window. \par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 2.\tab Position the crosshair on the bottom left of the segment to be aligned and hit the space bar on the keyboard. The bell will ring.\par
|
|
3.\tab Position the crosshair on the top right of the segment to be aligned and hit the space bar on the keyboard. The bell will ring.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab Define "Penalty for starting each gap".\par
|
|
5.\tab Define "penalty for each residue in gap".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 \tab A noticeable time will elapse before the alignment is displayed on the screen. A typical alignment is shown in figure 14.8.\par
|
|
6.\tab Reject "Keep alignment". If the alignment is "kept" the padded sequences from the alignment will replace the original sequences in the active region.\par
|
|
\pard\plain \li480\ri540\sl220\box\brdrth \f4\fs16 Align the sequences\par
|
|
\pard \li480\ri540\sl220\box\brdrth Aligning region 1 to 461\par
|
|
with region 1 to 514\tab \tab Working\par
|
|
V 1 11 21 31 41 51\par
|
|
MA--TGKIVQ VIGA------ VVDVEFPQDA VPRVYDALEV QNG------N ERLVL-----\par
|
|
* * * ** * * * * *\par
|
|
MQLNSTEISE LIKQRIAQFN VVSEAHNEGT IVSVSDGVIR IHGLADCMQG EMISLPGNRY\par
|
|
H 1 11 21 31 41 51\par
|
|
V 61 71 81 91 101 111\par
|
|
EVQQQLGGGI VRTIAMGSSD GLRRGLDVKD LEHPIEVPVG KATLGRIMNV LGEPVDMKGE\par
|
|
* * ** * * ** ***** *** * ** * * **\par
|
|
AIALNLERDS VGAVVMGPYA DLAEGMKVKC TGRILEVPVG RGLLGRVVNT LGAPIDGKGP\par
|
|
H 61 71 81 91 101 111\par
|
|
V 121 131 141 151 161 171\par
|
|
IGEEERWAIH RAAPSYEELS NSQELLETGI KVIDLMCPFA KGGKVGLFGG AGVGKTVNMM\par
|
|
* ** * ** * * * * * * ***\par
|
|
LDHDGFSAVE AIAPGVIERQ SVDQPVQTGY KAVDSMIPIG RGQRELIIGD RQTGKTALAI\par
|
|
H 121 131 141 151 161 171\par
|
|
V 181 191 201 211 221 231\par
|
|
ELIRNIAIEH SGYS-VFAGV GERTREGNDF YHEMTDSNVI DKVSLVYGQM NEPPGNRLRV\par
|
|
* * ** * * *\par
|
|
DAI--INQRD SGIKCIYVAI GQKASTISNV VRKLEEHGAL ANTIVVVATA SESAALQYLA\par
|
|
H 181 191 201 211 221 231\par
|
|
V 241 251 261 271 281 291\par
|
|
ALTGLTMAEK FRDEGRDVLL FVDNIYRYTL AGTEVSALLG RMPSAVGYQP TLAEEMGVLQ\par
|
|
* * *** * * * * * * ** * * *\par
|
|
RMPVALMGEY FRDRGEDALI IYDDLSKQAV AYRQISLLLR RPPGREAFPG DVFYLHSRLL\par
|
|
H 241 251 261 271 281 291\par
|
|
V 301 311 321 331 341 351\par
|
|
ERITST---- ---------- -KTGSITSVQ AVYVPADDLT DPSPATTFAH LDATVVLSRQ\par
|
|
** **** * * * * * *\par
|
|
ERAARVNAEY VEAFTKGEVK GKTGSLTALP IIETQAGDVS AFVPTNVISI TDGQIFLETN\par
|
|
H 301 311 321 331 341 351\par
|
|
V 361 371 381 391 401 411\par
|
|
IASLGIYPAV DPLDSTSRQL DPLVVGQEHY DTAR----GV QSILQRYQEL KDIIAILGMD\par
|
|
** *** * * ** * * * * * **\par
|
|
LFNAGIRPAV NPGISVSR-- ---VGGAAQT KIMKKLSGGI RTALAQYREL AAFSQFAS--\par
|
|
H 361 371 381 391 401 411\par
|
|
V 421 431 441 451 461 471\par
|
|
ELSEEDKLVV ARARKIQRFL SQ----PFFV AE----VFTG SPGKYVSLKD --TIRGFKGI\par
|
|
* * * * * * * * * * * *\par
|
|
DLDDATRKQL DHGQKVTELL KQKQYAPMSV AQQSLVLFAA ERG-YLADVE LSKIGSFEAA\par
|
|
H 421 431 441 451 461 471\par
|
|
V 481 491 501 511 521\par
|
|
MEG--EYDHL P-EQAFYMVG SIEEAVE--- --------KA KKL*\par
|
|
** * * * * *\par
|
|
LLAYVDRDHA PLMQEINQTG GYNDEIEGKL KGILDSFKAT QSW*\par
|
|
H 481 491 501 511 521\par
|
|
Conservation 22.5%\par
|
|
\pard \li480\ri540\sl220\keepn\box\brdrth Number of padding characters inserted 63 and 10\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb60\sa300\sl240\tx1140 \f21\fs20 Figure 14.8\tab A typical output from "Align the sequences". The horizontal and vertical sequences are labelled H and V.\par
|
|
\pard\plain \s6\sb240\sa60\sl280\tx560\tx860 \b\f20 2.8\tab Comparing a sequence against a library of sequences\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20
|
|
The program SIPL is used for comparing a probe sequence against a whole library of sequences. The searches are very fast and use the "Quick scan" algorithm described above to produce a list of matching sequences sorted in score order, and optionally, this
|
|
is followed by the production of optimal alignments using the Myers and Miller (5) algorithm. The program will search the whole of a library or restrict its search using a list of entry names. The list of
|
|
entry names can be used either as a list of sequences to search or conversely as a list of sequences to exclude from a search.\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab Select SIPL.\par
|
|
2.\tab Select "Personal file".\par
|
|
3.\tab Select "Format".\par
|
|
4.\tab Define "Name of sequence file". The name of the file containing the probe sequence.\par
|
|
5.\tab Define "Name of results file".\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 6.\tab Accept "Display alignments". The alternative will stop after producing a list of the best matching sequences.\par
|
|
7.\tab Define "Minimum library sequence length". This permits the search to skip sequences that are too short to be of interest.\par
|
|
8.\tab Define "Maximum number of scores to list". The maximum number of sequences that will be included in the results file.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 9.\tab
|
|
Define "Identity score". This is the minimum number of consecutive sequence characters that will be counted as a match. Only matches of at least this length will be included in the overall score. For proteins maximum sensitivity is gained using a value
|
|
of 1, but for nucleic acids values of 4 or 6 are necessary to achieve reasonable speed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 10.\tab Define "Number of sd above mean". This means the number of standard deviations above the mean that a diagonal must score in order for it to be scanned using the proportional algorithm.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 11.\tab Define "Odd window length". This is the window size for the rescanning of high scoring diagonals using the proportional algorithm.\par
|
|
12.\tab Define "Proportional score". The score used by the proportional algorithm. It depends on the window length and the score matrix.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 13.\tab Define "Minimum global score". This is the total score achieved using the proportional algorithm when all the diagonals scoring the defined number of standard deviations above the mean, are rescanned.
|
|
\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 14.\tab Define "Penalty for starting a gap". This is for the alignment algorithm.\par
|
|
15.\tab Define "Penalty for each residue in gap". See above.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 16.\tab Select a library to search. The default library will reflect the composition of the probe sequence. That is, a probe sequence that is less than 85% acgt will be guessed to be a protein.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 17.\tab Select "Search whole library". The alternatives allow the search to be restricted using a list of entry names.\par
|
|
\pard\plain \s4\qj\sa120\sl280 \f20 The search will start. A large number of parameters are required but for normal use the default value can be taken for them all. A worked example is shown in figure 14.9.\par
|
|
\pard\plain \li220\ri240\sl220\brdrt\brsp100\brdrth \brdrl\brsp100\brdrth \brdrr\brsp100\brdrth \f4\fs16 SIPL (Similarity investigation program (Library)) V3.0 June 1991\par
|
|
\pard \li220\ri240\sl220\brdrl\brsp100\brdrth \brdrr\brsp100\brdrth Author\: Rodger Staden\par
|
|
Compares a probe protein or nucleic acid\par
|
|
sequence against a library of sequences\par
|
|
\par
|
|
Select probe sequence\par
|
|
Select sequence source\par
|
|
X 1 Personal file \par
|
|
2 Sequence library\par
|
|
? Selection (1-2) (1) =2\par
|
|
Select a library\par
|
|
1 EMBL nucleotide library \par
|
|
X 2 SWISSPROT protein library\par
|
|
3 PIR protein library \par
|
|
? Selection (1-3) (2) =\par
|
|
Library is in EMBL format with indexes\par
|
|
Select a task\par
|
|
X 1 Get a sequence \par
|
|
2 Get annotations \par
|
|
3 Get entry names from accession numbers \par
|
|
4 Search titles for keywords \par
|
|
5 Search keyword index for keywords \par
|
|
? Selection (1-5) (1) =\par
|
|
? Entry name=bacr$halha\par
|
|
DE BACTERIORHODOPSIN PRECURSOR (BR) (GENE NAME\: BOP). \par
|
|
Sequence length= 262\par
|
|
Sequence composition\par
|
|
A C S T P A G N D E Q B Z H\par
|
|
N 0. 14. 19. 12. 30. 26. 3. 10. 11. 4. 0. 0. 0.\par
|
|
% 0.0 5.3 7.3 4.6 11.5 9.9 1.1 3.8 4.2 1.5 0.0 0.0 0.0\par
|
|
W 0. 1219. 1921. 1165. 2132. 1483. 342. 1151. 1420. 513. 0. 0. 0.\par
|
|
\par
|
|
A R K M I L V F Y W - X ? \par
|
|
N 7. 7. 10. 15. 39. 23. 13. 11. 8. 0. 0. 0. 0.\par
|
|
% 2.7 2.7 3.8 5.7 14.9 8.8 5.0 4.2 3.1 0.0 0.0 0.0 0.0\par
|
|
W 1093. 897. 1312. 1697. 4413. 2280. 1913. 1795. 1490. 0. 0. 0. 0.\par
|
|
Total molecular weight= 28256.254\par
|
|
? Results file=sipl.res\par
|
|
? Display alignments (y/n) (y) =\par
|
|
? Minimum library sequence length (10-20000) (209) =\par
|
|
? Maximum number of scores to list (1-10000) (20) =10\par
|
|
? Identity score (1-3) (1) =\par
|
|
? Number of sd above mean (0.00-10.00) (3.00) =\par
|
|
? Odd window length (1-31) (11) =\par
|
|
? Proportional score (1-297) (132) =\par
|
|
? Minimum global score (1-69168) (1729) =\par
|
|
? Penalty for starting a gap (1-100) (10) =\par
|
|
? Penalty for each residue in gap (1-100) (10) =\par
|
|
Select a library\par
|
|
1 EMBL nucleotide library \par
|
|
X 2 SWISSPROT protein library\par
|
|
3 PIR protein library \par
|
|
4 Personal file in PIR format \par
|
|
? Selection (1-4) (2) =\par
|
|
Library is in EMBL format with indexes\par
|
|
Select a task\par
|
|
X 1 Search whole library \par
|
|
2 Search only a list of entries \par
|
|
3 Search all but a list of entries \par
|
|
? Selection (1-3) (1) =3\par
|
|
? File of entry names=skip.nam\par
|
|
21794 entries processed, 25 above cutoff, sorting now\par
|
|
Entries exceeding sd cutoff= 4439\par
|
|
Mean number of diagonals above span cutoff 1.32012\par
|
|
List in score order\par
|
|
31007 BACA$HALSA DE ARCHAERHODOPSIN PRECURSOR (AR). \par
|
|
12177 BACH$NATPH DE HALORHODOPSIN PRECURSOR (HR) (GENE NAME\: HOP). \par
|
|
10999 BACH$HALSP DE HALORHODOPSIN PRECURSOR (HR) (GENE NAME\: HOP). \par
|
|
3999 HYAC$ECOLI DE HYPOTHETICAL 27.6 KD PROTEIN IN HYAB 3'REGION (GENE NAM\par
|
|
2670 OPS4$DROME DE OPSIN RH4 (INNER R7 PHOTORECEPTOR CELLS OPSIN) (GENE NA\par
|
|
2573 PYR1$MESAU DE CAD PROTEIN (CONTAINS\: GLUTAMINE-DEPENDENT CARBAMOYL-PH\par
|
|
2328 PFLA$ECOLI DE PYRUVATE FORMATE-LYASE ACTIVATING ENZYME. \par
|
|
2194 DCOP$CANAL DE OROTIDINE 5'-PHOSPHATE DECARBOXYLASE (EC 4.1.1.23) (OMP\par
|
|
2145 BCM1$HUMAN DE LYMPHOCYTE ACTIVATION MARKER BLAST-1 PRECURSOR (BCM1 SU\par
|
|
2103 LAG3$HUMAN DE LAG-3 PROTEIN PRECURSOR (FDC PROTEIN) (GENE NAME\: LAG3 \par
|
|
BACA$HALSA DE ARCHAERHODOPSIN PRECURSOR (AR). \par
|
|
V 1 11 21 31 41 51\par
|
|
MLELLPTAVE GVSQAQITGR PEWIWLALGT ALMGLGTLYF LVKGMGVSDP DAKKFYAITT\par
|
|
* ** ** ** ** ** ** ** *** ** * * * ** \par
|
|
M-DPIALTAA VGADLLGDGR PETLWLGIGT LLMLIGTFYF IVKGWGVTDK EAREYYSITI\par
|
|
H 1 11 21 31 41 51\par
|
|
V 61 71 81 91 101 111\par
|
|
LVPAIAFTMY LSMLLGYGLT MVPFGGEQNP IYWARYADWL FTTPLLLLDL ALLVDADQGT\par
|
|
*** ** * *** * *** * * * ** ******* ********** *** * \par
|
|
LVPGIASAAY LSMFFGIGLT EVQVGSEMLD IYYARYADWL FTTPLLLLDL ALLAKVDRVS\par
|
|
H 61 71 81 91 101 111\par
|
|
V 121 131 141 151 161 171\par
|
|
ILALVGADGI MIGTGLVGAL TKVYSYRFVW WAISTAAMLY ILYVLFFGFT SKAESMRPEV\par
|
|
* *** * ** ******* * * * ** * ** * * ***\par
|
|
IGTLVGVDAL MIVTGLVGAL SHTPLARYTW WLFSTICMIV VLYFLATSLR AAAKERGPEV\par
|
|
H 121 131 141 151 161 171\par
|
|
V 181 191 201 211 221 231\par
|
|
ASTFKVLRNV TVVLWSAYPV VWLIGSEGAG IVPLNIETLL FMVLDVSAKV GFGLILLRSR\par
|
|
**** * *** *** * ** **** * * ***** ****** *** *** ******\par
|
|
ASTFNTLTAL VLVLWTAYPI LWIIGTEGAG VVGLGIETLL FMVLDVTAKV GFGFILLRSR\par
|
|
H 181 191 201 211 221 231\par
|
|
V 241 251 261\par
|
|
AIFGEAEAPE PSAGDGAAAT SD\par
|
|
** * **** **** * *\par
|
|
AILGDTEAPE PSAG-AEASA AD\par
|
|
H 241 251 261\par
|
|
Conservation 56.1%\par
|
|
\pard \li220\ri240\sl220\keepn\brdrl\brsp100\brdrth \brdrb\brsp100\brdrth \brdrr\brsp100\brdrth Number of padding characters inserted 0 and 2\par
|
|
\pard\plain \s8\qj\fi-1140\li1140\sb120\sa120\sl240\tx1140 \f21\fs20 Figure 14.9\tab A run of SIPL using an entry from a sequence library and a file of entries to be excluded from the search.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 3.\tab Notes\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1.\tab
|
|
The variants on the proportional algorithm are selected by setting parameters using a special menu. This includes the facility to switch off the main diagonal for all options, which is useful when comparing a sequence against itself.\par
|
|
2.\tab For nucleotide sequences the program also has a function to complement a sequence. If the sequence on one axis is the complement of that on the other, the plots will show possible base pairing.\par
|
|
3.\tab When the cross hair is being employed, in addition to the standard special keys, the letter m will produce a display showing all the identical sequence characters around the cross hair position. The display is in the form of a matrix.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4.\tab
|
|
Users should not be misled by the "Quick scan" algorithm. Its function is to perform rapid comparisons. The plots it produces may look quite striking because they will contain almost no background, however such plots tell nothing about the significance
|
|
of the similarities displayed.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 5.\tab By using the "Reposition plots" function users can display several dot matrix plots on the screen at the same time. In this way plots from several pairs of sequence comparisons can be viewed together.
|
|
\par
|
|
6.\tab The library search program SIPL is of limited use for searching the nucleic acid libraries because it does not deal properly with sequences longer than 20,000 characters, but simply truncates them.\par
|
|
\pard\plain \s5\sb400\sa60\sl320\tx560 \b\f20\fs28 4.\tab References\par
|
|
\pard\plain \s7\qj\fi-560\li560\sa120\sl280\tx560 \f20 1. Staden, R. 1982. An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. {\i Nucl. Acids Res}. {\b 10(9)}\:2951-2961.\par
|
|
2. McLachlan, A.D. 1971. Test for comparing related amino acid sequences. {\i J. Mol. Biol.} {\b 61}\:409-424.\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 3. Schwartz, R.M. and Dayhoff, M.O. 1978. Matrices for detecting distant relationships. (in) {\i Atlas of Protein Sequence and Structure,} {\b 5 suppl. 3}\:353-358, Nat. Biomed. Res. Found., Washington D.C.
|
|
\par
|
|
\pard \s7\qj\fi-560\li560\sa120\sl280\tx560 4. Lipman, D.J. and Pearson, W.R. 1985. Rapid and sensitive protein similarity searches. {\i Science} {\b 227}\:1435-1441.\par
|
|
5.\tab Myers, E.W. and Miller, W. 1988. Optimal alignments in linear space. {\i Comput. Applic. Biosci}., {\b 4}, 11-17.\par
|
|
}
|