staden-lg/src/cop/COP.GUIDE

		  Checking Xdap Databases For Errors
			Using COP Version 1.1

			      Simon Dear
			    16 March 1992


0. Introduction

The program cop checks for editing errors in xdap project databases.
It uses a robust method that can detect insertions, deletions and
changes that have been inadvertently made. In later versions places
where there is reliant on traces of insufficient quality will be
detectable also.


1. Usage

The program allows the user to specify, the project name, the project
version, the consensus calculation cutoff percentage and a search path
for where traces are to be found:

	cop [-p project]
	    [-v version]
	    [-c consensus_cutoff_percentage]
	    [-r raw_data_search_path]
	    [-h]

An example: cop can be run on F59B2.??0 with the command:

	cop -p f59b2 -v 0 -r ~mmm/F59B2 -c 66

If the project and/or version are not specified, the user is prompted
for them. The default consensus cutoff percentage is 100%

If a trace file cannot be found in the current working directory and
the -r option is not used, the environment variable RAWDATA is used to
find the file.


2. How cop works

Cop works on a problem exclusion principle. It ignores problem areas
(places where there are insertions, deletions, changes, or where the
trace quality is poor) and concentrates on identifying places where
the coverage is good. It then reports regions where coverage is poor.
Unfortunately it isn't possible to provide explanations using this
approach.

The algorithm is as follows, and is performed on each contig.

a) The consensus for the contig is calculated and a "coverage"
array (to record areas of good coverage) is initialised.

b) Each gel reading in the contig is investigated. Information about
the trace file (its name, and size of cutoffs) is read from the
database. The trace file is read in.

c) The consensus of the region in which the gel reading lies is
aligned with the clipped trace sequence. If necessary, the consensus
is complemented. The alignment is performed using Myers and Miller's
algorithm [1], in the incarnation supplied in the fasta package.

d) A map is made relating the bases in the raw sequence and the bases
in the consensus. Places where trace quality is poor are removed from
this map.  For each region in the consensus where there is perfect
alignment (with no deletions, insertions, changes but are mapped) the
coverage array is updated.  Each entry in this array represents a pairs
of adjacent bases, and both must be adjacent in the alignment for the
entry to be marked as covered.

e) Once all the readings in the contig have been processed, all gaps
in the coverage are reported.


A. References

[1] Myers, E.W. and Miller, W. 1988. Optimal alignments in linear
space. CABIOS 4(1):11-17.