\documentstyle[12pt]{article}

\title{A trace display and editing program for data from fluorescence based 
sequencing machines}
\author{Timothy Gleeson \and LaDeana Hillier}

\begin{document}
\maketitle
\section*{}
\subsection*{}
\subsubsection*{ABSTRACT}

``Ted'' ({\em T}race {\em ed}itor) 
is a graphical editor for sequence and trace data from automated 
fluorescence sequencing machines.  It provides facilities 
for viewing sequence and trace data (in top or bottom strand 
orientation), for editing the base sequence,  for 
automated or manual trimming of the head (vector) and tail 
(uncertain data) from the sequence, for vertical and horizontal trace 
scaling, for keeping a history of sequence editing, and for output of 
the edited sequence.  Ted has been used extensively in the C. 
elegans genome sequencing project,
both as a stand-alone program and integrated into 
the Staden sequence assembly package, and  has 
greatly aided in the efficiency 
and accuracy of sequence editing.  It runs in the X 
windows environment on Sun workstations and is available from the 
authors.  Ted currently supports sequence and trace data from the ABI 
373A and Pharmacia A.L.F. sequencers.

\subsubsection*{INTRODUCTION}
	Time involved in sequence editing is extensive, and anything easing 
that burden will improve the efficiency of any major sequencing 
project.  Having sequence and trace data available online in easily-
manipulable form is invaluable. Ted (a Trace-EDitor) was developed to 
fill this role in the C. elegans genome 
sequencing project [1]. 

\subsubsection*{METHODS}

{\em Computing Design and Implementation.}
When designing ted, we had a number of specific computing goals 
in mind including portability and adaptability.  For portability, we 
chose to write ted in ANSI C using the X windowing system and the 
Xaw toolkit.  X provides basic capabilities for the creation and use 
of windows, and the toolkit contains a number of pre-packaged 
components, such as the ``sliders'' used for scrolling. X also allows 
site, user and per-run defaults to be set.  Adaptability is also an 
important goal since we are providing a new function to 
research groups who are constantly adding new requirements.  

	Stylistically, we have followed an ``Abstract Data Type''
discipline.  In this discipline, a program is split into a number of 
modules which provide separate, well-defined functions.  We 
separate the interface of a module from its implementation.  For 
example, a unified internal sequence format is used.  This can store 
a varying amount of information.  However, there is a clear and 
simple interface by which the rest of the program accesses this 
module.  Such a style is not well supported by C, but its adoption has 
been very successful.  The addition of new sequencing machines, and 
thus new external data formats, may cause some changes in the 
internal representation of the sequence but should not affect  
the rest of the program.

	Ted accepts a large number of optional command line arguments,
many of which can also be specified as system defaults. This
supports a mode of working whereby ted is invoked not directly by the
user but instead by a script or another application which supplies
arguments appropriate to the editing task.


{\em Graphical Interface.}
Ted currently accepts data from two fluorescence based sequencing
machines, the Pharmacia A.L.F. and the ABI 373A.
The sequencing machine data consists of 
four traces of fluorescence levels together with the machine's 
interpretation, which is a sequence of bases.  
Ted displays 
the traces and the machine-generated base list.  
A second, initially identical, list of bases is provided for correction 
by the user.

	Ted has an X windows based 
graphical interface. The trace file
can either be input from the command line or by 
clicking on the INPUT button after the program has been invoked.  
Other parameters which the user may specify on the
command line include: the output 
file name; a base position or sequence string on which the trace is 
to be centered;  a default trace magnification;  a 5' vector sequence 
for automated elimination of the sequence head (vector); top or 
bottom strand orientation; or any of the usual X-window parameters (e.g. 
display, geometry...).

	The graphics display (Figure 1) consists of the control 
panel, the base position information, the original and edited sequence 
data, and the graphical representation of the trace.  The user may 
begin by using the control panel INPUT button to input a new trace 
file at which time the user selects whether to view the sequence
and trace in top or bottom strand orientation.
The trace file is displayed and, if a 5' vector sequence has been 
specified on the command line, the program attempts to select a 
cutoff point corresponding to the vector sequence at the ``head'' of the 
trace file.  The bases beyond the ``cutoff'' point are  
displayed on a shaded background.  The user may modify the cutoff 
position by clicking on the ``Adj left cut'' button and clicking on the 
position of the desired cutoff.  Similarly, the user may adjust the 
right cutoff of the sequence (chosen by starting at the 5' end of the 
sequence and looking for the first occurrence when 2 out of 5 bases 
are 'N') by scrolling along the sequence to that point, clicking on the 
``Adj right cut'' button, and clicking on the appropriate base.  
Automation of the ``cutoff'' process is optional; the user may compile 
the program with that feature turned ``off.'' 

	Clicking on the ``Edit seq'' button allows the user to enter the edit 
mode.  The ``Search'' button can be used to skip from ``problem'' to 
``problem'' (i.e., ambiguity to ambiguity) or to look for runs of 
identical bases (e.g., TTTT) which are often mis-called by
the machine software.

  Bases can be inserted, deleted, or replaced as with
any ordinary word-processor.  In difficult-to-read areas,  
the trace may be vertically or horizontally scaled by dragging or 
clicking on the magnification scroll bar or by clicking on the 
vertical scaling buttons (``Scale down'', ``Scale up''), respectively.  
Finally, the edited sequence is saved to an ascii file using the 
``Output'' button.  A history of the editing session can also be saved
along with the sequence. 
The ``Quit'' button is used 
to exit the program.  When reinvoking ted on an edited trace file the 
edited base sequence, rather than the original sequence, is shown in 
the edited base window.  The user may invoke ted by calling in any one 
of the previous editing sessions.   


\subsubsection*{APPLICATIONS AND CONCLUSIONS}

	In the C. elegans genome sequencing project, data from the ABI or 
A.L.F. sequencing machines' computers are transferred to Sun 
workstations.  
The user invokes a Unix shell script that calls ted systematically 
on each of the new set of trace files creating a set of sequence files.
The sequence files that are deemed to be of acceptable quality
are then entered into the sequence 
assembly program xdap [2] where the sequences are assembled into 
contigs.  Portions of the ted trace-editor have been incorporated 
into the xdap ``trace manager,''  which is used in 
conjunction with the contig editor to view sets of aligned traces 
at sites of discrepancies in the aligned sequences.  

	Ted is also used at the stage of choosing oligo primers for the 
``walking'' stage of the sequencing project.  It can be invoked directly 
from the oligo selection program, osp [3], to allow examination
of the trace data in the region of the primers so that  
integrity of the sequence data can be verified.

	Currently, no other programs are known to be available 
which support editing of the ABI trace data. 
Further, the modular design of the program should allow
support for new types of sequencing machines, with new data 
formats, to be implemented in a straightforward fashion.  


\subsubsection*{AVAILABILITY}
	Ted is freely available from the authors or from Rodger Staden and
Simon Dear (MRC Laboratory of Molecular Biology, Hills Road, Cambridge,
UK, CB2 2QH) for use on Sun workstations running X-windows (or OpenLook).


\subsubsection*{ACKNOWLEDGMENTS}
	The authors would like to thank all members of the C. elegans
sequencing project with special thanks to the following people:
John Sulston, Bob Waterston,  
Phil Green, Rick Wilson, Richard Durbin, Simon Dear, and Rodger Staden 
for their helpful suggestions for improvements in the ted interface 
and for their parts in the development of ted.  This work was 
supported by the Medical Research Council and NIH grant R01-HG00136.

\subsubsection*{REFERENCES}

1. Waterston, R., Sulston, J., et al. (1991), in preparation.

2. Dear, S. and Staden, R. (1991) Nuc. Acids Res.,  in press.

3. Hillier, L. and Green, P. (1991) submitted.


{\bf Figure 1 legend.}

Figure 1 shows a ``screen dump'' of the ted graphical interface.  
The display consists of
the control panel and the synchronized view of the base position
information, original and edited sequence data, 
and graphical representation of the trace (with each nucleotide's trace
 being represented
by a different color).  The control
panel allows the user to read in new trace files (in either
bottom or top strand orientation)
as well as to search for a string of nucleotides or a certain base position.
Scroll bars allow the user to adjust the magnification of or scroll through
the sequence and trace data.  The user may also choose to change the vertical
magnification of the trace data.  Further, sequence on the head (vector)
or tail (uncertain data) of the sequence may be ``cutoff'' 
using the adjust left and right cutoff buttons. Bases can be inserted, 
deleted, or replaced as with
any ordinary word-processor in the sequence data window. Finally, the
sequence may be written to an ascii file using the output button on
the control panel.

\end{document}