125 lines
6.4 KiB
Text
125 lines
6.4 KiB
Text
|
|
|
|
XYLEM.DOC update 10 Aug 1994
|
|
|
|
XYLEM: TOOLS FOR MANIPULATION OF GENETIC DATABASES
|
|
Brian Fristensky, University of Manitoba
|
|
|
|
Fristensky, B. (1993) Feature expressions: creating and manipulating
|
|
sequence datasets. Nucleic Acids Research 21:5997-6003.
|
|
|
|
SPLITDB - Splits files containing one or more GenBank entries into
|
|
annotation, sequence, and index files. Indexfiles can also serve as
|
|
namefiles for GETLOC. Sequence files are in the format required for
|
|
use with the Pearson programs (FASTA,LFASTA etc.).
|
|
|
|
GETLOC - Reads a file containing LOCUS names (namefile) and
|
|
retrieves either annotation, sequence, or both from a split
|
|
database or database subset created by SPLITDB.
|
|
|
|
FETCH - A c-shell script that provides a convenient menu-driven
|
|
front end for retrieval of database entries using GETLOC.
|
|
|
|
FINDKEY - A c-shell script that provides a convenient menu-driven
|
|
front end for keyword searches of database annotation files,
|
|
using IDENTIFY.
|
|
|
|
IDENTIFY- Given line-numbered output from grep, IDENTIFY uses the
|
|
index file to determine which entries contained the keywords
|
|
searched for by grep. It then produces a namefile for use by
|
|
GETLOC. Namefiles can serve as logical databases, and utilities
|
|
such as the Unix comm command can perform logical operations on
|
|
these namefiles to produce database subsets.
|
|
|
|
FEATURES/GETOB - Given a namefile, pulls objects (mRNA, tRNA, CDS
|
|
etc.) from each of the named entries, using the new
|
|
DDBJ/EMBL/GenBank International Features Table Format. A future
|
|
version will also allow the annotation of sites within objects that
|
|
are extracted.
|
|
|
|
DBSTAT - Calculates amino acid frequencies in a protein database.
|
|
|
|
RIBOSOME - Given a file of one or more nucleic acids (eg. output
|
|
from GETOB) , RIBOSOME translates them into protein, using either
|
|
the universal genetic code or an alternative genetic code supplied
|
|
by the user. All ambiguities that can be resolved are translated.
|
|
|
|
PROT2NUC - reverse translates a sequence from protein to nucleic
|
|
acid, using IUPAC-IUB ambiguity codes.
|
|
|
|
SHUFFLE - Given a random seed, shuffles each sequence in a Pearson-
|
|
format (.wrp) file. Shuffling is done locally in overlapping windows
|
|
across the length of a given sequence. The window size and overlap
|
|
length can be specified by the user.
|
|
|
|
REFORM - Reformats multiply aligned nucleic acid or protein
|
|
sequences for publication. Output for M. Waterman's RALIGN
|
|
program, or the MBCRR MASE editor, can be directly used as input.
|
|
A variety of options are available for representing gaps, consensus
|
|
sequences and other features.
|
|
|
|
Fristensky (Cornell) Sequence Analysis Package - General purpose
|
|
sequence analysis package written in Standard Pascal. Features
|
|
include: sequence numbering, formatting, & translation, restriction
|
|
site searches & mapping, matrix similarity searches, TESTCODE
|
|
analysis, base composition analysis. All programs are interactive
|
|
and read free-format, BIONET, and GenBank files.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
XYLEM DATABASE TOOLS
|
|
|
|
|
|
|
|
----------
|
|
| .gen | getloc
|
|
|----------|<--------------------------
|
|
| GenBank | |
|
|
---------- |
|
|
| |
|
|
| splitgb |
|
|
/|\ |
|
|
/ | \ |
|
|
/ | \ |
|
|
/ | \ |
|
|
/ | \ |
|
|
/ | \ |
|
|
v v v |
|
|
---------- ---------- ---------- |
|
|
| .ano | | .wrp | | .ind | |
|
|
|----------| |----------| |----------| |
|
|
|annotation| | sequence | | index | |
|
|
---------- ---------- ---------- |
|
|
| \ | / |
|
|
| \ | / |
|
|
| \ | / |
|
|
| \ | / |
|
|
grep -n | \ | / |
|
|
| \ | / |
|
|
| | |
|
|
| | -------------------------------+
|
|
| ^ |
|
|
v | getob |
|
|
---------- ---------- v
|
|
| .grep | identify | .nam | ----------
|
|
|----------| --------->|----------| | .wrp |
|
|
| numbered | | LOCUS | ----------
|
|
|file lines| ---------- | eg. mRNA |
|
|
---------- | ^ | tRNA |
|
|
| | | rRNA |
|
|
| | | CDS |
|
|
--comm-- ----------
|
|
(logical operations on
|
|
sets of names)
|
|
|
|
Dr. Brian Fristensky
|
|
Dept. of Plant Science
|
|
University of Manitoba
|
|
Winnipeg, MB R3T 2N2 CANADA
|
|
204-474-6085
|
|
frist@cc.umanitoba.ca
|
|
|