gde_linux/CORE/xylem/xylem.doc



          XYLEM.DOC                                          update 10 Aug 1994

                   XYLEM:  TOOLS FOR MANIPULATION OF GENETIC DATABASES
                    Brian Fristensky, University of Manitoba

       Fristensky, B. (1993) Feature expressions: creating and manipulating
       sequence datasets. Nucleic Acids Research 21:5997-6003.

          SPLITDB - Splits  files containing one or more GenBank entries into
          annotation, sequence, and index files.  Indexfiles can also serve as
          namefiles for GETLOC. Sequence files are in the format required for
          use with the Pearson programs (FASTA,LFASTA etc.).

          GETLOC  - Reads a file containing LOCUS names (namefile)  and
          retrieves either annotation, sequence, or both from a split
          database or database subset created by SPLITDB.

          FETCH - A c-shell script that provides a convenient menu-driven
          front end for retrieval of database entries using GETLOC.

          FINDKEY - A c-shell script that provides a convenient menu-driven
          front end for keyword searches of database annotation files,
          using IDENTIFY.

          IDENTIFY- Given line-numbered output from grep, IDENTIFY uses the
          index file to determine which entries contained the keywords
          searched for by grep. It  then produces a namefile for use by
          GETLOC.  Namefiles can serve as logical databases, and utilities
          such as the Unix comm command can perform logical operations on
          these namefiles to produce database subsets.

          FEATURES/GETOB - Given a namefile, pulls objects (mRNA, tRNA, CDS
          etc.) from each of the named entries, using the new
          DDBJ/EMBL/GenBank International Features Table Format.  A future
          version will also allow the annotation of sites within objects that
          are extracted.

          DBSTAT - Calculates amino acid frequencies in a protein database.

          RIBOSOME -  Given a file of one or more nucleic acids (eg. output
          from GETOB) , RIBOSOME translates them into protein, using either
          the universal genetic code or an alternative genetic code supplied
          by the user.  All ambiguities that can be resolved are translated.

          PROT2NUC - reverse translates a sequence from protein to nucleic
          acid, using IUPAC-IUB ambiguity codes.

          SHUFFLE - Given a random seed,  shuffles each sequence in a Pearson-
          format (.wrp) file. Shuffling is done locally in overlapping windows
          across the length of a given sequence.  The window size and overlap
          length can be specified by the user.

	  REFORM - Reformats multiply aligned nucleic acid or protein
	  sequences for publication.  Output for M. Waterman's RALIGN
	  program, or the MBCRR MASE editor,  can be directly used as input.
          A variety of options are available for representing gaps, consensus
          sequences and other features.

          Fristensky (Cornell) Sequence Analysis Package - General purpose
          sequence analysis package written in Standard Pascal. Features
          include: sequence numbering, formatting, & translation, restriction
          site searches & mapping, matrix similarity searches, TESTCODE
          analysis, base composition analysis. All programs are interactive
          and read free-format, BIONET, and GenBank files.


                         XYLEM DATABASE TOOLS


                              ----------
                             |   .gen   |         getloc
                             |----------|<--------------------------
                             | GenBank  |                          |
                              ----------                           |
                                  |                                |
                                  | splitgb                        |
                                 /|\                               |
                               /  |  \                             |
                             /    |    \                           |
                           /      |      \                         |
                         /        |        \                       |
                       /          |          \                     |
                      v           v           v                    |
                 ----------   ----------   ----------              |
                |  .ano    | |   .wrp   | |  .ind    |             |
                |----------| |----------| |----------|             |
                |annotation| | sequence | |   index  |             |
                 ----------   ----------   ----------              |
                 |    \           |           /                    |
                 |      \         |         /                      |
                 |        \       |       /                        |
                 |          \     |     /                          |
        grep -n  |            \   |   /                            |
                 |              \ | /                              |
                 |                |                                |
                 |                | -------------------------------+
                 |                ^                                |
                 v                |                         getob  |
           ----------             ----------                       v
          | .grep    | identify  |   .nam   |                   ----------
          |----------| --------->|----------|                  |   .wrp   |
          | numbered |           |  LOCUS   |                   ----------
          |file lines|            ----------                   | eg. mRNA |
           ----------             |        ^                   |     tRNA |
                                  |        |                   |     rRNA |
                                  |        |                   |     CDS  |
                                   --comm--                     ----------
                                     (logical operations on
                                      sets of names)

                         	Dr. Brian Fristensky
                         	Dept. of Plant Science
                         	University of Manitoba
                         	Winnipeg, MB R3T 2N2 CANADA
                         	204-474-6085
                         	frist@cc.umanitoba.ca