staden-lg/src/indexseqlibs/CHANGES

Wed Feb 17 11:30:28 GMT 1993
----------------------------
freetext.c
  PIR 35.0. Changes to format
	One field identifier has changed in the PIR-International
	databases. All "#Title" tags for submitted citations have been
	converted to the new tag "#Description" which will not be
	standardized. This information may be considered free text.
  Changed code to reflect this.

access4.c
  The record size stored in acnum.hit header was 18. It should be
  4.

piraccession.script
emblaccession.script
genbaccession.script
  The name of the accession number index files are now acnum.hit and
  acnum.trg.


Thu Jan 21 15:32:26 GMT 1993
----------------------------
genbentryname1.c
pirentryname1.c
  These programs now give the offset of the FIRST base in the
  sequence. The entryname index previously being created was not
  in accordance with the standard specification. This change
  corresponds to changes to programs in the Staden package,
  which are included in release 1993.0 of the package.


Thu Jan 21 15:29:56 GMT 1993
----------------------------
genbentryname1.c
  The sequence offsets created in the entryname index were
  calculated wrongly. With the use with the Staden package
  it caused the first line of the entry to be omitted.


genbaccession.script
genbauthor.script
genbdivision.script
genbentryname.script
genbfreetext.script
genbtitle.script
  Genbank has 13 divisions

division.c
genbdivision.script
pirdivision.script
  Routines and scripts to create division lookup files.


Thu Jul 16 17:27:43 BST 1992
----------------------------
freetext.c
  Look for words in "OG" (EMBL/SWISSPROT) and "GN" (SWISSPROT)
  lines.


Tue Jun 16 16:56:09 BST 1992
----------------------------

freetext4.c
hitNtrg.c
  Creation of author and freetext indexes was in error. Each
  occurrance of author/word in the final sorted list was being written
  to the target file, rather than just once as it should have been.
  This bug did not affect the functionality but only the performance
  of the Staden programs that use the indexes.


Wed May 20 10:43:56 BST 1992
----------------------------

title2.c
entryname2.c
  In the embl updates it is possible that an entry appears more
  than once. These programs have been modified so that they ignore
  all but the first occurrence of the entry name, so that the brief
  and entryname index have the correct number of entries. This is
  not a clean solution, as words, authors, and accession numbers
  for the more recent entry won't appear in the annotation of the
  entry.


Wed May 13 17:22:09 BST 1992
----------------------------

author.c
hitNtrg.c
emblauthor.script
pirauthor.script
genbauthor.script
swissauthor.script
  Programs and scripts to create the new author indexes have been
  written. They are based closely on the freetext index. The program
  hitNtrg.c is almost identical to freetext4.c but takes the string
  length to be written to the target file from the command line.
  It is possible to write the accession number creation routines
  in the same fashion.


Wed Apr  1 16:33:11 BST 1992
----------------------------

freetext4.c Version 1.1
  Words that were longer than target file field width were not being
  truncated, thus corrupting the index. Fixed.


embltitle1.c Version 1.1
pirtitle1.c Version 1.1
pirtitle2.c Version 1.1
genbtitle1.c Version 1.1
  From some sources, the sequence libraries end each line with a
  carriage return followed by a new line character. The programs
  were changed to filter out non-printable characters in the title
  lines.

Wed Apr  1 18:48:12 BST 1992
----------------------------

genbaccession.script Version 1.1
piraccession.script Version 1.1
  The second sort in these scripts was in error, causing the file
  access.sorted2 to in fact no be sorted on accession number.  The
  command "${SORT} +1 +0..." should have been "${SORT} -b +1...".


Wed Apr 22 1992
---------------

freetext.c Version 1.1
  The line offset for PIR should be 16 not 15. This would only affect
  libraries where the 10th character of the entry name is significant
  and excluding it would result in a different sort order.

author.c Version 1.0
  A new program for extracting author names from sequence libraries.
  We have yet to see the EMBL CR-ROM author indexes, so this program
  may change. No scripts written yet. Subsequence processing of output
  file will include:
	1) Sorting on entry name, removing duplicate entry-name/author
	entries. (sort -u ...)
	2) Assigning entry numbers, using freetext2.c
	3) Sorting on author name. (sort -b +1 ...)
	4) Creation of indexes with program similar to freetext4 (differing
	only by the fact that the target string will be a different size.)