Wed Feb 17 11:30:28 GMT 1993 ---------------------------- freetext.c PIR 35.0. Changes to format One field identifier has changed in the PIR-International databases. All "#Title" tags for submitted citations have been converted to the new tag "#Description" which will not be standardized. This information may be considered free text. Changed code to reflect this. access4.c The record size stored in acnum.hit header was 18. It should be 4. piraccession.script emblaccession.script genbaccession.script The name of the accession number index files are now acnum.hit and acnum.trg. Thu Jan 21 15:32:26 GMT 1993 ---------------------------- genbentryname1.c pirentryname1.c These programs now give the offset of the FIRST base in the sequence. The entryname index previously being created was not in accordance with the standard specification. This change corresponds to changes to programs in the Staden package, which are included in release 1993.0 of the package. Thu Jan 21 15:29:56 GMT 1993 ---------------------------- genbentryname1.c The sequence offsets created in the entryname index were calculated wrongly. With the use with the Staden package it caused the first line of the entry to be omitted. genbaccession.script genbauthor.script genbdivision.script genbentryname.script genbfreetext.script genbtitle.script Genbank has 13 divisions division.c genbdivision.script pirdivision.script Routines and scripts to create division lookup files. Thu Jul 16 17:27:43 BST 1992 ---------------------------- freetext.c Look for words in "OG" (EMBL/SWISSPROT) and "GN" (SWISSPROT) lines. Tue Jun 16 16:56:09 BST 1992 ---------------------------- freetext4.c hitNtrg.c Creation of author and freetext indexes was in error. Each occurrance of author/word in the final sorted list was being written to the target file, rather than just once as it should have been. This bug did not affect the functionality but only the performance of the Staden programs that use the indexes. Wed May 20 10:43:56 BST 1992 ---------------------------- title2.c entryname2.c In the embl updates it is possible that an entry appears more than once. These programs have been modified so that they ignore all but the first occurrence of the entry name, so that the brief and entryname index have the correct number of entries. This is not a clean solution, as words, authors, and accession numbers for the more recent entry won't appear in the annotation of the entry. Wed May 13 17:22:09 BST 1992 ---------------------------- author.c hitNtrg.c emblauthor.script pirauthor.script genbauthor.script swissauthor.script Programs and scripts to create the new author indexes have been written. They are based closely on the freetext index. The program hitNtrg.c is almost identical to freetext4.c but takes the string length to be written to the target file from the command line. It is possible to write the accession number creation routines in the same fashion. Wed Apr 1 16:33:11 BST 1992 ---------------------------- freetext4.c Version 1.1 Words that were longer than target file field width were not being truncated, thus corrupting the index. Fixed. embltitle1.c Version 1.1 pirtitle1.c Version 1.1 pirtitle2.c Version 1.1 genbtitle1.c Version 1.1 From some sources, the sequence libraries end each line with a carriage return followed by a new line character. The programs were changed to filter out non-printable characters in the title lines. Wed Apr 1 18:48:12 BST 1992 ---------------------------- genbaccession.script Version 1.1 piraccession.script Version 1.1 The second sort in these scripts was in error, causing the file access.sorted2 to in fact no be sorted on accession number. The command "${SORT} +1 +0..." should have been "${SORT} -b +1...". Wed Apr 22 1992 --------------- freetext.c Version 1.1 The line offset for PIR should be 16 not 15. This would only affect libraries where the 10th character of the entry name is significant and excluding it would result in a different sort order. author.c Version 1.0 A new program for extracting author names from sequence libraries. We have yet to see the EMBL CR-ROM author indexes, so this program may change. No scripts written yet. Subsequence processing of output file will include: 1) Sorting on entry name, removing duplicate entry-name/author entries. (sort -u ...) 2) Assigning entry numbers, using freetext2.c 3) Sorting on author name. (sort -b +1 ...) 4) Creation of indexes with program similar to freetext4 (differing only by the fact that the target string will be a different size.)