The following notes are specific to EMBL release 79, and supplement the general details of the process described in BrassicaDB Nucleotide Sequence Process.
Extracted Brassica accessions from the EMBL flat files
| Brassica | EMBL | |
|---|---|---|
| EST | 43,746 | 21,434,594 |
| HTG | 4 | 68,255 |
| GSS | 596,930 | 8,471,953 |
| Other genomic seqs | 2,680 | 4,476,642 |
| Total | 643,360 | 34,451,444 |
Loading the generated ace files into an empty database yields ...
| Class | # objects | % change (cf r78) |
|---|---|---|
| DNA | 643,359 | +0.02% |
| Paper | 1,061 | +3.8% |
| Protein | 0 | |
| Sequence | 646,071 | +0.04% |
Parsed UniProt Release 1.11, released 7/6/2004 (results). Built the SPTrEMBL and EMBL BLAST databases for the BrassicaDB compute.
Building the database with EMBL r79, post-79 updates to today (14/06/04), the BBSRC SSR data, UniProt 1.11 and the BrasscaDB legacy dataset gives ...
| Class | # objects |
|---|---|
| Author | 10,002 |
| DNA | 643,765 |
| Gene_Product | 11,113 |
| Paper | 7,261 |
| Journal | 1,207 |
| Peptide | 1,876 |
| Protein | 1,876 |
| Sequence | 646,540 |
| Species | 28 |
Started the BrassicaDB BLAST compute at 1135. This process is being hindered by a large increase in the number of EST query sequences and hardware/software problems with our Linux cluster.For the present we are limiting the BLASTN analysis for the SSR flanking sequences to BrassicaDB and not EMBL.
Completed the BLAST analysis at 4.20 pm
Processed the BLAST output to ace files and applied to database. The Brassica GSS sequences had already been mapped to the TIGR v5 pseudomolecules as part of a separate development exercise. This output is in the form of a GFF file to drive GBrowse/MySQL. It was decided to recode the parsing of this file to look up annotation from TIGR v5 XML files as the reference source.
Forged intra-database protein object links and the links to the (frozen) Mendel-GFDb and Mendel-ESTS databases.
|
Last modified: Thu Sep 16 22:06:57 BST 2004 |