Download SPTrEMBL from the EBI's FTP site. This database is built weekly so this will be an up to date copy.
Extract the Brassica records from the flat files (brass_embl.pl):
% zcat sptrembl/sprot.dat.Z | ~/perl/brass_embl/brass_embl.pl % mv brass_embl.dat Brassica_SwissPROT.dat % zcat sptrembl/trembl.dat.Z | ~/perl/brass_embl/brass_embl.pl % mv brass_embl.dat Brassica_TrEMBL_.dat % zcat sptrembl/trembl_new.dat.Z | ~/perl/brass_embl/brass_embl.pl % mv brass_embl.dat Brassica_TrEMBL_new.dat
| SwissPROT | TrEMBL | TrEMBL new | Total | |
|---|---|---|---|---|
| Selected Brassica peptides | 162 | 479 | 186 | 827 |
| Brassica peptides | 216 | 552 | 212 | 980 |
| Total records | 84,893 | 235,493 | 117,496 | 437,882 |
Parse the files produced using the SwissPROT parsing script to generate the ace files.
% ~/perl/seq2ace/sp2ace/sp2ace_m.pl *.dat % mv sp_seq.ace SPTrEMBL.ace
Parse the files and load them into a scratch database. Compare the numbers of objects with the production database.
| BrassicaDB | Scratch database | |
|---|---|---|
| Peptide | 758 | 823 |
| Protein | 758 (1,176) | 823 |
The peptides from TrEMBL new could be a problem since their accessions will change when they are added to TrEMBL. This is fine if the proteins are going to be re-aquired each time, but not so great if incremental builds are prefered.
For now we will go with a re-aquisition stragegy, but this may change later.
So strip the peptide data out of the current database, and load up the new data.
Also generate a BLAST database of SPTrEMBL to use for analysis until next rebuild.
% zcat *.dat.Z | /usr/local/blast2/sp2fasta -g - > SPTrEMBL % /usr/local/blast2/setdb -t SPTrEMBL SPTrEMBL
|
Last modified: Wed Jun 19 15:18:24 MET DST |