The UK Crop Plant Bioinformatics Network

BrassicaDB Peptide Process


Download SPTrEMBL from the EBI's FTP site. This database is built weekly so this will be an up to date copy.

Extract the Brassica records from the flat files (brass_embl.pl):

% zcat sptrembl/sprot.dat.Z | ~/perl/brass_embl/brass_embl.pl
% mv brass_embl.dat Brassica_SwissPROT.dat
% zcat sptrembl/trembl.dat.Z | ~/perl/brass_embl/brass_embl.pl
% mv brass_embl.dat Brassica_TrEMBL_.dat
% zcat sptrembl/trembl_new.dat.Z | ~/perl/brass_embl/brass_embl.pl
% mv brass_embl.dat Brassica_TrEMBL_new.dat

SwissPROT TrEMBL TrEMBL new Total
Selected Brassica peptides 162 479 186 827
Brassica peptides 216 552 212 980
Total records 84,893 235,493 117,496 437,882

Parse the files produced using the SwissPROT parsing script to generate the ace files.

% ~/perl/seq2ace/sp2ace/sp2ace_m.pl *.dat
% mv sp_seq.ace SPTrEMBL.ace

Parse the files and load them into a scratch database. Compare the numbers of objects with the production database.

BrassicaDB Scratch database
Peptide 758 823
Protein 758 (1,176) 823

The peptides from TrEMBL new could be a problem since their accessions will change when they are added to TrEMBL. This is fine if the proteins are going to be re-aquired each time, but not so great if incremental builds are prefered.

For now we will go with a re-aquisition stragegy, but this may change later.

So strip the peptide data out of the current database, and load up the new data.


Also generate a BLAST database of SPTrEMBL to use for analysis until next rebuild.

% zcat *.dat.Z | /usr/local/blast2/sp2fasta -g - > SPTrEMBL
% /usr/local/blast2/setdb -t SPTrEMBL SPTrEMBL