The first time a user visits, he or she is asked to complete a simple registration step. A validation code is sent to the e-mail address supplied and this is used to authenticate the user. A successful authentication is registered by means of a cookie so henceforward the user can revisit from that platform without further challenge.
Our experimental annotation pipeline takes as its input large-scale DNA sequence in fasta format, usually uploaded from the user's local filesystem, but a sequence can also be copied and pasted directly into the submission form. Several processing options are available, selected through tick boxes on the form.
Generate synteny report - overlapping chunks from the DNA sequence are used to query, with BLASTN, the Arabidopsis v5/v6 genome sequence and the gene model (currently from v5 annotation) with the best hit for each chunk is reported. The output format is tab-delimited, enabling it to be viewed in spreadsheet applications. A URL for download of the file is sent to the user.
Scan with Brassica IGF probes - the sequence is searched via BLASTN with the set of 1300 Arabidopsis gene probes used in the Brassica IGF physical mapping project.
Scan with Brassica RFLP probes - the sequence is searched via BLASTN with genetically mapped RFLP markers.
Scan with Brassica BACends - the sequence is searched via BLASTN with the complete set of ~220k KBr BACends. Results are marked up to give information on overlaps and candidate extension clones
Scan for microsatellites - the sequence is searched for SSRs with the msatfinder program and Primer3 is used to generate PCR primers for candidate amplicons. These can be directly searched for cross-hybridization against the entire reference sequence database using GBrowse's OligoFinder plugin
Scan with Brassica EST assemblies - the sequence is searched via BLASTN with the 95k EST set developed in collaboration with JCVI. This comprises an oriented and annotated set of 42,642 assemblies and 51,916 singletons. The BLAST hits recovered can then be accurately re-aligned using the BLAT EST/genome alignment program
Submit to gene prediction programs trained on Brassica - it is recommended to use both, together with the EST alignments
- GlimmerHMM
- SNAP
- others may be added in due course
Individual exons of the gene models so identified are then used to search with BLASTX the Arabidopsis TIGR v5 proteome and the best hits reported.
Arabidopsis gene models - BLAT is used to attempt an accurate alignment of the spliced Arabidopsis gene sequences corresponding to these best hits with the Brassica genomic DNA. Potentially useful for validation of the ab initio gene model calls, particularly when combined with the Brassica EST evidence.
When completed (a 200 kb BAC takes about 40 minutes to process when all the options are selected), the user is notified by e-mail and given the URL to the GBrowse view that will display the results. Some extra features have been added to the standard GBrowse display:
Mousing over a gene model gives a popup detailing the Arabidopsis gene giving the best BLASTX hit, links to GO terms for that gene and the ability to inspect a dynamic protein translation and alignment with ClustalW/Jalview
Clicking on an EST alignment allows the user to launch a realtime ClustalW alignment with the submitted genome sequence and to inspect it graphically with the Jalview applet
For help, comments and bug reports please contact Martin Trick