SOP001
From VectorBase Help System
Contents |
[edit] SOP definition
Gene accuracy, gene structure inferred from protein and transcript data.
[edit] Introduction
This document provides information relating to the VectorBase gene structure predictions with particular reference to the lines of evidence used in prediction. It does not cover the downstream analysis which provide functional annotation, the assignment of gene symbols or descriptions.
[edit] Summary of product assigned to this SOP
A gene structure prediction which is associated with VB:SOP001 has complete transcript coverage of all coding exons including co-linearity across all introns. The evidence for this structure can be a full-length cDNA or a composite of EST sequences (i.e. the clustered consensus of the EST sequences is equivalent to a full-length cDNA). The prediction may contain both 5'-UTR and 3'-UTR but this is not a requirement.
Such gene structure predictions are viewed as completely reliable.
Note that this SOP is assigned to a transcript. The assignment of this SOP to a transcript does not preclude the existence of alternative splice forms (isoforms) for the parental gene. Annotated alternative isoforms have independent SOP assignments.
[edit] Evidence for gene structure prediction
[edit] Transcript mapping to genome
Transcript data from the self-species was mapped to the genomic sequence using the exonerate program [1] with a stringency which allows mismatches internally of (presumed) exonic alignments. The output from the exonerate program was parsed to yield a set of alignment objects for future web display and a set of partial gene structure predictions which correspond to the transcript alignment (i.e. each mapped EST produces a gene structure).
[edit] EST GeneBuilder
The Ensembl ClusterMerge algorithm was employed to collapse down the EST/cDNA based alignments into a non-redundant set of transcripts with complete open reading frames (n.b. complete in this sense means that the predictions are modulo 3 in length and not that any enforced restrictions exist on the first or last codon triplet).
[edit] Mark up in VectorBase gene build
- Any similarity build structure. Predictions come from the transcript-based similarity builds or the self-species similarity builds.
[edit] Methodology to assign these SOPs to the gene structures
- Transcripts have dna_supporting_evidence which covers all constituent exons
[edit] References
- ↑ G. Slater E. Birney BMC Bioinformatics, (2005), 6:31
