AgamP3.4

AgamP3.4 was released at VectorBase in July 2007.

The same annotation is visible at Ensembl from release 45.

Changes from previous gene set AgamP3.3

Significant differences from the previous annotation include:

  • Many more manually-appraised gene models, including most models on chromosome arm 2L.
  • Better identification of repeats (especially transposons) leading to a reduction in models that may be transposon-derived.
  • Improved handling of community-provided annotation.
  • Use of selected VectorBase transcript models from Aedes aegypti as an additional evidence source
  • Improvements to protein-based models due to better parameterization of GeneWise

Details of genebuild

The AgamP3.4 gene annotation was prepared by combining sets of transcript models made by different approaches.

  1. Manually-curated models (including alternative transcripts).
  2. Models built with GeneWise using Anopheles proteins (from public databases or contributed directly by Anopheles researchers) were given EST-based extensions, and merged to give a non-redundant set (allowing alternative transcripts).
  3. Models built with GeneWise using arthropod proteins (primarily from Drosophila (FlyBase 4.3 set) and Aedes (AaegL1.1) that have EST support), plus selected Arthropoda entries from Uniprot) were given EST-based extensions where possible, and merged to give a non-redundant set (not allowing alternative transcripts).
  4. EST-based models built solely from A. gambiae ESTs using the ClusterMerge algorithm.
  5. Protein-based models built with GeneWise using other Metazoa entries from Uniprot were given EST-based extensions (rarely possible), and merged to give a non-redundant set (not allowing alternative transcripts).
  6. SNAP ab initio predictions that have an identifiable Pfam domain but do not overlap with repeats.

The final gene set was produced by the progressive addition of models from the different approaches. First, Set 1 & 2 transcripts were combined, giving priority to manually-curated models in cases of conflict. Set 3 genes were then added, but only where there was no overlap with a Set 1/2 model. Genes from sets 4, 5 and 6 were similarly added in turn, only where there was no overlap with a higher priority model.

In addition, tRNA genes were predicted using the program tRNAScan-SE, and a small number of miRNA genes were predicted by homology with miRBase entries.

References:

GeneWise and its use within the Ensembl system for gene model annotation.

  • E.Birney et al., Genome Res. 2004 14:988-95
  • V.Curwen et al., Genome Res. 2004 14:942-50

EST alignment to genomes using Exonerate.

  • G. Slater et al., BMC Bioinformatics. 2005 6:31

      Cluster-Merge algorithm.

      • E.Eyras et al., Genome. Res. 2004 14:976-87

      Ab initio gene finding by SNAP.

      • I. Korf, BMC Bioinformatics. 2004 5:59

      tRNAscan-SE.

Genes

Genes Protein-coding Other
12,945 12,457 488

Transcripts

Transcripts Protein-coding Other
13,621 13,133 488
Release date: 
7 Jan 2007
Assembly: 

AgamP3

As described in Holt et al. (2002), plasmid and BAC DNA libraries were constructed with stringently size-selected PEST strain DNA. Two BAC libraries were constructed, one (ND-TAM) using DNA from whole adult male and female mosquitoes and the other (ND-1) using DNA from ovaries of PEST females collected about 24 hours after the blood meal. Plasmid libraries containing inserts of 2.5, 10 and 50 kb were constructed with DNA derived from either 330 male or 430 female mosquitoes.