What data is available for download?


The data in VectorBase is available for download in a variety of formats.

Genomic sequences

Fasta files are available for contigs, scaffolds, and (if available) chromosomes, for all assembled genomes in VectorBase. All sequences are soft-masked. The assemblies are described by AGP (v2.0) files.

Repeat data

Repeat features annotated on assembled genomes are available in GFF3 format. For many species the custom repeat library used with RepeatMasker is also available.

Gene annotations

Gene sets for assembled genomes are provided in GFF3 and GTF (v2.2) formats. The sequences of the transcripts and proteins are available as Fasta files.

Transcriptomes and proteomes

Transcriptomes and proteomes are available, in Fasta format, for species which do not have a genome assembly (and also for some species which do have a genome assembly).

Projection data

When assemblies are updated, genes are projected from the old to the new assembly. This process generates a range of files (in text, Fasta, and GFF3 format), described in detail in another FAQ.

Microarray data

For Anopheles gambiae and Aedes aegypti there are tab-delimited files of gene-averaged expression summary data. These are the p-values and text summaries for genes that are displayed in the Expression Browser.


Ontologies used in VectorBase are provided in OBO format.

Comparative data

Files relating to comparative analyses are described in detail in another FAQ.