Browser (faqs)

Displaying 1 - 10 of 36

Alignments (Image)


The top panel is similar to the gene map shown in the Region in Detail view in the Location tab.

Whole genome alignments are shown graphically in the lower panel. Alignments themselves are drawn. Note: The multi-species comparison view is a similar page that shows the chromosomes, scaffolds and contigs as they are. For example, gaps in the multi species comparison page are gaps in the genome assembly. Gaps in this align slice view may be gaps in the alignment.

Select the alignment at the top of the this panel by using the "alignment" roll-down menu. Choose multi-species alignments across more than two species, or a pairwise alignment between two species.

The Display

Horizontal bars

Horizontal blue bars represent genomic sequence as in other Ensembl views. The filled or hollow horizontal brown bars are clickable, and represent the alignments. If a bar is filled, the forward strand of the chromosome was aligned. Hollow bars represent alignments using the reverse strand of the chromosome.

Vertical shading

The brown background is there for contrast, and white vertical stripes are gaps in the alignment. Each panel shows this unique shading for a specific species. Different colours of vertical shading represent different chromosomes in the alignment.


Arrows (triangles) indicate breaks in the alignment. Click on an arrow for more information about the break.

Ancestral sequences

These are inferred from the multiple alignments. If present, the species used to determine the ancestral sequence are listed on the blue bar. For example, a blue bar labelled Hsap, Ptro, Mmul shows an ancestor of the Homo sapiens, Pan troglodytes, and Macaca mulatta genomes.

Customise the view by adding variations, gene sets, and other features using the configure this page link at the left.

To export alignments do so from the genomic alignments link, and click on export data from that page.

The image above shows a pairwise alignment between human and mouse. The mouse alignment was clicked on to show a pop-up box with information about that aligned region (on mouse chromosome 5).

Alignments (Text)


Whole genome aligments include "pairwise" sequence alignments between two species, and multi-species alignments using genomes of more than two species.

Export alignments with the export data link in the left hand navigation column of the Genomic alignments page.

Sequence Display

Only one species is shown by default. Click on select an alignment at the top of the sequence in order to choose an alignment to view.

Chromosomes and scaffolds in the alignment are listed for each species. Sequence is shown under these coordinates. Red, highlighted nucleotides are located in exons.

Click on configure this page at the left of the view to add or change the display. Customisable options are as follows:

  • Flanking sequence - View the sequence upstream and downstream of the gene.
  • Number of base pairs per row - Define how many base pairs shown per line.
  • Exons - Highlight exons
  • Exons on strand - Highlight exons on the forward, reverse or both strands of the chromosome.
  • Show variations - Show all variations.
  • Line numbering - Select line numbering. Relative to this sequence starts from 1 for the region displayed, and relative to the coordinate system shows the base pair position using genomic coordinates.
  • Conservation regions - Highlight regions in which more than 75% of bases match in the alignment.
  • Codons - Display codons for start and stop translation.
  • Display pop-up information on mouseover

Note: Ancestral sequences may be turned off using configure this page.

If the sequence is not known, nucleotides are replaced by dots.

The image above shows the multiple alignments for five catarrhini primates. Conserved nucleotides have been turned on, and are shown by blue highlighting. Variations and line numbering (relative to the coordinate system) are also selected by clicking on the configure this page button.

lawson's picture

cDNA sequence


Example page

This highly customisable sequence shows, by default, the transcript sequence (cDNA), the coding sequence underneath it, and the protein sequence in the third line. Line numbering is different for all three sequences.

Line numbering

  • First line: 1 at the start of the cDNA (i.e. the UnTranslated Region or UTR, if it is annotated)
  • Second line: 1 at the start of the coding sequence (i.e. from the first A of ATG)
  • Third line: 1 at the start of the protein sequence

Variations are drawn along the sequence, and an IUPAC ambiguity code represents the variation at the top of the relevant nucleotide. Click any ambiguity code for more information. A table of possible codes is described here.

Highlighted nucleotides are coloured according to the key at the top of the view. Red amino acids indicate that, due to sequence variation, one or more other amino acids are possible that that position.

Sequence colouring

  • UTR or UnTranslated Region is highlighted in dark yellow.
  • Exons are indicated by alternating black and blue nucleotide sequence.
  • Codons are marked by light yellow highlight, alternating with no highlight.
  • Variations are highlighted according to the key above the sequence. Red amino acids indicate more than one possible amino acid at that position. Hover over a red amino acid to see the alternative(s).

To turn off variations, coding sequence, UTR, and other markup, use the Configure this page tool button at the left. Note the view can be downloaded to open in Microsoft Word, using the Download view as RTF to do so.



A zoomed-in view of a chromosome is shown, including graphical displays of known and novel genes, percent of GC repeats, and variation density. Click on the chromosome to zoom in to Region in Detail. Add your own annotation to one chromosome, or a karyotype, using the custom data link and this view, or the Karyotype view.

Each species in Ensembl has a number of statistics for its genome assembly. These statistics are also found on species-specific home pages and are calculated as follows. Some counts may only be available from the species home page.

Base Pairs per chromosome

These are pre-calculated in order to speed up page display, and stored in the seq_region table of the core database. The number is based on the assembled end position of the last seq_region in each chromosome (from the AGP file), or if there is a terminal gap it is set to the assembled end location of that terminal gap.

For the haplotype chromosomes (c6_COX etc), although there is only haplotype-specific sequence for a small region of the chromosome, the length of the seq_region is set to the full length of the chromosome including the specific haplotype (eg. c6_COX is 170899992bp long).

Gene summaries

The number of gene types are listed below the chromosome, and are as follows:

Known Gene Count gives the number of known protein-coding genes that Ensembl has predicted on this chromosome. Known genes have been mapped by to species-specific protein sequences already available in the public sequence databases.

Novel Gene Count the number of novel genes that has predicted on this chromosome.Novel genes, although predicted on the basis of similarity to protein or cDNA sequences and/or ESTs, could not be mapped with confidence to existing entries for the same species.

Pseudogenes and non-coding (nc)RNA genes are annotated as several sub-classes of ncRNA genes. Counts per RNA gene class are available from this page.

Please note: Gene counts presented per chromosome on Ensembl Chromosome views are for only the areas shown. Gene counts for all chromosomes may not add up to the numbers presented for the whole genome on the species-specific home pages.This is due to extra-chromosomal, haplotypic sequences, which are annotated with genes but not necessarily displayed. The count differences are also due to the fact that pseudo-autosomal regions (PAR) on the human X and Y chromosomes count towards the whole-genome statistics only once.

SNP Count lists the number of variations that Ensembl has placed on this chromosome.

(Species Home Page) Base Pairs (whole assembly)

The total number of base pairs for the entire assembly is the sum ofall sequences in the dna table of the core database. It is available from the species-specific home page. This includes redundantregions such as haplotypic sequences and the pseudo-autosomal region (PAR) of the Y chromosome in human, and gaps in Drosophila melanogaster.See the assembly details of each species for more information.

(Species Home Page) Golden Path

The "golden path" is the length of the reference assembly. It consists of the sum of all top-level sequences in the seq_region table, omitting any redundant regions such as haplotypes and PARs.

To add user data to this display, click on the Custom data link at the left. Upload a file such as a gff file. If you have already uploaded data to another view, you can turn this track on by clicking on the configure this page link and selecting a track in the User data menu.

Note: The display is customisable. The gene densities and variation histogram may be turned off using the Configure this page link.

Exons - Sequence view


Exons, introns and flanking sequence are shown for one transcript in the 5' to 3' direction, regardless of whether it is a forward or reverse-stranded gene.

Change flanking sequence, view all intronic sequence, and/or turn on variations by clicking on the Configure this page tool button at the left of the view. Turn off columns using the Show/hide columns button at the top of the table. Export for use in Microsoft Word using the Download view as RTF button at the left of the view.

Exons - Uppercase letters

Flanking sequence and introns - lower case letters

  • Introns are blue
  • Flanking sequence upstream and downstream to the transcript is green

Explore this variation


This page offers a top panel of information specific to the variant. Graphical icons are presented that lead you to more specific variant data, also accessible from the links at the left. The links in the left hand menu have a corresponding icon. It's your choice how to navigate through the variation displays.

Top Panel

From the top of the view, the following information can be found:

  • Variant ID - The variant can be a SNP, insertion, deletion, structural variant, or a somatic mutation. An ID from dbSNP (rs...) is used preferentially to name the variation for short variants.
  • Alleles - The allele found on the forward strand of the reference assembly is shown first. Any alternate alleles are then displayed. For example, G/C indicates that the reference allele on the forward strand is G, and a C allele has also been detected at this position. Note: Ambiguity or IUPAC codes indicate the possible alleles. See the table at the bottom of this page.
  • Location - The location (chromosome, or scaffold, and basepair) of the variation is indicated, and a link is provided to the Region in Detail view. If more than one location is available, the variant maps to multiple locations.
  • Validation Status The status is imported from dbSNP See the NCBI validation status descriptions. More information can be found in the dbSNP handbook.
  • Synonyms - Other databases and projects cataloguing the same variation are listed. Any names of the variation within the alternate source are shown.
  • HGVS name - If nomenclature for this variant from HGVS is available, it will be shown here. A guide to HGVS symbols can be found at the HGVS website.

    Please see the variation documentation for more information such as source of variants, and consequence types (effect on genes and transcripts).

    IUPAC Ambiguity Codes

External data


Databases and projects external to VectorBase can be shown here. Click Configure this page to choose what information to view, related to your gene of interest. This page uses DAS or the Distributed Annotation System to show biological annotation from other sources. You can upload your own data (click on Configure this page, use the Custom Data tab).

External references


Genes, transcripts, and proteins are matched to sequences and information in other biological databases. The matches are referred to as external references, or Xrefs.

Xref sources for VectorBase genes include UniProtKB, EntrezGene, PubMed & Community symbols/descriptions.

Xref sources for VectorBase transcripts (i.e. matches to transcript and protein sequences) include UniProtKB, EntrezGene, and NCBI RefSeq.

Please see the General identifiers view in the transcript tab for more IDs associated with a specific transcript and/or protein.

Gene ontology


Gene/protein tree


VectorBase gene trees are generated by the Gene Orthology/Paralogy prediction method pipeline. All homologues are determined from gene trees.

Gene trees are constructed using one representative protein for every gene in every species. (They can also be considered as protein trees).

The display shows the maximum likelihood phylogenetic tree representing the evolutionary history of genes. These trees are reconciled with a species tree, generated by TreeBeST. Internal nodes are then annotated for duplication (red boxes) or speciation (blue boxes) events.

Red squares represent duplications nodes, blue squares represent speciation nodes, giving rise to paralogues and orthologues. Another class of node, ambiguous, is shown as a lighter blue square.

The gene of interest is highlighted in red and within-species paralogues are shown in blue, if the option to view paralogues is selected (below the tree diagram).

Taxonomy IDs refer to the NCBI Taxonomy Browser.The number at the top of pop-up menus (upon clicking on a node) corresponds to the node_id from the protein_tree_node table in the compara database.

Multiple alignment of the peptides (green bars) was made using MUSCLE. Green bars shows areas of amino acid alignment, white areas are gaps in the alignment. Dark green bars indicate consensus alignments.

Click on a node to expand a collapsed set of branches into a full tree. The consensus amino acid alignment corresponds to the consensus residues in the collapsed node, and will be expanded when the tree is expanded.

Configure this page to customise the tree. Colouring by clade can be removed.


Subscribe to Browser (faqs)