Web:GenomeBrowser:ContigView
From VectorBase Help System
| The following section needs improvement; if you are an editor, please expand upon the following section. |
[edit] Introduction
ContigView is the principal data visualisation tool for genome sequence annotation information. It provides a high level view of the contig sequences that form the genome sequence assembly, and of genes and other features that have been placed on it.
ContigView can be customized to suit you. More information can be added or the displays can be simplified to make browsing faster. Please look at the Menu Bar section for details.
The page is split into four sections representing different levels of zooming into the chromosome:
- The entire chromosome or a scaffold as an alternative sequence block for species with pre-chromosome assemblies.
- An Overview panel displaying a chromosome region of up to 1 Mb.
- The main Detailed View panel showing a broad range of features.
- A Basepair View panel showing within a small assembly region of up to 500 bases the actual sequence, six frame translations and restriction endonuclease recognition sites.
The red boxes in the Chromosome, Overview and Detailed View panels represent the regions shown at higher magnification in the panel following below. The absolute base pair location of the region displayed in Detailed View is indicated in the Navigation Bar at the top of the panel. You can use this bar to navigate along any chromosome by entering a new chromosome and location. Physical map locations may be directly specified entering base pair coordinates or numbers with 'kb' or 'Mb' as suffix.
[edit] Chromosome
The Chromosome panel displays an ideogram of an entire chromosome together with its cytogenetic banding pattern. Maps of cytogenetic bands to the genome sequence allow for rather crude orientation and are not available for all species. For all those species with genome sequence assemblies in a pre-chromosome stage, Ensembl displays other 'top-level' sequence entities such as scaffolds.
The red box illustrates the extent of the region displayed in the 'Overview' panel below and can be moved by clicking anywhere on the chromosome. The Chromosome display can be turned on or off using the plus [+] or minus [-] boxes, respectively.
[edit] Overview
The Overview panel displays a larger section of a chromosome together with its basic annotation. Usually the range is set to 1 Mb but can be smaller for species with genomes of higher density.
The panel displays the following information:
- Chromosome Bands
- Cytogenetic chromosome band(s) that map to a particular region allow for crude orientation on the genome sequence. Data sets mapping cytogenetic bands to genome sequence coordinates are currently not available for all species displayed in Ensembl.
- Scale Bar
- A scale bar illustrates the physical map coordinates this particular region falls into. Generally, Ensembl displays the genome sequence in standard notation from the p-telomere to the q-telomere. Since estimated gap sizes are included in the physical map, coordinates might change between genome sequence assembly builds.
- DNA (contigs)
- The individual contig sequences that form the genome sequence assembly in this particular region are depicted in alternating dark and light blue color. Where no blue contig is shown, this indicates a gap in the assembly.
- Markers
- The positions of Sequence Tagged Sites (STS) markers are indicated in magenta directly under the contig map ideogram.
- Ensembl Genes
- Colored boxes represent automatically annotated Ensembl genes. Thereby Ensembl Known Genes, which correlate to species-specific entries in public sequence databases are red, Ensembl Novel Genes are black and Ensembl Pseudogenes are drawn in gray.
- ncRNA Genes
- Several classes of hand checked RNA genes are also drawn if available for the organism-specific data set.
- Vega Genes
- Blue genes represent manually curated genes drawn from the Vega database (Ashurst, et al., 2005).
- Gene Legend
- The color coding of all types of genes is dynamically shown at the bottom of the Overview panel in the Gene Legend track. For additional information about the categories of genes displayed, see transcript information below.
The red box illustrates the extent of the region displayed in the subsequent Detailed View panel below. You may click anywhere in the Overview panel to re-center the red box at that point on the contig map. The Detailed View display below will change accordingly. Except for re-centering the display, contigs and genes are not clickable in the Overview display, but they are selectable in the Detailed View panel below. The Overview display can be turned on or off using the plus [+] or minus [-] button, respectively.
[edit] Detailed View
The third panel Detailed View shows smaller regions of chromosomes and provides more detailed insight into genome annotation. Features are annotated in tracks along the genome sequence assembly in its standard notation from the p-telomere to the q-telomere. The genomic DNA sequence is generally assembled from smaller sequence-level entities (BAC clones, whole genome sequencing scaffolds or contig sequences in general), which are represented by alternating dark and light blue blocks. Colour-coded features above the contigs are positioned on the forward strand, while those below are on the reverse strand, respectively.
The entire Detailed View display panel can be turned on or off using the plus [+] or minus [-] button, respectively.
[edit] Menu Bar
The Menu Bar and the Navigation Bar on top of the Detailed View panel are the main tools to customize this display. A set of pull-down menus is available and allows upon opening selection of options via check boxes. Changes take effect by clicking at the 'Close menu' option at bottom of the menus. The following menus are available:
- Features
- Genes, transcripts, markers and features in general that are annotated on the genome sequence are organised into tracks. Some feature tracks are not displayed by default, but can be added. Turning off unwanted features and functions will not only make the web pages download and render faster but also make it easier to see the features of interest. Generally, Ensembl annotates a broad variety of features. Some of them may be species-specific. A more detailed description of feature tracks and the underlying data sets follows below.
- Comparative
- Ensembl calculates whole genome alignments on a species pair basis. This menu provides a list of tracks that can be added to the display. Please see the Whole Genome Similarity Matches section for a detailed description of the calculation method. The following naming convention is in use:
- Species
- The species is encoded in the abbreviated form of the scientific name. (e.g. Hs for Homo sapiens, Mm for Mus musculus, ...)
- Conservation
- Ensembl compares the genomes of species pairs within phyla (e. g. vertebrates or arthropods) where significant homology can be expected. The conservation (cons) track annotates the results of this comparison. For closely related species pairs the initial conservation information is then re-scored into a second levels of conservation, which is annotated by the high conservation (high cons) track.
- Algorithm
- Ensembl may compare species pairs with two or more different algorithms. In these cases the algorithm is indicated in full (e. g. BLAT) or abbreviated form (e. g. bz for BLASTz). Please see the Whole Genome Similarity Matches section for more details.
- DAS Sources
- While all features are part of the underlying Ensembl databases, the Distributed Annotation System (DAS) provides a way to display Ensembl-external data sets in the genome browser. Ensembl provides already a set of pre-configured data sources, which can be added by selecting from the check boxes. Others data sets be added and configured in Ensembl DasConfView, which is available via the 'Manage sources ...' option in this menu.
- Repeats
- Ensembl characterizes and annotates several classes of repetitive sequences in repeats tracks. Options in this menu allow annotation of individual or all classes simultaneously.
- Decorations
- Decorations are properties of the assembly or display, rather than features located on the assembly.
- Half-height glyphs
- It is also possible to set the display to show most of the features at half their normal height.
- Show empty tracks
- This option will cause an information message to be displayed for a feature type, even when no features of that type appear in the current view. You may prefer this to the default behavior, whereby the track is not displayed at all when there are no features to display.
- Gene legend
- Unchecking this option will remove the descriptive legend explaining the color-coding of different gene and transcript types.
- Show register lines
- Unchecking this option will remove the evenly spaced vertical lines.
- Show pop-up menus
- For most features in the Detailed View panel, extra information and links can be displayed in pop-up text windows by pointing at features.
- The pop-up menu function can be turned off entirely by unchecking this option. This may speed your browsing. You will still be able to click on a feature to go to the corresponding information page. An alternative way of using the pop-up windows is also available: you can choose to have pop-up menus appear only when you click on a feature, by checking the '... popup on click' option. However, if you plan to use Ensembl regularly it is well worth getting used to the default behavior of the pop-up menus.
- Clones
-
- Tile path
- The tile path track in human Ensembl shows the tiling path (i. e. the locations) of BAC clones that form the current genome sequence assembly (the "golden path"). The different colours of red, orange and gold are only used to help distinguish between clones in the display. Pink clones are still in the phase1Ac stage. The name of the clone will be displayed if there is room in the display. Clones for which fluorescence in situ hybridization (FISH) mapping information is available are marked with a black triangle in the top left corner. Where a clone is shown in outline, the mapping of the clone to the sequence assembly is problematic and the true length is not displayed. Mouse-over brings up information about a particular clone and an option to re-centre the display around its location.
- 1 Mb Clone Set
- The 1 Mb clone set has been developed as a resource to aid the identification of breakpoints in chromosome rearrangements. The clones were selected to provide a set spaced at approximately 1 Mb intervals across the entire genome. Clones for which FISH-mapping data is available are marked in the top left corner with a black triangle. Dark and light green are only used to help distinguish between clones in the display. The name of the clone will be displayed if there is room in the display. Pointing at a clone will display a pop-up window with information about the clone and a clickable link to re-center the clone.
- 32k Clone Set
- Clones from the human genome high-resolution BAC re-arrayed [32k clone set] mapped to the genome sequence. The 32k clone set and individual clones from it are available via the [BAC-PAC] resource.
- Export
- This pull-down menu gives several options for downloading the data represented in the Detailed View panel.
- 'Flat file', 'FASTA' and 'Image' will redirect you to an ExportView page preset to the extent of genome sequence displayed in Detailed View and to the kind of download you have requested.
- Ensembl gene list, EST gene list, Vega gene list and SNP list will redirect you to the BioMart data mining system, with the displayed region and choice of focus already selected.
- Image Size
- - By default, the overall image size is set to a width of 700 pixels. This is appropriate for standard-sized screens. The 'Image size' pull-down menu on the gold bar allows you to adjust the width up to 2000 pixels.
- Help
- An additional pull-down menu on the menu bar gives you the option to jump directly to help sections on Detailed View display configuration, description of DAS Sources, this general help page for ContigView and to a page for sending questions or comments to the Helpdesk.
[edit] Navigation Bar
The Navigation Bar and the Menu Bar on top of the Detailed View panel are the main tools to customise this display. The following navigation functions are available:
- Horizontal Scrolling
- Navigation buttons allow horizontal scrolling of the display for 1, 2 and 5 Mb to the left or right. As the Overview panel displays 1 Mb sequence these buttons shift the entire display for multiples of this panel. Window buttons move the display only 80% to the left or right, preserving 20% of the display to facilitate orientation.
- Zooming Buttons
- By default, the Detailed View panel shows a region of 1 Mb. Clicking the plus or minus buttons zooms into or out of the region by a factor of approximately 2, respectively. Both buttons allow zooming in a range of as little as 1 bp and as much as 1 Mb.
- You can also navigate by clicking on the scale bars at the top and bottom of the Detailed View panel. This brings up a clickable pop-up menu that lets you zoom or re-centre the Detailed View display.
- Zooming Ladder
- The zooming ladder restricts or expands the field of view to a scale suitable to view any feature of interest. Individual steps of the ramp represent 1, 5, 10, 50, 100, 200, 500 kb or 1 Mb sequence, respectively. Regions larger than 1 Mb up to an entire chromosome are best viewed in Ensembl CytoView.
- Physical Coordinates
- The physical coordinates of the sequence region displayed in the Detailed View panel are indicated in the Navigation Bar. To move to a different chromosome or to specify a new chromosomal location in base pairs, enter numbers in the appropriate boxes and click the Refresh button.
- To specify a region between two generic features such as cytogentic bands or STS markers, use Ensembl MapView.
[edit] Feature Tracks
Feature tracks are named at the left side of the Detailed View panel. Clicking a track name will directly link to a description in Ensembl HelpView. Black track names represent Ensembl-internal feature tracks, while blue names indicate tracks served via the Distributed Annotation System from external DAS sources. Tracks may be turned on or off and customized to suit your requirements. Pointing the mouse to a feature will bring up a pop-up window showing the feature identifier together with links to more detailed information whenever available. Pop-up menus can be turned off by un-checking 'show pop-up menus' in the 'Decorations' pull-down menu. A single click on most features will take you to an appropriate page with more information on that particular feature, unless the '... pop-up on click' option from the 'Decorations' menu has been selected.
- Histone Modifications Track
- This track can be chosen from the 'Features' menu, and it appears collapsed by default, in order to see the plot you will have to expand it by clicking on the [+].
- Enriched sites were identified by a two-state hidden Markov Model (Flicek, unpublished) which incorporates replicate data and thus enrichment and replicate consistency.
- Raw data from tiling array experiments is normalized and displayed as simple wiggle tracks. This data is supplied to support and give a visual reference for the associated annotated features track. The default normalization of the data uses the VSN (Variance Stabilization Normalization) package from [Bioconductor], which performs a generalized log transformation. This roughly equates to the difference between the control and experimental value at low signal and smoothly transforms to the ratio between the values at high signals i.e. significant signal. This has the effect of minimizing anomalies arising from low signals pairs giving high ration scores. Current data is provided by:
- Human Chromosome X
- Data provided by Henk Stunnenberg (NCMLS).
- Mouse Chromosome 17
- Data provided by Mathew Sloane and Denise Barlow (CeMM).
[edit] DNA (contigs)
The DNA (contigs) track shows a representation of the genomic sequence assembly. Alternating light and dark blue blocks represent individual contig sequences in the genome sequence assembly. Small arrows near sequence identifiers represent the relative orientation of a particular contig sequence within the genome assembly in standard notation. Where no blue contig is shown, there is a gap in the assembly.
Pointing at a contig sequence representation in Detailed View will display a pop-up menu with the complete Ensembl sequence identifier (e.g. AC120349.5.1.183055) at the top. Sequence identifiers regularly include an [EMBL] accession number whenever available, as well as a sequence version, a start and an end coordinate ([EMBL accession number].[sequence version].[start].[end]). Since Ensembl is designed to use several coordinate systems like 'contigs', 'clones', 'supercontigs', 'scaffolds', 'chunks' or 'chromosomes' in parallel, corresponding sequence regions in other coordinate systems will be listed. Links in the pop-up windows allow for export of the sequence region or for centring the Detailed View panel on a particular sequence region. For BAC clones, Ensembl will provide an "EMBL source file" link to the underlying sequence database record in the pop-up window.
Clicking on a contig sequence representation in the Detailed View track will immediately center on the sequence region.
[edit] Transcripts and Genes
Several transcript types are available within the Ensembl system:
- Ensembl Transcripts
- This track shows Ensembl transcript model predictions, which are a result of the gene-building procedure ([Curwen et al., 2004]) in the Ensembl analysis and annotation pipeline. It is important to emphasize that all Ensembl predictions are strongly based on biological evidence imported via [UniProt] and [RefSeq] protein and cDNA sequence records, respectively. Thereby, three Ensembl transcripts types are distinguished:
- Known transcripts, which correspond closely to near-full-length species-specific cDNA and/or protein sequences already available in the public sequence databases, are shown in red.
- Novel transcripts, which cannot be matched with confidence to species-specific database entries, are shown in black. Most of the novel transcripts have been inferred from closely related species and are a result of the 'similarity build' procedure.
- Pseudogenes, which are transcripts that simply lack a coding sequence or fulfill more advanced criteria, are drawn in grey.
- Pointing at an Ensembl transcript in Detailed View will display a pop-up menu with the stable Ensembl transcript identifier (e.g. ENST12345678901) or an assigned gene symbol at the top. Additional identifiers link to GeneView, TranscriptView and ProteinView pages. 'Export cDNA' and 'Export peptide' links lead to Ensembl ExportView pages, allowing cDNA or protein sequence export for this particular transcript in FASTA format, respectively.
- Clicking on an Ensembl transcript in the Detailed View track will directly lead to the corresponding Ensembl GeneView page.
- ncRNA
- Non-coding RNAs are identified through conserved patterns of secondary structure. We use cmsearch to search the genome using RFAM covariance models. Because of compute intensive nature of covariance model searching an initial BLAST step against RFAMSEQ identifies the regions of the genome to search.
- Unlike most ncRNAs, miRNAs show very high sequence conservation across species, subsequently we take a different approach to identify them. MicroRNA precursor sequences from miRBase are aligned to the genome using BLAST. The resulting alignments are assessed to ensure they encompass the mature miRNA sequence and RNAfold is used to confirm that the precursor sequence can fold into a hairpin structure.
- Ensembl also includes a set of hand-checked non-coding RNA genes provided by [Sean Eddy] and Tom Jones. The [ncRNA] set, as well as a detailed [description] of the annotation methods can be obtained from [ftp.genetics.wustl.edu].
- The following non-coding RNA gene types are annotated:
- tRNA
- Nuclear transfer RNA (or pseudogene).
- Mt-tRNA
- Mitochondrially-derived tRNA pseudogenes located in nuclear genome.
- rRNA
- Ribosomal RNA (or pseudogene).
- scRNA
- Small cytoplasmic RNA (or pseudogene).
- snRNA
- Small nuclear RNA (or pseudogene).
- snoRNA
- Small nucleolar RNA (or pseudogene).
- miRNA
- microRNA precursors (or pseudogene).
- misc_RNA
- Miscellaneous other RNA, such as Xist (or pseudogene).
- EST Transcripts
- This track displays transient transcript predictions annotated by the Ensembl analysis and annotation pipeline using EST evidence alone ([Eyras et al., 2004]). You may wish to compare these predictions with those in the Ensembl transcript track.
- Pointing at an EST transcript in Detailed View will display a pop-up menu with the non-stable EST transcript identifier (e.g. ENSESTT12345678901) at the top. Additional identifiers link to EST TransView and EST ProtView pages. 'Export cDNA' and 'Export peptide' links lead to Ensembl ExportView pages, allowing cDNA or protein sequence export for this particular transcript in [FASTA] format, respectively.
- Clicking on a EST transcript in the Detailed View track will directly lead to the corresponding Ensembl EST TransView page.
- GENSCAN
- GENSCAN tracks display transcripts predicted ab initio by the [GENSCAN] gene prediction program. GENSCAN is run on individual contigs, so that predictions do not span more than one contig.
- Pointing at a GENSCAN transcript in Detailed View will display a pop-up menu with the non-stable GENSCAN transcript identifier (e.g. GENSCAN12345678901) at the top. Additional identifiers link to GENSCAN Transcript Report and GENSCAN Protein Report pages. 'Export cDNA' and 'Export peptide' links lead to Ensembl ExportView pages, allowing cDNA or protein sequence export for this particular transcript in [FASTA] format, respectively.
- Clicking on a GENSCAN transcript in the Detailed View track will directly lead to the corresponding GENSCAN TransView page.
- GENSCAN information is also available via flat file export by selecting 'Prediction Features' on the Flat File tab on Ensembl ExportView pages.
- SNAP
- SNAP tracks display transcripts predicted ab initio by the Semi-HMM-based Nucleic Acid Parser ([SNAP]). Like GENSCAN, it predicts transcripts solely on the basis of the underlying genomic sequence and does not take any experimental evidence into account. The SNAP track is not available for all species, but SNAP performs better than GENSCAN in some species.
- Pointing at a SNAP transcript in Detailed View will display a pop-up menu with the non-stable SNAP transcript identifier (e.g. SNAP12345678901) at the top. Additional identifiers link to SNAP TransView and SNAP ProtView pages. 'Export cDNA' and 'Export peptide' links lead to Ensembl ExportView pages, allowing cDNA or protein sequence export for this particular transcript in [FASTA] format, respectively.
- Clicking on a SNAP transcript in the Detailed View track will directly lead to the corresponding TransView page.
- SLAM
- SLAM tracks display transcripts predicted by this comparative-based tool for syntenic genomic sequences. [SLAM] predicts gene structures for any suitably related pair of organisms (e. g. Aedes and Anopheles or Aedes and Drosophila).
- Genefinder
- Genefinder systematically uses statistical criteria (primarily log likelihood ratios, or LLRs) to attempt to identify likely genes within a region of genomic sequence. Candidate genes are evaluated on the basis of scores that reflect their splice site, translation start site, and coding potential LLRs, and intron sizes. A dynamic programming algorithm is used to find the set of non-overlapping candidate genes (on a given strand) having the highest total score (among all such sets). [Genefinder] is an unpublished work of Colin Wilson, LaDeana Hilyer, and Phil Green. The source code is freely available for research and educational purposes.
- DAS
- Displays of gene and transcript predictions from NCBI and other groups may be available as DAS sources.
[edit] Protein Homology Evidence
- Anopheles Protein
- Anopheles protein alignments.
- Dros. Protein
- Drosophila protein alignments.
- Other Protein
- Protein alignments that are not directly associated with a particular organism.
[edit] mRNA Homology Evidence
- RNA (ALL)
- mRNA alignments, including multiple hits in cases where more than one alignment location is found with a strong match (multiple hits per mRNA).
- RNA (BEST)
- mRNA alignments; only the best matching alignment location (one hit per mRNA).
[edit] EST Homology Evidence
- EST
- This track displays hits to species-specific Expressed Sequence Tags (ESTs). Please note that this does not represent the complete set of ESTs available from public sequence databases but rather a stringently filtered set. Only ESTs better than 97% identical to the genome over more than 90% of their length are included.
- In human and mouse Ensembl, this track shows the evidence on which EST transcripts are based. Mouse-over will show the EMBL accession number and a clickable link to the database entry. The entry is also reached by directly clicking on the feature. A maximum of seven entries are displayed in any one position although more entries may have been mapped to this location. Unlike the other 'evidence' tracks, the EST track shows ESTs mapped by homology to the all of the genomic sequence, instead of just to predicted exon regions.
- ARRAY_MMC1_ests
- Mosquito Microarray Consortium platform #1 EST hits.
[edit] tRNA
[edit] Eponine
[edit] First Exon Finder
[edit] Microarray Probe Sets
Ensembl annotates microarray probe sets on the genome sequences if manufacturers disclosed individual probe set sequences for a particular micro array. The mapping process is a two step procedure out-lined in the [Microarray Probe Set Mapping] document.
- ARRAY_EMBL_MMC2_12k_v1
- Mosquito Microarray Consortium platform #2 microarray spot locations.
- ARRAY_EMBL_MMC1_20k_v1
- Mosquito Microarray Consortium platform #1 microarray spot locations.
- Affy_Plasmodium_Anopheles
- Affymetrix Anopheles/Plasmodium Genome chip spot locations.
- ARRAY_LIV_GAMDETOX_0.25k_v1
- Anopheles detox microarray spot locations.
[edit] BLAST Evidence
- BLAST UniProtKB
- One-way BLAST hits against the UniProtKB database.
- BLAST Drosophila
- One-way BLAST hits against the Drosophila genome.
[edit] Whole Genome Similarity Matches
[edit] Markers
[edit] Quantitative Trait Loci (QTL)
[edit] CpG Islands
[edit] Regulatory Regions
[edit] Single Nucleotide Polymorphisms (SNPs)
[edit] Repeats
[edit] Tile Path
[edit] Gaps
[edit] %GC
[edit] DAS Sources
VectorBase can display custom data tracks in the Genome Browser using the DAS protocol. The following data tracks are currently available in DAS:
- A. gambiae
- AnoEST Clusters
- The approximately 215 thousand expressed sequences of Anopheles gambiae were grouped into clusters using genomic sequence as template and associated with inferred functional annotation of the AgamP3 assembly, including the following: corresponding VectorBase gene prediction, putative orthologous genes in other species, homology to known proteins, protein domains, associated Gene Ontology terms, and corresponding classification into broad GO-slim functional groups (Kriventsevz et al. Genome Res. 15:893-9. 2005). AnoEST is a vital resource for interpretation of expression profiles derived using recently developed A. gambiae cDNA microarrays. Using these cDNA microarrays, Kriventseva et al. have experimentally confirmed the expression of 7961 clusters during mosquito development. Of these, 3100 are not associated with currently predicted genes. Moreover, the clusters with confirmed expression are nonbiased with respect to the current gene annotation or homology to known proteins. Consequently, we expect that many as yet unconfirmed clusters are likely to be actual A. gambiae gene.
- CEGG
- To Do
- MS_data_JHU
- 3,967 mass spectra from 16 LC-MS/MS runs of Anopheles gambiae salivary gland homogenates have been searched against the Anopheles gambiae genome database. The peptide sequences from this study were mapped onto the genomic sequence using the distributed annotation system available at Ensembl and can be visualized in the context of all other existing annotations. The strategy described in this paper can be used to correct and confirm genome annotations and permit discovery of novel proteins in a high-throughput manner by mass spectrometry. For more information, see the paper Genome annotation of Anopheles gambiae using mass spectrometry-derived data, Kalume et al. BMC Genomics. 2005 Sep 19;6:128.
- Manual Annotation
- This track displays manual gene models generated as part of the VectorBase Manual Annotation Pipeline.
- PDB_Spice
- To Do
- ReAnoCDS
- ReAnoCDS05 was obtained by synthesizing comparative and /ab initio /sets of predicted coding sequences (CDSs) into a single set using an exon-gene-union algorithm followed by an open-reading-frame-selection algorithm. The reannotation predicts 20,970 CDSs supported by at least two lines of evidence, and it lowers the proportion of CDSs lacking start and/or stop codons to only approximately 4%. The reannotated CDS set includes a set of 4,681 novel CDSs not represented in the Ensembl annotation but with EST support, and another set of 4,031 Ensembl-supported genes that undergo major structural and, therefore, probably functional changes in the reannotated set. The quality and accuracy of the reannotation was assessed by comparison with end sequences from 20,249 full-length cDNA clones, and evaluation of mass spectrometry peptide hit rates from an /A. gambiae /shotgun proteomic dataset confirms that the reannotated CDSs offer a high quality protein database for proteomics". The full paper can be obtained at http://genomebiology.com/2006/7/3/R24
- Transposons
- To Do
- Tiling Expression
- Whole organism RNA from 3-5 day old adult male or female mosquitoes was hybridised to a whole genome tiling array featuring 76,782 probes densely targeted at exons from an old Ensembl gene set (release 15) and 94,469 probes targeted at regular intervals throughout the genome where no genes were annotated at the time (zoom out to 100kb or more to see these clearly). The 36-mer probes were designed to be unique. The mapping here to the current repeat-masked VectorBase assembly used exonerate's ungapped alignment mode and only exact matches are shown (36 out of 36 identities). A small number of probes match more than once (see the Notes: section of the feature popup). Five male samples and five female samples were hybridised on five two-channel arrays, but the data here is treated as 10 independent channels. Each channel's intensity data has been median normalised (the new median is arbitrarily set to 100). Two "final" expression values are shown for each probe, the arithmetic mean of the five female signals in the positive direction (female up) and the same for the five male samples in the negative direction (male down). If you forget which way round these are, click on the bars for a reminder (see TYPE: subheading). For more information on the experiment, please see the open access article or the GEO submission, or contact the VectorBase help desk. Please note that DAS tracks serving many features (like this one) will slow down your browsing, so you are advised to turn it off when not needed.
- A. aegypti
- AedEST Clusters
- The same procedure as for the AnoEST track above was applied to Aedes aegypti ESTs.
