assembly

Displaying 1 - 10 of 28

AatrE1

This assembly was generated using 101 bp paired-end Illumina HiSeq2000 reads generated from three libraries: a 180 bp insert ‘fragment’ library, a 1.5 kb ‘jump’ library, and a 38 kb ‘fosill’ library. Sequencing template for the fragment and jump libraries was derived from genomic DNA extracted from a single individual, which was preserved by freezing at -80C. Native genomic DNA was used for the fragment library and whole genome amplified DNA was used for the jump library. Template for the fosill library was generated from a pooled extraction of many individuals. Reads were assembled at the Broad Institute using the ALLPATHS LG algorithm, with the Haploidify option enabled to address high allelic heterozygosity in the template.

Assembly name: 
AatrE1
Release date: 
Wednesday, October 16, 2013
Tags: 

EBRO

Originally isolated from Spain, isofemale selection was performed prior to genome sequencing. For more details click here.

Scaffold count: 
1 371
Scaffold N50 (bp): 
9 206 694
GenBank WGS Accession: 
AXCP00000000
GenBank WGS Version: 
AXCP00000000.1
Assembly status: 
Deprecated
Sequencing method: 
Illumina
Assembly software: 
allpaths v. R46504
Average depth of coverage: 
98.0x
Genome Analysis Of Vectorial Capacity In Major Anopheles Vectors Of Malaria Parasites
Alternative_assembly_name: 
Anop_atro_EBRO_V1
Genome Size (bp): 
224,290,125

AatrE2

This assembly is a recaffolding of the AatrE1 assembly using in situ hybridization of genomic scaffolds and synteny information supplied from a preprint of the paper "Partial-arm translocations in evolution of malaria mosquitoes revealed by high- coverage physical mapping of the Anopheles atroparvus genome", Artemov et al, BMC Genomics, manuscript submitted. In all 201MB (89.6% of the genome) and 56 scaffolds were anchored to chromosomes, leaving 1,315 scaffolds (10.4% of the genome assembly) unmapped.
Assembly name: 
AatrE2
Release date: 
Monday, December 11, 2017
Tags: 

EBRO

Originally isolated from Spain, isofemale selection was performed prior to genome sequencing. For more details click here.

Scaffold count: 
1 371
Scaffold N50 (bp): 
9 206 694
GenBank WGS Accession: 
AXCP00000000
GenBank WGS Version: 
AXCP00000000.1
Assembly status: 
Current
Sequencing method: 
Illumina
Assembly software: 
allpaths v. R46504,GRIMM-Synteny
Average depth of coverage: 
98.0x
Genome Analysis Of Vectorial Capacity In Major Anopheles Vectors Of Malaria Parasites
Genome Size (bp): 
224,290,125

AculA1

This assembly was generated using 101 bp paired-end Illumina HiSeq2000 reads generated from two libraries: a 180 bp insert ‘fragment’ library and a 1.5 kb ‘jump’ library. Sequencing template for the fragment and jump libraries was derived from genomic DNA extracted from a single individual, which was preserved by freezing at -80C. Native genomic DNA was used for the fragment library and whole genome amplified DNA was used for the jump library. Reads were assembled at the Broad Institute using the ALLPATHS LG algorithm, with the Haploidify option enabled to address high allelic heterozygosity in the template.

Assembly name: 
AculA1
Release date: 
Friday, October 11, 2013
Tags: 

A-37

Originally isolated from wild individuals collected in Iran (Ghoran village, May 2010), mosquitoes were donated by Mohammad Oshagi and Igor Sharakhov (Virginia Tech). There is no colony.

Scaffold count: 
16 162
Scaffold N50 (bp): 
22 320
GenBank WGS Accession: 
AXCM00000000
GenBank WGS Version: 
AXCM00000000.1
Assembly status: 
Current
Sequencing method: 
Illumina
Assembly software: 
allpaths v. R46449
Average depth of coverage: 
31.0x
Genome Analysis Of Vectorial Capacity In Major Anopheles Vectors Of Malaria Parasites
Alternative_assembly_name: 
Anop_culi_species_A-37_1_V1
Genome Size (bp): 
202,998,806

AdarC3

This assembly was generated by the Laboratorio Nacional de Computacao Cientifica in Petropolis, Brazil from whole genome shotgun assembly of Roche 454 sequences using the Celera Assember. Changes from the previous assembly reflect additional 60 scaffolds which were submitted to GenBank not in the original submission to VectorBase.

Assembly name: 
AdarC3
Release date: 
Wednesday, April 30, 2014
Tags: 

Coari

Originally isolated from Brazil. The Coari strain of An. darling were collected from the shores of Lake Coari, close to the city of Coari in Amazonas State. This area has had a high malaria incidence reported over the past 15 year and entomological surveys suggest that transmission is almost exclusively via An. darling. Female mosquitoes were collected from the Santa Luzia Buiuçuzinho community approximately 20km from Coari. Gravid females were allowed to spawn and DNA extracted from 4th stage larvae.
Scaffold count: 
2 221
Scaffold N50 (bp): 
115 072
GenBank WGS Accession: 
ADMH00000000
GenBank WGS Version: 
ADMH00000000.2
Assembly status: 
Current
Sequencing method: 
454
Assembly software: 
Celera assembler
Average depth of coverage: 
20x
Finishing status: 
draft
Genome Size (bp): 
136,935,538

AfarF1

This assembly was generated using 101 bp paired-end Illumina HiSeq2000 reads generated from two libraries: a 180 bp insert ‘fragment’ library and a 1.5 kb ‘jump’ library. Sequencing template for the fragment and jump libraries was derived from genomic DNA extracted from a single individual, which was preserved by freezing at -80C. Native genomic DNA was used for the fragment library and whole genome amplified DNA was used for the jump library. Reads were assembled at the Broad Institute using the ALLPATHS LG algorithm, with the Haploidify option enabled to address high allelic heterozygosity in the template.

Assembly name: 
AfarF1
Release date: 
Tuesday, October 1, 2013
Tags: 

FAR1

Originally collected in 1967 from Papua New Guinea (Rabaul, East New Britain Province), isofemale selection was performed prior to genome sequencing. For more details click here.

Scaffold count: 
550
Scaffold N50 (bp): 
1 196 527
GenBank WGS Accession: 
AXCN00000000
GenBank WGS Version: 
AXCN00000000.1
Assembly status: 
Deprecated
Sequencing method: 
Illumina
Assembly software: 
allpaths v. R46320
Average depth of coverage: 
69x
Genome Analysis Of Vectorial Capacity In Major Anopheles Vectors Of Malaria Parasites
Alternative_assembly_name: 
Anop_fara_FAR1_V1
Genome Size (bp): 
180,984,331

AfarF2

Assembly name: 
AfarF2
Release date: 
Saturday, June 13, 2015
Tags: 

FAR1

Originally collected in 1967 from Papua New Guinea (Rabaul, East New Britain Province), isofemale selection was performed prior to genome sequencing. For more details click here.

Scaffold count: 
310
Scaffold N50 (bp): 
12 895 223
GenBank WGS Accession: 
AXCN00000000
GenBank WGS Version: 
AXCN00000000.1
Assembly status: 
Current
Sequencing method: 
Illumina
Assembly software: 
ALLPATHS-LG
Average depth of coverage: 
233x
Finishing status: 
draft
Genome Analysis Of Vectorial Capacity In Major Anopheles Vectors Of Malaria Parasites
Genome Size (bp): 
183,103,254

AgamP3

As described in Holt et al. (2002), plasmid and BAC DNA libraries were constructed with stringently size-selected PEST strain DNA. Two BAC libraries were constructed, one (ND-TAM) using DNA from whole adult male and female mosquitoes and the other (ND-1) using DNA from ovaries of PEST females collected about 24 hours after the blood meal. Plasmid libraries containing inserts of 2.5, 10 and 50 kb were constructed with DNA derived from either 330 male or 430 female mosquitoes. For each sex, several libraries of each insert size class were made, and these were sequenced such that there was approximately equal coverage from male and female mosquitoes in the final data set. Celera Genomics, Genoscope and TIGR contributed sequence data that collectively provided 10.2-fold coverage, assuming a genome size of 278 Mb. The whole-genome data set was assembled with the Celera assembler (MOZ1 assembly), which constituted the basis of the primary genome publication (Holt et al. 2002).

The first update to this assembly (MOZ2) involved the results of a concerted effort to correct some of the ambiguities in scaffold map locations and orientations by manual analysis of the archived BAC chromosome hybridization photographs and by the hybridization of a small number of new BAC clones selected to resolve questions of scaffold orientation. The new AGP file, and early draft of which was first displayed on the A. gambiae genome poster published in the 4 October 2002 issue of science, formed the basis of a new annotation and gene build displayed on 1 October 2003 (MOZ2) (Mongin et al. 2004). This assembly is also 278 Mb.

In 2006, the major scaffolds were re-ordered into a new golden path file by use of additional physically mapped BAC clones combined with scaffold-to-scaffold sequence comparisons that identified some sequence overlaps. The AgamP3 assembly has a total of 80 scaffolds assigned to and ordered on the chromosome arms X, 2R, 2L, 3R and 3L, 28 of which are newly mapped or oriented. The most significant improvement in this new assembly is 24 scaffolds (8.64 Mbp) located to pericentromeric regions. However, this does not complete the centromeric region of any of the chromosomes. The new GenBank entries, CM000356-CM000360, reflect the revised 2L, 2R, 3L, 3R and X chromosome assemblies. This new assembly (AgamP3) of non-redundant ~264 Mb is still probably an overestimation of the true genome size (Sharakhova et al. 2007).

Additional notes compiled on known assembly issues from the initial VectorBase project can be found here.

New in situ Scaffold Mappings
  • Using cDNA for in situ hybridization, 15 previously unmapped scaffolds with size totaling 5.34 Mbp have been mapped to the pericentromeric regions on the chromosomes.
  • 23 scaffolds, previously mapped to chromosomes but with ambiguous direction, have been oriented.
  • Further analysis of scaffolds using in situ hybridization of BAC clones allowed us to identify 1.96 Mbp (5 scaffolds) that spanned physical gaps between scaffolds on euchromatic parts of the chromosomes.
  • Analysis of BAC end sequences has found 23 BAC clones that span the synthetic inter-scaffold gaps.
  • Unmapped scaffolds have been aligned to the chromosome assemblies in silico, identifying 144 scaffolds totaling 8.18 Mbp, that are already represented in the current Golden Path.
  • A list of the newly mapped scaffolds can be found here or on the right.
  • The list of 144 overlapping previously unmapped scaffolds can be found here or on the right.
Identification of Overlapping Scaffold Ends
Using Exonerate and Dotter, adjacent scaffolds who's ends contain overlapping regions have been identified through visual inspection. For a stretch of sequence covered by two scaffolds, we have taken one the overlapping regions and selected it for use with our updated A. gambiae Golden Path. The overlapping sequence from the other scaffold will be still be associated with the same region on the chromosome, except that it will be listed as a haplotype region instead of part of the golden path.

Using these techniques, 18 major overlaps have been identified between scaffolds mapped to chromosome arms. Based on these overlaps, approximately 3.5Mb of overlapping sequence has been removed from the current Golden Path and reclassified as haplotype region.

  • The AGP file describing the new A. gambiae assembly can be found here.
  • The list of overlapping scaffolds and the haplotype regions can be found here and on the right.
Y Chromosome Scaffold Identification
The scaffolds containing Y chromosome-specific satellite DNA families were identified by in silico searches. Initially, each male-only scaffold was screened for the possible presence of satellite DNA using Tandem Repeat Finder software. Subsequently, the consensus sequence of each identified tandem repeat family was used as a query in BLASTN searches against a database made of scaffold sequences derived exclusively from male libraries and a database containing all scaffolds constituting An. gambiae genome. The satellite DNA queries, that returned the same number of hits in both databases, were regarded as potentially Y-linked and in each case their Y-linkage was experimentally confirmed using PCR and Southern blot techniques. All scaffolds harboring the Y-specific satellite sequences were treated as originating from the Y chromosome.

One scaffold (AAAB01008227), containing more complex sequences, was serendipitously discovered in a TBLASTN search of the unmapped scaffold set using as queries GenBank-derived sequences of sex-determining or male fertility-related proteins. Although sequence similarity of the scaffold to the query (GenBank accession no. B21124) was limited to low-complexity (microsatellite) region, implying lack of homology between the query and the subject, PCR experiments confirmed Y-linkage of that scaffold.

  • Download the list of Y chromosome scaffolds here or on the right.
Bacterial Scaffold Identification
678 unmapped scaffolds in the current A. gambiae assembly have been identified as bacterial contaminants. All unmapped scaffolds were used as query for a BLAST against NCBI's nr protein database. Based on results, a scaffold was classified as bacterial contaminant if it met the following criteria:
  • The scaffold had no match or overlap to any other A. gambiae scaffold.
  • Top hits against the scaffold were from bacterial proteins and had E-values of at least five orders of magnitude higher than to proteins from other organisms.
  • Verification of the classification criteria was performed by randomly selecting an amount of sequence equal to the total length of all the newly identified bacterial scaffolds from currently mapped scaffolds, dividing that length of sequence up into 678 smaller scaffolds, and then performing a BLAST against NCBI's nr database with those new chunks.
  • Hits were examined using the same criteria. 2 of the 678 new chunks met the criteria that would classify them as bacterial but upon individual inspection, this was due to low complexity region similarity.
  • A full list of scaffolds reclassified as bacterial contaminants can be found here or on the right.
Assembly name: 
AgamP3
Release date: 
Sunday, April 1, 2012
Tags: 

PEST

The Anopheles gambiae PEST strain was chosen for genome sequencing because it had both a fixed, standard chromosomal arrangement and a sex-linked pink eye mutation that could readily be used as an indicator of cross-colony contamination. The pink eye mutation originated in a colony called A. gambiae LPE established in 1951 at the London School of Hygiene and Tropical Medicine from mosquitoes collected in Lagos, Nigeria.

Scaffold count: 
7
Scaffold N50 (bp): 
49 364 325
GenBank WGS Accession: 
AAAB00000000
GenBank WGS Version: 
AAAB00000000.1
Assembly status: 
Deprecated
Genome Size (bp): 
273,093,681

AgamP4

As described in Holt et al. (2002), plasmid and BAC DNA libraries were constructed with stringently size-selected PEST strain DNA. Two BAC libraries were constructed, one (ND-TAM) using DNA from whole adult male and female mosquitoes and the other (ND-1) using DNA from ovaries of PEST females collected about 24 hours after the blood meal. Plasmid libraries containing inserts of 2.5, 10 and 50 kb were constructed with DNA derived from either 330 male or 430 female mosquitoes. For each sex, several libraries of each insert size class were made, and these were sequenced such that there was approximately equal coverage from male and female mosquitoes in the final data set. Celera Genomics, Genoscope and TIGR contributed sequence data that collectively provided 10.2-fold coverage, assuming a genome size of 278 Mb. The whole-genome data set was assembled with the Celera assembler (MOZ1 assembly), which constituted the basis of the primary genome publication (Holt et al. 2002).

The first update to this assembly (MOZ2) involved the results of a concerted effort to correct some of the ambiguities in scaffold map locations and orientations by manual analysis of the archived BAC chromosome hybridization photographs and by the hybridization of a small number of new BAC clones selected to resolve questions of scaffold orientation. The new AGP file, and early draft of which was first displayed on the An. gambiae genome poster published in the 4 October 2002 issue of science, formed the basis of a new annotation and gene build displayed on 1 October 2003 (MOZ2) (Mongin et al. 2004). This assembly is also 278 Mb.

In 2006, the major scaffolds were re-ordered into a new golden path file by use of additional physically mapped BAC clones combined with scaffold-to-scaffold sequence comparisons that identified some sequence overlaps. The AgamP3 assembly has a total of 80 scaffolds assigned to and ordered on the chromosome arms X, 2R, 2L, 3R and 3L, 28 of which are newly mapped or oriented. The most significant improvement in this new assembly is 24 scaffolds (8.64 Mbp) located to pericentromeric regions. However, this does not complete the centromeric region of any of the chromosomes. The new GenBank entries, CM000356-CM000360, reflect the revised 2L, 2R, 3L, 3R and X chromosome assemblies. This new assembly (AgamP3) of non-redundant ~264 Mb is still probably an overestimation of the true genome size (Sharakhova et al. 2007).

This assembly differs from the previous version, AgamP3, by the addition of the mitochondrial genome (L20934, 16,655 bp) which includes 13 protein-coding and 24 ncRNAs (22 tRNA and 2 rRNA genes).

Additional notes compiled on known assembly issues from the initial VectorBase project can be found here.

New in situ Scaffold Mappings

  • Using cDNA for in situ hybridization, 15 previously unmapped scaffolds with size totaling 5.34 Mbp have been mapped to the pericentromeric regions on the chromosomes.
  • 23 scaffolds, previously mapped to chromosomes but with ambiguous direction, have been oriented.
  • Further analysis of scaffolds using in situ hybridization of BAC clones allowed us to identify 1.96 Mbp (5 scaffolds) that spanned physical gaps between scaffolds on euchromatic parts of the chromosomes.
  • Analysis of BAC end sequences has found 23 BAC clones that span the synthetic inter-scaffold gaps.
  • Unmapped scaffolds have been aligned to the chromosome assemblies in silico, identifying 144 scaffolds totaling 8.18 Mbp, that are already represented in the current Golden Path.
  • A list of the newly mapped scaffolds can be found here or on the right.
  • The list of 144 overlapping previously unmapped scaffolds can be found here or on the right.

Identification of Overlapping Scaffold Ends

Using Exonerate and Dotter, adjacent scaffolds who's ends contain overlapping regions have been identified through visual inspection. For a stretch of sequence covered by two scaffolds, we have taken one the overlapping regions and selected it for use with our updated An. gambiae Golden Path. The overlapping sequence from the other scaffold will be still be associated with the same region on the chromosome, except that it will be listed as a haplotype region instead of part of the golden path.

Using these techniques, 18 major overlaps have been identified between scaffolds mapped to chromosome arms. Based on these overlaps, approximately 3.5Mb of overlapping sequence has been removed from the current Golden Path and reclassified as haplotype region.

  • The AGP file describing the new An. gambiae assembly can be found here.
  • The list of overlapping scaffolds and the haplotype regions can be found here and on the right.

Y Chromosome Scaffold Identification

The scaffolds containing Y chromosome-specific satellite DNA families were identified by in silico searches. Initially, each male-only scaffold was screened for the possible presence of satellite DNA using Tandem Repeat Finder software. Subsequently, the consensus sequence of each identified tandem repeat family was used as a query in BLASTN searches against a database made of scaffold sequences derived exclusively from male libraries and a database containing all scaffolds constituting An. gambiae genome. The satellite DNA queries, that returned the same number of hits in both databases, were regarded as potentially Y-linked and in each case their Y-linkage was experimentally confirmed using PCR and Southern blot techniques. All scaffolds harboring the Y-specific satellite sequences were treated as originating from the Y chromosome.

One scaffold (AAAB01008227), containing more complex sequences, was serendipitously discovered in a TBLASTN search of the unmapped scaffold set using as queries GenBank-derived sequences of sex-determining or male fertility-related proteins. Although sequence similarity of the scaffold to the query (GenBank accession no. B21124) was limited to low-complexity (microsatellite) region, implying lack of homology between the query and the subject, PCR experiments confirmed Y-linkage of that scaffold.

  • Download the list of Y chromosome scaffolds here or on the right.

Bacterial Scaffold Identification

678 unmapped scaffolds in the current An. gambiae assembly have been identified as bacterial contaminants. All unmapped scaffolds were used as query for a BLAST against NCBI's nr protein database. Based on results, a scaffold was classified as bacterial contaminant if it met the following criteria:

  • The scaffold had no match or overlap to any other An. gambiae scaffold.
  • Top hits against the scaffold were from bacterial proteins and had E-values of at least five orders of magnitude higher than to proteins from other organisms.
  • Verification of the classification criteria was performed by randomly selecting an amount of sequence equal to the total length of all the newly identified bacterial scaffolds from currently mapped scaffolds, dividing that length of sequence up into 678 smaller scaffolds, and then performing a BLAST against NCBI's nr database with those new chunks.
  • Hits were examined using the same criteria. 2 of the 678 new chunks met the criteria that would classify them as bacterial but upon individual inspection, this was due to low complexity region similarity.
  • A full list of scaffolds reclassified as bacterial contaminants can be found here or on the right.
Assembly name: 
AgamP4
Release date: 
Wednesday, April 30, 2014
Tags: 

PEST

The Anopheles gambiae PEST strain was chosen for genome sequencing because it had both a fixed, standard chromosomal arrangement and a sex-linked pink eye mutation that could readily be used as an indicator of cross-colony contamination. The pink eye mutation originated in a colony called A. gambiae LPE established in 1951 at the London School of Hygiene and Tropical Medicine from mosquitoes collected in Lagos, Nigeria.

Scaffold count: 
8
Scaffold N50 (bp): 
49 364 325
GenBank WGS Accession: 
AAAB00000000
GenBank WGS Version: 
AAAB00000000.1
Assembly status: 
Current
Genome Size (bp): 
273,109,044

AmacM1

This assembly was generated using 101 bp paired-end Illumina HiSeq2000 reads generated from two libraries: a 180 bp insert ‘fragment’ library and a 1.5 kb ‘jump’ library. Sequencing template for the fragment and jump libraries was derived from genomic DNA extracted from a single individual, which was preserved in ethanol. Native genomic DNA was used for the fragment library and whole genome amplified DNA was used for the jump library. Reads were assembled at the Broad Institute using the ALLPATHS LG algorithm, with the Haploidify option enabled to address high allelic heterozygosity in the template.

Assembly name: 
AmacM1
Release date: 
Wednesday, October 16, 2013
Tags: 

maculatus3

These individuals were collected from Kuala Lumpur and sequencing was performed on preserved females donated by Lee Han Lim. Colony not at MR4.

Scaffold count: 
47 797
Scaffold N50 (bp): 
3 841
GenBank WGS Accession: 
AXCL00000000
GenBank WGS Version: 
AXCL00000000.1
Assembly status: 
Current
Sequencing method: 
Illumina
Assembly software: 
allpaths v. R46504
Average depth of coverage: 
25.0x
Genome Analysis Of Vectorial Capacity In Major Anopheles Vectors Of Malaria Parasites
Alternative_assembly_name: 
Anop_macu_maculatus3_V1
Genome Size (bp): 
141,894,015

AmelC1

This assembly was generated using 101 bp paired-end Illumina HiSeq2000 reads generated from two libraries: a 180 bp insert ‘fragment’ library and a 1.5 kb ‘jump’ library. Sequencing template for the fragment and jump libraries was derived from genomic DNA extracted from a single individual, which was preserved by freezing at -80C. Native genomic DNA was used for the fragment library and whole genome amplified DNA was used for the jump library. Reads were assembled at the Broad Institute using the ALLPATHS LG algorithm, with the Haploidify option enabled to address high allelic heterozygosity in the template.

Assembly name: 
AmelC1
Release date: 
Wednesday, October 16, 2013
Tags: 

CM1001059_A

Originally isolated from wild individuals collected in Cameroon (2.378 North, 9.828 East, Campo, 2010), mosquitoes were donated by Carlo Costantini. The colony was not subject to isofemale selection.

Scaffold count: 
20 281
Scaffold N50 (bp): 
18 014
GenBank WGS Accession: 
AXCO00000000
GenBank WGS Version: 
AXCO00000000.1
Assembly status: 
Deprecated
Sequencing method: 
Illumina
Assembly software: 
allpaths v. R46504
Average depth of coverage: 
110x
Genome Analysis Of Vectorial Capacity In Major Anopheles Vectors Of Malaria Parasites
Alternative_assembly_name: 
Anop_mela_CM1001059_A_V1
Genome Size (bp): 
227,407,517

Pages

Subscribe to assembly