Anopheles stephensi released

Anopheles stephensi Liston is a major malaria vector with a geographical range from the Middle East through the Indian subcontinent and China. Throughout its natural range, Anopheles stephensi is an important vector for both Plasmodium falciparum and Plasmodium vivax. Further, Anopheles stephensi can transmit Plasmodium knowlesi, a virulent parasite that is vectored from non-human primates to humans. The strain used for this genome sequencing project is the Indian Wild Type strain originally established at the Walter Reed Army Institute of Research. It belongs to the ''type'' biological form and has a segregating 2Rb inversion.

The genome assembly presented here is preliminary. It represents a whole genome shotgun assembly of Roche 454 sequences generated in the laboratories of Zhijian Tu and Igor Sharakhov at Virginia Tech, USA. Dr. Yogesh Shouche at the National Centre for Cell Science, India, is also involved in this collaborative effort. The assembly totals 158Mb of sequence in 33,024 contigs and 6,150 scaffolds. As this version of the assembly is being annotated by VectorBase, an improved assembly was obtained at Virginia Tech and this has been submitted it to GenBank. Future annotation refinement will be made using the new assembly.

Annotation of the Anopheles stephensi assembly was carried out by VectorBase using MAKER informed by protein similarities and 18K ESTs and both Illumina and Roche 454 RNA-seq transcriptomics data. The resulting gene set is preliminary and will be refined in the future.

Use of the data
As a public service to the biological research community, these data are being made available by the sequence producers before scientific publication. We elect to follow the NHGRI policy for Release and Database Deposition of Sequence Data (link): ''The producing laboratories intend to publish the sequence of the genome and certain large-scale analyses of the sequence in a timely manner. The sole exception to the unrestricted use of these unpublished data is that the data may not be used for the initial publication of the complete genome sequence assembly or other large-scale analyses. In this context, 'large-scale' refers to regions the size of the whole genome or individual chromosomes and examples of 'large-scale analyses' include identification of regions of evolutionary conservation across an entire genome and identification of complete sets of genomic features such as genes, repeat structures, GC content, etc. The producing laboratories will, however, be open to the possibility of collaboration on such assemblies or analyses.'' Any redistribution of the data should carry this notice.

The genome, transcripts and proteins sequences are now available from our VectorBase downloads page.