Contigs and singletons in the Ixodes assembly



The contigs file at VectorBase contains just the 570,637 contigs that make up the official annotated genome assembly (IscaW1). All of these are represented in the 369,492 supercontigs (note that many of these 'supercontigs' are just a single contig).

The file that describes how the contigs make up the supercontigs is also downloadable here

There are 580,638 additional contigs deposited at GenBank, unfortunately not tagged differently, to give a total of 1,141,595. The ones not used in IscaW1 are all short and are considered to "represent degenerate contigs". We don't have these at VectorBase.


The singletons file available from VectorBase consists of trace reads that could not be combined into contigs. We make it available because it represents a large proportion (38%) of the total trace reads (unusually large for a WGS sequencing project). It isn't entirely clear why such a large proportion of the reads were left as singletons. A high level of polymorphism within the sequenced population of ticks is suspected to be a large part of the problem.

The advice given by the JCVI folk who did the assembly is that there may be interesting sequence, different to the sequence in IscaW1, in the singletons; but you are unlikely to find anything interesting in the short degenerate contigs omitted from IscaW1.

So if you want can't find your gene of interest in the IscaW1 assembly, you may want to download and search the singletons to see if there is anything extra there. To do this, you will need plenty of storage space, and a local installation of BLAST or another search program.

Link to downloads page

Alternatively you can search the entire raw output of the WGS project at VectorBase, by searching the traces with BLAST:

WIKEL Strain, June 2007 Trace Reads

Link to BLAST page

These are the 19.3 million unassembled sequencing reads. For sequences that are present once in the IscaW1 assembly, you will expect to find several hits in the traces (sequences in the assembly are covered by an average of about 4 trace reads).