How are genes projected between assembly versions?

Answer: 

The projection is based on an alignment of the assembly versions, using ATAC, which provides a mapping between regions of the old and new assemblies. This mapping is used to project each exon separately; these are then combined to produce projected transcripts, which in turn generate projected genes.

Sometimes, UTRs are truncated when projected to the new assembly; in these cases, only the CDS regions are projected, and the projected transcript has no UTRs (since it is not necessarily the case that a truncated UTR will be valid).

Projection can fail by:

  • Generating a translation with internal stop codons. This is most likely due to a nucleotide change(s) in the underlying assembly.
  • Mapping from one scaffold in the old assembly to multiple scaffolds (or strands) in the new assembly.
  • Mapping partially (with either truncated CDS regions or missing exons).
  • Not mapping at all.

In many cases there are good reasons for a transcript failing to project, but some transcripts with good evidence can be lost; our automated procedures try to minimise this latter set, but it is inevitable that some will need a small amount of manual correction. To facilitate this, we calculate statistics to show the quality and quantity of evidence for unprojected transcripts. Further, transcripts which we consider to have good evidence in the old assembly are documented.

For unprojected transcripts that map at least partially, the mappings are available in GFF3 format, and are also presented as a track in WebApollo. If transcripts map to multiple scaffolds (or strands), the GFF file has separate genes for each scaffold, using the original ID with a numeric suffix. For all unprojected transcripts there are a range of FASTA files that have the original transcript's sequence, for BLASTing or otherwise searching on the new assembly. For completeness, we provide the GFF3 and FASTA files for projected transcripts as well, but these are probably not as useful as those for unprojected transcripts.

All of the files associated with projection are available from the Downloads section.