Retrieving genes with the same domain


> I am trying to retrieve the carboxylesterases (COEs) for Anopheles gambiae dataset but am having some problems. Previous research has identified roughly 50 COEs however I have only been able to retrieve 12. For example, there is an accession for COEAE1A but not for COEAE2A, which was used in the A. gambiae detox chip (David et al. 2005).

One reason why you are not finding the expected number of genes is that the many of the genes in the current VectorBase gene set do not have gene symbols assigned to them yet. This should improve with every gene build update and with contributions from the community via the VectorBase Apollo instances and other community annotation submissions

It looks like not all the COEs are on the detoxchip, so your best bet is probably to retrieve the genes with the relevant InterPro domain. InterPro domains are assigned automatically by sequence profile matching (with HMMs). Go to InterPro database, and type carboxylesterases in the search box. The entry you are interested in this case is IPR002018.;id=IPR002018

A quick count gives exactly 50! You also get 50 genes with BioMart, which would then let you download sequences, promoters, or whatever info you need.

You may also find other genes with the same domain from the page of your gene of interest. For example, using AGAP005370 go to "Transcript ID" or "Protein ID" link (first table in the page). Either will give you access to the "Protein Information" menu (left hand navigation bar). From there click on the "Domains and features" item. In the new page the first table lists proteins domains and has the "display all genes with this domain" link.