How can I retrieve Interpro domains for a given species?


You can retrieve the InterPro data using the BioMart tool.

For example, to retrieve all the InterPro domains that have been annotated on Anopheles gambiae transcripts:

  1. In BioMart, select the VectorBase Genes database, then the Anopheles gambiae dataset
  2. Select the Attributes link in the left menu, then select the following from the Protein section:
    • InterPro ID
    • InterPro short description
    • InterPro description
  3. By default, the gene and transcript stable IDs are pre-selected for inclusion in the results; if desired these can be deselected and/or additional attributes can be selected
  4. Click the Results button to see the first 10 results
  5. The full set of results can be exported in a variety of formats; use the Unique results only option to remove redundant rows

If you have particular InterPro domains of interest, these can be used to filter the BioMart results. Note, however, that only InterPro IDs can be used in the filter, not descriptions. If you want to filter based on descriptions, you could either retrieve a list of associated IDs from InterPro; or use the above instructions to retrieve all annotations, then extract those records that match the description (or keyword) of interest.

To filter with a set of InterPro IDs, the procedure is the same as described above, with the following step between steps 1 and 2:

  • Select the Filters link in the left menu, then in the Protein Domains section:
    • Select the Limit to genes with these family or domain IDs option
    • Select InterPro ID(s) from the adjacent menu
    • Enter the InterPro IDs in the text box, or upload a file of IDs