Rules and conventions for naming molluscan genes


Paraphrasing from the Zebrafish Book, Chapter 7, “It is very important that all members of the community adopt one set of conventions in order to minimize confusion and maximize the usefulness of the nomenclature and the ease with which everyone can follow the field”.

We strongly encourage the Biomphalaria glabrata community to follow Bayne 2012 rules (an abbreviated version shown below). These rules establish a clear process and a rational naming system for molluscan genes. These recommendations are also intended to simplify automatic computer searches and other such analyses inside and outside VectorBase. This Table provides examples.

Table 1. Examples illustrating the application of the proposed convention.

Gene name

(VectorBase description)


(VectorBase name or symbol)


(VectorBase transcripts or splice variants)

Mutant Protein name

actin act actA actA1 Actin
guanosine binding protein gnb gnbA gnbA1 Guanosine Binding Protein
superoxide dismutase 1 sod1 sod1A sod1A1 Superoxide Dismutase 1
glucose-6-phosphate dehydrogenase g6pdh g6pdhA g6pdhA1 Glucose-6-phosphate Dehydrogenase

The name should reflect the (putative) function of the product, if known or suspected (for example based on sequence similarity to a gene that has been identified with certainty and whose function(s) is/are known in another organism).

An abbreviation should be provided, with an optimum of three letters and a maximum of 6. The letters should be all lower case.

These should be the gene name followed sequentially by italicized A, B, C, D etc., in capitals and placed with no space or hyphen immediately after the gene name.

Mutants will be indicated by italicized superscript numeral(s) following the gene/allele name. Until consensus can be reached, it is left to the discretion of the individual scientist to decide if a variant with one or more non-synonymous substitutions is to be given a unique allele designation without mutant indicated, or be considered a mutant of an already named allele.

The name is the same as the full name of the gene, except it is not italicized, and the first letter is capitalized. In cases where a single name is used for several homologous proteins with the same functions (and encoded at distinct loci), the genes may be numbered in accordance with the protein nomenclature (e.g. Superoxide dismutase 1, Superoxide dismutase 2 and Superoxide dismutase 3).

*If in doubt please consult the complete/original document from:
Bayne CJ. 2012. A convention for naming molluscan genes. In: Current Topics in Genetics, vol. 5. Trivandrum, India: Research Trends. p. 45-48.

How do we name genes, transcripts and proteins in VectorBase?

We use identifiers or IDs. All VectorBase identifiers begin with 4 letters. The same numeric value is used for a gene and all its products (transcripts and proteins). Alternate products of a single gene have suffixes ending A, B, C etc. For example, for Anopheles gambiae PEST it is AGAP + 6 digits for a gene, this is called the gene identifier or gene ID. The gene ID plus -RA is used for transcripts and is called the transcript ID. The gene ID plus –PA is used for proteins and is called the protein ID. Alternative transcripts and proteins will be –RB and –PB, etc. For Biomphalaria glabrata BB02 the gene ID is BGLTMP.

Follow these links for examples of genes with and without metadata (gene description and name/symbol):