Generic Feature Format Version 3 (GFF3) file format


GFF3 files are nine-column, tab-delimited, plain text files.

  • Column 1: sequence ID
  • Column 2: source, e.g., VectorBase, Ensembl, Genbank.
  • Column 3: type of feature
  • Columns 4 & 5: start and end coordinates of the feature
  • Column 6: in score, E-values are used for sequence similarity features and P-values for ab initio gene prediction features.
  • Column 7: The strand of the feature. + for positive strand, - for minus strand, and . for features that are not stranded. In addition, ? can be used for features whose strandedness is relevant, but unknown.
  • Column 8: For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon.
  • Column 9: A list of feature attributes in the format tag=value. Multiple tag=value pairs are separated by semicolons.

You can download GFF3 files from VectorBase or you can create them de novo. For the later case you have three options:

  • Manually with a text editor that does tabs, e.g., NotePad++ or TextWrangler.
  • With Excel, save the file as “Tab Delimited Text (.txt)” (shown below).
  • Write a script, e.g., with Perl or Python.

For more information about the GFF3 file format go to:

This page includes an online GFF3 validator and a detailed description of each column of this file format.