Microarray experiments

We will need your raw data files (e.g. GenePix .gpr and Affymetrix .cel files) and information which tells us which samples are used in which hybridizations (and with which labels). We'll also need to know more about your samples (for example which timepoint they represent).


Sample IDs and sample annotations

The file format described below is a simplified subset (Biomaterials only) of the MAGE-TAB SDRF file format. If you have already generated SDRF, or wish to generate it, you may do so. This will speed up your ArrayExpress submission slightly.

Please prepare a tab-delimited text file with the following columns:

  1. Column 1 (heading 'Sample ID')
    Please create an ID for each of your samples following this scheme if possible: lab_experiment_sample. Each replicate should be given a different sample ID (e.g. ending with 1,2,3...). Use extra underscore characters if you need them. Each biological replicate should have a different ID, for example: bob_mf_male1, bob_mf_male2. There are no hard and fast rules, but keep them as short as possible, since more complete information will be held elsewhere. The lab_experiment part should be the same throughout the experiment.
    Column 2 (heading 'Material Type')
    e.g. whole_organism or organism_part (see MGED notes)
    Column 3 (heading 'Characteristics[Organism]')
    Please provide the Latin species name here
    Column 4 (heading 'Characteristics[StrainOrLine]')
    Please provide the strain name here
    Column 5...
    More 'Characteristics[xxx]' columns relevant to your experiment (e.g. Sex, Age, DevelopmentalStage, OrganismPart, DiseaseState, GeneticModification, GrowthCondition, Compound, Dose). If possible please follow the MGED guidelines [1], but since this is a rather complex document, we are happy for you to submit sample annotations using your own self-explanatory column headings.
    Last column(s) (heading 'Factor Value[xxx]')
    Where xxx is the factor you are varying in the experiment, e.g. DevelopmentalStage for a developmental study. The contents of the column can be an exact copy of the existing Characteristics column. More than one Factor Value column is needed for multi-factor experiments.

Do not forget to add your control or standard reference sample(s) to this file. If certain columns do not apply to some samples, leave them empty.


Hybridization information

Please prepare a tab-delimited text file with the
following columns. Each line represents one hybridization.

  1. Column 1 (heading 'Experimental sample')
    the sample ID (see previous section) of the experimental sample
    Column 2 (heading 'Experimental label')
    the dye used for the experimental sample (i.e. Cy5 or Cy3)
    Column 3 (heading 'Control sample')
    the sample ID (see previous section) of the control sample. Optional for single channel data.
    Column 4 (heading 'Control label')
    the dye used for the control sample (i.e. Cy5 or Cy3). Optional for single channel data.
    Column 5 (heading 'Data file')
    the filename of the raw data file for this hybridization. If you are using Imagene software and have two files per hybridisation, please use a comma to separate the two filenames, and put them in Cy5 Cy3 order (even for dye swaps). We do not currently accept normalized or manipulated data, however if you would like spots to be filtered base on your own criteria, please provide suitable values in the 'Flags' column (see also #Spot quality below).
    Optional column (heading 'Normalized file')
    the filename of the per-hyb normalised data file. It should follow the MAGE-TAB format guidelines for normalized data files.
    Optional column (heading 'Data matrix file')
    the filename of the (usually) per-experiment file containing one row per reporter and one or more columns per condition (e.g. with final values such as fold-changes and p-values). It should follow the MAGE-TAB format guidelines for data matrices.
    Optional column (heading 'Image file')
    We do not currently promise to load spot images into the database, but you can send them anyway if you like.

Note that in order for us to make a fully MIAME compliant ArrayExpress submission for you, you must submit either Normalized files or a Data matrix file (or both).

Note that the experimental and control samples are always in the same order. If you have performed dye swaps, just put the dyes in the opposite order.


Spot quality

We don't want to serve gene expression data to the community which comes from poor quality spots (low intensity, scratches, etc) so we need one of the following from you:

  1. A text file containing the reporter IDs (one per line) which pass your quality control filters
  2. Flag the poor quality spots in your raw data files (e.g. in the Flags column). We then need to know how to handle the flags (please send details by email). We currently handle simple procedures like: "reporters which have good spots in at least M out of N hybridisations".

Send your files

Large attachments may fail to reach us, so please 'zip' or 'tar' up your files into a single file and send it to r.maccallum@imperial.ac.uk using Imperial College's file exchange service.


  1. Sample annotation file
  2. Hybridization file
  3. Raw data files
  4. Normalized files and/or data matrix file (optional)
  5. Spot quality information
  6. Image files (optional)

Next steps

We will perform the standard VectorBase normalisation and analysis steps on your data, and prepare a preview website for you to check that we are representing your data correctly. When your data is published in a journal, we then make your data public on VectorBase. We can also assist with ArrayExpress submission.