ExpressionData:MicroarrayExperiments
From VectorBase Help System
We will need your raw data files (e.g. GenePix .gpr files) and information which tells us which samples are used in which hybridizations (and with which labels). We'll also need to know more about your samples (for example which timepoint they represent).
Contents |
[edit] Sample IDs and sample annotations
Please prepare a tab-delimited text file (e.g. using Excel) with the following columns:
- Column 1 (heading 'sample_id')
- Please create an ID for each of your samples following this scheme if possible: lab_experiment_sample. Each replicate should be given a different sample ID (e.g. ending with 1,2,3...). Use extra underscore characters if you need them. Each biological replicate should have a different ID, for example: bob_mf_male1, bob_mf_male2. There are no hard and fast rules, but keep them as short as possible, since more complete information will be held elsewhere. The lab_experiment part should be the same throughout the experiment.
- Column 2 (heading 'Organism')
- Please provide the full species name in this column
- Column 3 (heading 'StrainOrLine')
- Please provide the strain name here
- Column 4 and beyond
- Sample annotations relevant to your experiment (e.g. Sex, Age, DevelopmentalStage, OrganismPart, DiseaseState, GeneticModification). If possible please follow the MGED guidelines [1], but since this is a rather complex document, we are happy for you to submit sample annotations using your own self-explanatory column headings.
Do not forget to add your control or standard reference sample(s) to this file. If certain columns do not apply to some samples, leave them empty.
[edit] Hybridization information
Note: two-colour data only at present.
Please prepare a tab-delimited text file (e.g. using Excel) with the following columns. Each line represents one hybridization.
- Column 1 (heading 'Experimental_sample')
- the sample ID (see previous section) of the experimental sample
- Column 2 (heading 'Experimental_label')
- the dye used for the experimental sample (i.e. Cy5 or Cy3)
- Column 3 (heading 'Control_sample')
- the sample ID (see previous section) of the control sample
- Column 4 (heading 'Control_label')
- the dye used for the control sample (i.e. Cy5 or Cy3)
- Column 5 (heading 'Data_file')
- the filename of the raw data file for this hybridization. If you are using Imagene software and have two files per hybridisation, please use a comma to separate the two filenames, and put them in Cy5 Cy3 order (even for dye swaps). We do not currently accept normalized or manipulated data, however if you would like spots to be filtered base on your own criteria, please provide suitable values in the 'Flags' column (see also #Spot quality below).
- Column 6 (heading 'Image_file(optional)')
- this column heading must be present (including the '(optional)', but you may leave it empty. If you send us the image files, please note that we only take TIFF files (both channels in one file). If you provide images, we will try to extract and store the spot images in BASE (as JPEGs), but we will not keep the TIFF files for ever.
Note that the experimental and control samples are always in the same order. If you have performed dye swaps, just put the dyes in the opposite order.
EXAMPLE HYBS FILE (Note that your filenames do not need to be standardized as in this example.)
[edit] Spot quality
We don't want to serve gene expression data to the community which comes from poor quality spots (low intensity, scratches, etc) so we need one of the following from you:
- A text file containing the reporter IDs (one per line) which pass your quality control filters
- Flag the poor quality spots in your raw data files (e.g. in the Flags column). We then need to know how to handle the flags (please send details by email). We currently handle simple procedures like: "reporters which have good spots in at least M out of N hybridisations".
[edit] Send your files
Large attachments may fail to reach us, so please 'zip' or 'tar' up your files into a single file and send it to r.maccallum@imperial.ac.uk using Imperial College's file exchange service.
Checklist:
- Sample annotation file
- Hybridization file
- Raw data files
- Spot quality information
- Image files (optional)
[edit] Next steps
We will create an account for you in our BASE system, bulk load the data for you and perform some simple normalization and filtering. We will then send you your account details so that you can log in and do your own analyses, re-annotate your samples, add protocol information, and so on. You will be able to make the data publicly available through VectorBase at this stage if you like, or you can proceed with the Array Express submission.

