Microarray Design


Contents



Reporter IDs

Please read this important information regarding reporter IDs.


Sequence information for genome mapping

If your array design is not currently mapped to the genome, we will need to know which biological sequences are spotted/printed on your array. We will then take these sequences and map them to the latest genome assembly and gene build. With your permission, these will become an integral part of the next available VectorBase genome release (allowing display on contigviews, geneviews etc). Due to the timescales involved, it is important to get the sequence information to us as soon as possible.

Please do not create the files by hand (typing or cut and paste) unless it is absolutely the only option available. Not only does this waste your valuable time, but it introduces inconsistencies which could lead to lost or corrupted data. Please contact us if you need help.

We currently handle cDNA arrays, oligo arrays and PCR primer pair arrays.


cDNA arrays

Please send us a FASTA file with the following format:

 >ARRAYNAME::REPORTERID_GENBANKACC
 GCTCTGTACGTCGTCAGCTCAACTCTGCCGATCGGCCGTATCGCGCAGAACTACGTCATCGCCACCACGACGCGCGTCAA
 AGCTGCAGCGACGTACATTGCTTCAGCGTGTGCGACGAGCTGGATCGGTCGGAAGGAAACGTTGCTGCGCTCGCTGCTGG
 >ARRAYNAME::REPORTERID_GENBANKACC
 GTCGACCCACGCGTCCGACGTGTATTGTGTGTTTGTGAGGTTCACGTGTGGTGCAGTGATCAAGTTCAGCGCAAAGAGTG
 GCCGCATAGGTTCCCGGTCGTGAGCAAGCTTCCGGTAAAGGCAGTCAGCAGTTTGGCAAAACTTTCTACACACAATAACG

Note the two colons and the underscore between the Reporter ID and the GenBank accession. Sequences can be on one long line or split into several.

ARRAYNAME 
Full/official name of array - to appear as genome browser track names, for example. Please use the ArrayExpress naming scheme: Lab_Species_ArrayName_Size_Version (e.g. EMBL_A.gambiae_MMC1_20k_v1.0)
REPORTERID 
This is the reporter ID, usually the clone identifier (e.g. 4A3A-AAY-C-07) assuming that you are spotting clone-derived sequence onto the array. See ExpressionData:ReporterIDs for further information.
GENBANKACC 
This is the GenBank accession code of the sequence (usually an EST) sequenced for the reporter (usually a clone). If you don't have a GenBank accession code, please use some other unique identifier. Note that if you have multiple sequences per reporter (e.g. multiple EST reads per clone), you will need multiple separate sequence entries in the FASTA file for these reporters (i.e. each with their own '>' line). We will work out the best combined mapping of all available ESTs for each clone.

Next step: #Array layout


Oligo arrays

Please send us a FASTA file with the following format:

 >ARRAYNAME:PROBESETID:PROBEID
 AACATCCGGCTCCTGCTTCACCGGCAGCAGCACGTTCTCCCCGTCCTCCGTATCCTCGCGTTTAACGAGAGGCTGCGGGA
 >ARRAYNAME:PROBESETID:PROBEID
 AGTGAAGATGCCTGGAATCACCGTAAAGGACGTTGATCAAGACAAAGTGGTTGAGGGTGTTGCCCTTTTCCTCAAGAAGT
ARRAYNAME 
Full/official name of array - to appear as genome browser track names, for example. Please use the ArrayExpress naming scheme: Lab_Species_ArrayName_Size_Version (e.g. EMBL_A.gambiae_MMC1_20k_v1.0)
PROBESETID 
An identifier for the probe set/reporter set (see below if you do not have probe sets). As with reporter IDs, please avoid using gene names and ids for the probe set identifier.
PROBEID 
This is just a synonym for reporter ID.

If you have not used a reporter/probe set design, please leave this part empty and use two colons:

 >ARRAYNAME::PROBEID
 AACATCCGGCTCCTGCTTCACCGGCAGCAGCACGTTCTCCCCGTCCTCCGTATCCTCGCGTTTAACGAGAGGCTGCGGGA
 >ARRAYNAME::PROBEID
 AGTGAAGATGCCTGGAATCACCGTAAAGGACGTTGATCAAGACAAAGTGGTTGAGGGTGTTGCCCTTTTCCTCAAGAAGT

Next step: #Array layout


PCR primer pairs

Where you have designed PCR primers to amplify certain genomic or gene regions, but have not sequenced the amplicons, please prepare an STS file the format described below.

 --to be completed--

Next step: #Array layout


Array layout

At present we prefer to use .gal files (GenePix ArrayList [1]), but we can usually use normal datafiles also.

You will need to submit the array design/layout information for each variant/version of your array.

Please see the section below regarding reporter type annotations, as you may want to add them to the GAL file.


Reporter type annotation

The downstream analysis tools and ArrayExpress submission protocol require that the experimental roles of the features on your array are described. That is to say: which spots are experimental, which are controls, and what type of controls they are.

Please classify the reporters on your array using the four headings in bold below (see [2] for more information). Provide this information in a tab-delimited file or preferably as extra columns in the .gal file you submit.

  • Reporter BioSequence Type as one of
    • cDNA_clone
    • ds_oligo
    • genomic_DNA
    • PCR_amplicon
    • ss_oligo
  • Reporter BioSequence Polymer Type as one of
    • DNA
    • RNA
    • Protein
  • Reporter Group [role] as one of
    • Experimental
    • Control
  • Reporter Control Type as one of
    • control_biosequence
    • control_buffer
    • control_empty
    • control_genomic_DNA
    • control_label
    • control_reporter_size
    • control_spike_calibration
    • control_unknown_type


Array name

Please let us know the official name of your array (following the ArrayExpress format if possible: Lab_Species_ArrayName_Size_Version).


Array versions

In some circumstances we will assign/increment version numbers on the basis of submission date (rather than the true age/version of the array design). Therefore please only assign "version 1" if you are sure no older designs exist.