ExpressionData:MicroarrayDesign
From VectorBase Help System
Contents |
[edit] Reporter IDs
Please read this important information regarding reporter IDs.
[edit] Sequence information for genome mapping
If your array design is not currently mapped to the genome, we will need to know which biological sequences are spotted/printed on your array. We will then take these sequences and map them to the latest genome assembly and gene build. With your permission, these will become an integral part of the next available VectorBase genome release (allowing display on contigviews, geneviews etc). Due to the timescales involved, it is important to get the sequence information to us as soon as possible.
Please do not create the files by hand (typing or cut and paste) unless it is absolutely the only option available. Not only does this waste your valuable time, but it introduces inconsistencies which could lead to lost or corrupted data. Please contact us if you need help.
We currently handle cDNA arrays, oligo arrays and PCR primer pair arrays.
[edit] cDNA arrays
Please send us a FASTA file with the following format:
>ARRAYNAME::REPORTERID_GENBANKACC GCTCTGTACGTCGTCAGCTCAACTCTGCCGATCGGCCGTATCGCGCAGAACTACGTCATCGCCACCACGACGCGCGTCAA AGCTGCAGCGACGTACATTGCTTCAGCGTGTGCGACGAGCTGGATCGGTCGGAAGGAAACGTTGCTGCGCTCGCTGCTGG >ARRAYNAME::REPORTERID_GENBANKACC GTCGACCCACGCGTCCGACGTGTATTGTGTGTTTGTGAGGTTCACGTGTGGTGCAGTGATCAAGTTCAGCGCAAAGAGTG GCCGCATAGGTTCCCGGTCGTGAGCAAGCTTCCGGTAAAGGCAGTCAGCAGTTTGGCAAAACTTTCTACACACAATAACG
Note the two colons and the underscore between the Reporter ID and the GenBank accession. Sequences can be on one long line or split into several.
- ARRAYNAME
- Full/official name of array - to appear as genome browser track names, for example. Please use the ArrayExpress naming scheme: Lab_Species_ArrayName_Size_Version (e.g. EMBL_A.gambiae_MMC1_20k_v1.0)
- REPORTERID
- This is the reporter ID, usually the clone identifier (e.g. 4A3A-AAY-C-07) assuming that you are spotting clone-derived sequence onto the array. See ExpressionData:ReporterIDs for further information.
- GENBANKACC
- This is the GenBank accession code of the sequence (usually an EST) sequenced for the reporter (usually a clone). If you don't have a GenBank accession code, please use some other unique identifier. Note that if you have multiple sequences per reporter (e.g. multiple EST reads per clone), you will need multiple separate sequence entries in the FASTA file for these reporters (i.e. each with their own '>' line). We will work out the best combined mapping of all available ESTs for each clone.
Next step: #Array layout
[edit] Oligo arrays
Please send us a FASTA file with the following format:
>ARRAYNAME:PROBESETID:PROBEID AACATCCGGCTCCTGCTTCACCGGCAGCAGCACGTTCTCCCCGTCCTCCGTATCCTCGCGTTTAACGAGAGGCTGCGGGA >ARRAYNAME:PROBESETID:PROBEID AGTGAAGATGCCTGGAATCACCGTAAAGGACGTTGATCAAGACAAAGTGGTTGAGGGTGTTGCCCTTTTCCTCAAGAAGT
- ARRAYNAME
- Full/official name of array - to appear as genome browser track names, for example. Please use the ArrayExpress naming scheme: Lab_Species_ArrayName_Size_Version (e.g. EMBL_A.gambiae_MMC1_20k_v1.0)
- PROBESETID
- An identifier for the probe set/reporter set (see below if you do not have probe sets). As with reporter IDs, please avoid using gene names and ids for the probe set identifier.
- PROBEID
- This is just a synonym for reporter ID.
If you have not used a reporter/probe set design, please leave this part empty and use two colons:
>ARRAYNAME::PROBEID AACATCCGGCTCCTGCTTCACCGGCAGCAGCACGTTCTCCCCGTCCTCCGTATCCTCGCGTTTAACGAGAGGCTGCGGGA >ARRAYNAME::PROBEID AGTGAAGATGCCTGGAATCACCGTAAAGGACGTTGATCAAGACAAAGTGGTTGAGGGTGTTGCCCTTTTCCTCAAGAAGT
Next step: #Array layout
[edit] PCR primer pairs
Where you have designed PCR primers to amplify certain genomic or gene regions, but have not sequenced the amplicons, please prepare an STS file the format described below.
--to be completed--
Next step: #Array layout
[edit] Array layout
At present we prefer to use .gal files (GenePix ArrayList [1]). We understand that these files can be exported from some non-GenePix software - but please let us know if you cannot produce this type of file. We may also be able to use an ADF file, see the ArrayExpress submission page for more on ADF files.
You will need to submit the array design/layout information for each variant/version of your array.
Please see the section below regarding reporter type annotations, as you may want to add them to the GAL file.
[edit] Reporter type annotation
The downstream analysis tools and ArrayExpress submission protocol require that the experimental roles of the features on your array are described. That is to say: which spots are experimental, which are controls, and what type of controls they are.
Please classify the reporters on your array using the four headings in bold below (see [2] for more information). Provide this information in a tab-delimited file or preferably as extra columns in the .gal file you submit (example .gal file).
- Reporter BioSequence Type as one of
- cDNA_clone
- ds_oligo
- genomic_DNA
- PCR_amplicon
- ss_oligo
- Reporter BioSequence Polymer Type as one of
- DNA
- RNA
- Protein
- Reporter Group [role] as one of
- Experimental
- Control
- Reporter Control Type as one of
- control_biosequence
- control_buffer
- control_empty
- control_genomic_DNA
- control_label
- control_reporter_size
- control_spike_calibration
- control_unknown_type
[edit] Array name
Please let us know the official name of your array (following the ArrayExpress format if possible: Lab_Species_ArrayName_Size_Version).
[edit] Array versions
In some circumstances we will assign/increment version numbers on the basis of submission date (rather than the true age/version of the array design). Therefore please only assign "version 1" if you are sure no older designs exist.
Next step: Experimental data

