GESOL

GESOL is the Gene Expression Simple Object Layer that simplifies the way we interact with the BASE expression data management system/database. It's a Java API layer on top of the BASE API, and includes some analytical routines for generating means, confidence intervals and other statistics shown on the website and provided in the expression data download files. We will be adding it to github as soon as possible, but if you would like to see the code, just send us a message.

Overview

GESOL enables you (for example as a web application developer) to work with microarray expression data at a much simplified level. A typical MIAME-compliant object model would look something like this:

BioSource→Sample→Extract→Labelled Extract→Hybridisation→Scan→RawBioassay

whereas GESOL uses a much reduced model, replacing the above with

Sample→Hybridisation 

There are also objects for Microarrays, Reporters and Experiments, and that is it. Here is the full GESOL object model:

The API also provides some on-the-fly statistical analysis of expression data, and produces objects that are simple to render in JSP.

The current API includes adaptor code to retrieve data from an underlying BASE instance. However, it should be possible to write your own adaptors to fit other databases.

Prerequisites

  • Java 1.6 and Apache Ant

If you don't write your own adaptor code for the API then you will need the following.

  • A recent version (≥2.5) of BASE installed correctly on your machine.
  • At least one experiment fully loaded into BASE, from BioSource, Sample, Extracts etc right up to Hybridisations, Scans, RawBioassays and Experiment. You should also have Reporters and Array Designs all loaded and linked as appropriate.
  • BioSources should be annotated with at least the pertinent information (e.g. the experimental factor(s)).
  • The experiment should have at least one BioAssaySet (e.g. analysis) performed.
  • Create a new Annotation Type specific for GESOL called "Statistical Test" which can be applied to items of type Bioassayset. For details log into VectorBase's BASE server as guest and view the annotation type (under Administrate->Types->Annotation Types). It needs to be the "String" type and have the same three possible values: "ANOVA", "t-test" and "Neighbour t-test"
  • Annotate the most recently created Bioassayset (last analysis step) with the relevant "Statistical Test".
  • Assign the relevant Annotation Type(s) to the Experiment (if you haven't already)
  • Make sure one BASE user (ideally a low privilege user) has access to all items in the experiment (easiest if you create a Project first and all items will go into that by default). Also make sure that the Array Design, Annotation Types and reporters are readable by that user. This is the user which will "log in" through GESOL.

Two channel data

Because BASE doesn't know which sample is "channel 1" or "experimental", we need to tell it in a roundabout way: for each Hybridisation in BASE, create an inherited annotation to the channel 1/experimental BioSource.

Compound Annotation Types

If you want to perform the GESOL on-the-fly statistics with two annotation types at the same time (e.g. DevelopmentalStage and Age), create a new annotation type called "DevelopmentalStage:Age". Do not annotate any items with it. We don't even allow anyone to do that. Just assign this to the experiment's Experimental Factor.

Limitations

GESOL assumes a simple linear path from BioSource to Labelled Extract. Multi-step pooling designs are not handled correctly.

GESOL uses BASE item names extensively. They should really be unique.

Installation

Unpack the tar file (note, a directory named 'gesol' will be created).

Edit build.xml to point to your base2 installation (replace /usr/local/base2dev with something suitable).

Run "ant compile" just to check that everything is in order before you try to write any code against it.

Usage

Assuming that your low-level database is BASE. You should be able to write some code like this:


package GesolTest;
import java.util.*;
import java.io.*;
import org.vectorbase.funcgen.gesol.*;
import org.vectorbase.funcgen.gesol.analysis.*;
import org.vectorbase.funcgen.gesol.DB.*;
import org.vectorbase.funcgen.gesol.DB.base2.*;

public class GesolTest {
  
  public static void main(String[] args) {
    // create your connection to BASE
    Base2DBAdaptor b2dba = new Base2DBAdaptor("gesoltest", "username", "password", "localhost");
    // get an adaptor for retrieving Reporter objects
    ReporterAdaptor ra = b2dba.getReporterAdaptor();
    
    // get a reporter
    String reporterId = "ABC12345";
    Reporter reporter = ra.fetchByReporterName(reporterId);
    
    // get the experiments for this reporter (not used below - just an example)
    List<Experiment> experiments = reporter.getExperiments();
    
    // get a statistical summary of all experiments with data for this reporter
    ReporterExpressionSummary res = new ReporterExpressionSummary(reporter);
    
    // print out the best (by p-value) statistic per experiment
    for (ExperimentSummary exptsum : res.getExperimentSummaries()) {
      ExperimentStatistic stat = exptsum.getBestExperimentStatistic();
      System.out.println("Reporter " + reporter.getName() + " in experiment " +
                         exptSum.getExperiment().getName() + " has best p-value " + stat.getPvalue());
      // could also print stat.getTextSummary();
    }
  }
}


The authors will be happy to provide other example code demonstrating the API.

Note that per-gene summaries require extra code to handle the reporter↔gene relationships. You can browse the various VB-prefixed classes to see how we have handled this (with an extra MySQL table containing the information). Once you have determined which reporter(s) correspond to your gene, you can produce an averaged-over-reporters statistical summary with GESOL quite simply.

If you use the VBSpotDataAdaptor, you can specify which Bioassayset is the "final" one by adding a "VectorBase Final" annotation to it. (The default Base2SpotDataAdaptor retrieves the most recently produced Bioassayset.)

Authors

  • Seth Redmond
  • Bob MacCallum

for VectorBase at Imperial College London.