a Whole Genome Array Approach - Jacobs University
a Whole Genome Array Approach - Jacobs University
a Whole Genome Array Approach - Jacobs University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Background<br />
1.6 Microarray data analysis<br />
Data analysis is an essential process in DNA microarray experiments, since these experiments<br />
normally result in a large amount of information. These data must be adequately processed to<br />
find statistically significant correlations – e.g. co regulation of genes - within and between<br />
different arrays (FIG 4).<br />
First, the hybridisation signal intensities must be filtered and normalised. This transformation<br />
is done to minimise the bias arising from unequal quantities of starting RNA, differences in<br />
labelling or detection efficiencies of the fluorescent dyes applied, and other systematic biases<br />
(Quackenbush 2002). For an overview on different normalisation methods see Foster and<br />
Ghazal 2003(Foster and Ghazal 2003). In the next step, data mining techniques are required to<br />
answer the biological question behind the experiment. Normally, microarray experiments are<br />
conducted to identify genes which are either under- or over-expressed after a shift in the<br />
experimental conditions. For example, we might be interested in genes that have an elevated<br />
expression because of a drug treatment. Such genes are most easily found by simple filtering.<br />
If the log-transformed data (method) is used for filtering, differentially expressed genes are<br />
inferred by a fixed threshold cut off method (i.e. a two-fold increase or decrease). Filtering by<br />
absolute expression change can even be used for experiments, where there are no replicates.<br />
However, there are also ranking-methods available [t-test (Pan 2002), ANOVA (Kerr et al.<br />
2000), Bayesian method or Mann-Whitney test]. All these methods produce errors (falsepositive<br />
and false-negatives), therefore differential gene expression is usually confirmed by<br />
RT-PCR or northern blots (Leung and Cavalieri 2003). In case of interest for co-regulation of<br />
genes (or related arrays), various cluster techniques should be considered. The basic concept<br />
in clustering is to try to identify and group together similarly expressed genes and to correlate<br />
the observations to biology. The idea is that co-regulated and functionally related genes are<br />
grouped into clusters. Some often used grouping techniques are hierarchical clustering (Eisen<br />
et al. 1998), k-means clustering (Soukas et al. 2000), self-organising maps (SOMs) (Kohonen<br />
1992) and principal component analysis (PCA) (Raychaudhuri 2000) (Methods reviews<br />
Quackenbush 2002; Gollub and Sherlock 2006). There is no clustering method that can be<br />
applied for all kinds of experiments. Different cluster methods used on the same data set can<br />
reveal unique aspects of the data (Leung and Cavalieri 2003). It is therefore advisable to<br />
analyse the data using several methods rather than just one (Leung 2002).<br />
14