Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
228 Kohlmann et al.<br />
those patients after a training of classification engines (9,10). The gene lists<br />
from supervised analyses can also further be interpreted in terms of biology.<br />
For all gene expression profiles, master data tables have to be maintained. In<br />
these tables, rows represent all genes for which data has been collected, and<br />
columns represent microarray experiments from individual patients. Each cell<br />
represents the measured fluorescence intensity from the corresponding target<br />
probe set on the microarray. Before analyzing the data, it is a routine procedure<br />
to normalize the data (11). This is a mandatory step in the data-mining process<br />
in order to appropriately compare the measured gene expression levels. U133<br />
set microarray signal intensity values can be normalized by scaling the raw<br />
data intensities to a common target intensity using a recommended mask file<br />
(U133A/B mask file; e.g., selected global target intensity value: 5000) (see<br />
Note 9).<br />
3.4.1. Identification of Differentially Expressed Genes<br />
In microarray experiments, a common goal is to detect genes that show differential<br />
expression across two or more biological conditions. Therefore, multiple<br />
hypothesis testing algorithms are performed on all genes simultaneously<br />
to determine whether each one is differentially expressed. The null hypothesis<br />
is that there is no change in expression levels between various leukemia subclasses.<br />
The alternative hypothesis is that there is significant differential gene<br />
expression. The analyses can be performed either between two distinct classes<br />
(pairwise comparisons; subtype A vs subtype B) or between one distinct class<br />
and all other remaining classes in a one-vs-all (OVA) approach.<br />
3.4.1.1. SIGNIFICANCE ANALYSIS OF MICROARRAYS<br />
Supervised data analyses can be performed using the significance analysis<br />
of microarrays (SAM) software. SAM is a statistical technique for finding significant<br />
genes in large-scale microarray-based gene expression profiles, and<br />
correlates gene expression data with an external variable, e.g., the leukemia<br />
subclass or karyotype information. The SAM software is an add-in package for<br />
Microsoft Excel and analyzes statistical significance of the changes in gene<br />
expression from repeated permutations. It was proposed by Tusher and colleagues<br />
(12). SAM identifies genes with statistically significant changes in<br />
expression by assimilating a set of gene-specific t-tests. Each gene is assigned<br />
a score on the basis of its change in gene expression relative to the standard<br />
deviation of repeated measurements for that gene. Genes with scores greater<br />
than an adjustable threshold are deemed potentially significant. The cutoff for<br />
significance is determined by the tuning parameter delta, chosen by the user<br />
based on the false discovery rate (FDR). The FDR, i.e., the percentage of genes<br />
identified by chance, is estimated by analyzing repeated permutations of<br />
the data.