18.12.2012 Views

Myeloid Leukemia

Myeloid Leukemia

Myeloid Leukemia

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

228 Kohlmann et al.<br />

those patients after a training of classification engines (9,10). The gene lists<br />

from supervised analyses can also further be interpreted in terms of biology.<br />

For all gene expression profiles, master data tables have to be maintained. In<br />

these tables, rows represent all genes for which data has been collected, and<br />

columns represent microarray experiments from individual patients. Each cell<br />

represents the measured fluorescence intensity from the corresponding target<br />

probe set on the microarray. Before analyzing the data, it is a routine procedure<br />

to normalize the data (11). This is a mandatory step in the data-mining process<br />

in order to appropriately compare the measured gene expression levels. U133<br />

set microarray signal intensity values can be normalized by scaling the raw<br />

data intensities to a common target intensity using a recommended mask file<br />

(U133A/B mask file; e.g., selected global target intensity value: 5000) (see<br />

Note 9).<br />

3.4.1. Identification of Differentially Expressed Genes<br />

In microarray experiments, a common goal is to detect genes that show differential<br />

expression across two or more biological conditions. Therefore, multiple<br />

hypothesis testing algorithms are performed on all genes simultaneously<br />

to determine whether each one is differentially expressed. The null hypothesis<br />

is that there is no change in expression levels between various leukemia subclasses.<br />

The alternative hypothesis is that there is significant differential gene<br />

expression. The analyses can be performed either between two distinct classes<br />

(pairwise comparisons; subtype A vs subtype B) or between one distinct class<br />

and all other remaining classes in a one-vs-all (OVA) approach.<br />

3.4.1.1. SIGNIFICANCE ANALYSIS OF MICROARRAYS<br />

Supervised data analyses can be performed using the significance analysis<br />

of microarrays (SAM) software. SAM is a statistical technique for finding significant<br />

genes in large-scale microarray-based gene expression profiles, and<br />

correlates gene expression data with an external variable, e.g., the leukemia<br />

subclass or karyotype information. The SAM software is an add-in package for<br />

Microsoft Excel and analyzes statistical significance of the changes in gene<br />

expression from repeated permutations. It was proposed by Tusher and colleagues<br />

(12). SAM identifies genes with statistically significant changes in<br />

expression by assimilating a set of gene-specific t-tests. Each gene is assigned<br />

a score on the basis of its change in gene expression relative to the standard<br />

deviation of repeated measurements for that gene. Genes with scores greater<br />

than an adjustable threshold are deemed potentially significant. The cutoff for<br />

significance is determined by the tuning parameter delta, chosen by the user<br />

based on the false discovery rate (FDR). The FDR, i.e., the percentage of genes<br />

identified by chance, is estimated by analyzing repeated permutations of<br />

the data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!