18.12.2012 Views

Myeloid Leukemia

Myeloid Leukemia

Myeloid Leukemia

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

232 Kohlmann et al.<br />

unsupervised data analyses, a variation filter can be applied. This filter aims at<br />

removing probe sets that demonstrate minimal variation across the complete<br />

data set. Practically, for each gene the standard variance is calculated across all<br />

samples. Then the data matrix is sorted according to the standard variances,<br />

and probes demonstrating a low variance are excluded from the subsequent<br />

analysis. Hierarchical clustering is the method of choice when one has little or<br />

no a priori knowledge of the complete repertoire of expected gene expression<br />

patterns. However, no information about the statistical significance is provided.<br />

In contrast, using hierarchical clustering in a supervised approach helps to visualize<br />

differential gene expression of an already preselected set of genes.<br />

3.4.4. Principal Component Analysis<br />

The need to visualize large amounts of data in many dimensions occurs frequently<br />

in bioinformatics. Commonly, principal component analysis (PCA) is<br />

used in statistics to extract the main relationships in data of high dimensionality<br />

(15). It is a useful tool for categorization of multidimensional data such as<br />

gene expression studies, because it separates the dominating features in the<br />

data set. The background mathematical technique used in PCA is called eigen<br />

analysis. PCA reduces the dimensionality of the data set while retaining most<br />

of the information contained therein via the construction of a linear transformation<br />

matrix. This transformation matrix is composed of the most significant<br />

eigenvectors of the covariance matrix of the input matrix of feature vectors.<br />

The principal components (PC) are the projections of the data on the eigenvectors.<br />

These vectors give the directions in which the data cloud is stretched most.<br />

The significance of an eigenvector is defined by its variance, which is equivalent<br />

to its corresponding eigenvalue. Eigenvalues give an indication of the<br />

amount of information the respective PC represent. PCs corresponding to large<br />

eigenvalues represent much information in the data set, and thus can tell much<br />

about the relations between the data points. Because the original data’s variation<br />

can be retained and explained by a smaller number of transformed variables,<br />

a PCA projects the data into a new two- or three-dimensional space and<br />

may provide valuable insight into the data (Fig. 4).<br />

Fig. 4. (opposite page) Principal component analysis workflow. The multi-dimensional<br />

data are reduced by transformation to a new set of variables, the principle components<br />

(PCs). The traditional approach is to use the first few PCs, because they<br />

capture most of the variation in the original data. In the final graph, data points with<br />

similar characteristics will cluster together. Each patient’s expression pattern is represented<br />

by a color-coded sphere.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!