27.12.2014 Views

4 - Central Institute of Brackishwater Aquaculture

4 - Central Institute of Brackishwater Aquaculture

4 - Central Institute of Brackishwater Aquaculture

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

National Workshop-cum-Training on llidnformatics and Information Managemant in <strong>Aquaculture</strong><br />

3.3.4. Principal Component Analysis <strong>of</strong> Microarray<br />

Objectives <strong>of</strong> principal component analysis (PCA) are to discover or to reduce the<br />

dimensionality <strong>of</strong> the data set and to identify new meaningful underlying<br />

variables. PCA transforms a number <strong>of</strong> (possibly) correlated variables into a<br />

(smaller) number <strong>of</strong> uncorrelated variables called principal components. The<br />

basic idea in PCA is to find the components that explain the maximum amount <strong>of</strong><br />

variance possible by n linearly transformed components. The first principal<br />

component accounts for as much <strong>of</strong> the variability in the data as possible, and<br />

each succeeding component accounts for as much <strong>of</strong> the remaining variability as<br />

possible. PCA can be also applied when other information in addition to the<br />

actual expression levels is available (this applies to SOM and K-means methods<br />

as well).<br />

Fig. 7 An example <strong>of</strong> principal component analysis. The two most significant<br />

principal components have been selected as the axes <strong>of</strong> the plot (Source;<br />

Hovatta et. al.)<br />

3.3.5. Correspondence Analysis<br />

Correspondence analysis is an explorative method to study associations between<br />

variables. It directly visualizes associations between genes and hybridizations.<br />

Unlike many other methods, CA does not require any prior choice <strong>of</strong> parameters.<br />

Like principal components, it displays a low-dimensional projection <strong>of</strong> the data.<br />

However, in this case, both genes and samples can be projected onto the same<br />

space, revealing associations between them. Correspondence analysis requires<br />

an expression matrix with no missing valu6s. Therefore, any missing values have<br />

to be imputed first. We use the k-nearest neighbors algorithm to impute missing<br />

values. The only user input in the initialization dialog is the desired number <strong>of</strong><br />

neighbors for imputation. Genes that lie close to one another on the plot tend to<br />

have similar pr<strong>of</strong>iles, regardless <strong>of</strong> their absolute value. The same is true for<br />

samples. If some genes and samples lie close to one another on the plot, then<br />

these genes are likely to have a high expression in the nearby samples relative<br />

to other samples that are far away on the plot. On the other hand, if a set <strong>of</strong><br />

genes are on the opposite side <strong>of</strong> the plot from a set <strong>of</strong> samples relative to the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!