4 - Central Institute of Brackishwater Aquaculture
4 - Central Institute of Brackishwater Aquaculture
4 - Central Institute of Brackishwater Aquaculture
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
National Workshop-cum-Training on llidnformatics and Information Managemant in <strong>Aquaculture</strong><br />
3.3.4. Principal Component Analysis <strong>of</strong> Microarray<br />
Objectives <strong>of</strong> principal component analysis (PCA) are to discover or to reduce the<br />
dimensionality <strong>of</strong> the data set and to identify new meaningful underlying<br />
variables. PCA transforms a number <strong>of</strong> (possibly) correlated variables into a<br />
(smaller) number <strong>of</strong> uncorrelated variables called principal components. The<br />
basic idea in PCA is to find the components that explain the maximum amount <strong>of</strong><br />
variance possible by n linearly transformed components. The first principal<br />
component accounts for as much <strong>of</strong> the variability in the data as possible, and<br />
each succeeding component accounts for as much <strong>of</strong> the remaining variability as<br />
possible. PCA can be also applied when other information in addition to the<br />
actual expression levels is available (this applies to SOM and K-means methods<br />
as well).<br />
Fig. 7 An example <strong>of</strong> principal component analysis. The two most significant<br />
principal components have been selected as the axes <strong>of</strong> the plot (Source;<br />
Hovatta et. al.)<br />
3.3.5. Correspondence Analysis<br />
Correspondence analysis is an explorative method to study associations between<br />
variables. It directly visualizes associations between genes and hybridizations.<br />
Unlike many other methods, CA does not require any prior choice <strong>of</strong> parameters.<br />
Like principal components, it displays a low-dimensional projection <strong>of</strong> the data.<br />
However, in this case, both genes and samples can be projected onto the same<br />
space, revealing associations between them. Correspondence analysis requires<br />
an expression matrix with no missing valu6s. Therefore, any missing values have<br />
to be imputed first. We use the k-nearest neighbors algorithm to impute missing<br />
values. The only user input in the initialization dialog is the desired number <strong>of</strong><br />
neighbors for imputation. Genes that lie close to one another on the plot tend to<br />
have similar pr<strong>of</strong>iles, regardless <strong>of</strong> their absolute value. The same is true for<br />
samples. If some genes and samples lie close to one another on the plot, then<br />
these genes are likely to have a high expression in the nearby samples relative<br />
to other samples that are far away on the plot. On the other hand, if a set <strong>of</strong><br />
genes are on the opposite side <strong>of</strong> the plot from a set <strong>of</strong> samples relative to the