25.12.2013 Views

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The multivariate techniques applied in the field of chemometrics are conventionally<br />

divided into two main categories, namely supervised and unsupervised. Unsupervised<br />

methods “attempt to disclose naturally occurring groups and structures within the<br />

dataset without previous knowledge of any class assignment” (Alvarez-Ordoñez and<br />

Prieto, 2012). They chiefly focus on the discovery of patterns, trends, clusters and/or<br />

outliers in the data, and they include techniques such as Principal Component<br />

Analysis (PCA) and cluster analysis. On the other hand, supervised learning<br />

algorithms “make use of a priori knowledge of classes to guide the characterisation or<br />

classification process” (Alvarez-Ordoñez and Prieto, 2012); these algorithms generate<br />

prediction models for regression, classification, pattern recognition, or machine<br />

learning tasks. Characteristic examples of supervised learning involve Partial Least<br />

Squares Discriminant Analysis (PLS-DA) and Support Vector Machines (SVMs),<br />

among many others.<br />

1.3 Data pre-treatment<br />

Nowadays, the extraction of relevant information from highly heterogeneous datasets<br />

constitutes a major challenge (van den Berg et al., 2006). It is well established that<br />

prior to the application of any type of data analysis, proper data pre-treatment is<br />

crucial for the outcome and the interpretability of the results. Data pre-treatment can<br />

make the difference between a useful model and no model at all. Therefore, biological<br />

data under investigation are often scaled, centered and/or transformed. The<br />

application of pre-treatment techniques may prove to be extremely fruitful, especially<br />

under circumstances where the variables span over wide and different ranges. In<br />

addition, pre-treatment techniques aim to minimise the influence of disturbing factors<br />

such as measurement noise.<br />

The selection of chemometrics method to be applied, strongly influences the selection<br />

of the data pre-treatment methods. Different techniques focus on different aspects of<br />

the data. For instance, clustering algorithms focus on revealing similarity and<br />

dissimilarity patterns, whereas PCA attempts to explain the maximum variation based<br />

on a few meaningful components. Thus, a certain pre-treatment method may enhance<br />

the results of one technique and obscure the results of another.<br />

8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!