08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

56 CHAPTER 3. MATHEMATICAL MODELING AND ALGORITHMS<br />

3.8 Extracting Fingerprints<br />

3.8.1 What are Fingerprints ?<br />

A Fingerprint, with respect to this <strong>the</strong>sis, is a set <strong>of</strong> masterpeaks <strong>of</strong> a particular<br />

set <strong>of</strong> spectra (<strong>for</strong> example spectra <strong>of</strong> lung cancer patients <strong>of</strong> a certain age)<br />

that are not present (or differentiate) in some control group (<strong>for</strong> example<br />

healthy patients <strong>of</strong> <strong>the</strong> same age). That is, given an unknown spectrum S (<strong>of</strong><br />

a patient P ) and a fingerprint F <strong>of</strong>, say a particular disease, we can judge just<br />

by checking <strong>the</strong> occurrence <strong>of</strong> <strong>the</strong> peaks <strong>of</strong> F whe<strong>the</strong>r <strong>the</strong> patient P is likely<br />

to have this disease. The following sections will explain how we determine a<br />

fingerprint given two sets <strong>of</strong> spectra (e.g. cancer patients vs. healthy patients).<br />

3.8.2 Our Approach<br />

After <strong>the</strong> preprocessing steps we now have in<strong>for</strong>mation about assigned masterpeaks<br />

from <strong>the</strong> two patient (spectra) groups. With <strong>the</strong>se in<strong>for</strong>mation we<br />

can analyze <strong>for</strong> patterns (fingerprints) by detection and subsequent selection<br />

<strong>of</strong> significant features.<br />

1. Creation <strong>of</strong> Fingerprint<br />

� Requires Feature Detection, described in <strong>the</strong> previous section and<br />

detects potential features that can be used to discriminate two<br />

groups based on <strong>the</strong>ir properties (e.g. differences in average height,<br />

see Fig. 3.6.13)<br />

� Feature Selection: Selection <strong>of</strong> an optimal subset <strong>of</strong> features detected<br />

(see Figure 3.18(a)).<br />

2. Reduce Complexity: Dimensionality Reduction <strong>of</strong> fingerprint data by<br />

projecting fingerprint data to a low-dimensional space (see Fig. 3.18(b)).<br />

This is done because it is usually not reliable to cluster in high dimensions.<br />

3. Evaluation by clustering: Clustering <strong>of</strong> low-dimensional projections to<br />

get a per<strong>for</strong>mance measure. The clusters found can <strong>the</strong>n be used to<br />

derive classification rules. (See also section 4.3.3.)<br />

The feature detection step identifies a set <strong>of</strong> masterpeaks that differ significantly<br />

in particular properties (e.g. height, width) between two groups <strong>of</strong><br />

spectra.<br />

Feature Selection<br />

Generally speaking, feature selection approaches try to find a subset <strong>of</strong> <strong>the</strong><br />

original features <strong>of</strong> <strong>the</strong> given data. Thus, this step selects a small and efficient 16<br />

set <strong>of</strong> features from <strong>the</strong> previous feature detection step. This step is also called<br />

modeling since we build a reduced model <strong>of</strong> <strong>the</strong> (real) full feature set. By<br />

removing irrelevant and redundant features from <strong>the</strong> data, feature selection<br />

helps to improve per<strong>for</strong>mance <strong>of</strong> learning models by<br />

� Reducing <strong>the</strong> effect <strong>of</strong> <strong>the</strong> curse <strong>of</strong> dimensionality<br />

16 The less features used while maintaining <strong>the</strong> same discrimination power <strong>the</strong> more efficient<br />

is <strong>the</strong> set.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!