New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
56 CHAPTER 3. MATHEMATICAL MODELING AND ALGORITHMS<br />
3.8 Extracting Fingerprints<br />
3.8.1 What are Fingerprints ?<br />
A Fingerprint, with respect to this <strong>the</strong>sis, is a set <strong>of</strong> masterpeaks <strong>of</strong> a particular<br />
set <strong>of</strong> spectra (<strong>for</strong> example spectra <strong>of</strong> lung cancer patients <strong>of</strong> a certain age)<br />
that are not present (or differentiate) in some control group (<strong>for</strong> example<br />
healthy patients <strong>of</strong> <strong>the</strong> same age). That is, given an unknown spectrum S (<strong>of</strong><br />
a patient P ) and a fingerprint F <strong>of</strong>, say a particular disease, we can judge just<br />
by checking <strong>the</strong> occurrence <strong>of</strong> <strong>the</strong> peaks <strong>of</strong> F whe<strong>the</strong>r <strong>the</strong> patient P is likely<br />
to have this disease. The following sections will explain how we determine a<br />
fingerprint given two sets <strong>of</strong> spectra (e.g. cancer patients vs. healthy patients).<br />
3.8.2 Our Approach<br />
After <strong>the</strong> preprocessing steps we now have in<strong>for</strong>mation about assigned masterpeaks<br />
from <strong>the</strong> two patient (spectra) groups. With <strong>the</strong>se in<strong>for</strong>mation we<br />
can analyze <strong>for</strong> patterns (fingerprints) by detection and subsequent selection<br />
<strong>of</strong> significant features.<br />
1. Creation <strong>of</strong> Fingerprint<br />
� Requires Feature Detection, described in <strong>the</strong> previous section and<br />
detects potential features that can be used to discriminate two<br />
groups based on <strong>the</strong>ir properties (e.g. differences in average height,<br />
see Fig. 3.6.13)<br />
� Feature Selection: Selection <strong>of</strong> an optimal subset <strong>of</strong> features detected<br />
(see Figure 3.18(a)).<br />
2. Reduce Complexity: Dimensionality Reduction <strong>of</strong> fingerprint data by<br />
projecting fingerprint data to a low-dimensional space (see Fig. 3.18(b)).<br />
This is done because it is usually not reliable to cluster in high dimensions.<br />
3. Evaluation by clustering: Clustering <strong>of</strong> low-dimensional projections to<br />
get a per<strong>for</strong>mance measure. The clusters found can <strong>the</strong>n be used to<br />
derive classification rules. (See also section 4.3.3.)<br />
The feature detection step identifies a set <strong>of</strong> masterpeaks that differ significantly<br />
in particular properties (e.g. height, width) between two groups <strong>of</strong><br />
spectra.<br />
Feature Selection<br />
Generally speaking, feature selection approaches try to find a subset <strong>of</strong> <strong>the</strong><br />
original features <strong>of</strong> <strong>the</strong> given data. Thus, this step selects a small and efficient 16<br />
set <strong>of</strong> features from <strong>the</strong> previous feature detection step. This step is also called<br />
modeling since we build a reduced model <strong>of</strong> <strong>the</strong> (real) full feature set. By<br />
removing irrelevant and redundant features from <strong>the</strong> data, feature selection<br />
helps to improve per<strong>for</strong>mance <strong>of</strong> learning models by<br />
� Reducing <strong>the</strong> effect <strong>of</strong> <strong>the</strong> curse <strong>of</strong> dimensionality<br />
16 The less features used while maintaining <strong>the</strong> same discrimination power <strong>the</strong> more efficient<br />
is <strong>the</strong> set.