New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.3. STUDY RESULTS 85<br />
d(x, θi) = � x − θi � 2 . (4.3.7)<br />
Since we have only discrete observation xt, t = 1, . . . , |X|, <strong>the</strong> functional<br />
(4.3.4) gets <strong>the</strong> <strong>for</strong>m<br />
K� |X| �<br />
γi(xt) � xt − θi � 2 → min . (4.3.8)<br />
Γ(X),Θ<br />
i=1 t=1<br />
K-means algorithm iteratively minimizes <strong>the</strong> functional (4.3.8) subject to<br />
constraints (4.3.5 - 4.3.6) assigning <strong>the</strong> new cluster affiliations γ (l) (x) and<br />
updating <strong>the</strong> cluster centers θ (l)<br />
i in iteration (l) according to <strong>the</strong> following<br />
<strong>for</strong>mulas<br />
γ (l)<br />
i (x) =<br />
θ (l)<br />
i<br />
=<br />
�<br />
1 i = arg min � x − θ (l−1)<br />
i �2 0<br />
,<br />
o<strong>the</strong>rwise,<br />
(4.3.9)<br />
� |X|<br />
t=1 γ(l)<br />
i (xt) · x<br />
� .<br />
|X|<br />
(xt)<br />
(4.3.10)<br />
t=1 γ(l)<br />
i<br />
Iterations (4.3.9-4.3.10) are repeated until <strong>the</strong> change <strong>of</strong> <strong>the</strong> averaged clustering<br />
functional value does not exceed a certain predefined threshold value.<br />
The complexity <strong>of</strong> <strong>the</strong> k-means algorithm is O(K · |X| · L), where L is<br />
<strong>the</strong> number <strong>of</strong> iterations. Note that it is possible that a good cluster solution<br />
will be missed due to <strong>the</strong> algorithm converging to a local ra<strong>the</strong>r than global<br />
minimum <strong>of</strong> <strong>the</strong> scoring function. One way to alleviate this problem is to carry<br />
out multiple searches from different randomly chosen starting points <strong>for</strong> <strong>the</strong><br />
initial cluster centers. This can even be taken fur<strong>the</strong>r to adopt a simulated<br />
annealing strategy to try to avoid getting trapped in local minima <strong>of</strong> <strong>the</strong> score<br />
function.<br />
Fuzzy c-Means Clustering<br />
Experiments have shown that a spectrum is very unlikely to be assigned to<br />
exactly one (sub-)cluster. In most cases a spectra reflects a transient disease<br />
status between two or more extrema (e.g. some stage between healthy and<br />
fully diseased). As it can be seen from (4.3.9), this can not be represented<br />
by <strong>the</strong> k-means algorithm and thus geometrically overlapping clusters can not<br />
be resolved. This issue was addressed by (Bezdek, 1981) who proposed <strong>the</strong><br />
following modification <strong>of</strong> <strong>the</strong> averaged clustering functional (4.3.8):<br />
K� |X| �<br />
i=1 t=1<br />
γ m i (xt) � xt − θi � 2 → min , (4.3.11)<br />
Γ(X),Θ<br />
where m > 1 is a fixed parameter called <strong>the</strong> fuzzyfier (Bezdek, 1981; Bezdek<br />
et al., 1987). Analogously to k-means, <strong>the</strong> fuzzy c-means algorithm is an<br />
iterative procedure <strong>for</strong> <strong>the</strong> minimization <strong>of</strong> (4.3.11)