08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.3. STUDY RESULTS 85<br />

d(x, θi) = � x − θi � 2 . (4.3.7)<br />

Since we have only discrete observation xt, t = 1, . . . , |X|, <strong>the</strong> functional<br />

(4.3.4) gets <strong>the</strong> <strong>for</strong>m<br />

K� |X| �<br />

γi(xt) � xt − θi � 2 → min . (4.3.8)<br />

Γ(X),Θ<br />

i=1 t=1<br />

K-means algorithm iteratively minimizes <strong>the</strong> functional (4.3.8) subject to<br />

constraints (4.3.5 - 4.3.6) assigning <strong>the</strong> new cluster affiliations γ (l) (x) and<br />

updating <strong>the</strong> cluster centers θ (l)<br />

i in iteration (l) according to <strong>the</strong> following<br />

<strong>for</strong>mulas<br />

γ (l)<br />

i (x) =<br />

θ (l)<br />

i<br />

=<br />

�<br />

1 i = arg min � x − θ (l−1)<br />

i �2 0<br />

,<br />

o<strong>the</strong>rwise,<br />

(4.3.9)<br />

� |X|<br />

t=1 γ(l)<br />

i (xt) · x<br />

� .<br />

|X|<br />

(xt)<br />

(4.3.10)<br />

t=1 γ(l)<br />

i<br />

Iterations (4.3.9-4.3.10) are repeated until <strong>the</strong> change <strong>of</strong> <strong>the</strong> averaged clustering<br />

functional value does not exceed a certain predefined threshold value.<br />

The complexity <strong>of</strong> <strong>the</strong> k-means algorithm is O(K · |X| · L), where L is<br />

<strong>the</strong> number <strong>of</strong> iterations. Note that it is possible that a good cluster solution<br />

will be missed due to <strong>the</strong> algorithm converging to a local ra<strong>the</strong>r than global<br />

minimum <strong>of</strong> <strong>the</strong> scoring function. One way to alleviate this problem is to carry<br />

out multiple searches from different randomly chosen starting points <strong>for</strong> <strong>the</strong><br />

initial cluster centers. This can even be taken fur<strong>the</strong>r to adopt a simulated<br />

annealing strategy to try to avoid getting trapped in local minima <strong>of</strong> <strong>the</strong> score<br />

function.<br />

Fuzzy c-Means Clustering<br />

Experiments have shown that a spectrum is very unlikely to be assigned to<br />

exactly one (sub-)cluster. In most cases a spectra reflects a transient disease<br />

status between two or more extrema (e.g. some stage between healthy and<br />

fully diseased). As it can be seen from (4.3.9), this can not be represented<br />

by <strong>the</strong> k-means algorithm and thus geometrically overlapping clusters can not<br />

be resolved. This issue was addressed by (Bezdek, 1981) who proposed <strong>the</strong><br />

following modification <strong>of</strong> <strong>the</strong> averaged clustering functional (4.3.8):<br />

K� |X| �<br />

i=1 t=1<br />

γ m i (xt) � xt − θi � 2 → min , (4.3.11)<br />

Γ(X),Θ<br />

where m > 1 is a fixed parameter called <strong>the</strong> fuzzyfier (Bezdek, 1981; Bezdek<br />

et al., 1987). Analogously to k-means, <strong>the</strong> fuzzy c-means algorithm is an<br />

iterative procedure <strong>for</strong> <strong>the</strong> minimization <strong>of</strong> (4.3.11)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!