25.12.2014 Views

Fuzzy c-means

Fuzzy c-means

Fuzzy c-means

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Abstract<br />

<strong>Fuzzy</strong> Clustering<br />

• Problem: To extract rules from data<br />

• Method: <strong>Fuzzy</strong> c-<strong>means</strong><br />

• Results: e.g., finding cancer cells<br />

Lecture 14<br />

Clusters<br />

• A number of similar individuals that occur<br />

together for example:<br />

• two or more consecutive consonants or vowels in a<br />

segment of speech<br />

• a group of houses<br />

• an aggregation of stars or galaxies that appear close<br />

together in the sky and are gravitationally associated.<br />

• etc.<br />

Cluster analysis<br />

• A statistical classification technique for<br />

discovering whether the individuals of a<br />

population fall into different groups by making<br />

quantitative comparisons of multiple<br />

characteristics.


Vehicle Example<br />

Vehicle Clusters<br />

Vehicle Top speed<br />

km/h<br />

Colour Air<br />

resistance<br />

Weight<br />

Kg<br />

V1 220 red 0.30 1300<br />

V2 230 black 0.32 1400<br />

V3 260 red 0.29 1500<br />

V4 140 gray 0.35 800<br />

V5 155 blue 0.33 950<br />

V6 130 white 0.40 600<br />

V7 100 black 0.50 3000<br />

V8 105 red 0.60 2500<br />

V9 110 gray 0.55 3500<br />

Weight [kg]<br />

3500<br />

3000<br />

2500<br />

2000<br />

1500<br />

1000<br />

Lorries<br />

Medium market cars<br />

Sports cars<br />

500<br />

100 150 200 250 300<br />

Top speed [km/h]<br />

Terminology<br />

feature<br />

Weight [kg]<br />

3500<br />

3000<br />

2500<br />

2000<br />

1500<br />

1000<br />

Object or data point<br />

Lorries<br />

Medium market cars<br />

Sports cars<br />

500<br />

100 150 200 250 300<br />

Top speed [km/h]<br />

feature<br />

cluster<br />

label<br />

feature space<br />

Classify cracked tiles<br />

475Hz 557Hz Ok<br />

-----+-----+---<br />

0.958 0.003 Yes<br />

1.043 0.001 Yes<br />

1.907 0.003 Yes<br />

0.780 0.002 Yes<br />

0.579 0.001 Yes<br />

0.003 0.105 No<br />

0.001 1.748 No<br />

0.014 1.839 No<br />

0.007 1.021 No<br />

0.004 0.214 No<br />

Table 1: frequency<br />

intensities for ten<br />

tiles.<br />

Tiles are made from clay moulded into the right shape, brushed, glazed, and baked.<br />

Unfortunately, the baking may produce invisible cracks. Operators can detect the<br />

cracks by hitting the tiles with a hammer, and in an automated system the response is<br />

recorded with a microphone, filtered, Fourier transformed, and normalised. A small set<br />

of data is given in TABLE 1 (adapted from MIT, 1997).


hard c-<strong>means</strong> (HCM)<br />

(also known as k <strong>means</strong>)<br />

2<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

2<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

1<br />

1<br />

0<br />

0<br />

log(intensity) 557 Hz<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

log(intensity) 557 Hz<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

-6<br />

-7<br />

-7<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

Plot of tiles by frequencies (logarithms). The whole tiles (o) seem well separated<br />

from the cracked tiles (*). The objective is to find the two clusters.<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

1. Place two cluster centres (x) at random.<br />

2. Assign each data point (* and o) to the nearest cluster centre (x)<br />

2<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

2<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

1<br />

1<br />

0<br />

0<br />

log(intensity) 557 Hz<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

log(intensity) 557 Hz<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

-6<br />

-7<br />

-7<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

1. Compute the new centre of each class<br />

2. Move the crosses (x)<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

Iteration 2


log(intensity) 557 Hz<br />

2<br />

1<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

-7<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

Iteration 3<br />

log(intensity) 557 Hz<br />

2<br />

1<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

-7<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

Iteration 4 (then stop, because no visible change)<br />

Each data point belongs to the cluster defined by the nearest centre<br />

M =<br />

0.0000 1.0000<br />

0.0000 1.0000<br />

0.0000 1.0000<br />

0.0000 1.0000<br />

0.0000 1.0000<br />

1.0000 0.0000<br />

1.0000 0.0000<br />

1.0000 0.0000<br />

1.0000 0.0000<br />

1.0000 0.0000<br />

Membership matrix M<br />

m<br />

ik<br />

data point k cluster centre i<br />

⎧⎪<br />

1 if u −c ≤ u −c<br />

= ⎨<br />

⎪ ⎩0<br />

otherwise<br />

2<br />

k i k j<br />

distance<br />

2<br />

cluster centre j<br />

The membership matrix M:<br />

1. The last five data points (rows) belong to the first cluster (column)<br />

2. The first five data points (rows) belong to the second cluster (column)


c-partition<br />

Objective function<br />

All clusters C<br />

together fills the<br />

whole universe U<br />

Clusters do not<br />

overlap<br />

Minimise the total sum of<br />

all distances<br />

A cluster C is never<br />

empty and it is<br />

smaller than the<br />

whole universe U<br />

c<br />

∪<br />

i=<br />

1<br />

C = U<br />

C ∩ C = Ø<br />

i<br />

i<br />

Ø ⊂ C ⊂ U<br />

2 ≤ c ≤ K<br />

i<br />

j<br />

for all i ≠ j<br />

for all i<br />

There must be at least 2<br />

clusters in a c-partition and<br />

at most as many as the<br />

number of data points K<br />

J =<br />

c<br />

∑<br />

J<br />

=<br />

c<br />

⎛<br />

⎜<br />

⎝<br />

∑⎜<br />

∑<br />

i<br />

k<br />

i=<br />

1 i= 1 k , uk∈Ci<br />

u − c<br />

i<br />

2<br />

⎞<br />

⎟<br />

⎠<br />

<strong>Fuzzy</strong> c-<strong>means</strong><br />

One of the problems of the k-<strong>means</strong> algorithm is that it<br />

gives a hard partitioning of the data, that is to say that<br />

each point is attributed to one and only one cluster.<br />

But points on the edge of the cluster, or near another<br />

cluster, may not be as much in the cluster as points in<br />

the center of cluster.<br />

<strong>Fuzzy</strong> c-<strong>means</strong><br />

Therefore, in fuzzy clustering, each point does not pertain<br />

to a given cluster, but has a degree of belonging to a<br />

certain cluster, as in fuzzy logic. For each point x we have a<br />

coefficient giving the degree of being in the k-th cluster<br />

u k (x). Usually, the sum of those coefficients has to be one,<br />

so that u k (x) denotes a probability of belonging to a certain<br />

cluster:


<strong>Fuzzy</strong> c-<strong>means</strong><br />

<strong>Fuzzy</strong> c-<strong>means</strong><br />

The degree of being in a certain cluster is related to the<br />

inverse of the distance to the cluster<br />

With fuzzy c-<strong>means</strong>, the centroid of a cluster is computed as<br />

being the mean of all points, weighted by their degree of<br />

belonging to the cluster, that is:<br />

then the coefficients are normalized and fuzzyfied with a<br />

real parameter m > 1 so that their sum is 1. So :<br />

<strong>Fuzzy</strong> c-<strong>means</strong><br />

For m equal to 2, this is equivalent to normalising the<br />

coefficient linearly to make their sum 1. When m is close to<br />

1, then cluster center closest to the point is given much<br />

more weight than the others, and the algorithm is similar to<br />

k-<strong>means</strong>.<br />

<strong>Fuzzy</strong> c-<strong>means</strong><br />

The fuzzy c-<strong>means</strong> algorithm is greatly similar to the k-<br />

<strong>means</strong> algorithm :


<strong>Fuzzy</strong> c-<strong>means</strong><br />

•Choose a number of clusters<br />

•Assign randomly to each point coefficients for being in the<br />

clusters<br />

•Repeat until the algorithm has converged (that is, the<br />

coefficients' change between two iterations is no more than ε,<br />

the given sensitivity threshold) :<br />

•Compute the centroid for each cluster, using the formula<br />

above<br />

•For each point, compute its coefficients of being in the<br />

clusters, using the formula above<br />

<strong>Fuzzy</strong> C-<strong>means</strong><br />

uij is membership of sample i to custer j<br />

ck is centroid of custer i<br />

while changes in cluster Ck<br />

% compute new memberships<br />

for k=1,…,K do<br />

for i=1,…,N do<br />

ujk = f(xj – ck)<br />

end<br />

end<br />

% compute new cluster centroids<br />

for k=1,…,K do<br />

% weighted mean<br />

ck = SUMj jkxk xj /SUMj ujk<br />

end<br />

end<br />

<strong>Fuzzy</strong> c-<strong>means</strong><br />

<strong>Fuzzy</strong> c-<strong>means</strong> (FCM)<br />

The fuzzy c-<strong>means</strong> algorithm minimizes intra-cluster<br />

variance as well, but has the same problems as k-<strong>means</strong>,<br />

the minimum is local minimum, and the results depend on<br />

the initial choice of weights.<br />

log(intensity) 557 Hz<br />

2<br />

1<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

-7<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

Each data point belongs to two clusters to different degrees


2<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

2<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

1<br />

1<br />

0<br />

0<br />

log(intensity) 557 Hz<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

log(intensity) 557 Hz<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

-6<br />

-7<br />

-7<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

1. Place two cluster centres<br />

2. Assign a fuzzy membership to each data point depending on<br />

distance<br />

1. Compute the new centre of each class<br />

2. Move the crosses (x)<br />

2<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

2<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

1<br />

1<br />

0<br />

0<br />

log(intensity) 557 Hz<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

log(intensity) 557 Hz<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

-6<br />

-7<br />

-7<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

Iteration 2<br />

Iteration 5


2<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

2<br />

Tiles data: o = whole tiles, * = cracked tiles, x = centres<br />

1<br />

1<br />

0<br />

0<br />

log(intensity) 557 Hz<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

log(intensity) 557 Hz<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

-6<br />

-7<br />

-7<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

-8<br />

-8 -6 -4 -2 0 2<br />

log(intensity) 475 Hz<br />

Iteration 10<br />

Iteration 13 (then stop, because no visible change)<br />

Each data point belongs to the two clusters to a degree<br />

M =<br />

0.0025 0.9975<br />

0.0091 0.9909<br />

0.0129 0.9871<br />

0.0001 0.9999<br />

0.0107 0.9893<br />

0.9393 0.0607<br />

0.9638 0.0362<br />

0.9574 0.0426<br />

0.9906 0.0094<br />

0.9807 0.0193<br />

<strong>Fuzzy</strong> membership matrix M<br />

Point k’s membership<br />

of cluster i<br />

m<br />

ik<br />

dik<br />

=<br />

c<br />

∑<br />

j=<br />

1<br />

k<br />

1<br />

⎛ ⎞<br />

⎜<br />

dik<br />

⎟<br />

⎝ d<br />

jk ⎠<br />

= u − c<br />

i<br />

2 /<br />

( q−1)<br />

Fuzziness<br />

exponent<br />

Distance from point k to<br />

current cluster centre i<br />

Distance from point k to<br />

other cluster centres j<br />

The membership matrix M:<br />

1. The last five data points (rows) belong mostly to the first cluster (column)<br />

2. The first five data points (rows) belong mostly to the second cluster (column)


<strong>Fuzzy</strong> membership matrix M<br />

m ik =<br />

2 / ( q−1<br />

⎛ ⎞<br />

)<br />

c<br />

dik<br />

=<br />

∑<br />

j=<br />

1<br />

=<br />

⎛ d<br />

⎜<br />

⎝ d<br />

d<br />

⎜<br />

⎝ d<br />

ik<br />

1k<br />

1<br />

1<br />

jk<br />

2 /( q−1) 2 /( q−1) 2 /( q−1)<br />

⎞<br />

⎟<br />

⎠<br />

⎟<br />

⎠<br />

⎛ d ⎞ ⎛<br />

ik<br />

d<br />

+ ⎜ ⎟ + +<br />

⎜<br />

⎝ d2k<br />

⎠ ⎝ d<br />

1<br />

2 /( q−1)<br />

dik<br />

1<br />

1<br />

+ + +<br />

2 /( q−1) 2 /( q−1) 2 /( q−1)<br />

1k<br />

d2k<br />

dck<br />

1<br />

ik<br />

ck<br />

⎞<br />

⎟<br />

⎠<br />

Gravitation to<br />

cluster i relative<br />

to total gravitation<br />

<strong>Fuzzy</strong> Membership<br />

Membership of test point<br />

1<br />

0.5<br />

o is with q = 1.1, * is with q = 2<br />

0<br />

1 2 3 4 5<br />

Data point<br />

Cluster centres<br />

<strong>Fuzzy</strong> c-partition<br />

Example: Classify cancer cells<br />

All clusters C together fill the<br />

whole universe U.<br />

Remark: The sum of<br />

memberships for a data point<br />

is 1, and the total for all<br />

points is K<br />

A cluster C is never<br />

empty and it is<br />

smaller than the<br />

whole universe U<br />

c<br />

∪<br />

i=<br />

1<br />

C = U<br />

C ∩ C = Ø<br />

i<br />

i<br />

Ø ⊂ C ⊂ U<br />

2 ≤ c ≤ K<br />

i<br />

j<br />

for all i ≠ j<br />

for all i<br />

Not valid: Clusters<br />

do overlap<br />

There must be at least 2<br />

clusters in a c-partition and<br />

at most as many as the<br />

number of data points K<br />

Normal smear<br />

Using a small brush, cotton stick, or wooden<br />

stick, a specimen is taken from the uterin cervix<br />

and smeared onto a thin, rectangular glass plate,<br />

a slide. The purpose of the smear screening is to<br />

diagnose pre-malignant cell changes before they<br />

progress to cancer. The smear is stained using<br />

the Papanicolau method, hence the name Pap<br />

smear. Different characteristics have different<br />

colours, easy to distinguish in a microscope. A<br />

cyto-technician performs the screening in a<br />

microscope. It is time consuming and prone to<br />

error, as each slide may contain up to 300.000<br />

cells.<br />

Severely dysplastic smear<br />

Dysplastic cells have undergone precancerous changes.<br />

They generally have longer and darker nuclei, and they<br />

have a tendency to cling together in large clusters. Mildly<br />

dysplastic cels have enlarged and bright nuclei.<br />

Moderately dysplastic cells have larger and darker<br />

nuclei. Severely dysplastic cells have large, dark, and<br />

often oddly shaped nuclei. The cytoplasm is dark, and it<br />

is relatively small.


Possible Features<br />

Classes are nonseparable<br />

• Nucleus and cytoplasm area<br />

• Nucleus and cyto brightness<br />

• Nucleus shortest and longest diameter<br />

• Cyto shortest and longest diameter<br />

• Nucleus and cyto perimeter<br />

• Nucleus and cyto no of maxima<br />

• (...)<br />

Hard Classifier (HCM)<br />

<strong>Fuzzy</strong> Classifier (FCM)<br />

moderate<br />

moderate<br />

Ok<br />

light<br />

Ok<br />

severe<br />

A cell is either one<br />

or the other class<br />

defined by a colour.<br />

Ok<br />

light<br />

Ok<br />

severe<br />

A cell can belong to<br />

several classes to a<br />

Degree, i.e., one column<br />

may have several colours.


Function approximation<br />

Approximation by fuzzy sets<br />

Output1<br />

1.5<br />

1<br />

0.5<br />

0<br />

-0.5<br />

2<br />

1<br />

0<br />

-1<br />

-2<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1<br />

-1<br />

-1.5<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1<br />

Input<br />

Curve fitting in a multi-dimensional space is also called function<br />

approximation. Learning is equivalent to finding a function that best<br />

fits the training data.<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1<br />

Procedure to find a model<br />

1. Acquire data<br />

2. Select structure<br />

3. Find clusters, generate model<br />

4. Validate model<br />

Conclusions<br />

• Compared to neural networks, fuzzy models can<br />

be interpreted by human beings<br />

• Applications: system identification, adaptive<br />

systems

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!