24.04.2015 Views

A novel fuzzy clustering algorithm based on a fuzzy scatter matrix ...

A novel fuzzy clustering algorithm based on a fuzzy scatter matrix ...

A novel fuzzy clustering algorithm based on a fuzzy scatter matrix ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />

www.elsevier.com/locate/patrec<br />

A <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

<strong>scatter</strong> <strong>matrix</strong> with optimality tests<br />

Kuo-Lung Wu a , Jian Yu b , Miin-Shen Yang a, *<br />

a Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li 32023, Taiwan, ROC<br />

b Department of Computer Science, Beijing Jiaot<strong>on</strong>g University, Beijing 100044, PR China<br />

Received 16 January 2004<br />

Available <strong>on</strong>line 28 October 2004<br />

Abstract<br />

Most <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s are <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a within-cluster <strong>scatter</strong> <strong>matrix</strong> with a compactness measure. In this paper we<br />

propose a <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>, called the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> compactness and separati<strong>on</strong> (FCS), <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <strong>scatter</strong><br />

<strong>matrix</strong> in which the FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> is derived using compactness measure minimizati<strong>on</strong> and separati<strong>on</strong> measure maximizati<strong>on</strong>.<br />

The compactness is measured using a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster variati<strong>on</strong>. The separati<strong>on</strong> is measured using a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

between-cluster variati<strong>on</strong>. The proposed FCS objective functi<strong>on</strong> is a modificati<strong>on</strong> of the FS validity index proposed by<br />

Fukuyama and Sugeno and also a generalizati<strong>on</strong> of the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> c-means (FCM). The FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> assigns a crisp<br />

boundary (cluster kernel) for each cluster such that hard memberships and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> memberships can co-exist in the<br />

<str<strong>on</strong>g>clustering</str<strong>on</strong>g> results. Thus, FCS can be seen as a <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> with a <str<strong>on</strong>g>novel</str<strong>on</strong>g> sense between the hard c-means and<br />

<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> c-means. The FCS optimality tests and parameter selecti<strong>on</strong> are also investigated. Some numerical examples<br />

are dem<strong>on</strong>strated to show its robust properties and effectiveness.<br />

Ó 2004 Elsevier B.V. All rights reserved.<br />

Keywords: Fuzzy <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>; Scatter <strong>matrix</strong>; Within-cluster variati<strong>on</strong>; Between-cluster variati<strong>on</strong>; Fuzzy compactness and<br />

separati<strong>on</strong><br />

1. Introducti<strong>on</strong><br />

* Corresp<strong>on</strong>ding author. Tel.: +886 3 456 3171; fax: +886 3<br />

456 3160.<br />

E-mail address: msyang@math.cycu.edu.tw (M.-S. Yang).<br />

Cluster analysis is a branch in statistical multivariate<br />

analysis and unsupervised pattern recogniti<strong>on</strong><br />

learning. It is a method for <str<strong>on</strong>g>clustering</str<strong>on</strong>g> a data<br />

set into most similar groups in the same cluster<br />

and most dissimilar groups in different clusters.<br />

0167-8655/$ - see fr<strong>on</strong>t matter Ó 2004 Elsevier B.V. All rights reserved.<br />

doi:10.1016/j.patrec.2004.09.016


640 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />

The <str<strong>on</strong>g>clustering</str<strong>on</strong>g> applicati<strong>on</strong>s in various areas have<br />

been well documented (Duda and Hart, 1973; Jain<br />

and Dubes, 1988; Kaufman and Rousseeuw,<br />

1990). In these <str<strong>on</strong>g>clustering</str<strong>on</strong>g> methods, the hard<br />

c-means (or k-means) and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> c-means (FCM)<br />

<str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s are the most well-known<br />

methods (Bezdek, 1981; Jain and Dubes, 1988;<br />

Yang, 1993). Most of these methods are <str<strong>on</strong>g>based</str<strong>on</strong>g><br />

<strong>on</strong> minimizing the within-cluster <strong>scatter</strong> <strong>matrix</strong><br />

trace. The within-cluster <strong>scatter</strong> <strong>matrix</strong> trace can<br />

be interpreted as a compactness measure with a<br />

within-cluster variati<strong>on</strong>. Because the <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results<br />

obtained using k-means and FCMare<br />

roughly spherical with similar volumes, many <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />

<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s such as the Gustafs<strong>on</strong>–Kessel<br />

(G–K) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> (Gustafs<strong>on</strong> and Kessel, 1979),<br />

the sum of all normalized determinants (SAND)<br />

<str<strong>on</strong>g>algorithm</str<strong>on</strong>g> (Rouseeuw et al., 1996), the minimum<br />

<strong>scatter</strong> volume (MSV) and minimum cluster volume<br />

(MCV) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s (Krishnapuram and Kim,<br />

2000), the unsupervised <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> partiti<strong>on</strong>-optimal<br />

number of classes (UFP-ONC) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> (Gath<br />

and Geva, 1989), etc. were proposed to accommodate<br />

elliptical clusters with different volumes.<br />

These <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s are all <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a within-cluster<br />

<strong>scatter</strong> <strong>matrix</strong> with a compactness measure.<br />

The c<strong>on</strong>cept of adopting a separati<strong>on</strong> measure<br />

in <str<strong>on</strong>g>clustering</str<strong>on</strong>g> is used widely in solving cluster validity<br />

problems such as the separati<strong>on</strong> coefficient proposed<br />

by Gunders<strong>on</strong> (1978), the XB index<br />

proposed by Xie and Beni (1991), the FS index proposed<br />

by Fukuyama and Sugeno (1989), the SC<br />

index proposed by Zahid et al. (1999), F HV (<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

hyper-volume) and P D (partiti<strong>on</strong> density) indexes<br />

proposed by Gath and Geva (1989), etc. Özdemir<br />

and Akarun (2001) proposed an inter-cluster separati<strong>on</strong><br />

(ICS) <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> that involves a<br />

separati<strong>on</strong> measure in the ICS objective functi<strong>on</strong>.<br />

Because the between-cluster <strong>scatter</strong> <strong>matrix</strong> trace<br />

can be interpreted as a separati<strong>on</strong> measure with a<br />

between-cluster variati<strong>on</strong>, maximizati<strong>on</strong> of the between-cluster<br />

<strong>scatter</strong> <strong>matrix</strong> trace will induce a result<br />

with well-separated clusters (Yang et al., 2003).<br />

In this paper, we propose a <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />

<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>, called the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> compactness and<br />

separati<strong>on</strong> (FCS) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. The FCS objective<br />

functi<strong>on</strong> is <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <strong>scatter</strong> <strong>matrix</strong>. The<br />

FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> is derived by minimizing the compactness<br />

measure and simultaneously maximizing<br />

the separati<strong>on</strong> measure (Yang et al., 2003). The<br />

compactness is measured using a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster<br />

<strong>scatter</strong> <strong>matrix</strong>. The separati<strong>on</strong> is measured<br />

using a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> between-cluster <strong>scatter</strong> <strong>matrix</strong> trace.<br />

In k-means, data points always have crisp membership<br />

values of zero or <strong>on</strong>e. Although, FCMallows<br />

the data points to have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> membership values<br />

between zero and <strong>on</strong>e, it does not exactly produce<br />

a zero or <strong>on</strong>e for the membership values. In the<br />

proposed FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>, crisp and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> membership<br />

values could co-exist. These FCS properties<br />

will be discussed. We will also show that, when the<br />

weighting exp<strong>on</strong>ent m is large, the FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g><br />

is more robust to noise and outliers than FCM.<br />

The theoretical analysis <strong>on</strong> FCS will be investigated.<br />

Yu et al. (2004) gave a theoretical upper<br />

bound for the weighting exp<strong>on</strong>ent m in FCMin<br />

which the grand sample mean x is a unique optimizer<br />

of the FCMobjective functi<strong>on</strong>. In this paper,<br />

we will show that FCS with the different cluster<br />

kernel characteristic can avoid the situati<strong>on</strong> in<br />

which x is a unique optimizer of the FCS objective<br />

functi<strong>on</strong>. We also studied the optimality tests.<br />

These results will be used as the parameter selecti<strong>on</strong><br />

for FCS. The paper is organized as follows.<br />

In Secti<strong>on</strong> 2, the (crisp) <strong>scatter</strong> <strong>matrix</strong> is extended<br />

to the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <strong>scatter</strong> <strong>matrix</strong>. Some <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s<br />

<str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> the within-<strong>scatter</strong> <strong>matrix</strong> are then<br />

reviewed. In Secti<strong>on</strong> 3, we propose the <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

<str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> withinand<br />

between-<strong>scatter</strong> <strong>matrix</strong>. Secti<strong>on</strong> 4 gives our<br />

theoretical analysis <strong>on</strong> the optimality tests and<br />

the FCS parameter selecti<strong>on</strong>. Secti<strong>on</strong> 5 gives the<br />

robust properties of FCS <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> the gross error<br />

sensitivity and influence functi<strong>on</strong>. Some numerical<br />

examples are presented in Secti<strong>on</strong> 6. C<strong>on</strong>clusi<strong>on</strong>s<br />

are made in Secti<strong>on</strong> 7.<br />

2. Clustering <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a within-cluster<br />

<strong>scatter</strong> <strong>matrix</strong><br />

Let X ={x 1 ,...,x n } be a data set in an s-dimensi<strong>on</strong>al<br />

Euclidean space R s and let c be a positive<br />

integer larger than <strong>on</strong>e. A partiti<strong>on</strong> of X into c<br />

clusters can be presented using mutually disjoint<br />

sets X 1 ,...,X c such that X 1 [[X c = X or


K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 641<br />

equivalently using the indicator functi<strong>on</strong>s l 1 ,...,l c<br />

such that l ij = l i (x j )=1 if x j 2 X i and l ij =<br />

l i (x j )=0 if x j 62 X i . Let the sample mean of the<br />

ith cluster be<br />

a i ¼ X P n<br />

x j j¼1<br />

¼<br />

l ijx j<br />

P n<br />

x j2X i<br />

n i j¼1 l ;<br />

ij<br />

i ¼ 1; ...; c; j ¼ 1; ...; n;<br />

ð1Þ<br />

where n i is the number of data points in X i . Let the<br />

grand mean be x ¼ P n<br />

j¼1 x j=n. The total <strong>scatter</strong> <strong>matrix</strong><br />

S T for the data set X can then be decomposed<br />

into a within cluster <strong>scatter</strong> <strong>matrix</strong> S W and a between-cluster<br />

<strong>scatter</strong> <strong>matrix</strong> S B with S T = S W + S B<br />

where<br />

S T ¼ Xc X<br />

ðx j xÞðx j xÞ t<br />

x j2X i<br />

i¼1<br />

¼ Xc<br />

i¼1<br />

S W ¼ Xc<br />

i¼1<br />

¼ Xc<br />

S B ¼ Xc<br />

i¼1<br />

i¼1<br />

¼ Xc<br />

i¼1<br />

X n<br />

j¼1<br />

X<br />

l ij ðx j xÞðx j xÞ t ; ð2Þ<br />

x j2X i<br />

ðx j a i Þðx j a i Þ t<br />

X n<br />

j¼1<br />

X<br />

l ij ðx j a i Þðx j a i Þ t ; ð3Þ<br />

x j2X i<br />

ða i xÞða i xÞ t<br />

X n<br />

j¼1<br />

l ij ða i xÞða i xÞ t : ð4Þ<br />

Duda and Hart (1973) noted that the determinant<br />

jS W j of a within-cluster <strong>scatter</strong> <strong>matrix</strong> could<br />

be a criteri<strong>on</strong> functi<strong>on</strong> for <str<strong>on</strong>g>clustering</str<strong>on</strong>g>. jS W j can be<br />

interpreted as the square of the <strong>scatter</strong> volume.<br />

On the basis of jS W j, Rouseeuw et al. (1996) created<br />

the ‘‘so-called’’ SAND <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>.<br />

Let tr(A) denote the trace of a <strong>matrix</strong> A. The<br />

trace tr(S W ) of a within-cluster <strong>scatter</strong> <strong>matrix</strong> can<br />

be used to measure the compactness with a within-cluster<br />

variati<strong>on</strong>. It is reas<strong>on</strong>able to use tr(S W )<br />

as a <str<strong>on</strong>g>clustering</str<strong>on</strong>g> objective functi<strong>on</strong>. The k-means<br />

and most <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s are created by minimizing<br />

the objective functi<strong>on</strong> <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> tr(S W ). Because<br />

tr(S T ) = tr(S W ) + tr(S B ) for a given data set<br />

X and tr(S T ) is independent of the cluster center<br />

a i , the minimizati<strong>on</strong> of tr(S W ) will be equivalent<br />

to the maximizati<strong>on</strong> of tr(S B ). However, we know<br />

that the quantity tr(S B ) can be used as a measure<br />

of compactness. Thus, building an objective functi<strong>on</strong><br />

such that it can be optimized with minimizing<br />

tr(S W ) and simultaneously with maximizing tr(S B )<br />

should be sense work and the main goal of this<br />

paper.<br />

Since Zadeh (1965) introduced the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> set<br />

c<strong>on</strong>cept, researches <strong>on</strong> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> have<br />

been widely investigated (Bezdek, 1981; Yang,<br />

1993). In <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g>, FCMis a most used<br />

<str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />

P<br />

<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. Suppose that l ij 2 [0, 1] with<br />

c<br />

i¼1 l ij ¼ 1 for all j and m > 1 is a given real value.<br />

The FCMupdate equati<strong>on</strong> of <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> sample mean<br />

is with<br />

P n<br />

j¼1<br />

a i ¼<br />

lm ij x j<br />

; i ¼ 1; ...; c; j ¼ 1; ...; n: ð5Þ<br />

P n<br />

j¼1 lm ij<br />

Thus, we define the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> total <strong>scatter</strong> <strong>matrix</strong><br />

S FT , the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster <strong>scatter</strong> <strong>matrix</strong> S FW<br />

and the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> between-cluster <strong>scatter</strong> <strong>matrix</strong> S FB<br />

<strong>on</strong> the basis of the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> sample mean a i as<br />

S FT ¼ Xc X n<br />

l m ij ðx j xÞðx j xÞ t ; ð6Þ<br />

i¼1<br />

S FW ¼ Xc<br />

S FB ¼ Xc<br />

i¼1<br />

i¼1<br />

j¼1<br />

X n<br />

j¼1<br />

X n<br />

j¼1<br />

l m ij ðx j a i Þðx j a i Þ t ; ð7Þ<br />

l m ij ða i xÞða i xÞ t ; ð8Þ<br />

where l ij 2 [0, 1], P c<br />

i¼1 l ij ¼ 1 and m > 1. We know<br />

that<br />

S FT ¼ Xc X n<br />

l m ij ðx j a i þ a i xÞðx j a i þ a i xÞ t<br />

i¼1<br />

¼ Xc<br />

i¼1<br />

j¼1<br />

X n<br />

j¼1<br />

l m ij ½ðx j a i Þðx j a i Þ t þða i xÞða i xÞ t<br />

þðx j a i Þða i xÞ t þða i xÞðx j a i Þ t Š<br />

¼ S FW þ S FB þ Xc<br />

ða i xÞ t Xn<br />

l m ij ðx j a i Þ<br />

þ Xc<br />

i¼1<br />

ða i<br />

i¼1<br />

xÞ Xn<br />

j¼1<br />

j¼1<br />

l m ij ðx j a i Þ t :


642 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />

According to Eq. (5), we have P n<br />

j¼1 lm ij a i ¼<br />

P n<br />

j¼1 lm ij x j. Thus, we can create the property that<br />

S FT = S FW + S FB where a i is defined by Eq. (5).<br />

This property is exactly the same as S T = S W + S B<br />

for the (crisp) <strong>scatter</strong> <strong>matrix</strong>.<br />

Fuzzy <str<strong>on</strong>g>clustering</str<strong>on</strong>g>s including FCM(Bezdek,<br />

1981), alternative FCM(Wu and Yang, 2002;<br />

Yang et al., 2002), G–K (Gustafs<strong>on</strong> and Kessel,<br />

1979), SAND (Rouseeuw et al., 1996), MCV<br />

(Krishnapuram and Kim, 2000), and UFP-ONC<br />

(Gath and Geva, 1989), etc., are all <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong><br />

the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster <strong>scatter</strong> <strong>matrix</strong> S FW .<br />

It is known that the FCM<str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g><br />

is created by minimizing the objective<br />

functi<strong>on</strong><br />

J FCM ¼ trðS FW Þ¼ Xc<br />

i¼1<br />

X n<br />

j¼1<br />

with the membership update equati<strong>on</strong><br />

l ij ¼ ðkx j a i k 2 Þ 1<br />

m 1<br />

P c<br />

k¼1 ðkx j a k k 2 Þ 1<br />

m 1<br />

l m ij kx j a i k 2 ; ð9Þ<br />

ð10Þ<br />

and the cluster center update Eq. (5). Although<br />

tr(S FT ) = tr(S FW ) + tr(S FB ) for a given data set X,<br />

tr(S FT ) is not a fixed c<strong>on</strong>stant but depends <strong>on</strong> l ij .<br />

Thus, to minimize tr(S FW ), it is not necessary to<br />

maximize tr(S FB ). The trace tr(S FB ) of a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> set<br />

between the cluster <strong>scatter</strong> <strong>matrix</strong> can be interpreted<br />

as a separati<strong>on</strong> with a cluster variati<strong>on</strong> in<br />

between. A maximum value of tr(S FB ) will induce<br />

a <str<strong>on</strong>g>clustering</str<strong>on</strong>g> result with separated (distinguishable)<br />

clusters. In the next secti<strong>on</strong>, an <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> that c<strong>on</strong>siders<br />

tr(S FW ) and tr(S FB ) simultaneously is<br />

introduced.<br />

3. A proposed <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong><br />

tr(S FW ) and tr(S FB )<br />

Fukuyama and Sugeno (1989), Sugeno et al.<br />

(1993) had used tr(S FW ) and tr(S FB ) to create an<br />

index FS(c) in the cluster validity problem where<br />

Pal and Bezdek (1995) gave more discussi<strong>on</strong>s in<br />

cluster validity for FCM. The validity index<br />

FS(c) was formed with<br />

FSðcÞ ¼trðS FW Þ<br />

¼ Xc<br />

i¼1<br />

X c<br />

i¼1<br />

X n<br />

j¼1<br />

X n<br />

trðS FB Þ<br />

l m ij kx j a i k 2<br />

j¼1<br />

l m ij ka i xk 2 : ð11Þ<br />

A small FS(c) value will induce a good <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

<str<strong>on</strong>g>clustering</str<strong>on</strong>g> result with a small <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster<br />

variati<strong>on</strong> tr(S FW ) and a large <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> between-cluster<br />

variati<strong>on</strong> tr(S FB ). This will help us find a good<br />

cluster number estimate. It is reas<strong>on</strong>able to have<br />

a <str<strong>on</strong>g>clustering</str<strong>on</strong>g> objective functi<strong>on</strong> c<strong>on</strong>taining a measure<br />

of within- and between-cluster variati<strong>on</strong>s such<br />

as the FS(c) index. However, there does not exist<br />

an update equati<strong>on</strong> for a i by differentiating FS(c)<br />

with respect to a i . Thus, FS(c) cannot be a <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />

objective functi<strong>on</strong>.<br />

In this secti<strong>on</strong>, we propose a <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />

objective functi<strong>on</strong> which is a modificati<strong>on</strong><br />

of the FS(c) index. This could be also a generalizati<strong>on</strong><br />

of the FCMobjective functi<strong>on</strong> by combining<br />

<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within- and between-cluster variati<strong>on</strong>s. Our<br />

goal is to minimize a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster variati<strong>on</strong><br />

tr(S FW ) and also simultaneously maximize a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

between-cluster variati<strong>on</strong> tr(S FB ). We call this a<br />

<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> compactness and separati<strong>on</strong> (FCS) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>,<br />

because the compactness is measured using<br />

a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within variati<strong>on</strong> and the separati<strong>on</strong> is<br />

measured using a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> between variati<strong>on</strong>. Thus,<br />

the FCS objective functi<strong>on</strong> J FCS is defined as<br />

J FCS ¼ Xc X n<br />

l m ij kx j a i k 2<br />

i¼1 j¼1<br />

X c X n<br />

i¼1<br />

j¼1<br />

g i l m ij ka i xk 2 ; ð12Þ<br />

where g i P 0. Note that, J FCS = J FCM when g i =0<br />

and J FCS = FS(c) when g i = 1. By minimizing J FCS<br />

we have the following update equati<strong>on</strong>s:<br />

ðkx j a i k 2 g<br />

l ij ¼<br />

i ka i xk 2 m<br />

Þ 1<br />

1<br />

P c<br />

k¼1 ðkx ð13Þ<br />

j a k k 2 g k ka k xk 2 m<br />

Þ 1<br />

1<br />

and<br />

P n<br />

j¼1<br />

a i ¼<br />

lm ij x j<br />

P n<br />

j¼1 lm ij<br />

P n<br />

g i j¼1 lm ij<br />

P x<br />

g n ; ð14Þ<br />

i j¼1 lm ij


K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 643<br />

where the parameter g i could be set up with<br />

g i ¼ ðb=4Þmin i 0 6¼ika i a i 0k 2<br />

; 0 6 b 6 1:0: ð15Þ<br />

max k ka k xk 2<br />

In <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g>, we restrict l ij 2 [0,1]. Because<br />

l ij in Eq. (13) might be negative for some data<br />

point x j , we make some restricti<strong>on</strong>s <strong>on</strong> it. For a<br />

given data point x j ,ifkx j a i k 2 6 g i ka i xk 2 , then<br />

l ij = 1, and l i 0 j ¼ 0, for all i 0 5 i. That is, if the distance<br />

between the data points and the ith cluster<br />

center are smaller than g i ka i xk 2 , these data<br />

points will then bel<strong>on</strong>g exactly to the ith cluster with<br />

membership value of <strong>on</strong>e. Each cluster in FCS will<br />

have a crisp boundary such that all data points inside<br />

this boundary will have a crisp membership<br />

value l ij 2 {0,1} and other data points outside this<br />

boundary will have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> membership values<br />

l ij 2 [0, 1]. Each crisp boundary will form a hyperball<br />

for the corresp<strong>on</strong>ding cluster and can be seen<br />

as a cluster kernel. Fig. 1 shows a two-cluster data<br />

set in which each cluster c<strong>on</strong>tains a cluster center<br />

and a cluster kernel. The volume of each cluster kernel<br />

is decided by the term g i ka i xk 2 . Data points<br />

outside the kernel will have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> memberships.<br />

Note that Özdemir and Akarun (2002) proposed<br />

the partiti<strong>on</strong> index maximizati<strong>on</strong> (PIM) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g><br />

that uses a fixed volume of cluster kernels for each<br />

cluster. In our FCS, the volume of the kernel for<br />

each cluster is different. This FCS characteristic<br />

could catch more informati<strong>on</strong> from data according<br />

to different volumes with different shapes.<br />

In the k-means <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>, each data point has a<br />

crisp membership value with l ij 2 {0,1}. Although<br />

cluster center<br />

<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> boundary<br />

cluster kernel<br />

Fig. 1. Clusters obtained by FCS.<br />

crisp boundary<br />

FCMallows data points to have membership values<br />

l ij in the interval [0,1], it has fewer crisp membership<br />

values (i.e. zero or <strong>on</strong>e) in FCM, even<br />

when the data points are very close to any <strong>on</strong>e of<br />

these c cluster centers. The memberships seem to<br />

be too <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> in FCM. In our FCS, crisp and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

membership values co-exist. Data points that fall<br />

inside any <strong>on</strong>e of c-cluster kernels (i.e. close to<br />

any <strong>on</strong>e of c cluster centers) will have crisp memberships<br />

and those outside the cluster kernels (i.e.<br />

far away from all cluster centers) will have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

membership values. To guarantee that no two of<br />

these c cluster kernels will overlap, g i is chosen<br />

as Eq. (15) such that the parameter b will c<strong>on</strong>trol<br />

the size of each kernel. Since, for all i,<br />

0<br />

1<br />

g i ka i<br />

xk 2 ¼ðb=4Þðminka i<br />

i 0 6¼i<br />

6 b min<br />

i 0 6¼i<br />

a i 0k 2 ka i xk 2<br />

Þ@<br />

A<br />

maxka k xk 2<br />

k<br />

ka i a i 0k<br />

! 2<br />

2<br />

ka i a i 0k<br />

2<br />

6 min<br />

i 0 6¼i 2<br />

for all 0 6 b 6 1;<br />

we have, for a given data point x j with l ij =1,<br />

2<br />

kx j a i k 2 6 g i ka i xk 2 ka i a i 0k<br />

6 min<br />

:<br />

i 0 6¼i 2<br />

Note that the value of ka i a i 0k=2 is a half the<br />

distance between cluster centers a i and a i 0. Thus,<br />

Eq. (15) should guarantee that no two of these c<br />

cluster kernels will overlap. If b = 1.0, the FCS<br />

<str<strong>on</strong>g>algorithm</str<strong>on</strong>g> will cluster the data set using the largest<br />

kernel for each cluster. If b = 0 (i.e. g i = 0), the<br />

FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> will cluster the data set with no<br />

cluster kernel. This will be equivalent to the<br />

FCM<str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>.<br />

The cluster center update Eq. (14) in FCS can<br />

be interpreted as a weighted mean of the cluster<br />

center update equati<strong>on</strong> of FCMand the grand<br />

mean x. We can rewrite Eq. (14) as<br />

P n<br />

j¼1<br />

a i ¼<br />

lm ij x j= P n<br />

j¼1 lm ij<br />

g i x<br />

: ð16Þ<br />

1 g i<br />

The weights of the FCMcluster center and the<br />

grand mean x are 1 and g i , respectively. To maximize<br />

tr(S FB ), the cluster centers obtained by FCS


644 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />

will move away from x with a corresp<strong>on</strong>ding<br />

weighted term g i . Since g i is a m<strong>on</strong>ot<strong>on</strong>e increasing<br />

functi<strong>on</strong> of b, the cluster centers obtained by FCS<br />

with a large b value will be more far away from x<br />

than the centers obtained with a small b value. The<br />

proposed FCS <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> is summarized<br />

as follows:<br />

FCS Algorithm 1. (see also Yang et al.<br />

(2003)) Set the iterati<strong>on</strong> counter ‘ = 0 and choose<br />

the initial values a ð0Þ<br />

i , i =1,...,c. Given b, e >0<br />

Step 1. Find g ð‘þ1Þ<br />

i using (15)<br />

Step 2. Find l ð‘þ1Þ<br />

ij using (13)<br />

Step 3. Find a ð‘þ1Þ<br />

i using (14)<br />

Increment ‘; until max i ka ð‘þ1Þ<br />

i<br />

a ð‘Þ<br />

i k < e.<br />

We now use a simple example to illustrate the<br />

FCS properties. This is a sampling data set drawn<br />

from a <strong>on</strong>e-dimensi<strong>on</strong>al normal mixture (1/<br />

3)N(0, 1) + (1/3)N(4, 1) + (1/3)N(10, 1) with three<br />

populati<strong>on</strong>s of means 0, 4 and 10. The histogram<br />

of this data set is shown in Fig. 2(a). Fig. 2(b)<br />

shows the FCMmembership functi<strong>on</strong> that bel<strong>on</strong>gs<br />

to the mean 0 cluster. Fig. 2(c)–(f) are the FCS<br />

membership functi<strong>on</strong>s that bel<strong>on</strong>g to the mean 0<br />

cluster with the values 0.9, 0.5, 0.1 and 0.05 of<br />

the parameter b. The cluster kernel volumes always<br />

decrease when b is decreasing. The volume<br />

of the cluster kernel can also be presented using<br />

the range with the value of <strong>on</strong>e of the cluster membership<br />

functi<strong>on</strong>s. In our 3-cluster example, Fig.<br />

2(b)–(f) draw <strong>on</strong>ly the membership functi<strong>on</strong> of<br />

the mean 0 cluster. Thus, the range with the membership<br />

value <strong>on</strong>e presents the volume of the mean<br />

0 cluster kernel, and the range with the membership<br />

value zero presents the volumes of the means<br />

4 and 10 cluster kernels. The crisp membership<br />

values and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> membership values co-exist in<br />

FCS. When b = 0, the FCS membership values will<br />

be equivalent to FCMwith no existing cluster kernel.<br />

The three cluster centers obtained using FCM<br />

and FCS are shown in Table 1. When b decreases,<br />

the cluster centers obtained by FCS will then be<br />

closer to the grand mean x and also closer to the<br />

cluster centers obtained by the FCM. These results<br />

coincide with the property shown in Eq. (15).<br />

Histogram<br />

FCM<br />

FCS<br />

beta=0.9<br />

15<br />

1.0<br />

1.0<br />

Frequency<br />

10<br />

5<br />

membership<br />

0.5<br />

membership<br />

0.5<br />

0<br />

0.0<br />

0.0<br />

0 5 10 15<br />

0 5 10<br />

0 5 10<br />

data set<br />

data set<br />

data set<br />

(a) (b) (c)<br />

FCS<br />

beta=0.5<br />

FCS<br />

beta=0.1<br />

FCS<br />

beta=0.05<br />

1.0<br />

1.0<br />

1.0<br />

membership<br />

0.5<br />

membership<br />

0.5<br />

membership<br />

0.5<br />

0.0<br />

0.0<br />

0.0<br />

0 5 10<br />

0 5 10<br />

0 5 10<br />

data set<br />

data set<br />

data set<br />

(d) (e) (f)<br />

Fig. 2. Membership functi<strong>on</strong>s of FCM and FCS.


K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 645<br />

Table 1<br />

Cluster centers obtained by FCS<br />

Beta<br />

Cluster centers<br />

0.900 0.366 3.667 12.319<br />

0.500 0.281 3.687 11.221<br />

0.200 0.164 3.714 10.557<br />

0.100 0.111 3.726 10.356<br />

0.050 0.083 3.733 10.256<br />

0 (FCM) 0.052 3.740 10.156<br />

Although the FCMis a popular <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>,<br />

the cluster centers obtained by it will be closer<br />

to the grand mean x when the data set heavily<br />

overlaps. According to the FCS properties, the<br />

cluster centers obtained by it can be more accurate<br />

than the FCMin some situati<strong>on</strong>s. This will<br />

be illustrated in Secti<strong>on</strong> 5. In the next secti<strong>on</strong>,<br />

more details about the FCS parameters are<br />

presented.<br />

4. Optimality tests and parameter selecti<strong>on</strong> of FCS<br />

In general cases, when the data set is clustered<br />

into c (c > 1) subsets, each subset is often expected<br />

to have a different prototype (or cluster center)<br />

than the others. However, the grand sample center<br />

x of the data set is always a fixed point in the FCS<br />

<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. The FCS output will be x with a great<br />

probability if x is <strong>on</strong>e stable soluti<strong>on</strong> in the FCS<br />

<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. To avoid such cases, we hope that x is<br />

not an attracted point in the FCS. How do we<br />

judge if x is attracted or a stable point of FCS?<br />

The Hessian <strong>matrix</strong> of the FCS objective functi<strong>on</strong><br />

(12) must be studied. In order to simplify<br />

the calculati<strong>on</strong>s, substituting (13) into (12) yields<br />

(17)<br />

J ¼ Xn<br />

j¼1<br />

X c<br />

i¼1<br />

ðkx j a i k 2 g i ka i xk 2 Þ 1<br />

m 1! 1 m<br />

:<br />

ð17Þ<br />

It can be proved that J = min l J FCS . Therefore,<br />

it is enough to judge whether or not x is attracted<br />

or is a stable point in FCS using the Hessian<br />

<strong>matrix</strong> of (17). Let us set<br />

qðx j ; a i Þ¼kx j a i k 2 g i ka i xk 2 ;<br />

S j ¼ Xc<br />

i¼1<br />

m<br />

qðx j ; a i Þ 1<br />

1<br />

; l ij ¼ qðx m<br />

j; a i Þ 1<br />

1<br />

S j<br />

and<br />

h ij ¼ l m ij ½ð1 g iÞa i ðx j g i xÞŠ:<br />

We know that<br />

ð1Þ oJ<br />

oa i<br />

¼ 2 Xn<br />

ð2Þ<br />

j¼1<br />

o 2 J<br />

¼<br />

4m<br />

oa i oa k m 1<br />

þ 2d ik<br />

l m ij ½ð1 g iÞa i ðx j g i xÞŠ:<br />

X n<br />

j¼1<br />

X n<br />

j¼1<br />

ðS j Þ m<br />

l m ij ð1<br />

1 h ij ðh kj Þ t<br />

!<br />

g iÞ I ss<br />

4m<br />

m 1 d X n<br />

ik ½qðx j ; a i ÞŠ 1 h ij ½ð1<br />

j¼1<br />

8i; 8j<br />

g i Þa i<br />

ðx j g i xÞŠ t : ð18Þ<br />

Therefore, the sec<strong>on</strong>d-order term of TaylorÕs<br />

series expansi<strong>on</strong> of (17) cab be expressed as<br />

follows:<br />

<br />

o 2 <br />

!<br />

J<br />

u<br />

oa i oa a ¼ 4m<br />

2<br />

X n X<br />

ðS j Þ m 1 c<br />

l m ij<br />

k m 1<br />

ut a i<br />

½ð1 g i Þa i ðx j g i xÞŠ<br />

j¼1<br />

i¼1<br />

!<br />

þ2 Xc<br />

ð1 g i Þ Xn<br />

l m 4m X c X n<br />

ij ut a i<br />

u ai<br />

u t a<br />

m 1<br />

i<br />

l m ð1 g i Þa i ðx j g i xÞð1 g i Þa i ðx j g i xÞ t<br />

ij<br />

u<br />

qðx<br />

i¼1<br />

j¼1<br />

i¼1 j¼1<br />

j ;a i Þ<br />

ai<br />

:<br />

ð19Þ<br />

u t a


646 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />

ðx j xÞðx j xÞ t<br />

If "i, a i ¼ x, C b X ¼ P n<br />

j¼1<br />

, and "j,<br />

nkx j xk 2 Þ<br />

kx j xk > 0, then we get the following equati<strong>on</strong><br />

<br />

u t o 2 <br />

J a¼ða1<br />

a<br />

u<br />

oa i oa a<br />

k<br />

;a 2 ;...;a cÞ where a<br />

i¼x;8i<br />

P<br />

¼<br />

4m<br />

c<br />

X n i¼1 u t 2<br />

a i<br />

ðx j xÞ<br />

cðm 1Þ kx j xk 2<br />

þ 2n<br />

c m X c<br />

i¼1<br />

j¼1<br />

<br />

<br />

ð1 g i Þu t 2m C X<br />

a i<br />

I ss u<br />

ðm 1Þ ð1 g i Þ<br />

ai<br />

:<br />

ð20Þ<br />

For simplifying the analysis, we assume that "i,<br />

g i = g and "j, "k, q(x j ,a k ) > 0. In this way, we can<br />

ignore the update equati<strong>on</strong>s for g i to reduce the<br />

complexity of the analysis <strong>on</strong> FCS. Thus, Eq.<br />

(19) turns into (21) as follows:<br />

A good <str<strong>on</strong>g>clustering</str<strong>on</strong>g> method should have the robust<br />

ability to tolerate noise and outliers. In this<br />

secti<strong>on</strong>, we use the gross error sensitivity and influence<br />

functi<strong>on</strong> (Huber, 1981) to show that our<br />

weighted cluster center update equati<strong>on</strong> is robust<br />

to noise and outliers. Let {x 1 ,...,x n } be an<br />

observed data set of real numbers and h is an unknown<br />

parameter to be estimated. An M-estimator<br />

(Huber, 1981) is generated by minimizing the<br />

form<br />

X n<br />

j¼1<br />

qðx j ; hÞ;<br />

ð22Þ<br />

where q is an arbitrary functi<strong>on</strong> that can measure<br />

the loss of x j and h. Here, we are interested in a<br />

locati<strong>on</strong> estimate that minimizes<br />

X n<br />

j¼1<br />

qðx j hÞ ð23Þ<br />

u t a<br />

<br />

o 2 J<br />

oa i oa k<br />

<br />

u a ¼<br />

! 4m<br />

2<br />

X n X<br />

ðS j Þ m 1 c<br />

l m ij<br />

m 1<br />

ut a i<br />

½ð1 gÞa i ðx j gxÞŠ<br />

j¼1<br />

i¼1<br />

þ 2ð1<br />

gÞ Xc<br />

i¼1<br />

X n<br />

j¼1<br />

l m ij ut a i<br />

I<br />

2m X n<br />

m 1<br />

j¼1<br />

l m ij ½ða !<br />

i x j Þ gða i xÞŠ½ða i x j Þ gða i xÞŠ t<br />

ð1 gÞqðx j ; a i Þ P n<br />

u<br />

j¼1 lm ai<br />

:<br />

ij<br />

ð21Þ<br />

From Eq. (21), we know that if g approaches<br />

negative infinity, then any FCS soluti<strong>on</strong> will be stable.<br />

This is an unacceptable result. Similarly, if g<br />

approaches positive infinity, any FCS soluti<strong>on</strong> will<br />

be unstable. This is also unacceptable. Therefore, we<br />

can roughly set the range for g with 1


K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 647<br />

uðx hÞ<br />

ICðx; F ; hÞ ¼R u0 ðx hÞdF X ðxÞ ; ð25Þ<br />

where F X (x) denotes the distributi<strong>on</strong> functi<strong>on</strong> of<br />

X. If the influence functi<strong>on</strong> of an estimator is unbounded,<br />

a noise or outliers might cause trouble.<br />

Identically, if the u functi<strong>on</strong> of an estimator is unbounded,<br />

noise and outliers will cause trouble.<br />

Many important robustness measures can be observed<br />

from the influence functi<strong>on</strong>. One of the<br />

important measures is the gross error sensitivity<br />

c , defined by<br />

c ¼ sup jICðx; F ; hÞj:<br />

ð26Þ<br />

x<br />

This quantity can interpret the worst approximate<br />

influence that the additi<strong>on</strong> of an infinitesimal point<br />

mass can have <strong>on</strong> the value of the associated<br />

estimator.<br />

Let the loss between the data point x j and ith<br />

cluster center a i be<br />

qðx j a i Þ¼l m ij kx j a i k 2 g i l m ij ka i xk 2 ð27Þ<br />

and<br />

uðx j a i Þ¼ o qðx j a i Þ<br />

oa i<br />

¼ 2l m ij ðx j a i Þ 2g i l m ij ða i xÞ: ð28Þ<br />

By solving the equati<strong>on</strong> P n<br />

j¼1 uðx j a i Þ¼0, we<br />

have the result shown in Eq. (14). Thus, the FCS<br />

cluster center is an M-estimator with the loss functi<strong>on</strong><br />

(27) and u functi<strong>on</strong> (28). Note that, the u<br />

functi<strong>on</strong> of our estimator is a functi<strong>on</strong> of l m ij which<br />

depends <strong>on</strong> the fuzzifier m. We will show that the<br />

FCS cluster center is robust to noise and outliers<br />

when m is large.<br />

For a given data set, we will produce the FCS<br />

cluster centers {a 1 ,...,a c }, the parameters<br />

{g 1 ,...,g c } and the sample mean x by processing<br />

the FCS <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. The relative influence<br />

(the u functi<strong>on</strong>) of an individual observati<strong>on</strong><br />

x toward the ith cluster center can be defined as<br />

uðx a i Þ¼ 2ðl i ðxÞÞ m ½ðx a i Þ g i ða i xÞŠ;<br />

ð29Þ<br />

where<br />

l i ðxÞ ¼<br />

ðkx a i k 2 g i ka i xk 2 m<br />

Þ 1<br />

1<br />

P c<br />

k¼1 ðkx a : ð30Þ<br />

kk 2 g k ka k xk 2 m<br />

Þ 1<br />

1<br />

Note that, l i (x)=1ifkx a i k 2 6 g i ka i xk 2 . Suppose<br />

that x 2 R. For an extremely large or small x,<br />

we will have kx a i k 2 P g i ka i xk 2 and hence<br />

l i (x) 2 (0, 1). In general, the extremely large or<br />

small x will fall outside any <strong>on</strong>e of c cluster kernels<br />

and have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> membership value l i (x) 2 (0, 1) or<br />

more precisely, l i (x) will be very closed to 1/c<br />

and hence l i (x) m will be very closed to zero when<br />

m is large. Although ðx a i Þ g i ða i xÞ is a m<strong>on</strong>ot<strong>on</strong>e<br />

increasing functi<strong>on</strong> of x, ðl i ðxÞÞ m ½ðx a i Þ<br />

g i ða i xÞŠ will be very closed to zero for an extremely<br />

large or small x when m is large. Thus, the u<br />

functi<strong>on</strong> of the FCS cluster center will be bounded<br />

for an extremely large or small x when m tends to<br />

infinity. Therefore, for a large m case, <strong>on</strong>ly the<br />

data point x inside the ith cluster kernel with<br />

l i (x) = 1 will have an influence <strong>on</strong> the ith FCS<br />

cluster center a i and the data point x falls <strong>on</strong> the<br />

crisp boundary of the ith cluster center will have<br />

the gross error sensitivity c < 1. The above discussi<strong>on</strong>s<br />

give a theoretical foundati<strong>on</strong> to asseverate<br />

the FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> to be robust to noise and<br />

outlier when m is large. The following is a simple<br />

example.<br />

For the data set shown in Fig. 2(a), we implement<br />

FCS with b = 0.1 and m = 2 to produce the<br />

set of {a 1 ,a 2 ,a 3 }, {g 1 ,g 2 ,g 3 }andx. For a given<br />

data point x, the membership functi<strong>on</strong> l 1 (x) (Eq.<br />

(30)) is illustrated in Fig. 3(a) and the u functi<strong>on</strong><br />

uðx a 1 Þ¼ 2ðl 1 ðxÞÞ 2 ½ðx a 1 Þ g 1 ða 1 xÞŠ<br />

is illustrated in Fig. 3(d). When m = 2, an extremely<br />

large or small x will have a large influence<br />

<strong>on</strong> a 1 as shown in Fig. 3(d). Equivalently, we<br />

implement FCS with m = 3 and m = 4 to have<br />

other sets of {a 1 ,a 2 ,a 3 } and {g 1 ,g 2 ,g 3 } to illustrate<br />

the l 1 (x) inFig. 3(b) and (c), respectively. The corresp<strong>on</strong>ding<br />

u functi<strong>on</strong>s are illustrated in Fig. 3(e)<br />

and (f). Fig. 3 shows that the influence of an extremely<br />

large or small x will become small when m increases.<br />

These numerical results coincide to our<br />

theoretical analysis.<br />

6. Numerical examples<br />

We implement the Normal-4 data set which was<br />

proposed by Pal and Bezdek (1995). Normal-4 is a<br />

four-dimensi<strong>on</strong>al data set with the sample size


648 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />

FCS<br />

beta=0.1, m=2<br />

FCS<br />

beta=0.1, m=3<br />

FCS<br />

beta=0.1, m=4<br />

1.0<br />

1.0<br />

1.0<br />

membership<br />

0.5<br />

membership<br />

0.5<br />

membership<br />

0.5<br />

0.0<br />

0.0<br />

0.0<br />

0<br />

10<br />

20 30 40 50<br />

x<br />

0 10 20 30 40 50<br />

x<br />

0 10 20 30<br />

x<br />

(a) (b) (c)<br />

40<br />

50<br />

4<br />

The phi functi<strong>on</strong> of FCS<br />

beta=0.1, m=2<br />

4<br />

The phi functi<strong>on</strong> of FCS<br />

beta=0.1, m=3<br />

4<br />

The phi functi<strong>on</strong> of FCS<br />

beta=0.1, m=4<br />

2<br />

2<br />

2<br />

phi functi<strong>on</strong><br />

0<br />

-2<br />

-4<br />

phi functi<strong>on</strong><br />

0<br />

-2<br />

-4<br />

phi functi<strong>on</strong><br />

0<br />

-2<br />

-4<br />

-6<br />

-6<br />

-6<br />

-8<br />

0 10 20 30 40 50<br />

x<br />

-8<br />

0 10 20 30 40 50<br />

x<br />

-8<br />

0 10 20 30 40 50<br />

x<br />

(d) (e) (f)<br />

Fig. 3. Membership functi<strong>on</strong>s (Eq. (30)) and phi functi<strong>on</strong>s (Eq. (29)) for a given data point x when m = 2, 3 and 4.<br />

n = 800 points c<strong>on</strong>sisted of 200 points from each<br />

of four clusters. The populati<strong>on</strong> mean vectors<br />

are l 1 = (3,0,0,0), l 2 = (0, 3,0,0), l 3 = (0, 0,3,0)<br />

P<br />

and l 4 = (0, 0,0,3). The covariance matrices are<br />

i ¼ I 4, i = 1, 2, 3, 4. We use the mean vectors<br />

as the initial values for both FCMand FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s.<br />

We compare FCMand FCS with the mean<br />

squared error (MSE) criteri<strong>on</strong>. The MSE is calculated<br />

by P 4<br />

i¼1 ka i l i k 2 . We implement the FCS<br />

<str<strong>on</strong>g>algorithm</str<strong>on</strong>g> with different values combinati<strong>on</strong>s of b<br />

for (0,0.005, 0.01, 0.05,0.1, 0.2) and m for<br />

(1.5,2,2.5,3,3.5). For 100 repeated Normal-4 data<br />

sets, the MSE values are shown in Fig. 4(a). The<br />

result of b = 0 is equivalent to FCM. As b increases,<br />

the volume of the cluster kernel grows.<br />

When the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> index m is small, the results of<br />

FCS (b > 0) and FCM(b = 0) are similar. As m becomes<br />

larger, FCMgives larger MSE values than<br />

FCS. Thus, the FCS result is more insensitive to<br />

the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> index m than FCM, especially for<br />

b = 0.05, 0.1 and 0.2. In general, the MSE values<br />

will increase when m increases. Yu et al. (2004)<br />

gave a theoretical upper bound for m for FCM<br />

that the grand sample mean x will be a unique<br />

optimizer. For each combinati<strong>on</strong> of m and b, the<br />

range between worst and best MSE values am<strong>on</strong>g<br />

these 100 repeated MSE values are shown in Fig.<br />

4(b). The cases of m = 3.5 and b = 0,0.05, 0.01<br />

are the examples of the grand sample mean x being<br />

the unique optimizer. The range of MSE values for<br />

FCS with b = 0.05, 0.1 and 0.2 are still insensitive<br />

to the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> index m. One may argue that we can<br />

process FCMwith a small m value, say 1.5 or 2,<br />

to avoid the above defective of FCM. However,<br />

FCS are not <strong>on</strong>ly insensitive to the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> index m<br />

than FCM, but also more robust to noise and outliers<br />

than FCMwhen m is large. We will show this<br />

robust property of FCS in the following examples.<br />

Fig. 5 shows a two-cluster data set with unequal<br />

sample sizes. The <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results of FCS with<br />

m = 2 and b = 0,0.1,0.2 are shown in Fig. 5(a)–<br />

(c). Three figures show the similar results. The<br />

cluster centers are presented by the solid circle<br />

points. Note that, the distance between two cluster<br />

centers increases as b increases. This phenomen<strong>on</strong><br />

can be explained by the update Eq. (17) of FCS.


K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 649<br />

MSE<br />

10<br />

5<br />

0<br />

00.005<br />

0.01 0.05<br />

beta<br />

0.1<br />

0.2<br />

(a)<br />

1.5<br />

2.5<br />

m<br />

3.5<br />

range of MSE<br />

25<br />

20<br />

15<br />

10<br />

5<br />

0<br />

00.005<br />

0.01 0.05 0.1<br />

beta<br />

0.2<br />

(b)<br />

1.5<br />

2.5<br />

m<br />

3.5<br />

Fig. 4. MSE values for different combinati<strong>on</strong>s of beta and m.<br />

When m becomes larger (m = 6), the <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results<br />

of FCS with b = 0, 0.1 and 0.2 are shown in<br />

Fig. 5(d)–(f). The FCS with cluster kernels<br />

(b = 0.1 and 0.2) obtains better performance than<br />

FCM(b = 0) which clusters the data set without<br />

a cluster kernel. This shows that FCS with a large<br />

and suitable m value can detect unequal sample<br />

size clusters or is robust to the noise. Fig. 6 shows<br />

a two-cluster data set with <strong>on</strong>e outlying point<br />

whose coordinate is (100, 0). When m is large<br />

(m = 6), the results of FCS with b = 0.1 and<br />

0.2 are more robust to the outlier than FCM<br />

(b = 0). These robust properties of FCS can be<br />

explained using the FCS update equati<strong>on</strong>s. Let<br />

^l ¼ maxfl i1 ; ...; l in g, l 0 ij ¼ l ij=^l, j =1,...,n. We<br />

have<br />

80<br />

FCM<br />

m=2<br />

80<br />

FCS<br />

m=2, beta=0.1<br />

80<br />

FCS<br />

m=2, beta=0.2<br />

70<br />

70<br />

70<br />

60<br />

60<br />

60<br />

50<br />

50<br />

50<br />

40<br />

40<br />

40<br />

30<br />

30<br />

30<br />

20<br />

20<br />

20<br />

10<br />

10<br />

10<br />

10 20 30 40 50 60 70 80 90 100<br />

(a)<br />

10 20 30 40 50 60 70 80 90 100<br />

(b)<br />

10 20 30 40 50 60 70 80 90 100<br />

(c)<br />

80<br />

FCM<br />

m=6<br />

80<br />

FCS<br />

m=6, beta=0.1<br />

80<br />

FCS<br />

m=6, beta=0.2<br />

70<br />

70<br />

70<br />

60<br />

60<br />

60<br />

50<br />

50<br />

50<br />

40<br />

40<br />

40<br />

30<br />

30<br />

30<br />

20<br />

20<br />

20<br />

10<br />

10<br />

10<br />

10 20 30 40 50 60 70 80 90 100<br />

10 20 30 40 50 60 70 80 90 100<br />

10 20 30 40 50 60 70 80 90 100<br />

(d) (e) (f)<br />

Fig. 5. FCMand FCS <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results for unequal sample size data sets.


650 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />

2<br />

FCM<br />

m=6<br />

2<br />

FCS<br />

m=6, beta=0.1<br />

2<br />

FCS<br />

m=6, beta=0.2<br />

1<br />

1<br />

1<br />

0<br />

0<br />

0<br />

-1<br />

-1<br />

-1<br />

-2<br />

-2<br />

-2<br />

-3 -2 -1 0 1 2 3 4 5 6 7<br />

-3 -2 -1 0 1 2 3 4 5 6 7<br />

-3 -2 -1 0 1 2<br />

(a) (b) (c)<br />

3<br />

4<br />

5<br />

6<br />

7<br />

Fig. 6. FCMand FCS <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results for the two-clusters data set with <strong>on</strong>e outlier in the coordinate (100,0).<br />

P n<br />

lim fa j¼1<br />

ig¼ lim<br />

lm ij x P n<br />

j g i j¼1 lm ij<br />

P x<br />

n P<br />

m!1 m!1<br />

j¼1 lm ij g n<br />

i j¼1 lm ij<br />

P n P n<br />

j¼1<br />

¼ lim<br />

ðl0 ij Þm x j g i j¼1 ðl0 ij Þm x<br />

P n m!1<br />

j¼1 ðl0 ijÞ m P n<br />

g i j¼1 ðl0 ijÞ m<br />

P<br />

Pl<br />

¼<br />

0 ¼1x j g<br />

ij i l 0 ¼1x ij<br />

ð1 g i Þ P l 0 ¼11 ij<br />

P<br />

l<br />

¼<br />

0 ¼1x j= P <br />

l<br />

g<br />

ij ij1 0 i x<br />

1 g<br />

<br />

i<br />

P<br />

l ij ¼^l x j= P <br />

l ij ¼^l 1 g i x<br />

¼<br />

: ð31Þ<br />

1 g i<br />

In FCM, when m is large, l ij =1/c for all i, j and<br />

hence l ij ¼ ^l for all i, j. This is why FCMcould<br />

obtain results in which the sample mean x will be<br />

a unique optimizer when m is large. However,<br />

the data points inside the cluster kernels in FCS<br />

will have l ij 2 {0,1} and l ij 2 (0,1) for those data<br />

points outside cluster kernels. When m is large,<br />

the ith cluster center update Eq. (22) will give a<br />

large ðl 0 ij Þm ¼ðl ij =^lÞ m ¼ 1 for the data points inside<br />

the ith cluster kernel and will give a small<br />

ðl 0 ij Þm 0 for the data points outside the ith cluster<br />

kernel. When m is large, the ith cluster center update<br />

Eq. (22) will be the weighted mean of the sample<br />

mean of the data points inside the ith cluster<br />

kernel and the grand mean x. The sample mean<br />

data point weights inside the ith cluster kernel<br />

and the grand mean x are 1 and g i , respectively.<br />

For a suitable b value, noise and outliers will be<br />

outside the cluster kernels and their influences <strong>on</strong><br />

the <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results will be small when m is large.<br />

This explains the <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results shown in Figs.<br />

5 and 6 and also coincide to the theoretical analysis<br />

in Secti<strong>on</strong> 5. This property also provides a<br />

method to avoid the sample mean x being a unique<br />

optimizer in FCM.<br />

We know that when the sample mean x is the<br />

unique optimizer of a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>,<br />

the partiti<strong>on</strong> coefficient (PC) (Bezdek, 1974)<br />

defined by<br />

PCðCÞ ¼ 1 n<br />

X c<br />

i¼1<br />

X n<br />

j¼1<br />

l 2 ij<br />

ð32Þ<br />

will be equal to 1/c or equivalently the n<strong>on</strong>-<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

index (NFI) (Pal and Bezdek, 1995) defined by<br />

c<br />

NFIðcÞ ¼1 ð1 PCðCÞÞ ð33Þ<br />

c 1<br />

equals to zero. Note that Dave (1996) also proposed<br />

a modificati<strong>on</strong> of the PC index which is<br />

equivalent to the NFI index. According to the<br />

above analysis, we hope the FCS with cluster ker-<br />

NFI<br />

0.25<br />

0.20<br />

0.15<br />

0.10<br />

0.05<br />

0.00<br />

m=10<br />

0 0.05 0.1 0.15 0.2 0.5 0.99<br />

beta<br />

m=20<br />

Fig. 7. NFI(2) values for the unequal sample size data set<br />

shown in Fig. 5.


K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 651<br />

0.5<br />

m=1.5<br />

0.12<br />

m=2<br />

NFI<br />

0.4<br />

PIM<br />

NFI<br />

0.10<br />

0.08<br />

0.06<br />

FCS<br />

0.04<br />

0.3<br />

FCS<br />

0.02<br />

PIM<br />

0.00<br />

0 0.05 0.1 0.15 0.2 0.5 0.99<br />

beta, delta<br />

(a)<br />

0 0.05 0.1 0.15 0.2 0.5 0.99<br />

beta, delta<br />

(b)<br />

Fig. 8. NFI(11) values for the normalized Vowel data set in which both PIMand FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s are processed with the same<br />

parameter values.<br />

nels can avoid the situati<strong>on</strong> in which the sample<br />

mean is a unique optimizer of the FCS objective<br />

functi<strong>on</strong>. Fig. 7 presents the NFI (2) values of<br />

the data set shown in Fig. 5. The NFI values of<br />

FCS with cluster kernels (b > 0) are always larger<br />

than the NFI values of the FCM(b = 0) which is<br />

the case of the sample mean x being the unique<br />

optimizer with NFI = 0 when m = 10 and 20. This<br />

shows that the FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> can avoid the case<br />

of NFI = 0 and is robust to the noise and outliers<br />

than FCMwhen m is larger. Because the sample<br />

mean x of the data set shown in Fig. 6 will not<br />

be the unique optimizer of FCMand FCS when<br />

m is larger, we do not show their NFI values.<br />

Note that some properties of FCS discussed<br />

above can also be achieved by the partiti<strong>on</strong> index<br />

maximizati<strong>on</strong> (PIM) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> (Özdemir and<br />

Akarun, 2002) which used a fixed volume for all<br />

cluster kernels. The radius of each cluster volume<br />

in PIMis defined by<br />

a ¼ d minfmin ka i a 0<br />

i6¼i 0 ik=2g; 0 6 d 6 1: ð34Þ<br />

The NFI values of the normalized Vowel data set<br />

in the UCI Machine Learning Repository (Blake<br />

and Merz, 1998) of PIMand FCS are shown in<br />

Fig. 8. Yu et al. (2004) showed that when<br />

m > 1.7787, the sample mean x will be the unique<br />

optimizer of FCMfor the normalized Vowel data<br />

set in Blake and Merz (1998). InFig. 8(a), when<br />

m = 1.5, both PIMand FCS with different d and<br />

b values have the NFI index values larger than<br />

0.3. However, when m = 2 as shown in Fig. 8(b),<br />

the PIMgive the same NFI values as FCM<br />

(d =0 or b = 0). The use of the same volumes of<br />

the cluster kernels do not help PIMto have a larger<br />

NFI values than FCM. The same situati<strong>on</strong><br />

when m = 2 in FCS as shown in Fig. 8(b), the<br />

NFI values of FCS are always larger than FCM<br />

and PIM. Using the different cluster kernel volumes<br />

in FCS produces these good merits.<br />

7. C<strong>on</strong>clusi<strong>on</strong>s<br />

We proposed a <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g><br />

called the FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> which attempts to minimize<br />

the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster <strong>scatter</strong> <strong>matrix</strong> trace<br />

and simultaneously maximize the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> betweencluster<br />

<strong>scatter</strong> <strong>matrix</strong> trace. Each cluster obtained<br />

by the FCS will have a cluster kernel. Data points<br />

that fall inside any <strong>on</strong>e of the c cluster kernels will<br />

have crisp memberships and be outside all of the<br />

cluster kernels that have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> memberships.<br />

The volume of each cluster kernel is decided by<br />

the parameter g i which is a functi<strong>on</strong> of b. The crisp<br />

and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> memberships co-exist in the FCS. The<br />

cluster center update equati<strong>on</strong>s in the FCS can<br />

be interpreted as a weighted mean of the FCM<br />

cluster centers and the grand mean x. Numerical<br />

examples show that the FCS can have more accurate<br />

results in the parameter estimati<strong>on</strong> than the<br />

FCM. It also shows that FCS can help avoid the<br />

situati<strong>on</strong> where the sample mean x is a unique<br />

optimizer of FCMand is more robust to noise


652 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />

and outliers than FCMwhen m is large. A theoretical<br />

analysis of FCS was also investigated. Overall,<br />

the proposed FCS is recommended as a good<br />

<str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> when the most compact kernels<br />

are in the same cluster and most separated<br />

are in different clusters.<br />

Acknowledgement<br />

This work was supported in part by the Nati<strong>on</strong>al<br />

Science Council of Taiwan, ROC, under<br />

grant NSC-91-2118-M-033-001.<br />

References<br />

Bezdek, J.C., 1974. Cluster validity with <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> sets. J. Cybernet.<br />

3, 58–73.<br />

Bezdek, J.C., 1981. Pattern Recogniti<strong>on</strong> with Fuzzy Objective<br />

Functi<strong>on</strong> Algorithms. Plenum Press, New York.<br />

Blake, C.L., Merz, C.J., 1998. UCI repository of machine<br />

learning databases, a huge collecti<strong>on</strong> of artificial and realworld<br />

data sets. Available from: .<br />

Dave, R.N., 1996. Validating <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> partiti<strong>on</strong> obtained through<br />

c-shells <str<strong>on</strong>g>clustering</str<strong>on</strong>g>. Pattern Recogniti<strong>on</strong> Lett. 17, 613–623.<br />

Duda, R.O., Hart, P.E., 1973. Pattern Classificati<strong>on</strong> and Scene<br />

Analysis. Wiley, New York.<br />

Fukuyama, Y., Sugeno, M., 1989. A new method of choosing<br />

the number of clusters for <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> c-means method. In:<br />

Proceedings of the 5th Fuzzy System Symposium (in<br />

Japanese), pp. 247–250.<br />

Gath, J., Geva, A.B., 1989. Unsupervised optimal <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

<str<strong>on</strong>g>clustering</str<strong>on</strong>g>. IEEE Trans. Pattern Anal. Mach. Intell. 11,<br />

773–781.<br />

Gunders<strong>on</strong>, M., 1978. Applicati<strong>on</strong> of <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> ISODATA <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s<br />

to star tracker pointing systems. In: Proceedings of<br />

the 7th Triennial World IFCA C<strong>on</strong>g., Helsinki, Filind, pp.<br />

1319–1323.<br />

Gustafs<strong>on</strong>, D.E., Kessel, W.C., 1979. Fuzzy <str<strong>on</strong>g>clustering</str<strong>on</strong>g> with a<br />

<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> covariance <strong>matrix</strong>. In: Proceedings of the IEEE<br />

C<strong>on</strong>ference <strong>on</strong> Decisi<strong>on</strong> C<strong>on</strong>trol, San Diego, CA, pp.<br />

761–766.<br />

Huber, P.J., 1981. Robust Statistics. Wiley, New York.<br />

Jain, A.K., Dubes, R.C., 1988. In: Algorithm for Clustering<br />

Data. Prentice-Hall, Englewood Cliffs, NJ.<br />

Kaufman, L., Rousseeuw, P.J., 1990. Finding Groups in Data:<br />

An Introducti<strong>on</strong> to Cluster Analysis. Wiley, New York.<br />

Krishnapuram, R., Kim, J., 2000. Clustering <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s <str<strong>on</strong>g>based</str<strong>on</strong>g><br />

<strong>on</strong> volume criteria. IEEE Trans. Fuzzy Syst. 8, 228–236.<br />

Özdemir, D., Akarun, L., 2001. Fuzzy <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s for combined<br />

quantizati<strong>on</strong> and dithering. IEEE Trans. Image Processing<br />

10 (6), 923–931.<br />

Özdemir, D., Akarun, L., 2002. A <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> for color<br />

quantizati<strong>on</strong> of images. Pattern Recogniti<strong>on</strong> 35, 1785–1791.<br />

Pal, N.R., Bezdek, J.C., 1995. On cluster validity for <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

c-means model. IEEE Trans. Fuzzy Syst., 370–379.<br />

Rouseeuw, P.J., Kaufman, L., Trauwaert, E., 1996. Fuzzy<br />

<str<strong>on</strong>g>clustering</str<strong>on</strong>g> using <strong>scatter</strong> matrices. Comput. Statist. Data<br />

Anal. 23, 135–151.<br />

Sugeno, M., Yasukawa, T., 1993. A <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g>-logic-<str<strong>on</strong>g>based</str<strong>on</strong>g> approach<br />

to qualitative modeling. IEEE Trans. Fuzzy Syst. 1, 7–31.<br />

Wu, K.L., Yang, M.S., 2002. Alternative c-means <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />

<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. Pattern Recogniti<strong>on</strong> 35, 2267–2278.<br />

Xie, X.L., Beni, G., 1991. A validity measure for <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />

<str<strong>on</strong>g>clustering</str<strong>on</strong>g>. IEEE Trans. Pattern Anal. Mach. Intell. 13,<br />

841–847.<br />

Yang, M.S., 1993. A survey of <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g>. Mathl Comput.<br />

Model. 18, 1–16.<br />

Yang, M.S., Hu, Y.J., Lin, K.C.R., Lin, C.C.L., 2002.<br />

Segmentati<strong>on</strong> techniques for tissue differentiati<strong>on</strong> in MRI<br />

of Ophthalmology using <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s. Magn.<br />

Res<strong>on</strong>. Imaging 20, 173–179.<br />

Yang, M.S., Wu, K.L., Yu, J., 2003. A <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />

<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. In: Proceedings of the 2003 IEEE Internati<strong>on</strong>al<br />

Symposium <strong>on</strong> Computati<strong>on</strong>al Intelligence in Robotics and<br />

Automati<strong>on</strong> (CIRA2003), Kobe, Japan, pp. 647–652.<br />

Yu, J., Cheng, Q., Huang, H., 2004. Analysis of the weighting<br />

exp<strong>on</strong>ent in the FCM. IEEE Trans. Syst. Man Cybernet.<br />

Part B 34, 634–638.<br />

Zadeh, L.A., 1965. Fuzzy sets. Inform. C<strong>on</strong>tr. 8, 338–353.<br />

Zahid, N., Limouri, M., Essaid, A., 1999. A new cluster-validity<br />

for <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g>. Pattern Recogniti<strong>on</strong> 32, 1089–1097.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!