A novel fuzzy clustering algorithm based on a fuzzy scatter matrix ...
A novel fuzzy clustering algorithm based on a fuzzy scatter matrix ...
A novel fuzzy clustering algorithm based on a fuzzy scatter matrix ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />
www.elsevier.com/locate/patrec<br />
A <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
<strong>scatter</strong> <strong>matrix</strong> with optimality tests<br />
Kuo-Lung Wu a , Jian Yu b , Miin-Shen Yang a, *<br />
a Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li 32023, Taiwan, ROC<br />
b Department of Computer Science, Beijing Jiaot<strong>on</strong>g University, Beijing 100044, PR China<br />
Received 16 January 2004<br />
Available <strong>on</strong>line 28 October 2004<br />
Abstract<br />
Most <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s are <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a within-cluster <strong>scatter</strong> <strong>matrix</strong> with a compactness measure. In this paper we<br />
propose a <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>, called the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> compactness and separati<strong>on</strong> (FCS), <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <strong>scatter</strong><br />
<strong>matrix</strong> in which the FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> is derived using compactness measure minimizati<strong>on</strong> and separati<strong>on</strong> measure maximizati<strong>on</strong>.<br />
The compactness is measured using a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster variati<strong>on</strong>. The separati<strong>on</strong> is measured using a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
between-cluster variati<strong>on</strong>. The proposed FCS objective functi<strong>on</strong> is a modificati<strong>on</strong> of the FS validity index proposed by<br />
Fukuyama and Sugeno and also a generalizati<strong>on</strong> of the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> c-means (FCM). The FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> assigns a crisp<br />
boundary (cluster kernel) for each cluster such that hard memberships and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> memberships can co-exist in the<br />
<str<strong>on</strong>g>clustering</str<strong>on</strong>g> results. Thus, FCS can be seen as a <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> with a <str<strong>on</strong>g>novel</str<strong>on</strong>g> sense between the hard c-means and<br />
<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> c-means. The FCS optimality tests and parameter selecti<strong>on</strong> are also investigated. Some numerical examples<br />
are dem<strong>on</strong>strated to show its robust properties and effectiveness.<br />
Ó 2004 Elsevier B.V. All rights reserved.<br />
Keywords: Fuzzy <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>; Scatter <strong>matrix</strong>; Within-cluster variati<strong>on</strong>; Between-cluster variati<strong>on</strong>; Fuzzy compactness and<br />
separati<strong>on</strong><br />
1. Introducti<strong>on</strong><br />
* Corresp<strong>on</strong>ding author. Tel.: +886 3 456 3171; fax: +886 3<br />
456 3160.<br />
E-mail address: msyang@math.cycu.edu.tw (M.-S. Yang).<br />
Cluster analysis is a branch in statistical multivariate<br />
analysis and unsupervised pattern recogniti<strong>on</strong><br />
learning. It is a method for <str<strong>on</strong>g>clustering</str<strong>on</strong>g> a data<br />
set into most similar groups in the same cluster<br />
and most dissimilar groups in different clusters.<br />
0167-8655/$ - see fr<strong>on</strong>t matter Ó 2004 Elsevier B.V. All rights reserved.<br />
doi:10.1016/j.patrec.2004.09.016
640 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />
The <str<strong>on</strong>g>clustering</str<strong>on</strong>g> applicati<strong>on</strong>s in various areas have<br />
been well documented (Duda and Hart, 1973; Jain<br />
and Dubes, 1988; Kaufman and Rousseeuw,<br />
1990). In these <str<strong>on</strong>g>clustering</str<strong>on</strong>g> methods, the hard<br />
c-means (or k-means) and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> c-means (FCM)<br />
<str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s are the most well-known<br />
methods (Bezdek, 1981; Jain and Dubes, 1988;<br />
Yang, 1993). Most of these methods are <str<strong>on</strong>g>based</str<strong>on</strong>g><br />
<strong>on</strong> minimizing the within-cluster <strong>scatter</strong> <strong>matrix</strong><br />
trace. The within-cluster <strong>scatter</strong> <strong>matrix</strong> trace can<br />
be interpreted as a compactness measure with a<br />
within-cluster variati<strong>on</strong>. Because the <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results<br />
obtained using k-means and FCMare<br />
roughly spherical with similar volumes, many <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />
<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s such as the Gustafs<strong>on</strong>–Kessel<br />
(G–K) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> (Gustafs<strong>on</strong> and Kessel, 1979),<br />
the sum of all normalized determinants (SAND)<br />
<str<strong>on</strong>g>algorithm</str<strong>on</strong>g> (Rouseeuw et al., 1996), the minimum<br />
<strong>scatter</strong> volume (MSV) and minimum cluster volume<br />
(MCV) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s (Krishnapuram and Kim,<br />
2000), the unsupervised <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> partiti<strong>on</strong>-optimal<br />
number of classes (UFP-ONC) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> (Gath<br />
and Geva, 1989), etc. were proposed to accommodate<br />
elliptical clusters with different volumes.<br />
These <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s are all <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a within-cluster<br />
<strong>scatter</strong> <strong>matrix</strong> with a compactness measure.<br />
The c<strong>on</strong>cept of adopting a separati<strong>on</strong> measure<br />
in <str<strong>on</strong>g>clustering</str<strong>on</strong>g> is used widely in solving cluster validity<br />
problems such as the separati<strong>on</strong> coefficient proposed<br />
by Gunders<strong>on</strong> (1978), the XB index<br />
proposed by Xie and Beni (1991), the FS index proposed<br />
by Fukuyama and Sugeno (1989), the SC<br />
index proposed by Zahid et al. (1999), F HV (<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
hyper-volume) and P D (partiti<strong>on</strong> density) indexes<br />
proposed by Gath and Geva (1989), etc. Özdemir<br />
and Akarun (2001) proposed an inter-cluster separati<strong>on</strong><br />
(ICS) <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> that involves a<br />
separati<strong>on</strong> measure in the ICS objective functi<strong>on</strong>.<br />
Because the between-cluster <strong>scatter</strong> <strong>matrix</strong> trace<br />
can be interpreted as a separati<strong>on</strong> measure with a<br />
between-cluster variati<strong>on</strong>, maximizati<strong>on</strong> of the between-cluster<br />
<strong>scatter</strong> <strong>matrix</strong> trace will induce a result<br />
with well-separated clusters (Yang et al., 2003).<br />
In this paper, we propose a <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />
<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>, called the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> compactness and<br />
separati<strong>on</strong> (FCS) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. The FCS objective<br />
functi<strong>on</strong> is <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <strong>scatter</strong> <strong>matrix</strong>. The<br />
FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> is derived by minimizing the compactness<br />
measure and simultaneously maximizing<br />
the separati<strong>on</strong> measure (Yang et al., 2003). The<br />
compactness is measured using a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster<br />
<strong>scatter</strong> <strong>matrix</strong>. The separati<strong>on</strong> is measured<br />
using a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> between-cluster <strong>scatter</strong> <strong>matrix</strong> trace.<br />
In k-means, data points always have crisp membership<br />
values of zero or <strong>on</strong>e. Although, FCMallows<br />
the data points to have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> membership values<br />
between zero and <strong>on</strong>e, it does not exactly produce<br />
a zero or <strong>on</strong>e for the membership values. In the<br />
proposed FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>, crisp and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> membership<br />
values could co-exist. These FCS properties<br />
will be discussed. We will also show that, when the<br />
weighting exp<strong>on</strong>ent m is large, the FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g><br />
is more robust to noise and outliers than FCM.<br />
The theoretical analysis <strong>on</strong> FCS will be investigated.<br />
Yu et al. (2004) gave a theoretical upper<br />
bound for the weighting exp<strong>on</strong>ent m in FCMin<br />
which the grand sample mean x is a unique optimizer<br />
of the FCMobjective functi<strong>on</strong>. In this paper,<br />
we will show that FCS with the different cluster<br />
kernel characteristic can avoid the situati<strong>on</strong> in<br />
which x is a unique optimizer of the FCS objective<br />
functi<strong>on</strong>. We also studied the optimality tests.<br />
These results will be used as the parameter selecti<strong>on</strong><br />
for FCS. The paper is organized as follows.<br />
In Secti<strong>on</strong> 2, the (crisp) <strong>scatter</strong> <strong>matrix</strong> is extended<br />
to the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <strong>scatter</strong> <strong>matrix</strong>. Some <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s<br />
<str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> the within-<strong>scatter</strong> <strong>matrix</strong> are then<br />
reviewed. In Secti<strong>on</strong> 3, we propose the <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
<str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> withinand<br />
between-<strong>scatter</strong> <strong>matrix</strong>. Secti<strong>on</strong> 4 gives our<br />
theoretical analysis <strong>on</strong> the optimality tests and<br />
the FCS parameter selecti<strong>on</strong>. Secti<strong>on</strong> 5 gives the<br />
robust properties of FCS <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> the gross error<br />
sensitivity and influence functi<strong>on</strong>. Some numerical<br />
examples are presented in Secti<strong>on</strong> 6. C<strong>on</strong>clusi<strong>on</strong>s<br />
are made in Secti<strong>on</strong> 7.<br />
2. Clustering <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a within-cluster<br />
<strong>scatter</strong> <strong>matrix</strong><br />
Let X ={x 1 ,...,x n } be a data set in an s-dimensi<strong>on</strong>al<br />
Euclidean space R s and let c be a positive<br />
integer larger than <strong>on</strong>e. A partiti<strong>on</strong> of X into c<br />
clusters can be presented using mutually disjoint<br />
sets X 1 ,...,X c such that X 1 [[X c = X or
K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 641<br />
equivalently using the indicator functi<strong>on</strong>s l 1 ,...,l c<br />
such that l ij = l i (x j )=1 if x j 2 X i and l ij =<br />
l i (x j )=0 if x j 62 X i . Let the sample mean of the<br />
ith cluster be<br />
a i ¼ X P n<br />
x j j¼1<br />
¼<br />
l ijx j<br />
P n<br />
x j2X i<br />
n i j¼1 l ;<br />
ij<br />
i ¼ 1; ...; c; j ¼ 1; ...; n;<br />
ð1Þ<br />
where n i is the number of data points in X i . Let the<br />
grand mean be x ¼ P n<br />
j¼1 x j=n. The total <strong>scatter</strong> <strong>matrix</strong><br />
S T for the data set X can then be decomposed<br />
into a within cluster <strong>scatter</strong> <strong>matrix</strong> S W and a between-cluster<br />
<strong>scatter</strong> <strong>matrix</strong> S B with S T = S W + S B<br />
where<br />
S T ¼ Xc X<br />
ðx j xÞðx j xÞ t<br />
x j2X i<br />
i¼1<br />
¼ Xc<br />
i¼1<br />
S W ¼ Xc<br />
i¼1<br />
¼ Xc<br />
S B ¼ Xc<br />
i¼1<br />
i¼1<br />
¼ Xc<br />
i¼1<br />
X n<br />
j¼1<br />
X<br />
l ij ðx j xÞðx j xÞ t ; ð2Þ<br />
x j2X i<br />
ðx j a i Þðx j a i Þ t<br />
X n<br />
j¼1<br />
X<br />
l ij ðx j a i Þðx j a i Þ t ; ð3Þ<br />
x j2X i<br />
ða i xÞða i xÞ t<br />
X n<br />
j¼1<br />
l ij ða i xÞða i xÞ t : ð4Þ<br />
Duda and Hart (1973) noted that the determinant<br />
jS W j of a within-cluster <strong>scatter</strong> <strong>matrix</strong> could<br />
be a criteri<strong>on</strong> functi<strong>on</strong> for <str<strong>on</strong>g>clustering</str<strong>on</strong>g>. jS W j can be<br />
interpreted as the square of the <strong>scatter</strong> volume.<br />
On the basis of jS W j, Rouseeuw et al. (1996) created<br />
the ‘‘so-called’’ SAND <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>.<br />
Let tr(A) denote the trace of a <strong>matrix</strong> A. The<br />
trace tr(S W ) of a within-cluster <strong>scatter</strong> <strong>matrix</strong> can<br />
be used to measure the compactness with a within-cluster<br />
variati<strong>on</strong>. It is reas<strong>on</strong>able to use tr(S W )<br />
as a <str<strong>on</strong>g>clustering</str<strong>on</strong>g> objective functi<strong>on</strong>. The k-means<br />
and most <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s are created by minimizing<br />
the objective functi<strong>on</strong> <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> tr(S W ). Because<br />
tr(S T ) = tr(S W ) + tr(S B ) for a given data set<br />
X and tr(S T ) is independent of the cluster center<br />
a i , the minimizati<strong>on</strong> of tr(S W ) will be equivalent<br />
to the maximizati<strong>on</strong> of tr(S B ). However, we know<br />
that the quantity tr(S B ) can be used as a measure<br />
of compactness. Thus, building an objective functi<strong>on</strong><br />
such that it can be optimized with minimizing<br />
tr(S W ) and simultaneously with maximizing tr(S B )<br />
should be sense work and the main goal of this<br />
paper.<br />
Since Zadeh (1965) introduced the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> set<br />
c<strong>on</strong>cept, researches <strong>on</strong> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> have<br />
been widely investigated (Bezdek, 1981; Yang,<br />
1993). In <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g>, FCMis a most used<br />
<str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />
P<br />
<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. Suppose that l ij 2 [0, 1] with<br />
c<br />
i¼1 l ij ¼ 1 for all j and m > 1 is a given real value.<br />
The FCMupdate equati<strong>on</strong> of <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> sample mean<br />
is with<br />
P n<br />
j¼1<br />
a i ¼<br />
lm ij x j<br />
; i ¼ 1; ...; c; j ¼ 1; ...; n: ð5Þ<br />
P n<br />
j¼1 lm ij<br />
Thus, we define the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> total <strong>scatter</strong> <strong>matrix</strong><br />
S FT , the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster <strong>scatter</strong> <strong>matrix</strong> S FW<br />
and the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> between-cluster <strong>scatter</strong> <strong>matrix</strong> S FB<br />
<strong>on</strong> the basis of the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> sample mean a i as<br />
S FT ¼ Xc X n<br />
l m ij ðx j xÞðx j xÞ t ; ð6Þ<br />
i¼1<br />
S FW ¼ Xc<br />
S FB ¼ Xc<br />
i¼1<br />
i¼1<br />
j¼1<br />
X n<br />
j¼1<br />
X n<br />
j¼1<br />
l m ij ðx j a i Þðx j a i Þ t ; ð7Þ<br />
l m ij ða i xÞða i xÞ t ; ð8Þ<br />
where l ij 2 [0, 1], P c<br />
i¼1 l ij ¼ 1 and m > 1. We know<br />
that<br />
S FT ¼ Xc X n<br />
l m ij ðx j a i þ a i xÞðx j a i þ a i xÞ t<br />
i¼1<br />
¼ Xc<br />
i¼1<br />
j¼1<br />
X n<br />
j¼1<br />
l m ij ½ðx j a i Þðx j a i Þ t þða i xÞða i xÞ t<br />
þðx j a i Þða i xÞ t þða i xÞðx j a i Þ t Š<br />
¼ S FW þ S FB þ Xc<br />
ða i xÞ t Xn<br />
l m ij ðx j a i Þ<br />
þ Xc<br />
i¼1<br />
ða i<br />
i¼1<br />
xÞ Xn<br />
j¼1<br />
j¼1<br />
l m ij ðx j a i Þ t :
642 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />
According to Eq. (5), we have P n<br />
j¼1 lm ij a i ¼<br />
P n<br />
j¼1 lm ij x j. Thus, we can create the property that<br />
S FT = S FW + S FB where a i is defined by Eq. (5).<br />
This property is exactly the same as S T = S W + S B<br />
for the (crisp) <strong>scatter</strong> <strong>matrix</strong>.<br />
Fuzzy <str<strong>on</strong>g>clustering</str<strong>on</strong>g>s including FCM(Bezdek,<br />
1981), alternative FCM(Wu and Yang, 2002;<br />
Yang et al., 2002), G–K (Gustafs<strong>on</strong> and Kessel,<br />
1979), SAND (Rouseeuw et al., 1996), MCV<br />
(Krishnapuram and Kim, 2000), and UFP-ONC<br />
(Gath and Geva, 1989), etc., are all <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong><br />
the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster <strong>scatter</strong> <strong>matrix</strong> S FW .<br />
It is known that the FCM<str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g><br />
is created by minimizing the objective<br />
functi<strong>on</strong><br />
J FCM ¼ trðS FW Þ¼ Xc<br />
i¼1<br />
X n<br />
j¼1<br />
with the membership update equati<strong>on</strong><br />
l ij ¼ ðkx j a i k 2 Þ 1<br />
m 1<br />
P c<br />
k¼1 ðkx j a k k 2 Þ 1<br />
m 1<br />
l m ij kx j a i k 2 ; ð9Þ<br />
ð10Þ<br />
and the cluster center update Eq. (5). Although<br />
tr(S FT ) = tr(S FW ) + tr(S FB ) for a given data set X,<br />
tr(S FT ) is not a fixed c<strong>on</strong>stant but depends <strong>on</strong> l ij .<br />
Thus, to minimize tr(S FW ), it is not necessary to<br />
maximize tr(S FB ). The trace tr(S FB ) of a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> set<br />
between the cluster <strong>scatter</strong> <strong>matrix</strong> can be interpreted<br />
as a separati<strong>on</strong> with a cluster variati<strong>on</strong> in<br />
between. A maximum value of tr(S FB ) will induce<br />
a <str<strong>on</strong>g>clustering</str<strong>on</strong>g> result with separated (distinguishable)<br />
clusters. In the next secti<strong>on</strong>, an <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> that c<strong>on</strong>siders<br />
tr(S FW ) and tr(S FB ) simultaneously is<br />
introduced.<br />
3. A proposed <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong><br />
tr(S FW ) and tr(S FB )<br />
Fukuyama and Sugeno (1989), Sugeno et al.<br />
(1993) had used tr(S FW ) and tr(S FB ) to create an<br />
index FS(c) in the cluster validity problem where<br />
Pal and Bezdek (1995) gave more discussi<strong>on</strong>s in<br />
cluster validity for FCM. The validity index<br />
FS(c) was formed with<br />
FSðcÞ ¼trðS FW Þ<br />
¼ Xc<br />
i¼1<br />
X c<br />
i¼1<br />
X n<br />
j¼1<br />
X n<br />
trðS FB Þ<br />
l m ij kx j a i k 2<br />
j¼1<br />
l m ij ka i xk 2 : ð11Þ<br />
A small FS(c) value will induce a good <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
<str<strong>on</strong>g>clustering</str<strong>on</strong>g> result with a small <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster<br />
variati<strong>on</strong> tr(S FW ) and a large <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> between-cluster<br />
variati<strong>on</strong> tr(S FB ). This will help us find a good<br />
cluster number estimate. It is reas<strong>on</strong>able to have<br />
a <str<strong>on</strong>g>clustering</str<strong>on</strong>g> objective functi<strong>on</strong> c<strong>on</strong>taining a measure<br />
of within- and between-cluster variati<strong>on</strong>s such<br />
as the FS(c) index. However, there does not exist<br />
an update equati<strong>on</strong> for a i by differentiating FS(c)<br />
with respect to a i . Thus, FS(c) cannot be a <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />
objective functi<strong>on</strong>.<br />
In this secti<strong>on</strong>, we propose a <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />
objective functi<strong>on</strong> which is a modificati<strong>on</strong><br />
of the FS(c) index. This could be also a generalizati<strong>on</strong><br />
of the FCMobjective functi<strong>on</strong> by combining<br />
<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within- and between-cluster variati<strong>on</strong>s. Our<br />
goal is to minimize a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster variati<strong>on</strong><br />
tr(S FW ) and also simultaneously maximize a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
between-cluster variati<strong>on</strong> tr(S FB ). We call this a<br />
<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> compactness and separati<strong>on</strong> (FCS) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>,<br />
because the compactness is measured using<br />
a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within variati<strong>on</strong> and the separati<strong>on</strong> is<br />
measured using a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> between variati<strong>on</strong>. Thus,<br />
the FCS objective functi<strong>on</strong> J FCS is defined as<br />
J FCS ¼ Xc X n<br />
l m ij kx j a i k 2<br />
i¼1 j¼1<br />
X c X n<br />
i¼1<br />
j¼1<br />
g i l m ij ka i xk 2 ; ð12Þ<br />
where g i P 0. Note that, J FCS = J FCM when g i =0<br />
and J FCS = FS(c) when g i = 1. By minimizing J FCS<br />
we have the following update equati<strong>on</strong>s:<br />
ðkx j a i k 2 g<br />
l ij ¼<br />
i ka i xk 2 m<br />
Þ 1<br />
1<br />
P c<br />
k¼1 ðkx ð13Þ<br />
j a k k 2 g k ka k xk 2 m<br />
Þ 1<br />
1<br />
and<br />
P n<br />
j¼1<br />
a i ¼<br />
lm ij x j<br />
P n<br />
j¼1 lm ij<br />
P n<br />
g i j¼1 lm ij<br />
P x<br />
g n ; ð14Þ<br />
i j¼1 lm ij
K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 643<br />
where the parameter g i could be set up with<br />
g i ¼ ðb=4Þmin i 0 6¼ika i a i 0k 2<br />
; 0 6 b 6 1:0: ð15Þ<br />
max k ka k xk 2<br />
In <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g>, we restrict l ij 2 [0,1]. Because<br />
l ij in Eq. (13) might be negative for some data<br />
point x j , we make some restricti<strong>on</strong>s <strong>on</strong> it. For a<br />
given data point x j ,ifkx j a i k 2 6 g i ka i xk 2 , then<br />
l ij = 1, and l i 0 j ¼ 0, for all i 0 5 i. That is, if the distance<br />
between the data points and the ith cluster<br />
center are smaller than g i ka i xk 2 , these data<br />
points will then bel<strong>on</strong>g exactly to the ith cluster with<br />
membership value of <strong>on</strong>e. Each cluster in FCS will<br />
have a crisp boundary such that all data points inside<br />
this boundary will have a crisp membership<br />
value l ij 2 {0,1} and other data points outside this<br />
boundary will have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> membership values<br />
l ij 2 [0, 1]. Each crisp boundary will form a hyperball<br />
for the corresp<strong>on</strong>ding cluster and can be seen<br />
as a cluster kernel. Fig. 1 shows a two-cluster data<br />
set in which each cluster c<strong>on</strong>tains a cluster center<br />
and a cluster kernel. The volume of each cluster kernel<br />
is decided by the term g i ka i xk 2 . Data points<br />
outside the kernel will have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> memberships.<br />
Note that Özdemir and Akarun (2002) proposed<br />
the partiti<strong>on</strong> index maximizati<strong>on</strong> (PIM) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g><br />
that uses a fixed volume of cluster kernels for each<br />
cluster. In our FCS, the volume of the kernel for<br />
each cluster is different. This FCS characteristic<br />
could catch more informati<strong>on</strong> from data according<br />
to different volumes with different shapes.<br />
In the k-means <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>, each data point has a<br />
crisp membership value with l ij 2 {0,1}. Although<br />
cluster center<br />
<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> boundary<br />
cluster kernel<br />
Fig. 1. Clusters obtained by FCS.<br />
crisp boundary<br />
FCMallows data points to have membership values<br />
l ij in the interval [0,1], it has fewer crisp membership<br />
values (i.e. zero or <strong>on</strong>e) in FCM, even<br />
when the data points are very close to any <strong>on</strong>e of<br />
these c cluster centers. The memberships seem to<br />
be too <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> in FCM. In our FCS, crisp and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
membership values co-exist. Data points that fall<br />
inside any <strong>on</strong>e of c-cluster kernels (i.e. close to<br />
any <strong>on</strong>e of c cluster centers) will have crisp memberships<br />
and those outside the cluster kernels (i.e.<br />
far away from all cluster centers) will have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
membership values. To guarantee that no two of<br />
these c cluster kernels will overlap, g i is chosen<br />
as Eq. (15) such that the parameter b will c<strong>on</strong>trol<br />
the size of each kernel. Since, for all i,<br />
0<br />
1<br />
g i ka i<br />
xk 2 ¼ðb=4Þðminka i<br />
i 0 6¼i<br />
6 b min<br />
i 0 6¼i<br />
a i 0k 2 ka i xk 2<br />
Þ@<br />
A<br />
maxka k xk 2<br />
k<br />
ka i a i 0k<br />
! 2<br />
2<br />
ka i a i 0k<br />
2<br />
6 min<br />
i 0 6¼i 2<br />
for all 0 6 b 6 1;<br />
we have, for a given data point x j with l ij =1,<br />
2<br />
kx j a i k 2 6 g i ka i xk 2 ka i a i 0k<br />
6 min<br />
:<br />
i 0 6¼i 2<br />
Note that the value of ka i a i 0k=2 is a half the<br />
distance between cluster centers a i and a i 0. Thus,<br />
Eq. (15) should guarantee that no two of these c<br />
cluster kernels will overlap. If b = 1.0, the FCS<br />
<str<strong>on</strong>g>algorithm</str<strong>on</strong>g> will cluster the data set using the largest<br />
kernel for each cluster. If b = 0 (i.e. g i = 0), the<br />
FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> will cluster the data set with no<br />
cluster kernel. This will be equivalent to the<br />
FCM<str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>.<br />
The cluster center update Eq. (14) in FCS can<br />
be interpreted as a weighted mean of the cluster<br />
center update equati<strong>on</strong> of FCMand the grand<br />
mean x. We can rewrite Eq. (14) as<br />
P n<br />
j¼1<br />
a i ¼<br />
lm ij x j= P n<br />
j¼1 lm ij<br />
g i x<br />
: ð16Þ<br />
1 g i<br />
The weights of the FCMcluster center and the<br />
grand mean x are 1 and g i , respectively. To maximize<br />
tr(S FB ), the cluster centers obtained by FCS
644 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />
will move away from x with a corresp<strong>on</strong>ding<br />
weighted term g i . Since g i is a m<strong>on</strong>ot<strong>on</strong>e increasing<br />
functi<strong>on</strong> of b, the cluster centers obtained by FCS<br />
with a large b value will be more far away from x<br />
than the centers obtained with a small b value. The<br />
proposed FCS <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> is summarized<br />
as follows:<br />
FCS Algorithm 1. (see also Yang et al.<br />
(2003)) Set the iterati<strong>on</strong> counter ‘ = 0 and choose<br />
the initial values a ð0Þ<br />
i , i =1,...,c. Given b, e >0<br />
Step 1. Find g ð‘þ1Þ<br />
i using (15)<br />
Step 2. Find l ð‘þ1Þ<br />
ij using (13)<br />
Step 3. Find a ð‘þ1Þ<br />
i using (14)<br />
Increment ‘; until max i ka ð‘þ1Þ<br />
i<br />
a ð‘Þ<br />
i k < e.<br />
We now use a simple example to illustrate the<br />
FCS properties. This is a sampling data set drawn<br />
from a <strong>on</strong>e-dimensi<strong>on</strong>al normal mixture (1/<br />
3)N(0, 1) + (1/3)N(4, 1) + (1/3)N(10, 1) with three<br />
populati<strong>on</strong>s of means 0, 4 and 10. The histogram<br />
of this data set is shown in Fig. 2(a). Fig. 2(b)<br />
shows the FCMmembership functi<strong>on</strong> that bel<strong>on</strong>gs<br />
to the mean 0 cluster. Fig. 2(c)–(f) are the FCS<br />
membership functi<strong>on</strong>s that bel<strong>on</strong>g to the mean 0<br />
cluster with the values 0.9, 0.5, 0.1 and 0.05 of<br />
the parameter b. The cluster kernel volumes always<br />
decrease when b is decreasing. The volume<br />
of the cluster kernel can also be presented using<br />
the range with the value of <strong>on</strong>e of the cluster membership<br />
functi<strong>on</strong>s. In our 3-cluster example, Fig.<br />
2(b)–(f) draw <strong>on</strong>ly the membership functi<strong>on</strong> of<br />
the mean 0 cluster. Thus, the range with the membership<br />
value <strong>on</strong>e presents the volume of the mean<br />
0 cluster kernel, and the range with the membership<br />
value zero presents the volumes of the means<br />
4 and 10 cluster kernels. The crisp membership<br />
values and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> membership values co-exist in<br />
FCS. When b = 0, the FCS membership values will<br />
be equivalent to FCMwith no existing cluster kernel.<br />
The three cluster centers obtained using FCM<br />
and FCS are shown in Table 1. When b decreases,<br />
the cluster centers obtained by FCS will then be<br />
closer to the grand mean x and also closer to the<br />
cluster centers obtained by the FCM. These results<br />
coincide with the property shown in Eq. (15).<br />
Histogram<br />
FCM<br />
FCS<br />
beta=0.9<br />
15<br />
1.0<br />
1.0<br />
Frequency<br />
10<br />
5<br />
membership<br />
0.5<br />
membership<br />
0.5<br />
0<br />
0.0<br />
0.0<br />
0 5 10 15<br />
0 5 10<br />
0 5 10<br />
data set<br />
data set<br />
data set<br />
(a) (b) (c)<br />
FCS<br />
beta=0.5<br />
FCS<br />
beta=0.1<br />
FCS<br />
beta=0.05<br />
1.0<br />
1.0<br />
1.0<br />
membership<br />
0.5<br />
membership<br />
0.5<br />
membership<br />
0.5<br />
0.0<br />
0.0<br />
0.0<br />
0 5 10<br />
0 5 10<br />
0 5 10<br />
data set<br />
data set<br />
data set<br />
(d) (e) (f)<br />
Fig. 2. Membership functi<strong>on</strong>s of FCM and FCS.
K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 645<br />
Table 1<br />
Cluster centers obtained by FCS<br />
Beta<br />
Cluster centers<br />
0.900 0.366 3.667 12.319<br />
0.500 0.281 3.687 11.221<br />
0.200 0.164 3.714 10.557<br />
0.100 0.111 3.726 10.356<br />
0.050 0.083 3.733 10.256<br />
0 (FCM) 0.052 3.740 10.156<br />
Although the FCMis a popular <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>,<br />
the cluster centers obtained by it will be closer<br />
to the grand mean x when the data set heavily<br />
overlaps. According to the FCS properties, the<br />
cluster centers obtained by it can be more accurate<br />
than the FCMin some situati<strong>on</strong>s. This will<br />
be illustrated in Secti<strong>on</strong> 5. In the next secti<strong>on</strong>,<br />
more details about the FCS parameters are<br />
presented.<br />
4. Optimality tests and parameter selecti<strong>on</strong> of FCS<br />
In general cases, when the data set is clustered<br />
into c (c > 1) subsets, each subset is often expected<br />
to have a different prototype (or cluster center)<br />
than the others. However, the grand sample center<br />
x of the data set is always a fixed point in the FCS<br />
<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. The FCS output will be x with a great<br />
probability if x is <strong>on</strong>e stable soluti<strong>on</strong> in the FCS<br />
<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. To avoid such cases, we hope that x is<br />
not an attracted point in the FCS. How do we<br />
judge if x is attracted or a stable point of FCS?<br />
The Hessian <strong>matrix</strong> of the FCS objective functi<strong>on</strong><br />
(12) must be studied. In order to simplify<br />
the calculati<strong>on</strong>s, substituting (13) into (12) yields<br />
(17)<br />
J ¼ Xn<br />
j¼1<br />
X c<br />
i¼1<br />
ðkx j a i k 2 g i ka i xk 2 Þ 1<br />
m 1! 1 m<br />
:<br />
ð17Þ<br />
It can be proved that J = min l J FCS . Therefore,<br />
it is enough to judge whether or not x is attracted<br />
or is a stable point in FCS using the Hessian<br />
<strong>matrix</strong> of (17). Let us set<br />
qðx j ; a i Þ¼kx j a i k 2 g i ka i xk 2 ;<br />
S j ¼ Xc<br />
i¼1<br />
m<br />
qðx j ; a i Þ 1<br />
1<br />
; l ij ¼ qðx m<br />
j; a i Þ 1<br />
1<br />
S j<br />
and<br />
h ij ¼ l m ij ½ð1 g iÞa i ðx j g i xÞŠ:<br />
We know that<br />
ð1Þ oJ<br />
oa i<br />
¼ 2 Xn<br />
ð2Þ<br />
j¼1<br />
o 2 J<br />
¼<br />
4m<br />
oa i oa k m 1<br />
þ 2d ik<br />
l m ij ½ð1 g iÞa i ðx j g i xÞŠ:<br />
X n<br />
j¼1<br />
X n<br />
j¼1<br />
ðS j Þ m<br />
l m ij ð1<br />
1 h ij ðh kj Þ t<br />
!<br />
g iÞ I ss<br />
4m<br />
m 1 d X n<br />
ik ½qðx j ; a i ÞŠ 1 h ij ½ð1<br />
j¼1<br />
8i; 8j<br />
g i Þa i<br />
ðx j g i xÞŠ t : ð18Þ<br />
Therefore, the sec<strong>on</strong>d-order term of TaylorÕs<br />
series expansi<strong>on</strong> of (17) cab be expressed as<br />
follows:<br />
<br />
o 2 <br />
!<br />
J<br />
u<br />
oa i oa a ¼ 4m<br />
2<br />
X n X<br />
ðS j Þ m 1 c<br />
l m ij<br />
k m 1<br />
ut a i<br />
½ð1 g i Þa i ðx j g i xÞŠ<br />
j¼1<br />
i¼1<br />
!<br />
þ2 Xc<br />
ð1 g i Þ Xn<br />
l m 4m X c X n<br />
ij ut a i<br />
u ai<br />
u t a<br />
m 1<br />
i<br />
l m ð1 g i Þa i ðx j g i xÞð1 g i Þa i ðx j g i xÞ t<br />
ij<br />
u<br />
qðx<br />
i¼1<br />
j¼1<br />
i¼1 j¼1<br />
j ;a i Þ<br />
ai<br />
:<br />
ð19Þ<br />
u t a
646 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />
ðx j xÞðx j xÞ t<br />
If "i, a i ¼ x, C b X ¼ P n<br />
j¼1<br />
, and "j,<br />
nkx j xk 2 Þ<br />
kx j xk > 0, then we get the following equati<strong>on</strong><br />
<br />
u t o 2 <br />
J a¼ða1<br />
a<br />
u<br />
oa i oa a<br />
k<br />
;a 2 ;...;a cÞ where a<br />
i¼x;8i<br />
P<br />
¼<br />
4m<br />
c<br />
X n i¼1 u t 2<br />
a i<br />
ðx j xÞ<br />
cðm 1Þ kx j xk 2<br />
þ 2n<br />
c m X c<br />
i¼1<br />
j¼1<br />
<br />
<br />
ð1 g i Þu t 2m C X<br />
a i<br />
I ss u<br />
ðm 1Þ ð1 g i Þ<br />
ai<br />
:<br />
ð20Þ<br />
For simplifying the analysis, we assume that "i,<br />
g i = g and "j, "k, q(x j ,a k ) > 0. In this way, we can<br />
ignore the update equati<strong>on</strong>s for g i to reduce the<br />
complexity of the analysis <strong>on</strong> FCS. Thus, Eq.<br />
(19) turns into (21) as follows:<br />
A good <str<strong>on</strong>g>clustering</str<strong>on</strong>g> method should have the robust<br />
ability to tolerate noise and outliers. In this<br />
secti<strong>on</strong>, we use the gross error sensitivity and influence<br />
functi<strong>on</strong> (Huber, 1981) to show that our<br />
weighted cluster center update equati<strong>on</strong> is robust<br />
to noise and outliers. Let {x 1 ,...,x n } be an<br />
observed data set of real numbers and h is an unknown<br />
parameter to be estimated. An M-estimator<br />
(Huber, 1981) is generated by minimizing the<br />
form<br />
X n<br />
j¼1<br />
qðx j ; hÞ;<br />
ð22Þ<br />
where q is an arbitrary functi<strong>on</strong> that can measure<br />
the loss of x j and h. Here, we are interested in a<br />
locati<strong>on</strong> estimate that minimizes<br />
X n<br />
j¼1<br />
qðx j hÞ ð23Þ<br />
u t a<br />
<br />
o 2 J<br />
oa i oa k<br />
<br />
u a ¼<br />
! 4m<br />
2<br />
X n X<br />
ðS j Þ m 1 c<br />
l m ij<br />
m 1<br />
ut a i<br />
½ð1 gÞa i ðx j gxÞŠ<br />
j¼1<br />
i¼1<br />
þ 2ð1<br />
gÞ Xc<br />
i¼1<br />
X n<br />
j¼1<br />
l m ij ut a i<br />
I<br />
2m X n<br />
m 1<br />
j¼1<br />
l m ij ½ða !<br />
i x j Þ gða i xÞŠ½ða i x j Þ gða i xÞŠ t<br />
ð1 gÞqðx j ; a i Þ P n<br />
u<br />
j¼1 lm ai<br />
:<br />
ij<br />
ð21Þ<br />
From Eq. (21), we know that if g approaches<br />
negative infinity, then any FCS soluti<strong>on</strong> will be stable.<br />
This is an unacceptable result. Similarly, if g<br />
approaches positive infinity, any FCS soluti<strong>on</strong> will<br />
be unstable. This is also unacceptable. Therefore, we<br />
can roughly set the range for g with 1
K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 647<br />
uðx hÞ<br />
ICðx; F ; hÞ ¼R u0 ðx hÞdF X ðxÞ ; ð25Þ<br />
where F X (x) denotes the distributi<strong>on</strong> functi<strong>on</strong> of<br />
X. If the influence functi<strong>on</strong> of an estimator is unbounded,<br />
a noise or outliers might cause trouble.<br />
Identically, if the u functi<strong>on</strong> of an estimator is unbounded,<br />
noise and outliers will cause trouble.<br />
Many important robustness measures can be observed<br />
from the influence functi<strong>on</strong>. One of the<br />
important measures is the gross error sensitivity<br />
c , defined by<br />
c ¼ sup jICðx; F ; hÞj:<br />
ð26Þ<br />
x<br />
This quantity can interpret the worst approximate<br />
influence that the additi<strong>on</strong> of an infinitesimal point<br />
mass can have <strong>on</strong> the value of the associated<br />
estimator.<br />
Let the loss between the data point x j and ith<br />
cluster center a i be<br />
qðx j a i Þ¼l m ij kx j a i k 2 g i l m ij ka i xk 2 ð27Þ<br />
and<br />
uðx j a i Þ¼ o qðx j a i Þ<br />
oa i<br />
¼ 2l m ij ðx j a i Þ 2g i l m ij ða i xÞ: ð28Þ<br />
By solving the equati<strong>on</strong> P n<br />
j¼1 uðx j a i Þ¼0, we<br />
have the result shown in Eq. (14). Thus, the FCS<br />
cluster center is an M-estimator with the loss functi<strong>on</strong><br />
(27) and u functi<strong>on</strong> (28). Note that, the u<br />
functi<strong>on</strong> of our estimator is a functi<strong>on</strong> of l m ij which<br />
depends <strong>on</strong> the fuzzifier m. We will show that the<br />
FCS cluster center is robust to noise and outliers<br />
when m is large.<br />
For a given data set, we will produce the FCS<br />
cluster centers {a 1 ,...,a c }, the parameters<br />
{g 1 ,...,g c } and the sample mean x by processing<br />
the FCS <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. The relative influence<br />
(the u functi<strong>on</strong>) of an individual observati<strong>on</strong><br />
x toward the ith cluster center can be defined as<br />
uðx a i Þ¼ 2ðl i ðxÞÞ m ½ðx a i Þ g i ða i xÞŠ;<br />
ð29Þ<br />
where<br />
l i ðxÞ ¼<br />
ðkx a i k 2 g i ka i xk 2 m<br />
Þ 1<br />
1<br />
P c<br />
k¼1 ðkx a : ð30Þ<br />
kk 2 g k ka k xk 2 m<br />
Þ 1<br />
1<br />
Note that, l i (x)=1ifkx a i k 2 6 g i ka i xk 2 . Suppose<br />
that x 2 R. For an extremely large or small x,<br />
we will have kx a i k 2 P g i ka i xk 2 and hence<br />
l i (x) 2 (0, 1). In general, the extremely large or<br />
small x will fall outside any <strong>on</strong>e of c cluster kernels<br />
and have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> membership value l i (x) 2 (0, 1) or<br />
more precisely, l i (x) will be very closed to 1/c<br />
and hence l i (x) m will be very closed to zero when<br />
m is large. Although ðx a i Þ g i ða i xÞ is a m<strong>on</strong>ot<strong>on</strong>e<br />
increasing functi<strong>on</strong> of x, ðl i ðxÞÞ m ½ðx a i Þ<br />
g i ða i xÞŠ will be very closed to zero for an extremely<br />
large or small x when m is large. Thus, the u<br />
functi<strong>on</strong> of the FCS cluster center will be bounded<br />
for an extremely large or small x when m tends to<br />
infinity. Therefore, for a large m case, <strong>on</strong>ly the<br />
data point x inside the ith cluster kernel with<br />
l i (x) = 1 will have an influence <strong>on</strong> the ith FCS<br />
cluster center a i and the data point x falls <strong>on</strong> the<br />
crisp boundary of the ith cluster center will have<br />
the gross error sensitivity c < 1. The above discussi<strong>on</strong>s<br />
give a theoretical foundati<strong>on</strong> to asseverate<br />
the FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> to be robust to noise and<br />
outlier when m is large. The following is a simple<br />
example.<br />
For the data set shown in Fig. 2(a), we implement<br />
FCS with b = 0.1 and m = 2 to produce the<br />
set of {a 1 ,a 2 ,a 3 }, {g 1 ,g 2 ,g 3 }andx. For a given<br />
data point x, the membership functi<strong>on</strong> l 1 (x) (Eq.<br />
(30)) is illustrated in Fig. 3(a) and the u functi<strong>on</strong><br />
uðx a 1 Þ¼ 2ðl 1 ðxÞÞ 2 ½ðx a 1 Þ g 1 ða 1 xÞŠ<br />
is illustrated in Fig. 3(d). When m = 2, an extremely<br />
large or small x will have a large influence<br />
<strong>on</strong> a 1 as shown in Fig. 3(d). Equivalently, we<br />
implement FCS with m = 3 and m = 4 to have<br />
other sets of {a 1 ,a 2 ,a 3 } and {g 1 ,g 2 ,g 3 } to illustrate<br />
the l 1 (x) inFig. 3(b) and (c), respectively. The corresp<strong>on</strong>ding<br />
u functi<strong>on</strong>s are illustrated in Fig. 3(e)<br />
and (f). Fig. 3 shows that the influence of an extremely<br />
large or small x will become small when m increases.<br />
These numerical results coincide to our<br />
theoretical analysis.<br />
6. Numerical examples<br />
We implement the Normal-4 data set which was<br />
proposed by Pal and Bezdek (1995). Normal-4 is a<br />
four-dimensi<strong>on</strong>al data set with the sample size
648 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />
FCS<br />
beta=0.1, m=2<br />
FCS<br />
beta=0.1, m=3<br />
FCS<br />
beta=0.1, m=4<br />
1.0<br />
1.0<br />
1.0<br />
membership<br />
0.5<br />
membership<br />
0.5<br />
membership<br />
0.5<br />
0.0<br />
0.0<br />
0.0<br />
0<br />
10<br />
20 30 40 50<br />
x<br />
0 10 20 30 40 50<br />
x<br />
0 10 20 30<br />
x<br />
(a) (b) (c)<br />
40<br />
50<br />
4<br />
The phi functi<strong>on</strong> of FCS<br />
beta=0.1, m=2<br />
4<br />
The phi functi<strong>on</strong> of FCS<br />
beta=0.1, m=3<br />
4<br />
The phi functi<strong>on</strong> of FCS<br />
beta=0.1, m=4<br />
2<br />
2<br />
2<br />
phi functi<strong>on</strong><br />
0<br />
-2<br />
-4<br />
phi functi<strong>on</strong><br />
0<br />
-2<br />
-4<br />
phi functi<strong>on</strong><br />
0<br />
-2<br />
-4<br />
-6<br />
-6<br />
-6<br />
-8<br />
0 10 20 30 40 50<br />
x<br />
-8<br />
0 10 20 30 40 50<br />
x<br />
-8<br />
0 10 20 30 40 50<br />
x<br />
(d) (e) (f)<br />
Fig. 3. Membership functi<strong>on</strong>s (Eq. (30)) and phi functi<strong>on</strong>s (Eq. (29)) for a given data point x when m = 2, 3 and 4.<br />
n = 800 points c<strong>on</strong>sisted of 200 points from each<br />
of four clusters. The populati<strong>on</strong> mean vectors<br />
are l 1 = (3,0,0,0), l 2 = (0, 3,0,0), l 3 = (0, 0,3,0)<br />
P<br />
and l 4 = (0, 0,0,3). The covariance matrices are<br />
i ¼ I 4, i = 1, 2, 3, 4. We use the mean vectors<br />
as the initial values for both FCMand FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s.<br />
We compare FCMand FCS with the mean<br />
squared error (MSE) criteri<strong>on</strong>. The MSE is calculated<br />
by P 4<br />
i¼1 ka i l i k 2 . We implement the FCS<br />
<str<strong>on</strong>g>algorithm</str<strong>on</strong>g> with different values combinati<strong>on</strong>s of b<br />
for (0,0.005, 0.01, 0.05,0.1, 0.2) and m for<br />
(1.5,2,2.5,3,3.5). For 100 repeated Normal-4 data<br />
sets, the MSE values are shown in Fig. 4(a). The<br />
result of b = 0 is equivalent to FCM. As b increases,<br />
the volume of the cluster kernel grows.<br />
When the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> index m is small, the results of<br />
FCS (b > 0) and FCM(b = 0) are similar. As m becomes<br />
larger, FCMgives larger MSE values than<br />
FCS. Thus, the FCS result is more insensitive to<br />
the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> index m than FCM, especially for<br />
b = 0.05, 0.1 and 0.2. In general, the MSE values<br />
will increase when m increases. Yu et al. (2004)<br />
gave a theoretical upper bound for m for FCM<br />
that the grand sample mean x will be a unique<br />
optimizer. For each combinati<strong>on</strong> of m and b, the<br />
range between worst and best MSE values am<strong>on</strong>g<br />
these 100 repeated MSE values are shown in Fig.<br />
4(b). The cases of m = 3.5 and b = 0,0.05, 0.01<br />
are the examples of the grand sample mean x being<br />
the unique optimizer. The range of MSE values for<br />
FCS with b = 0.05, 0.1 and 0.2 are still insensitive<br />
to the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> index m. One may argue that we can<br />
process FCMwith a small m value, say 1.5 or 2,<br />
to avoid the above defective of FCM. However,<br />
FCS are not <strong>on</strong>ly insensitive to the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> index m<br />
than FCM, but also more robust to noise and outliers<br />
than FCMwhen m is large. We will show this<br />
robust property of FCS in the following examples.<br />
Fig. 5 shows a two-cluster data set with unequal<br />
sample sizes. The <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results of FCS with<br />
m = 2 and b = 0,0.1,0.2 are shown in Fig. 5(a)–<br />
(c). Three figures show the similar results. The<br />
cluster centers are presented by the solid circle<br />
points. Note that, the distance between two cluster<br />
centers increases as b increases. This phenomen<strong>on</strong><br />
can be explained by the update Eq. (17) of FCS.
K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 649<br />
MSE<br />
10<br />
5<br />
0<br />
00.005<br />
0.01 0.05<br />
beta<br />
0.1<br />
0.2<br />
(a)<br />
1.5<br />
2.5<br />
m<br />
3.5<br />
range of MSE<br />
25<br />
20<br />
15<br />
10<br />
5<br />
0<br />
00.005<br />
0.01 0.05 0.1<br />
beta<br />
0.2<br />
(b)<br />
1.5<br />
2.5<br />
m<br />
3.5<br />
Fig. 4. MSE values for different combinati<strong>on</strong>s of beta and m.<br />
When m becomes larger (m = 6), the <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results<br />
of FCS with b = 0, 0.1 and 0.2 are shown in<br />
Fig. 5(d)–(f). The FCS with cluster kernels<br />
(b = 0.1 and 0.2) obtains better performance than<br />
FCM(b = 0) which clusters the data set without<br />
a cluster kernel. This shows that FCS with a large<br />
and suitable m value can detect unequal sample<br />
size clusters or is robust to the noise. Fig. 6 shows<br />
a two-cluster data set with <strong>on</strong>e outlying point<br />
whose coordinate is (100, 0). When m is large<br />
(m = 6), the results of FCS with b = 0.1 and<br />
0.2 are more robust to the outlier than FCM<br />
(b = 0). These robust properties of FCS can be<br />
explained using the FCS update equati<strong>on</strong>s. Let<br />
^l ¼ maxfl i1 ; ...; l in g, l 0 ij ¼ l ij=^l, j =1,...,n. We<br />
have<br />
80<br />
FCM<br />
m=2<br />
80<br />
FCS<br />
m=2, beta=0.1<br />
80<br />
FCS<br />
m=2, beta=0.2<br />
70<br />
70<br />
70<br />
60<br />
60<br />
60<br />
50<br />
50<br />
50<br />
40<br />
40<br />
40<br />
30<br />
30<br />
30<br />
20<br />
20<br />
20<br />
10<br />
10<br />
10<br />
10 20 30 40 50 60 70 80 90 100<br />
(a)<br />
10 20 30 40 50 60 70 80 90 100<br />
(b)<br />
10 20 30 40 50 60 70 80 90 100<br />
(c)<br />
80<br />
FCM<br />
m=6<br />
80<br />
FCS<br />
m=6, beta=0.1<br />
80<br />
FCS<br />
m=6, beta=0.2<br />
70<br />
70<br />
70<br />
60<br />
60<br />
60<br />
50<br />
50<br />
50<br />
40<br />
40<br />
40<br />
30<br />
30<br />
30<br />
20<br />
20<br />
20<br />
10<br />
10<br />
10<br />
10 20 30 40 50 60 70 80 90 100<br />
10 20 30 40 50 60 70 80 90 100<br />
10 20 30 40 50 60 70 80 90 100<br />
(d) (e) (f)<br />
Fig. 5. FCMand FCS <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results for unequal sample size data sets.
650 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />
2<br />
FCM<br />
m=6<br />
2<br />
FCS<br />
m=6, beta=0.1<br />
2<br />
FCS<br />
m=6, beta=0.2<br />
1<br />
1<br />
1<br />
0<br />
0<br />
0<br />
-1<br />
-1<br />
-1<br />
-2<br />
-2<br />
-2<br />
-3 -2 -1 0 1 2 3 4 5 6 7<br />
-3 -2 -1 0 1 2 3 4 5 6 7<br />
-3 -2 -1 0 1 2<br />
(a) (b) (c)<br />
3<br />
4<br />
5<br />
6<br />
7<br />
Fig. 6. FCMand FCS <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results for the two-clusters data set with <strong>on</strong>e outlier in the coordinate (100,0).<br />
P n<br />
lim fa j¼1<br />
ig¼ lim<br />
lm ij x P n<br />
j g i j¼1 lm ij<br />
P x<br />
n P<br />
m!1 m!1<br />
j¼1 lm ij g n<br />
i j¼1 lm ij<br />
P n P n<br />
j¼1<br />
¼ lim<br />
ðl0 ij Þm x j g i j¼1 ðl0 ij Þm x<br />
P n m!1<br />
j¼1 ðl0 ijÞ m P n<br />
g i j¼1 ðl0 ijÞ m<br />
P<br />
Pl<br />
¼<br />
0 ¼1x j g<br />
ij i l 0 ¼1x ij<br />
ð1 g i Þ P l 0 ¼11 ij<br />
P<br />
l<br />
¼<br />
0 ¼1x j= P <br />
l<br />
g<br />
ij ij1 0 i x<br />
1 g<br />
<br />
i<br />
P<br />
l ij ¼^l x j= P <br />
l ij ¼^l 1 g i x<br />
¼<br />
: ð31Þ<br />
1 g i<br />
In FCM, when m is large, l ij =1/c for all i, j and<br />
hence l ij ¼ ^l for all i, j. This is why FCMcould<br />
obtain results in which the sample mean x will be<br />
a unique optimizer when m is large. However,<br />
the data points inside the cluster kernels in FCS<br />
will have l ij 2 {0,1} and l ij 2 (0,1) for those data<br />
points outside cluster kernels. When m is large,<br />
the ith cluster center update Eq. (22) will give a<br />
large ðl 0 ij Þm ¼ðl ij =^lÞ m ¼ 1 for the data points inside<br />
the ith cluster kernel and will give a small<br />
ðl 0 ij Þm 0 for the data points outside the ith cluster<br />
kernel. When m is large, the ith cluster center update<br />
Eq. (22) will be the weighted mean of the sample<br />
mean of the data points inside the ith cluster<br />
kernel and the grand mean x. The sample mean<br />
data point weights inside the ith cluster kernel<br />
and the grand mean x are 1 and g i , respectively.<br />
For a suitable b value, noise and outliers will be<br />
outside the cluster kernels and their influences <strong>on</strong><br />
the <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results will be small when m is large.<br />
This explains the <str<strong>on</strong>g>clustering</str<strong>on</strong>g> results shown in Figs.<br />
5 and 6 and also coincide to the theoretical analysis<br />
in Secti<strong>on</strong> 5. This property also provides a<br />
method to avoid the sample mean x being a unique<br />
optimizer in FCM.<br />
We know that when the sample mean x is the<br />
unique optimizer of a <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>,<br />
the partiti<strong>on</strong> coefficient (PC) (Bezdek, 1974)<br />
defined by<br />
PCðCÞ ¼ 1 n<br />
X c<br />
i¼1<br />
X n<br />
j¼1<br />
l 2 ij<br />
ð32Þ<br />
will be equal to 1/c or equivalently the n<strong>on</strong>-<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
index (NFI) (Pal and Bezdek, 1995) defined by<br />
c<br />
NFIðcÞ ¼1 ð1 PCðCÞÞ ð33Þ<br />
c 1<br />
equals to zero. Note that Dave (1996) also proposed<br />
a modificati<strong>on</strong> of the PC index which is<br />
equivalent to the NFI index. According to the<br />
above analysis, we hope the FCS with cluster ker-<br />
NFI<br />
0.25<br />
0.20<br />
0.15<br />
0.10<br />
0.05<br />
0.00<br />
m=10<br />
0 0.05 0.1 0.15 0.2 0.5 0.99<br />
beta<br />
m=20<br />
Fig. 7. NFI(2) values for the unequal sample size data set<br />
shown in Fig. 5.
K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652 651<br />
0.5<br />
m=1.5<br />
0.12<br />
m=2<br />
NFI<br />
0.4<br />
PIM<br />
NFI<br />
0.10<br />
0.08<br />
0.06<br />
FCS<br />
0.04<br />
0.3<br />
FCS<br />
0.02<br />
PIM<br />
0.00<br />
0 0.05 0.1 0.15 0.2 0.5 0.99<br />
beta, delta<br />
(a)<br />
0 0.05 0.1 0.15 0.2 0.5 0.99<br />
beta, delta<br />
(b)<br />
Fig. 8. NFI(11) values for the normalized Vowel data set in which both PIMand FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s are processed with the same<br />
parameter values.<br />
nels can avoid the situati<strong>on</strong> in which the sample<br />
mean is a unique optimizer of the FCS objective<br />
functi<strong>on</strong>. Fig. 7 presents the NFI (2) values of<br />
the data set shown in Fig. 5. The NFI values of<br />
FCS with cluster kernels (b > 0) are always larger<br />
than the NFI values of the FCM(b = 0) which is<br />
the case of the sample mean x being the unique<br />
optimizer with NFI = 0 when m = 10 and 20. This<br />
shows that the FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> can avoid the case<br />
of NFI = 0 and is robust to the noise and outliers<br />
than FCMwhen m is larger. Because the sample<br />
mean x of the data set shown in Fig. 6 will not<br />
be the unique optimizer of FCMand FCS when<br />
m is larger, we do not show their NFI values.<br />
Note that some properties of FCS discussed<br />
above can also be achieved by the partiti<strong>on</strong> index<br />
maximizati<strong>on</strong> (PIM) <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> (Özdemir and<br />
Akarun, 2002) which used a fixed volume for all<br />
cluster kernels. The radius of each cluster volume<br />
in PIMis defined by<br />
a ¼ d minfmin ka i a 0<br />
i6¼i 0 ik=2g; 0 6 d 6 1: ð34Þ<br />
The NFI values of the normalized Vowel data set<br />
in the UCI Machine Learning Repository (Blake<br />
and Merz, 1998) of PIMand FCS are shown in<br />
Fig. 8. Yu et al. (2004) showed that when<br />
m > 1.7787, the sample mean x will be the unique<br />
optimizer of FCMfor the normalized Vowel data<br />
set in Blake and Merz (1998). InFig. 8(a), when<br />
m = 1.5, both PIMand FCS with different d and<br />
b values have the NFI index values larger than<br />
0.3. However, when m = 2 as shown in Fig. 8(b),<br />
the PIMgive the same NFI values as FCM<br />
(d =0 or b = 0). The use of the same volumes of<br />
the cluster kernels do not help PIMto have a larger<br />
NFI values than FCM. The same situati<strong>on</strong><br />
when m = 2 in FCS as shown in Fig. 8(b), the<br />
NFI values of FCS are always larger than FCM<br />
and PIM. Using the different cluster kernel volumes<br />
in FCS produces these good merits.<br />
7. C<strong>on</strong>clusi<strong>on</strong>s<br />
We proposed a <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g><br />
called the FCS <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> which attempts to minimize<br />
the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> within-cluster <strong>scatter</strong> <strong>matrix</strong> trace<br />
and simultaneously maximize the <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> betweencluster<br />
<strong>scatter</strong> <strong>matrix</strong> trace. Each cluster obtained<br />
by the FCS will have a cluster kernel. Data points<br />
that fall inside any <strong>on</strong>e of the c cluster kernels will<br />
have crisp memberships and be outside all of the<br />
cluster kernels that have <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> memberships.<br />
The volume of each cluster kernel is decided by<br />
the parameter g i which is a functi<strong>on</strong> of b. The crisp<br />
and <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> memberships co-exist in the FCS. The<br />
cluster center update equati<strong>on</strong>s in the FCS can<br />
be interpreted as a weighted mean of the FCM<br />
cluster centers and the grand mean x. Numerical<br />
examples show that the FCS can have more accurate<br />
results in the parameter estimati<strong>on</strong> than the<br />
FCM. It also shows that FCS can help avoid the<br />
situati<strong>on</strong> where the sample mean x is a unique<br />
optimizer of FCMand is more robust to noise
652 K.-L. Wu et al. / Pattern Recogniti<strong>on</strong> Letters 26 (2005) 639–652<br />
and outliers than FCMwhen m is large. A theoretical<br />
analysis of FCS was also investigated. Overall,<br />
the proposed FCS is recommended as a good<br />
<str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> when the most compact kernels<br />
are in the same cluster and most separated<br />
are in different clusters.<br />
Acknowledgement<br />
This work was supported in part by the Nati<strong>on</strong>al<br />
Science Council of Taiwan, ROC, under<br />
grant NSC-91-2118-M-033-001.<br />
References<br />
Bezdek, J.C., 1974. Cluster validity with <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> sets. J. Cybernet.<br />
3, 58–73.<br />
Bezdek, J.C., 1981. Pattern Recogniti<strong>on</strong> with Fuzzy Objective<br />
Functi<strong>on</strong> Algorithms. Plenum Press, New York.<br />
Blake, C.L., Merz, C.J., 1998. UCI repository of machine<br />
learning databases, a huge collecti<strong>on</strong> of artificial and realworld<br />
data sets. Available from: .<br />
Dave, R.N., 1996. Validating <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> partiti<strong>on</strong> obtained through<br />
c-shells <str<strong>on</strong>g>clustering</str<strong>on</strong>g>. Pattern Recogniti<strong>on</strong> Lett. 17, 613–623.<br />
Duda, R.O., Hart, P.E., 1973. Pattern Classificati<strong>on</strong> and Scene<br />
Analysis. Wiley, New York.<br />
Fukuyama, Y., Sugeno, M., 1989. A new method of choosing<br />
the number of clusters for <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> c-means method. In:<br />
Proceedings of the 5th Fuzzy System Symposium (in<br />
Japanese), pp. 247–250.<br />
Gath, J., Geva, A.B., 1989. Unsupervised optimal <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
<str<strong>on</strong>g>clustering</str<strong>on</strong>g>. IEEE Trans. Pattern Anal. Mach. Intell. 11,<br />
773–781.<br />
Gunders<strong>on</strong>, M., 1978. Applicati<strong>on</strong> of <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> ISODATA <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s<br />
to star tracker pointing systems. In: Proceedings of<br />
the 7th Triennial World IFCA C<strong>on</strong>g., Helsinki, Filind, pp.<br />
1319–1323.<br />
Gustafs<strong>on</strong>, D.E., Kessel, W.C., 1979. Fuzzy <str<strong>on</strong>g>clustering</str<strong>on</strong>g> with a<br />
<str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> covariance <strong>matrix</strong>. In: Proceedings of the IEEE<br />
C<strong>on</strong>ference <strong>on</strong> Decisi<strong>on</strong> C<strong>on</strong>trol, San Diego, CA, pp.<br />
761–766.<br />
Huber, P.J., 1981. Robust Statistics. Wiley, New York.<br />
Jain, A.K., Dubes, R.C., 1988. In: Algorithm for Clustering<br />
Data. Prentice-Hall, Englewood Cliffs, NJ.<br />
Kaufman, L., Rousseeuw, P.J., 1990. Finding Groups in Data:<br />
An Introducti<strong>on</strong> to Cluster Analysis. Wiley, New York.<br />
Krishnapuram, R., Kim, J., 2000. Clustering <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s <str<strong>on</strong>g>based</str<strong>on</strong>g><br />
<strong>on</strong> volume criteria. IEEE Trans. Fuzzy Syst. 8, 228–236.<br />
Özdemir, D., Akarun, L., 2001. Fuzzy <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s for combined<br />
quantizati<strong>on</strong> and dithering. IEEE Trans. Image Processing<br />
10 (6), 923–931.<br />
Özdemir, D., Akarun, L., 2002. A <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g> for color<br />
quantizati<strong>on</strong> of images. Pattern Recogniti<strong>on</strong> 35, 1785–1791.<br />
Pal, N.R., Bezdek, J.C., 1995. On cluster validity for <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
c-means model. IEEE Trans. Fuzzy Syst., 370–379.<br />
Rouseeuw, P.J., Kaufman, L., Trauwaert, E., 1996. Fuzzy<br />
<str<strong>on</strong>g>clustering</str<strong>on</strong>g> using <strong>scatter</strong> matrices. Comput. Statist. Data<br />
Anal. 23, 135–151.<br />
Sugeno, M., Yasukawa, T., 1993. A <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g>-logic-<str<strong>on</strong>g>based</str<strong>on</strong>g> approach<br />
to qualitative modeling. IEEE Trans. Fuzzy Syst. 1, 7–31.<br />
Wu, K.L., Yang, M.S., 2002. Alternative c-means <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />
<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. Pattern Recogniti<strong>on</strong> 35, 2267–2278.<br />
Xie, X.L., Beni, G., 1991. A validity measure for <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g><br />
<str<strong>on</strong>g>clustering</str<strong>on</strong>g>. IEEE Trans. Pattern Anal. Mach. Intell. 13,<br />
841–847.<br />
Yang, M.S., 1993. A survey of <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g>. Mathl Comput.<br />
Model. 18, 1–16.<br />
Yang, M.S., Hu, Y.J., Lin, K.C.R., Lin, C.C.L., 2002.<br />
Segmentati<strong>on</strong> techniques for tissue differentiati<strong>on</strong> in MRI<br />
of Ophthalmology using <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g> <str<strong>on</strong>g>algorithm</str<strong>on</strong>g>s. Magn.<br />
Res<strong>on</strong>. Imaging 20, 173–179.<br />
Yang, M.S., Wu, K.L., Yu, J., 2003. A <str<strong>on</strong>g>novel</str<strong>on</strong>g> <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g><br />
<str<strong>on</strong>g>algorithm</str<strong>on</strong>g>. In: Proceedings of the 2003 IEEE Internati<strong>on</strong>al<br />
Symposium <strong>on</strong> Computati<strong>on</strong>al Intelligence in Robotics and<br />
Automati<strong>on</strong> (CIRA2003), Kobe, Japan, pp. 647–652.<br />
Yu, J., Cheng, Q., Huang, H., 2004. Analysis of the weighting<br />
exp<strong>on</strong>ent in the FCM. IEEE Trans. Syst. Man Cybernet.<br />
Part B 34, 634–638.<br />
Zadeh, L.A., 1965. Fuzzy sets. Inform. C<strong>on</strong>tr. 8, 338–353.<br />
Zahid, N., Limouri, M., Essaid, A., 1999. A new cluster-validity<br />
for <str<strong>on</strong>g>fuzzy</str<strong>on</strong>g> <str<strong>on</strong>g>clustering</str<strong>on</strong>g>. Pattern Recogniti<strong>on</strong> 32, 1089–1097.