A novel fuzzy clustering algorithm based on a fuzzy scatter matrix ...

Pattern Recognition Letters 26 (2005) 639–652 

www.elsevier.com/locate/patrec 

A <strong>novel</strong> <strong>fuzzy</strong> <strong>clustering</strong> <strong>algorithm</strong> <strong>based</strong> on a <strong>fuzzy</strong> 

scatter matrix with optimality tests 

Kuo-Lung Wu a , Jian Yu b , Miin-Shen Yang a, * 

a Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li 32023, Taiwan, ROC 

b Department of Computer Science, Beijing Jiaotong University, Beijing 100044, PR China 

Received 16 January 2004 

Available online 28 October 2004 

Abstract 

Most <strong>clustering</strong> <strong>algorithm</strong>s are <strong>based</strong> on a within-cluster scatter matrix with a compactness measure. In this paper we 

propose a <strong>novel</strong> <strong>fuzzy</strong> <strong>clustering</strong> <strong>algorithm</strong>, called the <strong>fuzzy</strong> compactness and separation (FCS), <strong>based</strong> on a <strong>fuzzy</strong> scatter 

matrix in which the FCS <strong>algorithm</strong> is derived using compactness measure minimization and separation measure maximization. 

The compactness is measured using a <strong>fuzzy</strong> within-cluster variation. The separation is measured using a <strong>fuzzy</strong> 

between-cluster variation. The proposed FCS objective function is a modification of the FS validity index proposed by 

Fukuyama and Sugeno and also a generalization of the <strong>fuzzy</strong> c-means (FCM). The FCS <strong>algorithm</strong> assigns a crisp 

boundary (cluster kernel) for each cluster such that hard memberships and <strong>fuzzy</strong> memberships can co-exist in the 

<strong>clustering</strong> results. Thus, FCS can be seen as a <strong>clustering</strong> <strong>algorithm</strong> with a <strong>novel</strong> sense between the hard c-means and 

<strong>fuzzy</strong> c-means. The FCS optimality tests and parameter selection are also investigated. Some numerical examples 

are demonstrated to show its robust properties and effectiveness. 

Ó 2004 Elsevier B.V. All rights reserved. 

Keywords: Fuzzy <strong>clustering</strong> <strong>algorithm</strong>; Scatter matrix; Within-cluster variation; Between-cluster variation; Fuzzy compactness and 

separation 

1. Introduction 

* Corresponding author. Tel.: +886 3 456 3171; fax: +886 3 

456 3160. 

E-mail address: msyang@math.cycu.edu.tw (M.-S. Yang). 

Cluster analysis is a branch in statistical multivariate 

analysis and unsupervised pattern recognition 

learning. It is a method for <strong>clustering</strong> a data 

set into most similar groups in the same cluster 

and most dissimilar groups in different clusters. 

0167-8655/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. 

doi:10.1016/j.patrec.2004.09.016

640 K.-L. Wu et al. / Pattern Recognition Letters 26 (2005) 639–652 

The <strong>clustering</strong> applications in various areas have 

been well documented (Duda and Hart, 1973; Jain 

and Dubes, 1988; Kaufman and Rousseeuw, 

1990). In these <strong>clustering</strong> methods, the hard 

c-means (or k-means) and <strong>fuzzy</strong> c-means (FCM) 

<strong>clustering</strong> <strong>algorithm</strong>s are the most well-known 

methods (Bezdek, 1981; Jain and Dubes, 1988; 

Yang, 1993). Most of these methods are <strong>based</strong> 

on minimizing the within-cluster scatter matrix 

trace. The within-cluster scatter matrix trace can 

be interpreted as a compactness measure with a 

within-cluster variation. Because the <strong>clustering</strong> results 

obtained using k-means and FCMare 

roughly spherical with similar volumes, many <strong>clustering</strong> 

<strong>algorithm</strong>s such as the Gustafson–Kessel 

(G–K) <strong>algorithm</strong> (Gustafson and Kessel, 1979), 

the sum of all normalized determinants (SAND) 

<strong>algorithm</strong> (Rouseeuw et al., 1996), the minimum 

scatter volume (MSV) and minimum cluster volume 

(MCV) <strong>algorithm</strong>s (Krishnapuram and Kim, 

2000), the unsupervised <strong>fuzzy</strong> partition-optimal 

number of classes (UFP-ONC) <strong>algorithm</strong> (Gath 

and Geva, 1989), etc. were proposed to accommodate 

elliptical clusters with different volumes. 

These <strong>algorithm</strong>s are all <strong>based</strong> on a within-cluster 

scatter matrix with a compactness measure. 

The concept of adopting a separation measure 

in <strong>clustering</strong> is used widely in solving cluster validity 

problems such as the separation coefficient proposed 

by Gunderson (1978), the XB index 

proposed by Xie and Beni (1991), the FS index proposed 

by Fukuyama and Sugeno (1989), the SC 

index proposed by Zahid et al. (1999), F HV (<strong>fuzzy</strong> 

hyper-volume) and P D (partition density) indexes 

proposed by Gath and Geva (1989), etc. Özdemir 

and Akarun (2001) proposed an inter-cluster separation 

(ICS) <strong>clustering</strong> <strong>algorithm</strong> that involves a 

separation measure in the ICS objective function. 

Because the between-cluster scatter matrix trace 

can be interpreted as a separation measure with a 

between-cluster variation, maximization of the between-cluster 

scatter matrix trace will induce a result 

with well-separated clusters (Yang et al., 2003). 

In this paper, we propose a <strong>novel</strong> <strong>fuzzy</strong> <strong>clustering</strong> 

<strong>algorithm</strong>, called the <strong>fuzzy</strong> compactness and 

separation (FCS) <strong>algorithm</strong>. The FCS objective 

function is <strong>based</strong> on a <strong>fuzzy</strong> scatter matrix. The 

FCS <strong>algorithm</strong> is derived by minimizing the compactness 

measure and simultaneously maximizing 

the separation measure (Yang et al., 2003). The 

compactness is measured using a <strong>fuzzy</strong> within-cluster 

scatter matrix. The separation is measured 

using a <strong>fuzzy</strong> between-cluster scatter matrix trace. 

In k-means, data points always have crisp membership 

values of zero or one. Although, FCMallows 

the data points to have <strong>fuzzy</strong> membership values 

between zero and one, it does not exactly produce 

a zero or one for the membership values. In the 

proposed FCS <strong>algorithm</strong>, crisp and <strong>fuzzy</strong> membership 

values could co-exist. These FCS properties 

will be discussed. We will also show that, when the 

weighting exponent m is large, the FCS <strong>algorithm</strong> 

is more robust to noise and outliers than FCM. 

The theoretical analysis on FCS will be investigated. 

Yu et al. (2004) gave a theoretical upper 

bound for the weighting exponent m in FCMin 

which the grand sample mean x is a unique optimizer 

of the FCMobjective function. In this paper, 

we will show that FCS with the different cluster 

kernel characteristic can avoid the situation in 

which x is a unique optimizer of the FCS objective 

function. We also studied the optimality tests. 

These results will be used as the parameter selection 

for FCS. The paper is organized as follows. 

In Section 2, the (crisp) scatter matrix is extended 

to the <strong>fuzzy</strong> scatter matrix. Some <strong>clustering</strong> <strong>algorithm</strong>s 

<strong>based</strong> on the within-scatter matrix are then 

reviewed. In Section 3, we propose the <strong>novel</strong> <strong>fuzzy</strong> 

<strong>clustering</strong> <strong>algorithm</strong> <strong>based</strong> on the <strong>fuzzy</strong> withinand 

between-scatter matrix. Section 4 gives our 

theoretical analysis on the optimality tests and 

the FCS parameter selection. Section 5 gives the 

robust properties of FCS <strong>based</strong> on the gross error 

sensitivity and influence function. Some numerical 

examples are presented in Section 6. Conclusions 

are made in Section 7. 

2. Clustering <strong>algorithm</strong>s <strong>based</strong> on a within-cluster 

scatter matrix 

Let X ={x 1 ,...,x n } be a data set in an s-dimensional 

Euclidean space R s and let c be a positive 

integer larger than one. A partition of X into c 

clusters can be presented using mutually disjoint 

sets X 1 ,...,X c such that X 1 [[X c = X or

K.-L. Wu et al. / Pattern Recognition Letters 26 (2005) 639–652 641 

equivalently using the indicator functions l 1 ,...,l c 

such that l ij = l i (x j )=1 if x j 2 X i and l ij = 

l i (x j )=0 if x j 62 X i . Let the sample mean of the 

ith cluster be 

a i ¼ X P n 

x j j¼1 

¼ 

l ijx j 

P n 

x j2X i 

n i j¼1 l ; 

ij 

i ¼ 1; ...; c; j ¼ 1; ...; n; 

ð1Þ 

where n i is the number of data points in X i . Let the 

grand mean be x ¼ P n 

j¼1 x j=n. The total scatter matrix 

S T for the data set X can then be decomposed 

into a within cluster scatter matrix S W and a between-cluster 

scatter matrix S B with S T = S W + S B 

where 

S T ¼ Xc X 

ðx j xÞðx j xÞ t 

x j2X i 

i¼1 

¼ Xc 

i¼1 

S W ¼ Xc 

i¼1 

¼ Xc 

S B ¼ Xc 

i¼1 

i¼1 

¼ Xc 

i¼1 

X n 

j¼1 

X 

l ij ðx j xÞðx j xÞ t ; ð2Þ 

x j2X i 

ðx j a i Þðx j a i Þ t 

X n 

j¼1 

X 

l ij ðx j a i Þðx j a i Þ t ; ð3Þ 

x j2X i 

ða i xÞða i xÞ t 

X n 

j¼1 

l ij ða i xÞða i xÞ t : ð4Þ 

Duda and Hart (1973) noted that the determinant 

jS W j of a within-cluster scatter matrix could 

be a criterion function for <strong>clustering</strong>. jS W j can be 

interpreted as the square of the scatter volume. 

On the basis of jS W j, Rouseeuw et al. (1996) created 

the ‘‘so-called’’ SAND <strong>algorithm</strong>. 

Let tr(A) denote the trace of a matrix A. The 

trace tr(S W ) of a within-cluster scatter matrix can 

be used to measure the compactness with a within-cluster 

variation. It is reasonable to use tr(S W ) 

as a <strong>clustering</strong> objective function. The k-means 

and most <strong>clustering</strong> <strong>algorithm</strong>s are created by minimizing 

the objective function <strong>based</strong> on tr(S W ). Because 

tr(S T ) = tr(S W ) + tr(S B ) for a given data set 

X and tr(S T ) is independent of the cluster center 

a i , the minimization of tr(S W ) will be equivalent 

to the maximization of tr(S B ). However, we know 

that the quantity tr(S B ) can be used as a measure 

of compactness. Thus, building an objective function 

such that it can be optimized with minimizing 

tr(S W ) and simultaneously with maximizing tr(S B ) 

should be sense work and the main goal of this 

paper. 

Since Zadeh (1965) introduced the <strong>fuzzy</strong> set 

concept, researches on <strong>fuzzy</strong> <strong>clustering</strong> have 

been widely investigated (Bezdek, 1981; Yang, 

1993). In <strong>fuzzy</strong> <strong>clustering</strong>, FCMis a most used 

<strong>clustering</strong> 

P 

<strong>algorithm</strong>. Suppose that l ij 2 [0, 1] with 

c 

i¼1 l ij ¼ 1 for all j and m > 1 is a given real value. 

The FCMupdate equation of <strong>fuzzy</strong> sample mean 

is with 

P n 

j¼1 

a i ¼ 

lm ij x j 

; i ¼ 1; ...; c; j ¼ 1; ...; n: ð5Þ 

P n 

j¼1 lm ij 

Thus, we define the <strong>fuzzy</strong> total scatter matrix 

S FT , the <strong>fuzzy</strong> within-cluster scatter matrix S FW 

and the <strong>fuzzy</strong> between-cluster scatter matrix S FB 

on the basis of the <strong>fuzzy</strong> sample mean a i as 

S FT ¼ Xc X n 

l m ij ðx j xÞðx j xÞ t ; ð6Þ 

i¼1 

S FW ¼ Xc 

S FB ¼ Xc 

i¼1 

i¼1 

j¼1 

X n 

j¼1 

X n 

j¼1 

l m ij ðx j a i Þðx j a i Þ t ; ð7Þ 

l m ij ða i xÞða i xÞ t ; ð8Þ 

where l ij 2 [0, 1], P c 

i¼1 l ij ¼ 1 and m > 1. We know 

that 

S FT ¼ Xc X n 

l m ij ðx j a i þ a i xÞðx j a i þ a i xÞ t 

i¼1 

¼ Xc 

i¼1 

j¼1 

X n 

j¼1 

l m ij ½ðx j a i Þðx j a i Þ t þða i xÞða i xÞ t 

þðx j a i Þða i xÞ t þða i xÞðx j a i Þ t Š 

¼ S FW þ S FB þ Xc 

ða i xÞ t Xn 

l m ij ðx j a i Þ 

þ Xc 

i¼1 

ða i 

i¼1 

xÞ Xn 

j¼1 

j¼1 

l m ij ðx j a i Þ t :


According to Eq. (5), we have P n 

j¼1 lm ij a i ¼ 

P n 

j¼1 lm ij x j. Thus, we can create the property that 

S FT = S FW + S FB where a i is defined by Eq. (5). 

This property is exactly the same as S T = S W + S B 

for the (crisp) scatter matrix. 

Fuzzy <strong>clustering</strong>s including FCM(Bezdek, 

1981), alternative FCM(Wu and Yang, 2002; 

Yang et al., 2002), G–K (Gustafson and Kessel, 

1979), SAND (Rouseeuw et al., 1996), MCV 

(Krishnapuram and Kim, 2000), and UFP-ONC 

(Gath and Geva, 1989), etc., are all <strong>based</strong> on 

the <strong>fuzzy</strong> within-cluster scatter matrix S FW . 

It is known that the FCM<strong>clustering</strong> <strong>algorithm</strong> 

is created by minimizing the objective 

function 

J FCM ¼ trðS FW Þ¼ Xc 

i¼1 

X n 

j¼1 

with the membership update equation 

l ij ¼ ðkx j a i k 2 Þ 1 

m 1 

P c 

k¼1 ðkx j a k k 2 Þ 1 

m 1 

l m ij kx j a i k 2 ; ð9Þ 

ð10Þ 

and the cluster center update Eq. (5). Although 

tr(S FT ) = tr(S FW ) + tr(S FB ) for a given data set X, 

tr(S FT ) is not a fixed constant but depends on l ij . 

Thus, to minimize tr(S FW ), it is not necessary to 

maximize tr(S FB ). The trace tr(S FB ) of a <strong>fuzzy</strong> set 

between the cluster scatter matrix can be interpreted 

as a separation with a cluster variation in 

between. A maximum value of tr(S FB ) will induce 

a <strong>clustering</strong> result with separated (distinguishable) 

clusters. In the next section, an <strong>algorithm</strong> that considers 

tr(S FW ) and tr(S FB ) simultaneously is 

introduced. 

3. A proposed <strong>fuzzy</strong> <strong>clustering</strong> <strong>algorithm</strong> <strong>based</strong> on 

tr(S FW ) and tr(S FB ) 

Fukuyama and Sugeno (1989), Sugeno et al. 

(1993) had used tr(S FW ) and tr(S FB ) to create an 

index FS(c) in the cluster validity problem where 

Pal and Bezdek (1995) gave more discussions in 

cluster validity for FCM. The validity index 

FS(c) was formed with 

FSðcÞ ¼trðS FW Þ 

¼ Xc 

i¼1 

X c 

i¼1 

X n 

j¼1 

X n 

trðS FB Þ 

l m ij kx j a i k 2 

j¼1 

l m ij ka i xk 2 : ð11Þ 

A small FS(c) value will induce a good <strong>fuzzy</strong> 

<strong>clustering</strong> result with a small <strong>fuzzy</strong> within-cluster 

variation tr(S FW ) and a large <strong>fuzzy</strong> between-cluster 

variation tr(S FB ). This will help us find a good 

cluster number estimate. It is reasonable to have 

a <strong>clustering</strong> objective function containing a measure 

of within- and between-cluster variations such 

as the FS(c) index. However, there does not exist 

an update equation for a i by differentiating FS(c) 

with respect to a i . Thus, FS(c) cannot be a <strong>clustering</strong> 

objective function. 

In this section, we propose a <strong>novel</strong> <strong>fuzzy</strong> <strong>clustering</strong> 

objective function which is a modification 

of the FS(c) index. This could be also a generalization 

of the FCMobjective function by combining 

<strong>fuzzy</strong> within- and between-cluster variations. Our 

goal is to minimize a <strong>fuzzy</strong> within-cluster variation 

tr(S FW ) and also simultaneously maximize a <strong>fuzzy</strong> 

between-cluster variation tr(S FB ). We call this a 

<strong>fuzzy</strong> compactness and separation (FCS) <strong>algorithm</strong>, 

because the compactness is measured using 

a <strong>fuzzy</strong> within variation and the separation is 

measured using a <strong>fuzzy</strong> between variation. Thus, 

the FCS objective function J FCS is defined as 

J FCS ¼ Xc X n 

l m ij kx j a i k 2 

i¼1 j¼1 

X c X n 

i¼1 

j¼1 

g i l m ij ka i xk 2 ; ð12Þ 

where g i P 0. Note that, J FCS = J FCM when g i =0 

and J FCS = FS(c) when g i = 1. By minimizing J FCS 

we have the following update equations: 

ðkx j a i k 2 g 

l ij ¼ 

i ka i xk 2 m 

Þ 1 

1 

P c 

k¼1 ðkx ð13Þ 

j a k k 2 g k ka k xk 2 m 

Þ 1 

1 

and 

P n 

j¼1 

a i ¼ 

lm ij x j 

P n 

j¼1 lm ij 

P n 

g i j¼1 lm ij 

P x 

g n ; ð14Þ 

i j¼1 lm ij


where the parameter g i could be set up with 

g i ¼ ðb=4Þmin i 0 6¼ika i a i 0k 2 

; 0 6 b 6 1:0: ð15Þ 

max k ka k xk 2 

In <strong>fuzzy</strong> <strong>clustering</strong>, we restrict l ij 2 [0,1]. Because 

l ij in Eq. (13) might be negative for some data 

point x j , we make some restrictions on it. For a 

given data point x j ,ifkx j a i k 2 6 g i ka i xk 2 , then 

l ij = 1, and l i 0 j ¼ 0, for all i 0 5 i. That is, if the distance 

between the data points and the ith cluster 

center are smaller than g i ka i xk 2 , these data 

points will then belong exactly to the ith cluster with 

membership value of one. Each cluster in FCS will 

have a crisp boundary such that all data points inside 

this boundary will have a crisp membership 

value l ij 2 {0,1} and other data points outside this 

boundary will have <strong>fuzzy</strong> membership values 

l ij 2 [0, 1]. Each crisp boundary will form a hyperball 

for the corresponding cluster and can be seen 

as a cluster kernel. Fig. 1 shows a two-cluster data 

set in which each cluster contains a cluster center 

and a cluster kernel. The volume of each cluster kernel 

is decided by the term g i ka i xk 2 . Data points 

outside the kernel will have <strong>fuzzy</strong> memberships. 

Note that Özdemir and Akarun (2002) proposed 

the partition index maximization (PIM) <strong>algorithm</strong> 

that uses a fixed volume of cluster kernels for each 

cluster. In our FCS, the volume of the kernel for 

each cluster is different. This FCS characteristic 

could catch more information from data according 

to different volumes with different shapes. 

In the k-means <strong>algorithm</strong>, each data point has a 

crisp membership value with l ij 2 {0,1}. Although 

cluster center 

<strong>fuzzy</strong> boundary 

cluster kernel 

Fig. 1. Clusters obtained by FCS. 

crisp boundary 

FCMallows data points to have membership values 

l ij in the interval [0,1], it has fewer crisp membership 

values (i.e. zero or one) in FCM, even 

when the data points are very close to any one of 

these c cluster centers. The memberships seem to 

be too <strong>fuzzy</strong> in FCM. In our FCS, crisp and <strong>fuzzy</strong> 

membership values co-exist. Data points that fall 

inside any one of c-cluster kernels (i.e. close to 

any one of c cluster centers) will have crisp memberships 

and those outside the cluster kernels (i.e. 

far away from all cluster centers) will have <strong>fuzzy</strong> 

membership values. To guarantee that no two of 

these c cluster kernels will overlap, g i is chosen 

as Eq. (15) such that the parameter b will control 

the size of each kernel. Since, for all i, 

0 

1 

g i ka i 

xk 2 ¼ðb=4Þðminka i 

i 0 6¼i 

6 b min 

i 0 6¼i 

a i 0k 2 ka i xk 2 

Þ@ 

A 

maxka k xk 2 

k 

ka i a i 0k 

! 2 

2 

ka i a i 0k 

2 

6 min 

i 0 6¼i 2 

for all 0 6 b 6 1; 

we have, for a given data point x j with l ij =1, 

2 

kx j a i k 2 6 g i ka i xk 2 ka i a i 0k 

6 min 

: 

i 0 6¼i 2 

Note that the value of ka i a i 0k=2 is a half the 

distance between cluster centers a i and a i 0. Thus, 

Eq. (15) should guarantee that no two of these c 

cluster kernels will overlap. If b = 1.0, the FCS 

<strong>algorithm</strong> will cluster the data set using the largest 

kernel for each cluster. If b = 0 (i.e. g i = 0), the 

FCS <strong>algorithm</strong> will cluster the data set with no 

cluster kernel. This will be equivalent to the 

FCM<strong>clustering</strong> <strong>algorithm</strong>. 

The cluster center update Eq. (14) in FCS can 

be interpreted as a weighted mean of the cluster 

center update equation of FCMand the grand 

mean x. We can rewrite Eq. (14) as 

P n 

j¼1 

a i ¼ 

lm ij x j= P n 

j¼1 lm ij 

g i x 

: ð16Þ 

1 g i 

The weights of the FCMcluster center and the 

grand mean x are 1 and g i , respectively. To maximize 

tr(S FB ), the cluster centers obtained by FCS


will move away from x with a corresponding 

weighted term g i . Since g i is a monotone increasing 

function of b, the cluster centers obtained by FCS 

with a large b value will be more far away from x 

than the centers obtained with a small b value. The 

proposed FCS <strong>clustering</strong> <strong>algorithm</strong> is summarized 

as follows: 

FCS Algorithm 1. (see also Yang et al. 

(2003)) Set the iteration counter ‘ = 0 and choose 

the initial values a ð0Þ 

i , i =1,...,c. Given b, e >0 

Step 1. Find g ð‘þ1Þ 

i using (15) 

Step 2. Find l ð‘þ1Þ 

ij using (13) 

Step 3. Find a ð‘þ1Þ 

i using (14) 

Increment ‘; until max i ka ð‘þ1Þ 

i 

a ð‘Þ 

i k < e. 

We now use a simple example to illustrate the 

FCS properties. This is a sampling data set drawn 

from a one-dimensional normal mixture (1/ 

3)N(0, 1) + (1/3)N(4, 1) + (1/3)N(10, 1) with three 

populations of means 0, 4 and 10. The histogram 

of this data set is shown in Fig. 2(a). Fig. 2(b) 

shows the FCMmembership function that belongs 

to the mean 0 cluster. Fig. 2(c)–(f) are the FCS 

membership functions that belong to the mean 0 

cluster with the values 0.9, 0.5, 0.1 and 0.05 of 

the parameter b. The cluster kernel volumes always 

decrease when b is decreasing. The volume 

of the cluster kernel can also be presented using 

the range with the value of one of the cluster membership 

functions. In our 3-cluster example, Fig. 

2(b)–(f) draw only the membership function of 

the mean 0 cluster. Thus, the range with the membership 

value one presents the volume of the mean 

0 cluster kernel, and the range with the membership 

value zero presents the volumes of the means 

4 and 10 cluster kernels. The crisp membership 

values and <strong>fuzzy</strong> membership values co-exist in 

FCS. When b = 0, the FCS membership values will 

be equivalent to FCMwith no existing cluster kernel. 

The three cluster centers obtained using FCM 

and FCS are shown in Table 1. When b decreases, 

the cluster centers obtained by FCS will then be 

closer to the grand mean x and also closer to the 

cluster centers obtained by the FCM. These results 

coincide with the property shown in Eq. (15). 

Histogram 

FCM 

FCS 

beta=0.9 

15 

1.0 

1.0 

Frequency 

10 

5 

membership 

0.5 

membership 

0.5 

0 

0.0 

0.0 

0 5 10 15 

0 5 10 

0 5 10 

data set 

data set 

data set 

(a) (b) (c) 

FCS 

beta=0.5 

FCS 

beta=0.1 

FCS 

beta=0.05 

1.0 

1.0 

1.0 

membership 

0.5 

membership 

0.5 

membership 

0.5 

0.0 

0.0 

0.0 

0 5 10 

0 5 10 

0 5 10 

data set 

data set 

data set 

(d) (e) (f) 

Fig. 2. Membership functions of FCM and FCS.


Table 1 

Cluster centers obtained by FCS 

Beta 

Cluster centers 

0.900 0.366 3.667 12.319 

0.500 0.281 3.687 11.221 

0.200 0.164 3.714 10.557 

0.100 0.111 3.726 10.356 

0.050 0.083 3.733 10.256 

0 (FCM) 0.052 3.740 10.156 

Although the FCMis a popular <strong>clustering</strong> <strong>algorithm</strong>, 

the cluster centers obtained by it will be closer 

to the grand mean x when the data set heavily 

overlaps. According to the FCS properties, the 

cluster centers obtained by it can be more accurate 

than the FCMin some situations. This will 

be illustrated in Section 5. In the next section, 

more details about the FCS parameters are 

presented. 

4. Optimality tests and parameter selection of FCS 

In general cases, when the data set is clustered 

into c (c > 1) subsets, each subset is often expected 

to have a different prototype (or cluster center) 

than the others. However, the grand sample center 

x of the data set is always a fixed point in the FCS 

<strong>algorithm</strong>. The FCS output will be x with a great 

probability if x is one stable solution in the FCS 

<strong>algorithm</strong>. To avoid such cases, we hope that x is 

not an attracted point in the FCS. How do we 

judge if x is attracted or a stable point of FCS? 

The Hessian matrix of the FCS objective function 

(12) must be studied. In order to simplify 

the calculations, substituting (13) into (12) yields 

(17) 

J ¼ Xn 

j¼1 

X c 

i¼1 

ðkx j a i k 2 g i ka i xk 2 Þ 1 

m 1! 1 m 

: 

ð17Þ 

It can be proved that J = min l J FCS . Therefore, 

it is enough to judge whether or not x is attracted 

or is a stable point in FCS using the Hessian 

matrix of (17). Let us set 

qðx j ; a i Þ¼kx j a i k 2 g i ka i xk 2 ; 

S j ¼ Xc 

i¼1 

m 

qðx j ; a i Þ 1 

1 

; l ij ¼ qðx m 

j; a i Þ 1 

1 

S j 

and 

h ij ¼ l m ij ½ð1 g iÞa i ðx j g i xÞŠ: 

We know that 

ð1Þ oJ 

oa i 

¼ 2 Xn 

ð2Þ 

j¼1 

o 2 J 

¼ 

4m 

oa i oa k m 1 

þ 2d ik 

l m ij ½ð1 g iÞa i ðx j g i xÞŠ: 

X n 

j¼1 

X n 

j¼1 

ðS j Þ m 

l m ij ð1 

1 h ij ðh kj Þ t 

! 

g iÞ I ss 

4m 

m 1 d X n 

ik ½qðx j ; a i ÞŠ 1 h ij ½ð1 

j¼1 

8i; 8j 

g i Þa i 

ðx j g i xÞŠ t : ð18Þ 

Therefore, the second-order term of TaylorÕs 

series expansion of (17) cab be expressed as 

follows: 

 

o 2 

! 

J 

u 

oa i oa a ¼ 4m 

2 

X n X 

ðS j Þ m 1 c 

l m ij 

k m 1 

ut a i 

½ð1 g i Þa i ðx j g i xÞŠ 

j¼1 

i¼1 

! 

þ2 Xc 

ð1 g i Þ Xn 

l m 4m X c X n 

ij ut a i 

u ai 

u t a 

m 1 

i 

l m ð1 g i Þa i ðx j g i xÞð1 g i Þa i ðx j g i xÞ t 

ij 

u 

qðx 

i¼1 

j¼1 

i¼1 j¼1 

j ;a i Þ 

ai 

: 

ð19Þ 

u t a


ðx j xÞðx j xÞ t 

If "i, a i ¼ x, C b X ¼ P n 

j¼1 

, and "j, 

nkx j xk 2 Þ 

kx j xk > 0, then we get the following equation 

 

u t o 2 

J a¼ða1 

a 

u 

oa i oa a 

k 

;a 2 ;...;a cÞ where a 

i¼x;8i 

P 

¼ 

4m 

c 

X n i¼1 u t 2 

a i 

ðx j xÞ 

cðm 1Þ kx j xk 2 

þ 2n 

c m X c 

i¼1 

j¼1 

 

 

ð1 g i Þu t 2m C X 

a i 

I ss u 

ðm 1Þ ð1 g i Þ 

ai 

: 

ð20Þ 

For simplifying the analysis, we assume that "i, 

g i = g and "j, "k, q(x j ,a k ) > 0. In this way, we can 

ignore the update equations for g i to reduce the 

complexity of the analysis on FCS. Thus, Eq. 

(19) turns into (21) as follows: 

A good <strong>clustering</strong> method should have the robust 

ability to tolerate noise and outliers. In this 

section, we use the gross error sensitivity and influence 

function (Huber, 1981) to show that our 

weighted cluster center update equation is robust 

to noise and outliers. Let {x 1 ,...,x n } be an 

observed data set of real numbers and h is an unknown 

parameter to be estimated. An M-estimator 

(Huber, 1981) is generated by minimizing the 

form 

X n 

j¼1 

qðx j ; hÞ; 

ð22Þ 

where q is an arbitrary function that can measure 

the loss of x j and h. Here, we are interested in a 

location estimate that minimizes 

X n 

j¼1 

qðx j hÞ ð23Þ 

u t a 

 

o 2 J 

oa i oa k 

 

u a ¼ 

! 4m 

2 

X n X 

ðS j Þ m 1 c 

l m ij 

m 1 

ut a i 

½ð1 gÞa i ðx j gxÞŠ 

j¼1 

i¼1 

þ 2ð1 

gÞ Xc 

i¼1 

X n 

j¼1 

l m ij ut a i 

I 

2m X n 

m 1 

j¼1 

l m ij ½ða ! 

i x j Þ gða i xÞŠ½ða i x j Þ gða i xÞŠ t 

ð1 gÞqðx j ; a i Þ P n 

u 

j¼1 lm ai 

: 

ij 

ð21Þ 

From Eq. (21), we know that if g approaches 

negative infinity, then any FCS solution will be stable. 

This is an unacceptable result. Similarly, if g 

approaches positive infinity, any FCS solution will 

be unstable. This is also unacceptable. Therefore, we 

can roughly set the range for g with 1


uðx hÞ 

ICðx; F ; hÞ ¼R u0 ðx hÞdF X ðxÞ ; ð25Þ 

where F X (x) denotes the distribution function of 

X. If the influence function of an estimator is unbounded, 

a noise or outliers might cause trouble. 

Identically, if the u function of an estimator is unbounded, 

noise and outliers will cause trouble. 

Many important robustness measures can be observed 

from the influence function. One of the 

important measures is the gross error sensitivity 

c , defined by 

c ¼ sup jICðx; F ; hÞj: 

ð26Þ 

x 

This quantity can interpret the worst approximate 

influence that the addition of an infinitesimal point 

mass can have on the value of the associated 

estimator. 

Let the loss between the data point x j and ith 

cluster center a i be 

qðx j a i Þ¼l m ij kx j a i k 2 g i l m ij ka i xk 2 ð27Þ 

and 

uðx j a i Þ¼ o qðx j a i Þ 

oa i 

¼ 2l m ij ðx j a i Þ 2g i l m ij ða i xÞ: ð28Þ 

By solving the equation P n 

j¼1 uðx j a i Þ¼0, we 

have the result shown in Eq. (14). Thus, the FCS 

cluster center is an M-estimator with the loss function 

(27) and u function (28). Note that, the u 

function of our estimator is a function of l m ij which 

depends on the fuzzifier m. We will show that the 

FCS cluster center is robust to noise and outliers 

when m is large. 

For a given data set, we will produce the FCS 

cluster centers {a 1 ,...,a c }, the parameters 

{g 1 ,...,g c } and the sample mean x by processing 

the FCS <strong>clustering</strong> <strong>algorithm</strong>. The relative influence 

(the u function) of an individual observation 

x toward the ith cluster center can be defined as 

uðx a i Þ¼ 2ðl i ðxÞÞ m ½ðx a i Þ g i ða i xÞŠ; 

ð29Þ 

where 

l i ðxÞ ¼ 

ðkx a i k 2 g i ka i xk 2 m 

Þ 1 

1 

P c 

k¼1 ðkx a : ð30Þ 

kk 2 g k ka k xk 2 m 

Þ 1 

1 

Note that, l i (x)=1ifkx a i k 2 6 g i ka i xk 2 . Suppose 

that x 2 R. For an extremely large or small x, 

we will have kx a i k 2 P g i ka i xk 2 and hence 

l i (x) 2 (0, 1). In general, the extremely large or 

small x will fall outside any one of c cluster kernels 

and have <strong>fuzzy</strong> membership value l i (x) 2 (0, 1) or 

more precisely, l i (x) will be very closed to 1/c 

and hence l i (x) m will be very closed to zero when 

m is large. Although ðx a i Þ g i ða i xÞ is a monotone 

increasing function of x, ðl i ðxÞÞ m ½ðx a i Þ 

g i ða i xÞŠ will be very closed to zero for an extremely 

large or small x when m is large. Thus, the u 

function of the FCS cluster center will be bounded 

for an extremely large or small x when m tends to 

infinity. Therefore, for a large m case, only the 

data point x inside the ith cluster kernel with 

l i (x) = 1 will have an influence on the ith FCS 

cluster center a i and the data point x falls on the 

crisp boundary of the ith cluster center will have 

the gross error sensitivity c < 1. The above discussions 

give a theoretical foundation to asseverate 

the FCS <strong>algorithm</strong> to be robust to noise and 

outlier when m is large. The following is a simple 

example. 

For the data set shown in Fig. 2(a), we implement 

FCS with b = 0.1 and m = 2 to produce the 

set of {a 1 ,a 2 ,a 3 }, {g 1 ,g 2 ,g 3 }andx. For a given 

data point x, the membership function l 1 (x) (Eq. 

(30)) is illustrated in Fig. 3(a) and the u function 

uðx a 1 Þ¼ 2ðl 1 ðxÞÞ 2 ½ðx a 1 Þ g 1 ða 1 xÞŠ 

is illustrated in Fig. 3(d). When m = 2, an extremely 

large or small x will have a large influence 

on a 1 as shown in Fig. 3(d). Equivalently, we 

implement FCS with m = 3 and m = 4 to have 

other sets of {a 1 ,a 2 ,a 3 } and {g 1 ,g 2 ,g 3 } to illustrate 

the l 1 (x) inFig. 3(b) and (c), respectively. The corresponding 

u functions are illustrated in Fig. 3(e) 

and (f). Fig. 3 shows that the influence of an extremely 

large or small x will become small when m increases. 

These numerical results coincide to our 

theoretical analysis. 

6. Numerical examples 

We implement the Normal-4 data set which was 

proposed by Pal and Bezdek (1995). Normal-4 is a 

four-dimensional data set with the sample size


FCS 

beta=0.1, m=2 

FCS 

beta=0.1, m=3 

FCS 

beta=0.1, m=4 

1.0 

1.0 

1.0 

membership 

0.5 

membership 

0.5 

membership 

0.5 

0.0 

0.0 

0.0 

0 

10 

20 30 40 50 

x 

0 10 20 30 40 50 

x 

0 10 20 30 

x 

(a) (b) (c) 

40 

50 

4 

The phi function of FCS 

beta=0.1, m=2 

4 


beta=0.1, m=3 

4 


beta=0.1, m=4 

2 

2 

2 

phi function 

0 

-2 

-4 


0 

-2 

-4 


0 

-2 

-4 

-6 

-6 

-6 

-8 

0 10 20 30 40 50 

x 

-8 

0 10 20 30 40 50 

x 

-8 

0 10 20 30 40 50 

x 

(d) (e) (f) 

Fig. 3. Membership functions (Eq. (30)) and phi functions (Eq. (29)) for a given data point x when m = 2, 3 and 4. 

n = 800 points consisted of 200 points from each 

of four clusters. The population mean vectors 

are l 1 = (3,0,0,0), l 2 = (0, 3,0,0), l 3 = (0, 0,3,0) 

P 

and l 4 = (0, 0,0,3). The covariance matrices are 

i ¼ I 4, i = 1, 2, 3, 4. We use the mean vectors 

as the initial values for both FCMand FCS <strong>algorithm</strong>s. 

We compare FCMand FCS with the mean 

squared error (MSE) criterion. The MSE is calculated 

by P 4 

i¼1 ka i l i k 2 . We implement the FCS 

<strong>algorithm</strong> with different values combinations of b 

for (0,0.005, 0.01, 0.05,0.1, 0.2) and m for 

(1.5,2,2.5,3,3.5). For 100 repeated Normal-4 data 

sets, the MSE values are shown in Fig. 4(a). The 

result of b = 0 is equivalent to FCM. As b increases, 

the volume of the cluster kernel grows. 

When the <strong>fuzzy</strong> index m is small, the results of 

FCS (b > 0) and FCM(b = 0) are similar. As m becomes 

larger, FCMgives larger MSE values than 

FCS. Thus, the FCS result is more insensitive to 

the <strong>fuzzy</strong> index m than FCM, especially for 

b = 0.05, 0.1 and 0.2. In general, the MSE values 

will increase when m increases. Yu et al. (2004) 

gave a theoretical upper bound for m for FCM 

that the grand sample mean x will be a unique 

optimizer. For each combination of m and b, the 

range between worst and best MSE values among 

these 100 repeated MSE values are shown in Fig. 

4(b). The cases of m = 3.5 and b = 0,0.05, 0.01 

are the examples of the grand sample mean x being 

the unique optimizer. The range of MSE values for 

FCS with b = 0.05, 0.1 and 0.2 are still insensitive 

to the <strong>fuzzy</strong> index m. One may argue that we can 

process FCMwith a small m value, say 1.5 or 2, 

to avoid the above defective of FCM. However, 

FCS are not only insensitive to the <strong>fuzzy</strong> index m 

than FCM, but also more robust to noise and outliers 

than FCMwhen m is large. We will show this 

robust property of FCS in the following examples. 

Fig. 5 shows a two-cluster data set with unequal 

sample sizes. The <strong>clustering</strong> results of FCS with 

m = 2 and b = 0,0.1,0.2 are shown in Fig. 5(a)– 

(c). Three figures show the similar results. The 

cluster centers are presented by the solid circle 

points. Note that, the distance between two cluster 

centers increases as b increases. This phenomenon 

can be explained by the update Eq. (17) of FCS.


MSE 

10 

5 

0 

00.005 

0.01 0.05 

beta 

0.1 

0.2 

(a) 

1.5 

2.5 

m 

3.5 

range of MSE 

25 

20 

15 

10 

5 

0 

00.005 

0.01 0.05 0.1 

beta 

0.2 

(b) 

1.5 

2.5 

m 

3.5 

Fig. 4. MSE values for different combinations of beta and m. 

When m becomes larger (m = 6), the <strong>clustering</strong> results 

of FCS with b = 0, 0.1 and 0.2 are shown in 

Fig. 5(d)–(f). The FCS with cluster kernels 

(b = 0.1 and 0.2) obtains better performance than 

FCM(b = 0) which clusters the data set without 

a cluster kernel. This shows that FCS with a large 

and suitable m value can detect unequal sample 

size clusters or is robust to the noise. Fig. 6 shows 

a two-cluster data set with one outlying point 

whose coordinate is (100, 0). When m is large 

(m = 6), the results of FCS with b = 0.1 and 

0.2 are more robust to the outlier than FCM 

(b = 0). These robust properties of FCS can be 

explained using the FCS update equations. Let 

^l ¼ maxfl i1 ; ...; l in g, l 0 ij ¼ l ij=^l, j =1,...,n. We 

have 

80 

FCM 

m=2 

80 

FCS 

m=2, beta=0.1 

80 

FCS 

m=2, beta=0.2 

70 

70 

70 

60 

60 

60 

50 

50 

50 

40 

40 

40 

30 

30 

30 

20 

20 

20 

10 

10 

10 

10 20 30 40 50 60 70 80 90 100 

(a) 

10 20 30 40 50 60 70 80 90 100 

(b) 

10 20 30 40 50 60 70 80 90 100 

(c) 

80 

FCM 

m=6 

80 

FCS 

m=6, beta=0.1 

80 

FCS 

m=6, beta=0.2 

70 

70 

70 

60 

60 

60 

50 

50 

50 

40 

40 

40 

30 

30 

30 

20 

20 

20 

10 

10 

10 

10 20 30 40 50 60 70 80 90 100 

10 20 30 40 50 60 70 80 90 100 

10 20 30 40 50 60 70 80 90 100 

(d) (e) (f) 

Fig. 5. FCMand FCS <strong>clustering</strong> results for unequal sample size data sets.


2 

FCM 

m=6 

2 

FCS 

m=6, beta=0.1 

2 

FCS 

m=6, beta=0.2 

1 

1 

1 

0 

0 

0 

-1 

-1 

-1 

-2 

-2 

-2 

-3 -2 -1 0 1 2 3 4 5 6 7 

-3 -2 -1 0 1 2 3 4 5 6 7 

-3 -2 -1 0 1 2 

(a) (b) (c) 

3 

4 

5 

6 

7 

Fig. 6. FCMand FCS <strong>clustering</strong> results for the two-clusters data set with one outlier in the coordinate (100,0). 

P n 

lim fa j¼1 

ig¼ lim 

lm ij x P n 

j g i j¼1 lm ij 

P x 

n P 

m!1 m!1 

j¼1 lm ij g n 

i j¼1 lm ij 

P n P n 

j¼1 

¼ lim 

ðl0 ij Þm x j g i j¼1 ðl0 ij Þm x 

P n m!1 

j¼1 ðl0 ijÞ m P n 

g i j¼1 ðl0 ijÞ m 

P 

Pl 

¼ 

0 ¼1x j g 

ij i l 0 ¼1x ij 

ð1 g i Þ P l 0 ¼11 ij 

P 

l 

¼ 

0 ¼1x j= P 

l 

g 

ij ij1 0 i x 

1 g 

 

i 

P 

l ij ¼^l x j= P 

l ij ¼^l 1 g i x 

¼ 

: ð31Þ 

1 g i 

In FCM, when m is large, l ij =1/c for all i, j and 

hence l ij ¼ ^l for all i, j. This is why FCMcould 

obtain results in which the sample mean x will be 

a unique optimizer when m is large. However, 

the data points inside the cluster kernels in FCS 

will have l ij 2 {0,1} and l ij 2 (0,1) for those data 

points outside cluster kernels. When m is large, 

the ith cluster center update Eq. (22) will give a 

large ðl 0 ij Þm ¼ðl ij =^lÞ m ¼ 1 for the data points inside 

the ith cluster kernel and will give a small 

ðl 0 ij Þm 0 for the data points outside the ith cluster 

kernel. When m is large, the ith cluster center update 

Eq. (22) will be the weighted mean of the sample 

mean of the data points inside the ith cluster 

kernel and the grand mean x. The sample mean 

data point weights inside the ith cluster kernel 

and the grand mean x are 1 and g i , respectively. 

For a suitable b value, noise and outliers will be 

outside the cluster kernels and their influences on 

the <strong>clustering</strong> results will be small when m is large. 

This explains the <strong>clustering</strong> results shown in Figs. 

5 and 6 and also coincide to the theoretical analysis 

in Section 5. This property also provides a 

method to avoid the sample mean x being a unique 

optimizer in FCM. 

We know that when the sample mean x is the 

unique optimizer of a <strong>fuzzy</strong> <strong>clustering</strong> <strong>algorithm</strong>, 

the partition coefficient (PC) (Bezdek, 1974) 

defined by 

PCðCÞ ¼ 1 n 

X c 

i¼1 

X n 

j¼1 

l 2 ij 

ð32Þ 

will be equal to 1/c or equivalently the non-<strong>fuzzy</strong> 

index (NFI) (Pal and Bezdek, 1995) defined by 

c 

NFIðcÞ ¼1 ð1 PCðCÞÞ ð33Þ 

c 1 

equals to zero. Note that Dave (1996) also proposed 

a modification of the PC index which is 

equivalent to the NFI index. According to the 

above analysis, we hope the FCS with cluster ker- 

NFI 

0.25 

0.20 

0.15 

0.10 

0.05 

0.00 

m=10 

0 0.05 0.1 0.15 0.2 0.5 0.99 

beta 

m=20 

Fig. 7. NFI(2) values for the unequal sample size data set 

shown in Fig. 5.


0.5 

m=1.5 

0.12 

m=2 

NFI 

0.4 

PIM 

NFI 

0.10 

0.08 

0.06 

FCS 

0.04 

0.3 

FCS 

0.02 

PIM 

0.00 

0 0.05 0.1 0.15 0.2 0.5 0.99 

beta, delta 

(a) 

0 0.05 0.1 0.15 0.2 0.5 0.99 

beta, delta 

(b) 

Fig. 8. NFI(11) values for the normalized Vowel data set in which both PIMand FCS <strong>algorithm</strong>s are processed with the same 

parameter values. 

nels can avoid the situation in which the sample 

mean is a unique optimizer of the FCS objective 

function. Fig. 7 presents the NFI (2) values of 

the data set shown in Fig. 5. The NFI values of 

FCS with cluster kernels (b > 0) are always larger 

than the NFI values of the FCM(b = 0) which is 

the case of the sample mean x being the unique 

optimizer with NFI = 0 when m = 10 and 20. This 

shows that the FCS <strong>algorithm</strong> can avoid the case 

of NFI = 0 and is robust to the noise and outliers 

than FCMwhen m is larger. Because the sample 

mean x of the data set shown in Fig. 6 will not 

be the unique optimizer of FCMand FCS when 

m is larger, we do not show their NFI values. 

Note that some properties of FCS discussed 

above can also be achieved by the partition index 

maximization (PIM) <strong>algorithm</strong> (Özdemir and 

Akarun, 2002) which used a fixed volume for all 

cluster kernels. The radius of each cluster volume 

in PIMis defined by 

a ¼ d minfmin ka i a 0 

i6¼i 0 ik=2g; 0 6 d 6 1: ð34Þ 

The NFI values of the normalized Vowel data set 

in the UCI Machine Learning Repository (Blake 

and Merz, 1998) of PIMand FCS are shown in 

Fig. 8. Yu et al. (2004) showed that when 

m > 1.7787, the sample mean x will be the unique 

optimizer of FCMfor the normalized Vowel data 

set in Blake and Merz (1998). InFig. 8(a), when 

m = 1.5, both PIMand FCS with different d and 

b values have the NFI index values larger than 

0.3. However, when m = 2 as shown in Fig. 8(b), 

the PIMgive the same NFI values as FCM 

(d =0 or b = 0). The use of the same volumes of 

the cluster kernels do not help PIMto have a larger 

NFI values than FCM. The same situation 

when m = 2 in FCS as shown in Fig. 8(b), the 

NFI values of FCS are always larger than FCM 

and PIM. Using the different cluster kernel volumes 

in FCS produces these good merits. 

7. Conclusions 

We proposed a <strong>novel</strong> <strong>clustering</strong> <strong>algorithm</strong> 

called the FCS <strong>algorithm</strong> which attempts to minimize 

the <strong>fuzzy</strong> within-cluster scatter matrix trace 

and simultaneously maximize the <strong>fuzzy</strong> betweencluster 

scatter matrix trace. Each cluster obtained 

by the FCS will have a cluster kernel. Data points 

that fall inside any one of the c cluster kernels will 

have crisp memberships and be outside all of the 

cluster kernels that have <strong>fuzzy</strong> memberships. 

The volume of each cluster kernel is decided by 

the parameter g i which is a function of b. The crisp 

and <strong>fuzzy</strong> memberships co-exist in the FCS. The 

cluster center update equations in the FCS can 

be interpreted as a weighted mean of the FCM 

cluster centers and the grand mean x. Numerical 

examples show that the FCS can have more accurate 

results in the parameter estimation than the 

FCM. It also shows that FCS can help avoid the 

situation where the sample mean x is a unique 

optimizer of FCMand is more robust to noise


and outliers than FCMwhen m is large. A theoretical 

analysis of FCS was also investigated. Overall, 

the proposed FCS is recommended as a good 

<strong>clustering</strong> <strong>algorithm</strong> when the most compact kernels 

are in the same cluster and most separated 

are in different clusters. 

Acknowledgement 

This work was supported in part by the National 

Science Council of Taiwan, ROC, under 

grant NSC-91-2118-M-033-001. 

References 

Bezdek, J.C., 1974. Cluster validity with <strong>fuzzy</strong> sets. J. Cybernet. 

3, 58–73. 

Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective 

Function Algorithms. Plenum Press, New York. 

Blake, C.L., Merz, C.J., 1998. UCI repository of machine 

learning databases, a huge collection of artificial and realworld 

data sets. Available from: . 

Dave, R.N., 1996. Validating <strong>fuzzy</strong> partition obtained through 

c-shells <strong>clustering</strong>. Pattern Recognition Lett. 17, 613–623. 

Duda, R.O., Hart, P.E., 1973. Pattern Classification and Scene 

Analysis. Wiley, New York. 

Fukuyama, Y., Sugeno, M., 1989. A new method of choosing 

the number of clusters for <strong>fuzzy</strong> c-means method. In: 

Proceedings of the 5th Fuzzy System Symposium (in 

Japanese), pp. 247–250. 

Gath, J., Geva, A.B., 1989. Unsupervised optimal <strong>fuzzy</strong> 

<strong>clustering</strong>. IEEE Trans. Pattern Anal. Mach. Intell. 11, 

773–781. 

Gunderson, M., 1978. Application of <strong>fuzzy</strong> ISODATA <strong>algorithm</strong>s 

to star tracker pointing systems. In: Proceedings of 

the 7th Triennial World IFCA Cong., Helsinki, Filind, pp. 

1319–1323. 

Gustafson, D.E., Kessel, W.C., 1979. Fuzzy <strong>clustering</strong> with a 

<strong>fuzzy</strong> covariance matrix. In: Proceedings of the IEEE 

Conference on Decision Control, San Diego, CA, pp. 

761–766. 

Huber, P.J., 1981. Robust Statistics. Wiley, New York. 

Jain, A.K., Dubes, R.C., 1988. In: Algorithm for Clustering 

Data. Prentice-Hall, Englewood Cliffs, NJ. 

Kaufman, L., Rousseeuw, P.J., 1990. Finding Groups in Data: 

An Introduction to Cluster Analysis. Wiley, New York. 

Krishnapuram, R., Kim, J., 2000. Clustering <strong>algorithm</strong>s <strong>based</strong> 

on volume criteria. IEEE Trans. Fuzzy Syst. 8, 228–236. 

Özdemir, D., Akarun, L., 2001. Fuzzy <strong>algorithm</strong>s for combined 

quantization and dithering. IEEE Trans. Image Processing 

10 (6), 923–931. 

Özdemir, D., Akarun, L., 2002. A <strong>fuzzy</strong> <strong>algorithm</strong> for color 

quantization of images. Pattern Recognition 35, 1785–1791. 

Pal, N.R., Bezdek, J.C., 1995. On cluster validity for <strong>fuzzy</strong> 

c-means model. IEEE Trans. Fuzzy Syst., 370–379. 

Rouseeuw, P.J., Kaufman, L., Trauwaert, E., 1996. Fuzzy 

<strong>clustering</strong> using scatter matrices. Comput. Statist. Data 

Anal. 23, 135–151. 

Sugeno, M., Yasukawa, T., 1993. A <strong>fuzzy</strong>-logic-<strong>based</strong> approach 

to qualitative modeling. IEEE Trans. Fuzzy Syst. 1, 7–31. 

Wu, K.L., Yang, M.S., 2002. Alternative c-means <strong>clustering</strong> 

<strong>algorithm</strong>. Pattern Recognition 35, 2267–2278. 

Xie, X.L., Beni, G., 1991. A validity measure for <strong>fuzzy</strong> 

<strong>clustering</strong>. IEEE Trans. Pattern Anal. Mach. Intell. 13, 

841–847. 

Yang, M.S., 1993. A survey of <strong>fuzzy</strong> <strong>clustering</strong>. Mathl Comput. 

Model. 18, 1–16. 

Yang, M.S., Hu, Y.J., Lin, K.C.R., Lin, C.C.L., 2002. 

Segmentation techniques for tissue differentiation in MRI 

of Ophthalmology using <strong>fuzzy</strong> <strong>clustering</strong> <strong>algorithm</strong>s. Magn. 

Reson. Imaging 20, 173–179. 

Yang, M.S., Wu, K.L., Yu, J., 2003. A <strong>novel</strong> <strong>fuzzy</strong> <strong>clustering</strong> 

<strong>algorithm</strong>. In: Proceedings of the 2003 IEEE International 

Symposium on Computational Intelligence in Robotics and 

Automation (CIRA2003), Kobe, Japan, pp. 647–652. 

Yu, J., Cheng, Q., Huang, H., 2004. Analysis of the weighting 

exponent in the FCM. IEEE Trans. Syst. Man Cybernet. 

Part B 34, 634–638. 

Zadeh, L.A., 1965. Fuzzy sets. Inform. Contr. 8, 338–353. 

Zahid, N., Limouri, M., Essaid, A., 1999. A new cluster-validity 

for <strong>fuzzy</strong> <strong>clustering</strong>. Pattern Recognition 32, 1089–1097.

A novel fuzzy clustering algorithm based on a fuzzy scatter matrix ...

Create successful ePaper yourself

Delete template?

Save as template?