12.01.2015 Views

Download - Academy Publisher

Download - Academy Publisher

Download - Academy Publisher

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

each sample belongs to each category. It can identify<br />

each sample belongs to a category in accordance with the<br />

principle of maximum membership according to the<br />

matrix. We can see from the algorithm that it is better to<br />

cluster for the data of normal distribution, and is more<br />

sensitive to the isolated point.<br />

C. Problems about the Application of FCM<br />

In recent years, cluster analysis has become an<br />

important technique of data mining. Although many<br />

clustering methods obtain a wide range of applications,<br />

there are problems of limitations and adaptability for<br />

every method. To the fuzzy clustering algorithm, FCM<br />

algorithm is the most widely used, but it also has many<br />

issues to exist, such as the number of the cluster must be<br />

preset by users, and select the appropriate number of<br />

clusters is the precondition for a precise clustering, but it<br />

is difficult to determine the number of clustering; as<br />

FCM algorithm is essential a local optimization<br />

technology, which uses the Hill Climbing to find out the<br />

optimal solution for iterating, it is particularly sensitive<br />

to initialize, and easily to fall into the local minimum so<br />

that can’t obtain the global optimal solution.<br />

IV. OPTIMIZATION OF FUZZY CLUSTERING ALGORITHM<br />

The method of intrusion detection based on fuzzy<br />

clustering, FCM often combine with other methods to<br />

detect the intrusion, of course, there are many hybrid<br />

methods, such as the combination of FCM and adaptive<br />

immune system, the application of the average<br />

information entropy, the combination of FCM and<br />

support vector machine, the fuzzy genetic algorithm and<br />

etc.. The paper respectively introduced the research in<br />

recent years on how to obtain the number of clusters and<br />

the optimal solution in the following.<br />

A. Determination of The Number of Clusters<br />

Many studies aimed at the number of FCM clustering<br />

algorithm and the selection of the initial cluster centers,<br />

but the relevant research, only consider the determination<br />

of the number of clusters, or only to select the initial<br />

cluster centers. Ref. [4] introduced a method based on<br />

the average information entropy for the problem of the<br />

determination of the cluster number in FCM algorithm,<br />

and used the density function to obtain the initial cluster<br />

center.<br />

When the division of the cluster is more reasonable,<br />

the data on the attribution of a cluster is more established,<br />

and the information entropy is smaller. Ref. [4] improved<br />

the clustering algorithm based on the information<br />

entropy, it used the average information entropy as the<br />

standard for determining the number of cluster. The<br />

concept of the average information entropy is defined as<br />

follows:<br />

C N<br />

( k ) = −∑∑{ [ uij<br />

× log 2( uij<br />

) + ( − uij<br />

)×<br />

H 1<br />

i= 1 j=<br />

1<br />

( )] N}<br />

log 21−u ij<br />

⑸<br />

First, to define the range of the number of cluster,<br />

C . In (5), uij<br />

indicated the extent of the<br />

u ij<br />

∈ 0,1 , ∀i,<br />

. When<br />

[ ]<br />

min ,C max<br />

sample j belongs to clusteri , [ ] j<br />

k increase from C min<br />

to C<br />

max<br />

, it can create<br />

Cmax − Cmin<br />

+ 1 of H K<br />

( x)<br />

. According to the<br />

regulation, the smaller information entropy value is, the<br />

more established the data belongs to a cluster, and then<br />

select the smallest H K<br />

( x)<br />

, which corresponds to the<br />

cluster number k as the final cluster numberC .<br />

Ref. [5] proposed a fuzzy C-means and support vector<br />

machine algorithm (F-CMSVM) for automatic clustering<br />

number determination; it can solve the issue in fuzzy<br />

C-means algorithm (FCM) that clustering number has to<br />

be pre-defined. It used the algorithm of support vector<br />

machine with a fuzzy membership function and used the<br />

affiliating matrix which obtained by the introduction of<br />

support vector machine into fuzzy C-means algorithm as<br />

the fuzzy membership function, so that each different<br />

input data sample can have different penalty value, then<br />

it can obtain the optimized separating hyper-plane.<br />

For the fuzzy C-means and support vector machine<br />

algorithm, first of all, it suppose that the given data set<br />

can be divided into 2 categories ( k = 2)<br />

, and use the<br />

fuzzy C-means to cluster, then use the affiliating matrix<br />

which obtained by using the algorithm as the fuzzy<br />

membership of the fuzzy support vector machine<br />

algorithm, and train the data sets so that it can access to<br />

the support vector machine and the separating<br />

hyper-plane. To test the assumption, it had proposed a<br />

new standard in the paper: use the d<br />

SV<br />

as the distance<br />

between the two categories. And d<br />

S1<br />

, d<br />

S 2<br />

mean the<br />

average distance between the respective support vector<br />

of S1, S2 and the nearest neighbor points. If<br />

d<br />

SV<br />

≤ min( d<br />

S1,<br />

d<br />

S 2<br />

) , the original dataset can not be<br />

classified, and the assumption is not true; otherwise, it<br />

establish and the original dataset can be divided into two<br />

categories at least.<br />

B. Obtain The Global Optimal Solution<br />

For the problem that it is easy to fall into local<br />

minimum value caused by the sensitive initialization to<br />

FCM, people have been proposed the clustering method<br />

based on genetic algorithm, the method can converge to<br />

the global optimal value at a higher probability, but its’<br />

convergence rate is slower and it will appear the<br />

phenomenon of precocity. To solve the problem, Ref. [6]<br />

used the clonal selection algorithm (CSA) to optimize<br />

the objective function of the unsupervised FCM<br />

clustering algorithm; the clonal selection algorithm used<br />

the mechanism of antibody clone to construct a clone<br />

operator which combines characters of the evolutionary<br />

search, the global search, the stochastic search and the<br />

local search. As CSA based on the clone operator is a<br />

Groupization strategy, it has parallelism and randomness,<br />

so that it can obtain the global optimal solution with a<br />

91

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!