12.01.2015 Views

Download - Academy Publisher

Download - Academy Publisher

Download - Academy Publisher

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

topic that because it is flexible division of sample data<br />

sets and can detect invasion more objectively.<br />

A. The Theory of FCM<br />

Fuzzy clustering is a multi-technology for<br />

classification of objective things, which construct fuzzy<br />

resemblance according to the characteristics of the<br />

objective things, the relatedness and the similarity.<br />

Ref. [1] referred the method of fuzzy clustering<br />

analysis that it can be divided into three categories:<br />

1) The number of categories is indeterminate; it means<br />

to cluster dynamically according to the different<br />

requirements.<br />

2) The number of categories is given; the target is to<br />

find out the best way to classify the data. This method<br />

cluster based on objective function and called fuzzy C<br />

means (FCM) algorithm or fuzzy ISODATA clustering.<br />

3) In the case of significant perturbation, it clusters<br />

according to the fuzzy similarity matrix. This method is<br />

called fuzzy clustering based on perturbation.<br />

The theory of fuzzy C-means clustering (FCM) [2,3]:<br />

Fuzzy C-means clustering is an algorithm based on the<br />

division, and it is an improved algorithm based on<br />

C-means, the C-means algorithm is rigid for data<br />

partition, but FCM is flexible and fuzzy for partition.<br />

According to the quadratic sum in minimum of the<br />

specified grouping, FCM uses the membership to<br />

determine each data instance; it divides a data instance<br />

X = { X<br />

i<br />

X<br />

i<br />

∈ R( i =1,2,<br />

L,<br />

n)<br />

} with n into k<br />

categories ( 1 < K < N ), and calculates the cluster center<br />

of each category, in order to make the non-similarity<br />

value function minimum. The matrix of<br />

classification, U = ( uij i = 1,2,<br />

L n;<br />

j = 1,2, L,<br />

k)<br />

,<br />

where u<br />

ij<br />

indicated the membership of the data<br />

instance belong to , and satisfied the following<br />

conditions:<br />

k<br />

∑<br />

j=<br />

1<br />

u<br />

ij<br />

= 1, ∀i<br />

= 1, L,<br />

n.<br />

Use the FCM for fuzzy partition, so that each given<br />

data instance can determine which categories belong to,<br />

according to the membership between 0 and 1. The<br />

elements of the matrix U get values between 0 and 1.<br />

The value function defined as follows:<br />

m<br />

N k<br />

m 2<br />

( U,<br />

C) ∑∑uij<br />

d ij<br />

( X i<br />

C j<br />

)<br />

i= 1 j = 1<br />

⑴<br />

J = ,<br />

⑵<br />

J<br />

m<br />

can be seen as the quadratic sum of the distance<br />

between the each data instance and the cluster center. In<br />

(2), C = { C<br />

j<br />

C<br />

j<br />

∈ I, j = 1,2, L,<br />

k}<br />

, and C j<br />

∈ I<br />

indicate the cluster centers; X i<br />

∈ I indicate the data<br />

instance sets; u ij<br />

mean the membership of the data<br />

instance belong to the cluster center, their values are<br />

U = is a matrix of n × k ,<br />

u ij<br />

between 0 and 1, { }<br />

[ C1, C2 , L C k<br />

]<br />

{ c , c2 , }<br />

C = , is a matrix of s × k ;<br />

C =<br />

1<br />

L,<br />

c k<br />

, c<br />

i<br />

indicate the cluster center of<br />

p<br />

the fuzzy group; X ∈ R are the data instances;<br />

( X C )<br />

i<br />

d<br />

ij i<br />

,<br />

j<br />

indicate the distance between the data<br />

instance and the cluster center; m means the fuzzy<br />

coefficient ( 1 ≤ m < ∞)<br />

; k means the number of the<br />

pre-categories, it determined by the initial clustering. We<br />

can use the Lagrange multiplier method to obtain the<br />

necessary condition of minimum for J :<br />

c<br />

u<br />

ij<br />

ij<br />

= 1<br />

k<br />

2<br />

( ) ( m−1<br />

∑ ) dij<br />

di<br />

, ∀i<br />

⑶<br />

1<br />

i=<br />

1<br />

m<br />

m<br />

⎛ m ⎞ ⎛<br />

= ⎜∑uij<br />

x<br />

j<br />

⎟ ⎜∑u<br />

⎝ i=<br />

1 ⎠ ⎝ i=<br />

1<br />

ij<br />

m<br />

⎞<br />

⎟,<br />

∀j<br />

⎠<br />

The parameter m in the above formulas is a scalar to<br />

control the blur length of the classification matrixU , the<br />

bigger m is, the more blurred it is. If m = 1, the algorithm<br />

of FCM degenerates into hard C-means clustering (HCM)<br />

algorithm. FCM clustering needs many times to iterate so<br />

that the value function obtains the minimum.<br />

B. FCM Used in Anomaly Detection<br />

The intrusion detection algorithm based on FCM [3]:<br />

From the above discussion we can see, the FCM<br />

algorithm requires two parameters: the number of<br />

clusters C and the parameter m. The number of clusters<br />

can use the clustering number of initial clustering as C,<br />

and C is less than the total number of cluster samples.<br />

The detection optimization can follow these steps:<br />

Step1: initialize the membership matrix U with<br />

random number between 0 and 1, and satisfy the<br />

n<br />

formula∑uij<br />

= 1,<br />

∀j<br />

= 1, L , n .<br />

i=<br />

1<br />

n<br />

⎛ m ⎞<br />

Step2: use ci<br />

= ⎜∑uij<br />

x<br />

j<br />

⎟<br />

⎝ i=<br />

1 ⎠<br />

cluster centersC i<br />

, i = 1, L,<br />

k .<br />

n<br />

∑<br />

j=<br />

1<br />

u<br />

2 ( m−<br />

j )<br />

m<br />

ij<br />

⑷<br />

to calculate the<br />

c ⎛ d<br />

ij<br />

⎞<br />

Step3: use uij<br />

= 1 ∑⎜<br />

⎟ to calculate the<br />

k = 1 d<br />

⎝ kj ⎠<br />

new membership matrix U.<br />

Step4: calculate the value function according to<br />

J<br />

m<br />

N k<br />

m 2<br />

( U C) = ∑∑uij<br />

dij<br />

( X<br />

i<br />

, C<br />

j<br />

)<br />

, . If it is smaller<br />

i= 1 j=<br />

1<br />

than a determined threshold or is smaller than the change<br />

with the last value function, then it will stop and output<br />

the clustering results. Otherwise, return to Step2 to<br />

continue iterating.<br />

The output of the algorithm is a fuzzy partition matrix<br />

with N × K , the matrix indicate the membership of the<br />

90

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!