12.01.2015 Views

Download - Academy Publisher

Download - Academy Publisher

Download - Academy Publisher

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

higher probability and converge at a high speed. At the<br />

same time, the algorithm considered the affinity of<br />

immune response in order to overcome the phenomenon<br />

of precocity. In the paper, it introduced the improved<br />

FCM algorithm into the anomaly intrusion detection<br />

V. THE IMPROVED INTRUSION DETECTION ALGOITHM<br />

BASED ON FCM<br />

How to improve the detection efficiency of IDS is the<br />

emphasis for people to research constantly. We cluster<br />

the data which is not only the numerical data but also is<br />

character data. For example, the KDD Cup 1999 network<br />

datasets, each connection instance contains 41 properties.<br />

In 41 properties, there are 3 flag properties and 38<br />

numerical properties, at present, most of the studies were<br />

directed at the numerical properties of the sample data.<br />

If we research the hybrid attributes of the sample data,<br />

the distance can only used into the numerical data, so we<br />

used a new method to solve the problem[7], we may<br />

defined the distance between x , x and k as fallow:<br />

( x x )<br />

⎧ x ≠<br />

=<br />

ik<br />

x<br />

d<br />

jk<br />

ik<br />

,<br />

jk ⎨<br />

1<br />

⎩0 , ⑹<br />

, xik<br />

= x<br />

jk<br />

We supposed the object data has p numerical<br />

attributes and q categorical attributes, and the distance<br />

between the objects can be defined as fallow:<br />

' '<br />

( x , x ) d ( x , x ) + d ( x x )<br />

i<br />

j<br />

n<br />

i<br />

i<br />

d = , ⑺<br />

' '<br />

In (7), d<br />

n<br />

( xi<br />

, x<br />

j<br />

),<br />

i ≠ j<br />

numerical attributes, and ( x x )<br />

c<br />

j<br />

j<br />

c<br />

i<br />

is the distance of the<br />

d , is the distance of<br />

the character attributes.<br />

We can get the objective function of the hybrid attributes<br />

datasets; it can change from (2), that is:<br />

N<br />

⎧<br />

k p<br />

m ' ' 2<br />

J<br />

m<br />

( U,<br />

C) = ∑∑ ⎨ uij<br />

∑( xik<br />

− x<br />

jk<br />

) +<br />

i= 1 ⎩ j= 1 l=<br />

1<br />

i<br />

k p+<br />

q<br />

⎫<br />

m<br />

λ ∑uij<br />

∑d<br />

c<br />

( xik<br />

, x<br />

jk<br />

) ⎬<br />

⑻<br />

j=<br />

1 l=<br />

p+<br />

1 ⎭<br />

In (8), weight λ is to balance the properties of hybrid<br />

attributes datasets, and the value λ is determined by<br />

the proportion of two kinds properties; m > 1 is the<br />

fuzzy coefficient, it uses to control the blur length of U .<br />

Supposed:<br />

As<br />

n<br />

Ci<br />

and<br />

C<br />

C<br />

n<br />

Ci<br />

and<br />

c<br />

i<br />

n<br />

i<br />

c<br />

i<br />

=<br />

k p<br />

m<br />

'<br />

∑uij<br />

∑( xik<br />

− x<br />

jl<br />

)<br />

j= 1 l=<br />

1<br />

k p+<br />

q<br />

m<br />

∑uij<br />

∑ c<br />

j=<br />

1 l=<br />

p+<br />

1<br />

j<br />

' 2<br />

⑼<br />

( x , x )<br />

= λ d<br />

⑽<br />

ik<br />

c<br />

Ci<br />

are nonnegative, we can minimize<br />

J m<br />

U, C minimization, at<br />

C to make the ( )<br />

jk<br />

j<br />

the same time, we use the Lagrange multiplier method:<br />

u<br />

ij<br />

( x , x )<br />

−1<br />

2<br />

⎧<br />

⎫<br />

k<br />

d<br />

m<br />

⎪ ⎡ i j ⎤ −1<br />

⎪<br />

= ⎨∑ ⎢ ⎥ ⎬ , ∀i<br />

⎪ l=<br />

1 ⎣ d( xi<br />

, xl<br />

) ⎦ ⎪<br />

⎩<br />

⎭<br />

We can iterate the process with (9), (10) and (11),<br />

as m > 1, the algorithm is convergent.<br />

The improved intrusion detection algorithm based on<br />

FCM, which summarized as fallows [3]:<br />

Step1: initialize the membership matrix U with<br />

random number between 0 and 1, and satisfy (1).<br />

Step2: for the different attributes of data, respectively<br />

n<br />

use (9) and (10) to calculate the cluster centers, C<br />

i<br />

,<br />

C , i = 1,<br />

Lk.<br />

c<br />

i<br />

Step3: use (11) to calculate the new membership<br />

matrix U.<br />

Step4: calculate the value function according to (8). If<br />

it is smaller than a determined threshold or is smaller<br />

than the change with the last value function, then it will<br />

stop and output the clustering results. Otherwise, return<br />

to Step2 to continue iterating.<br />

We not only consider the numeric data, also<br />

considering the character data, when we use the method<br />

to research the sample data. It more comprehensively<br />

analyzed the data to the clustering, not only helps to<br />

reduce the rate of the false alarm and the rate of the<br />

failing alarm, at the same time, combine the method with<br />

the optimized algorithm of FCM; it can further enhance<br />

the detection efficiency.<br />

ACKNOWLEDGMENT<br />

This work is supported by Economic Commence<br />

Market Application Technology Foundation Grant by<br />

2007gdecof004.<br />

REFERENCES<br />

[1] Theodolidis S. Pattern Recongnition[M]. Second Edition,<br />

USA: Elsevier Science, 2003.<br />

[2] Gao XB. Fuzzy Cluster Analysis and its Application[M].<br />

XI’AN: Xidian University Press, 2004, 49–61.<br />

[3] Yang DG. Research of The Network Intrusion Detection<br />

Based on Fuzzy Clustering[J]. Computer Science, 2005,<br />

32(1): 86–91.<br />

[4] Song QK, Hao M. Improved Fuzzy C-means Clustering<br />

Algorithm[J]. Journal Harbin University Science and<br />

Technology. 2007, 12(4):8–10.<br />

[5] Xiao LZ, Shao ZQ, Ma HH, Wang XY, Liu G. An<br />

Algorithm for Automatic Clustering Number<br />

Determination in Network Intrusion Detection[J]. Journal<br />

of Software, 2008, 19(8):2140–2148.<br />

[6] Xian JQ, Lang FH. Anomaly Detection Method Based on<br />

CSA-Based Unsupervised Fuzzy Clustering Algorithm[J].<br />

Journal of Beijing University of Posts and<br />

Telecommunications, 2005, 28(4):103–106.<br />

[7] Li J, Gao XB, Jiao LC. A GA-Based Clustering Algorithm<br />

for Large Data Sets with Mixed Numerical and Categorical<br />

Values[J]. Journal of Electronics & Information Technology,<br />

2004, 26(8):1203–1209.<br />

⑾<br />

92

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!