Download - Academy Publisher
Download - Academy Publisher
Download - Academy Publisher
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
higher probability and converge at a high speed. At the<br />
same time, the algorithm considered the affinity of<br />
immune response in order to overcome the phenomenon<br />
of precocity. In the paper, it introduced the improved<br />
FCM algorithm into the anomaly intrusion detection<br />
V. THE IMPROVED INTRUSION DETECTION ALGOITHM<br />
BASED ON FCM<br />
How to improve the detection efficiency of IDS is the<br />
emphasis for people to research constantly. We cluster<br />
the data which is not only the numerical data but also is<br />
character data. For example, the KDD Cup 1999 network<br />
datasets, each connection instance contains 41 properties.<br />
In 41 properties, there are 3 flag properties and 38<br />
numerical properties, at present, most of the studies were<br />
directed at the numerical properties of the sample data.<br />
If we research the hybrid attributes of the sample data,<br />
the distance can only used into the numerical data, so we<br />
used a new method to solve the problem[7], we may<br />
defined the distance between x , x and k as fallow:<br />
( x x )<br />
⎧ x ≠<br />
=<br />
ik<br />
x<br />
d<br />
jk<br />
ik<br />
,<br />
jk ⎨<br />
1<br />
⎩0 , ⑹<br />
, xik<br />
= x<br />
jk<br />
We supposed the object data has p numerical<br />
attributes and q categorical attributes, and the distance<br />
between the objects can be defined as fallow:<br />
' '<br />
( x , x ) d ( x , x ) + d ( x x )<br />
i<br />
j<br />
n<br />
i<br />
i<br />
d = , ⑺<br />
' '<br />
In (7), d<br />
n<br />
( xi<br />
, x<br />
j<br />
),<br />
i ≠ j<br />
numerical attributes, and ( x x )<br />
c<br />
j<br />
j<br />
c<br />
i<br />
is the distance of the<br />
d , is the distance of<br />
the character attributes.<br />
We can get the objective function of the hybrid attributes<br />
datasets; it can change from (2), that is:<br />
N<br />
⎧<br />
k p<br />
m ' ' 2<br />
J<br />
m<br />
( U,<br />
C) = ∑∑ ⎨ uij<br />
∑( xik<br />
− x<br />
jk<br />
) +<br />
i= 1 ⎩ j= 1 l=<br />
1<br />
i<br />
k p+<br />
q<br />
⎫<br />
m<br />
λ ∑uij<br />
∑d<br />
c<br />
( xik<br />
, x<br />
jk<br />
) ⎬<br />
⑻<br />
j=<br />
1 l=<br />
p+<br />
1 ⎭<br />
In (8), weight λ is to balance the properties of hybrid<br />
attributes datasets, and the value λ is determined by<br />
the proportion of two kinds properties; m > 1 is the<br />
fuzzy coefficient, it uses to control the blur length of U .<br />
Supposed:<br />
As<br />
n<br />
Ci<br />
and<br />
C<br />
C<br />
n<br />
Ci<br />
and<br />
c<br />
i<br />
n<br />
i<br />
c<br />
i<br />
=<br />
k p<br />
m<br />
'<br />
∑uij<br />
∑( xik<br />
− x<br />
jl<br />
)<br />
j= 1 l=<br />
1<br />
k p+<br />
q<br />
m<br />
∑uij<br />
∑ c<br />
j=<br />
1 l=<br />
p+<br />
1<br />
j<br />
' 2<br />
⑼<br />
( x , x )<br />
= λ d<br />
⑽<br />
ik<br />
c<br />
Ci<br />
are nonnegative, we can minimize<br />
J m<br />
U, C minimization, at<br />
C to make the ( )<br />
jk<br />
j<br />
the same time, we use the Lagrange multiplier method:<br />
u<br />
ij<br />
( x , x )<br />
−1<br />
2<br />
⎧<br />
⎫<br />
k<br />
d<br />
m<br />
⎪ ⎡ i j ⎤ −1<br />
⎪<br />
= ⎨∑ ⎢ ⎥ ⎬ , ∀i<br />
⎪ l=<br />
1 ⎣ d( xi<br />
, xl<br />
) ⎦ ⎪<br />
⎩<br />
⎭<br />
We can iterate the process with (9), (10) and (11),<br />
as m > 1, the algorithm is convergent.<br />
The improved intrusion detection algorithm based on<br />
FCM, which summarized as fallows [3]:<br />
Step1: initialize the membership matrix U with<br />
random number between 0 and 1, and satisfy (1).<br />
Step2: for the different attributes of data, respectively<br />
n<br />
use (9) and (10) to calculate the cluster centers, C<br />
i<br />
,<br />
C , i = 1,<br />
Lk.<br />
c<br />
i<br />
Step3: use (11) to calculate the new membership<br />
matrix U.<br />
Step4: calculate the value function according to (8). If<br />
it is smaller than a determined threshold or is smaller<br />
than the change with the last value function, then it will<br />
stop and output the clustering results. Otherwise, return<br />
to Step2 to continue iterating.<br />
We not only consider the numeric data, also<br />
considering the character data, when we use the method<br />
to research the sample data. It more comprehensively<br />
analyzed the data to the clustering, not only helps to<br />
reduce the rate of the false alarm and the rate of the<br />
failing alarm, at the same time, combine the method with<br />
the optimized algorithm of FCM; it can further enhance<br />
the detection efficiency.<br />
ACKNOWLEDGMENT<br />
This work is supported by Economic Commence<br />
Market Application Technology Foundation Grant by<br />
2007gdecof004.<br />
REFERENCES<br />
[1] Theodolidis S. Pattern Recongnition[M]. Second Edition,<br />
USA: Elsevier Science, 2003.<br />
[2] Gao XB. Fuzzy Cluster Analysis and its Application[M].<br />
XI’AN: Xidian University Press, 2004, 49–61.<br />
[3] Yang DG. Research of The Network Intrusion Detection<br />
Based on Fuzzy Clustering[J]. Computer Science, 2005,<br />
32(1): 86–91.<br />
[4] Song QK, Hao M. Improved Fuzzy C-means Clustering<br />
Algorithm[J]. Journal Harbin University Science and<br />
Technology. 2007, 12(4):8–10.<br />
[5] Xiao LZ, Shao ZQ, Ma HH, Wang XY, Liu G. An<br />
Algorithm for Automatic Clustering Number<br />
Determination in Network Intrusion Detection[J]. Journal<br />
of Software, 2008, 19(8):2140–2148.<br />
[6] Xian JQ, Lang FH. Anomaly Detection Method Based on<br />
CSA-Based Unsupervised Fuzzy Clustering Algorithm[J].<br />
Journal of Beijing University of Posts and<br />
Telecommunications, 2005, 28(4):103–106.<br />
[7] Li J, Gao XB, Jiao LC. A GA-Based Clustering Algorithm<br />
for Large Data Sets with Mixed Numerical and Categorical<br />
Values[J]. Journal of Electronics & Information Technology,<br />
2004, 26(8):1203–1209.<br />
⑾<br />
92