Download Full Issue in PDF - Academy Publisher
Download Full Issue in PDF - Academy Publisher
Download Full Issue in PDF - Academy Publisher
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
JOURNAL OF COMPUTERS, VOL. 8, NO. 6, JUNE 2013 1461<br />
where F represents the fitness function, m is the number<br />
of output node, c i and c x represent respectively forecast<br />
output and the desired output of the first i node.<br />
(4) Selection operation<br />
Dur<strong>in</strong>g this process, we adapt proportion selection<br />
operator and calculate the probability of each <strong>in</strong>dividual's<br />
fitness <strong>in</strong> accordance with the equation (10). The<br />
<strong>in</strong>dividuals with larger probability is selected as the best<br />
<strong>in</strong>dividual to the next generation of genetic population,<br />
the one with smaller probability not.<br />
(5) The crossover operation<br />
Dur<strong>in</strong>g crossover operation, two <strong>in</strong>dividual are selected<br />
randomly from population to generate a new and<br />
outstand<strong>in</strong>g <strong>in</strong>dividual.<br />
As the optimization of this part adopts b<strong>in</strong>ary cod<strong>in</strong>g,<br />
one-po<strong>in</strong>t crossover operator is used dur<strong>in</strong>g the crossover<br />
operation. For a matched pair of <strong>in</strong>dividual, select<br />
randomly the cross-po<strong>in</strong>t and swap the other bit from<br />
cross-po<strong>in</strong>t. The operat<strong>in</strong>g diagram is shown <strong>in</strong> Figure 5.<br />
The b<strong>in</strong>ary str<strong>in</strong>g 1001 <strong>in</strong> <strong>in</strong>dividual A exchanges data<br />
<strong>in</strong>formation with b<strong>in</strong>ary str<strong>in</strong>g 0011 <strong>in</strong> <strong>in</strong>dividual B. After<br />
crossover operation, it generates two new <strong>in</strong>dividual and<br />
<strong>in</strong>creases the diversity of <strong>in</strong>dividual.<br />
Figure 5.<br />
Crossover operation<br />
(6) Mutation operation<br />
Mutation operation can also <strong>in</strong>crease the diversity of<br />
<strong>in</strong>dividual. Here, select a s<strong>in</strong>gle po<strong>in</strong>t mutation operator<br />
and random mutation po<strong>in</strong>t, then 0 and 1 is exchanged.<br />
The pr<strong>in</strong>ciple is shown <strong>in</strong> Figure 6. Two new <strong>in</strong>dividual<br />
generate after this operation.<br />
Figure 6. Mutation operation<br />
(7) The establishment of New-GA-SSOM network<br />
model<br />
After many times evolution, when meet<strong>in</strong>g the<br />
iteration condition, the output of the population is the<br />
optimal solution of the problem. They are the handsome<br />
and the most representative <strong>in</strong>put variable comb<strong>in</strong>ation.<br />
Through the above steps, we get the optimal<br />
chromosome which is composed of the optimal feature.<br />
Extract a set of variables from the best chromosome gene<br />
as the f<strong>in</strong>al <strong>in</strong>put variables to achieve the dimension<br />
reduction of <strong>in</strong>dependent variables. That is the new neural<br />
network model, named New-GA-SSOM. Then we use<br />
this model to tra<strong>in</strong> network, and carry out <strong>in</strong>trusion<br />
detection data based on KDD Cup 1999 data set.<br />
V. EXPERIMENT<br />
KDD Cup 1999 data set is a standard data set for<br />
<strong>in</strong>trusion detection, <strong>in</strong>clud<strong>in</strong>g the tra<strong>in</strong><strong>in</strong>g data set and test<br />
data set. The tra<strong>in</strong><strong>in</strong>g data set <strong>in</strong>cludes 494 021 records<br />
and test<strong>in</strong>g data set <strong>in</strong>cludes 311 029 records. In the<br />
KDD99 data set, each data example represents attribute<br />
values of a class <strong>in</strong> the network data flow, and each class<br />
is labeled either as normal or as an attack with exactly<br />
one specific attack type. There are 22 types of attacks <strong>in</strong><br />
the tra<strong>in</strong><strong>in</strong>g data set and an <strong>in</strong>crease of new 14 k<strong>in</strong>ds of<br />
attacks <strong>in</strong> the test<strong>in</strong>g data set. All the attack types can be<br />
divided <strong>in</strong>to four major categories: Prob<strong>in</strong>g, Denial of<br />
Service (DoS), User-to-Root (U2R) and Remote-to-Local<br />
(R2L). Each complete TCP (transmission control<br />
protocol) connection is considered as a record, <strong>in</strong>clud<strong>in</strong>g<br />
four types of attributes collection: time-based traffic<br />
features, host-based traffic features, content features and<br />
basic features [22, 23, 24].<br />
Our experiment is based on the KDD Cup 1999<br />
<strong>in</strong>trusion detection data set. Tra<strong>in</strong><strong>in</strong>g data set is composed<br />
of 3 000 data of normal type and 3 000 data of attack type,<br />
selected randomly from KDD Cup99 of "10% KDD"<br />
dataset. Test<strong>in</strong>g data set is composed of 2 000 data of<br />
normal type and 2 000 data of attack type, selected<br />
randomly from KDD Cup99 of the "Corrected KDD"<br />
dataset. The selected data set is shown <strong>in</strong> Table I.<br />
Each data has 41 different attributes (32 cont<strong>in</strong>uous<br />
attributes and 9 discrete attributes) used as SSOM <strong>in</strong>put<br />
value and 1 attack type label used as output value of<br />
SSOM. Some of them are the numerical types, and some<br />
are character types, but SSOM can only deal with<br />
numerical data. Therefore, before tra<strong>in</strong><strong>in</strong>g we must make<br />
the <strong>in</strong>put data numerical and normalized. This study used<br />
simple substitution symbols with numerical data types.<br />
The protocol-type, service and flag are replaced by digital<br />
attributes. For example, three k<strong>in</strong>ds of protocol-type (tcp,<br />
udp and icmp) will be expressed with 1, 2, 3. Also, 70<br />
k<strong>in</strong>ds of services are substituted with 1, 2… 70. The<br />
attack types are also numbered with 1, 2, 3 and so on.<br />
Experimental platform is the PC with Intel Core2 Duo<br />
CPU 2.0GHz, memory 2.0GB, W<strong>in</strong>dows XP operat<strong>in</strong>g<br />
system and MATLAB 7.8.0 (R2009.0a) programm<strong>in</strong>g<br />
environment.<br />
Based on the experiment data <strong>in</strong> Table I, tra<strong>in</strong><strong>in</strong>g and<br />
test are carried out respectively us<strong>in</strong>g SSOM (its<br />
parameters are selected randomly), GA-SSOM and New-<br />
GA-SSOM neural network. Accord<strong>in</strong>g to the different<br />
classification number of attack type, experiment is carried<br />
out as follow<strong>in</strong>g two cases.<br />
TABLE I.<br />
TRAINING SET AND TEST SETS<br />
Attack class Attack type Tra<strong>in</strong><strong>in</strong>g set Test set<br />
Normal normal 6000 3000<br />
back 700 400<br />
DOS<br />
neptune 2700 1200<br />
smurf 1600 800<br />
R2L guess_passwd 53 40<br />
U2R buffer_overflow 30 22<br />
ipsweep 350 180<br />
Probe portsweep 350 200<br />
satan 217 158<br />
© 2013 ACADEMY PUBLISHER