25.01.2015 Views

Download Full Issue in PDF - Academy Publisher

Download Full Issue in PDF - Academy Publisher

Download Full Issue in PDF - Academy Publisher

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

JOURNAL OF COMPUTERS, VOL. 8, NO. 6, JUNE 2013 1461<br />

where F represents the fitness function, m is the number<br />

of output node, c i and c x represent respectively forecast<br />

output and the desired output of the first i node.<br />

(4) Selection operation<br />

Dur<strong>in</strong>g this process, we adapt proportion selection<br />

operator and calculate the probability of each <strong>in</strong>dividual's<br />

fitness <strong>in</strong> accordance with the equation (10). The<br />

<strong>in</strong>dividuals with larger probability is selected as the best<br />

<strong>in</strong>dividual to the next generation of genetic population,<br />

the one with smaller probability not.<br />

(5) The crossover operation<br />

Dur<strong>in</strong>g crossover operation, two <strong>in</strong>dividual are selected<br />

randomly from population to generate a new and<br />

outstand<strong>in</strong>g <strong>in</strong>dividual.<br />

As the optimization of this part adopts b<strong>in</strong>ary cod<strong>in</strong>g,<br />

one-po<strong>in</strong>t crossover operator is used dur<strong>in</strong>g the crossover<br />

operation. For a matched pair of <strong>in</strong>dividual, select<br />

randomly the cross-po<strong>in</strong>t and swap the other bit from<br />

cross-po<strong>in</strong>t. The operat<strong>in</strong>g diagram is shown <strong>in</strong> Figure 5.<br />

The b<strong>in</strong>ary str<strong>in</strong>g 1001 <strong>in</strong> <strong>in</strong>dividual A exchanges data<br />

<strong>in</strong>formation with b<strong>in</strong>ary str<strong>in</strong>g 0011 <strong>in</strong> <strong>in</strong>dividual B. After<br />

crossover operation, it generates two new <strong>in</strong>dividual and<br />

<strong>in</strong>creases the diversity of <strong>in</strong>dividual.<br />

Figure 5.<br />

Crossover operation<br />

(6) Mutation operation<br />

Mutation operation can also <strong>in</strong>crease the diversity of<br />

<strong>in</strong>dividual. Here, select a s<strong>in</strong>gle po<strong>in</strong>t mutation operator<br />

and random mutation po<strong>in</strong>t, then 0 and 1 is exchanged.<br />

The pr<strong>in</strong>ciple is shown <strong>in</strong> Figure 6. Two new <strong>in</strong>dividual<br />

generate after this operation.<br />

Figure 6. Mutation operation<br />

(7) The establishment of New-GA-SSOM network<br />

model<br />

After many times evolution, when meet<strong>in</strong>g the<br />

iteration condition, the output of the population is the<br />

optimal solution of the problem. They are the handsome<br />

and the most representative <strong>in</strong>put variable comb<strong>in</strong>ation.<br />

Through the above steps, we get the optimal<br />

chromosome which is composed of the optimal feature.<br />

Extract a set of variables from the best chromosome gene<br />

as the f<strong>in</strong>al <strong>in</strong>put variables to achieve the dimension<br />

reduction of <strong>in</strong>dependent variables. That is the new neural<br />

network model, named New-GA-SSOM. Then we use<br />

this model to tra<strong>in</strong> network, and carry out <strong>in</strong>trusion<br />

detection data based on KDD Cup 1999 data set.<br />

V. EXPERIMENT<br />

KDD Cup 1999 data set is a standard data set for<br />

<strong>in</strong>trusion detection, <strong>in</strong>clud<strong>in</strong>g the tra<strong>in</strong><strong>in</strong>g data set and test<br />

data set. The tra<strong>in</strong><strong>in</strong>g data set <strong>in</strong>cludes 494 021 records<br />

and test<strong>in</strong>g data set <strong>in</strong>cludes 311 029 records. In the<br />

KDD99 data set, each data example represents attribute<br />

values of a class <strong>in</strong> the network data flow, and each class<br />

is labeled either as normal or as an attack with exactly<br />

one specific attack type. There are 22 types of attacks <strong>in</strong><br />

the tra<strong>in</strong><strong>in</strong>g data set and an <strong>in</strong>crease of new 14 k<strong>in</strong>ds of<br />

attacks <strong>in</strong> the test<strong>in</strong>g data set. All the attack types can be<br />

divided <strong>in</strong>to four major categories: Prob<strong>in</strong>g, Denial of<br />

Service (DoS), User-to-Root (U2R) and Remote-to-Local<br />

(R2L). Each complete TCP (transmission control<br />

protocol) connection is considered as a record, <strong>in</strong>clud<strong>in</strong>g<br />

four types of attributes collection: time-based traffic<br />

features, host-based traffic features, content features and<br />

basic features [22, 23, 24].<br />

Our experiment is based on the KDD Cup 1999<br />

<strong>in</strong>trusion detection data set. Tra<strong>in</strong><strong>in</strong>g data set is composed<br />

of 3 000 data of normal type and 3 000 data of attack type,<br />

selected randomly from KDD Cup99 of "10% KDD"<br />

dataset. Test<strong>in</strong>g data set is composed of 2 000 data of<br />

normal type and 2 000 data of attack type, selected<br />

randomly from KDD Cup99 of the "Corrected KDD"<br />

dataset. The selected data set is shown <strong>in</strong> Table I.<br />

Each data has 41 different attributes (32 cont<strong>in</strong>uous<br />

attributes and 9 discrete attributes) used as SSOM <strong>in</strong>put<br />

value and 1 attack type label used as output value of<br />

SSOM. Some of them are the numerical types, and some<br />

are character types, but SSOM can only deal with<br />

numerical data. Therefore, before tra<strong>in</strong><strong>in</strong>g we must make<br />

the <strong>in</strong>put data numerical and normalized. This study used<br />

simple substitution symbols with numerical data types.<br />

The protocol-type, service and flag are replaced by digital<br />

attributes. For example, three k<strong>in</strong>ds of protocol-type (tcp,<br />

udp and icmp) will be expressed with 1, 2, 3. Also, 70<br />

k<strong>in</strong>ds of services are substituted with 1, 2… 70. The<br />

attack types are also numbered with 1, 2, 3 and so on.<br />

Experimental platform is the PC with Intel Core2 Duo<br />

CPU 2.0GHz, memory 2.0GB, W<strong>in</strong>dows XP operat<strong>in</strong>g<br />

system and MATLAB 7.8.0 (R2009.0a) programm<strong>in</strong>g<br />

environment.<br />

Based on the experiment data <strong>in</strong> Table I, tra<strong>in</strong><strong>in</strong>g and<br />

test are carried out respectively us<strong>in</strong>g SSOM (its<br />

parameters are selected randomly), GA-SSOM and New-<br />

GA-SSOM neural network. Accord<strong>in</strong>g to the different<br />

classification number of attack type, experiment is carried<br />

out as follow<strong>in</strong>g two cases.<br />

TABLE I.<br />

TRAINING SET AND TEST SETS<br />

Attack class Attack type Tra<strong>in</strong><strong>in</strong>g set Test set<br />

Normal normal 6000 3000<br />

back 700 400<br />

DOS<br />

neptune 2700 1200<br />

smurf 1600 800<br />

R2L guess_passwd 53 40<br />

U2R buffer_overflow 30 22<br />

ipsweep 350 180<br />

Probe portsweep 350 200<br />

satan 217 158<br />

© 2013 ACADEMY PUBLISHER

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!