25.01.2015 Views

Download Full Issue in PDF - Academy Publisher

Download Full Issue in PDF - Academy Publisher

Download Full Issue in PDF - Academy Publisher

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1460 JOURNAL OF COMPUTERS, VOL. 8, NO. 6, JUNE 2013<br />

Because we use real number encod<strong>in</strong>g GA, the<br />

crossover operation uses arithmetic crossover which is a<br />

l<strong>in</strong>ear comb<strong>in</strong>ation of the two <strong>in</strong>dividuals to produce a<br />

new <strong>in</strong>dividual. The process is shown <strong>in</strong> equation (11).<br />

In this equation, it shows that the first k chromosome<br />

named a k performs crossover operation with the first l<br />

chromosome named a l , and the crossover bit is at first j<br />

bit. After crossover operation, a new pair of <strong>in</strong>dividual<br />

with good genes is generated.<br />

In this part we use b<strong>in</strong>ary encod<strong>in</strong>g genetic algorithm<br />

to optimize <strong>in</strong>put variable and reduce its dimensionality.<br />

To complete it, first encode the <strong>in</strong>dividual components,<br />

<strong>in</strong>itialize the number of populations and the evolution,<br />

and design the fitness function. Then perform selection<br />

operation, crossover operation and mutation operation to<br />

generate the best <strong>in</strong>dividual which is the optimal<br />

comb<strong>in</strong>ation of <strong>in</strong>dependent variables. The workflow is<br />

shown <strong>in</strong> Figure 4, each part functions as follows:<br />

⎧⎪ akj = akjb+ alj<br />

(1 −b)<br />

⎨<br />

⎪⎩ alj = aljb+ akj<br />

(1 −b)<br />

(11)<br />

where b represents a random number between 0 and 1.<br />

(6) Mutation operation<br />

The so-called mutation operation is a process <strong>in</strong> which<br />

the value of certa<strong>in</strong> genes <strong>in</strong> the <strong>in</strong>dividual encoded str<strong>in</strong>g<br />

is replaced by other genetic value to form a new<br />

<strong>in</strong>dividual. The mutation operation is a helper method to<br />

generate new <strong>in</strong>dividual, but it is essential to a comput<strong>in</strong>g<br />

step. Mutation operation determ<strong>in</strong>es the local search<br />

ability of genetic algorithms.<br />

Equation (12) shows the process of mutation operation<br />

<strong>in</strong> this part. Select a ij , which is the first j gene of the first i<br />

<strong>in</strong>dividual to perform mutation operation. The mutation<br />

operation is as follows:<br />

a<br />

ij<br />

⎧⎪ aij<br />

+ ( aij<br />

− amax<br />

) × r r > 0.5<br />

= ⎨<br />

⎪⎩ aij<br />

+ ( am<strong>in</strong><br />

− aij<br />

) × r r < 0.5<br />

(12)<br />

where a max is the upper bound of a ij , a m<strong>in</strong> is the lower<br />

bound of a ij , r is the random value between 0 and 1.<br />

Through the above steps, we get the optimal<br />

chromosome which is composed of the optimal learn<strong>in</strong>g<br />

rate η<br />

1<br />

and η<br />

2<br />

, the optimal neighborhood radius r . Use<br />

these optimal variables to create SSOM neural network<br />

model named GA-BP. Then we use this model to carry<br />

out <strong>in</strong>trusion detection experiment based on KDD Cup<br />

1999 data set.<br />

IV. OPTIMIZATION OF INPUT VARIABLE FOR<br />

DIMENSIONALITY REDUCTION<br />

Us<strong>in</strong>g SSOM neural network to establish the model,<br />

the excessive <strong>in</strong>put variable is easy to over fitt<strong>in</strong>g, which<br />

leads to the low precision, low rates of detection and<br />

excessive time. So it is necessary to optimize the<br />

selection of <strong>in</strong>put variables, remove the redundancy<br />

variables and reta<strong>in</strong> the variables which can most reflect<br />

the relationship between <strong>in</strong>put and output variables <strong>in</strong> the<br />

model [21].<br />

B<strong>in</strong>ary encod<strong>in</strong>g is one of the most commonly cod<strong>in</strong>g<br />

of genetic algorithm. It has the follow<strong>in</strong>g advantages:<br />

• Encod<strong>in</strong>g, decod<strong>in</strong>g operation is simple.<br />

• The cross and mutation operation is easy to realize.<br />

• Meet<strong>in</strong>g the m<strong>in</strong>imum character set encod<strong>in</strong>g<br />

pr<strong>in</strong>ciple.<br />

• Easy to use schema theorem theoretical to analyze the<br />

algorithm.<br />

Figure 4. The optimization of <strong>in</strong>put variables<br />

(1) Data normalization<br />

Data normalization is a data preprocess<strong>in</strong>g procedure<br />

before the experiment, it is also important for variable<br />

dimension reduction. Here, the <strong>in</strong>put feature value is also<br />

normalized to [0, 1]. Data normalization function also<br />

uses equation (8).<br />

(2) The <strong>in</strong>itialization of population<br />

In this optimization process, the <strong>in</strong>dividual cod<strong>in</strong>g<br />

adopts the b<strong>in</strong>ary cod<strong>in</strong>g mode. As the <strong>in</strong>trusion detection<br />

data has 41 features, the length of cod<strong>in</strong>g is designed to<br />

41 and every <strong>in</strong>dividual is a b<strong>in</strong>ary str<strong>in</strong>g composed of 41<br />

b<strong>in</strong>ary bits. Every chromosome corresponds to an <strong>in</strong>put<br />

feature and every gene can only be 1 and 0. If the value<br />

of a particular chromosome is 1, it means that the <strong>in</strong>put<br />

variable correspond<strong>in</strong>g to this bit takes part <strong>in</strong> the f<strong>in</strong>al<br />

model, otherwise not.<br />

A population is formed by N <strong>in</strong>dividuals generated<br />

randomly and genetic algorithm starts the iteration from<br />

this population as the <strong>in</strong>itial po<strong>in</strong>t.<br />

(3) Fitness function calculation<br />

Here, the reciprocal of the absolute error between<br />

forecast output and the desired output data is chose as the<br />

fitness function and it is shown <strong>in</strong> equation (13).<br />

In the process of calculat<strong>in</strong>g the fitness function, the<br />

learn<strong>in</strong>g rate η 1<br />

and η 2<br />

, neighborhood radius r of<br />

every <strong>in</strong>dividual is optimized by the genetic algorithm <strong>in</strong><br />

Section III, avoid<strong>in</strong>g the impact of its random on fitness<br />

function calculation.<br />

F=<br />

1<br />

m<br />

∑(Ci<br />

-C<br />

x<br />

)<br />

i=1<br />

(13)<br />

© 2013 ACADEMY PUBLISHER

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!