23.07.2013 Views

Online Boosting Based Intrusion Detection in Changing Environments

Online Boosting Based Intrusion Detection in Changing Environments

Online Boosting Based Intrusion Detection in Changing Environments

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The tra<strong>in</strong>er of weak classifier, i.e. L <strong>in</strong> Table 1, often<br />

b<br />

selects a weak classifier from a specific function space <br />

based on the criterion below:<br />

( m)<br />

( m)<br />

Lb ({( x1,<br />

y1),...,<br />

( xN<br />

, yN<br />

)}, w ) argm<strong>in</strong><br />

<br />

h<br />

The complexity of L relates to the function space b<br />

from<br />

which h is selected. L <strong>in</strong> Table 2 is an onl<strong>in</strong>e version of<br />

o<br />

L . In order to meet the requirement of quick onl<strong>in</strong>e<br />

b<br />

updat<strong>in</strong>g of the ensemble detector <strong>in</strong> <strong>in</strong>trusion detection, the<br />

onl<strong>in</strong>e learn<strong>in</strong>g of weak classifiers should be efficiently<br />

implemented.<br />

We choose the weak classifiers to be the set of decision<br />

stumps on each feature dimension, s<strong>in</strong>ce they are very<br />

simple and can be updated onl<strong>in</strong>e efficiently. Each weak<br />

classifier is a decision stump on one feature dimension, thus<br />

the number of weak classifiers M is fixed, which equals to<br />

the number of features d . These weak classifiers are<br />

learned onl<strong>in</strong>e, and the approximated weighted classification<br />

error (4) is updated when a new tra<strong>in</strong><strong>in</strong>g sample comes.<br />

For a categorical feature f , the set of attribute values<br />

f<br />

f<br />

C is divided <strong>in</strong>to two subsets p<br />

f<br />

C and C with no<br />

For a cont<strong>in</strong>uous feature f , the range of attribute<br />

values is split by a threshold v , and the decision stump<br />

takes the follow<strong>in</strong>g form:<br />

or<br />

The threshold v and the above two cases are chosen to<br />

m<strong>in</strong>imize the weighted classification error<br />

There is another important issue to be addressed <strong>in</strong><br />

<strong>in</strong>trusion detection, which relates to the detection<br />

performance of the detector. We use the follow<strong>in</strong>g measures<br />

to evaluate the performance of the detection algorithm:<br />

detection rate:<br />

false alarm rate:<br />

n<br />

( 5)<br />

<strong>in</strong>tersection, and the decision stump takes the form as<br />

f<br />

f p<br />

hf (x)<br />

<br />

where x <strong>in</strong>dicates the attribute value of x on the feature<br />

f<br />

f . To avoid the comb<strong>in</strong>atorial computation of exam<strong>in</strong><strong>in</strong>g<br />

f<br />

all possible decision stumps, the division of C is<br />

efficiently constructed <strong>in</strong> the follow<strong>in</strong>g way:<br />

where z is an attribute value on the feature f <strong>in</strong> the<br />

tra<strong>in</strong><strong>in</strong>g data set, ()<br />

equals to 1 if condition ( ) satisfies,<br />

equals to 0 otherwise.<br />

C 1 x <br />

{ 1<br />

f<br />

f n C<br />

( 6)<br />

x <br />

f<br />

C<br />

z ( y 1)<br />

<br />

( y 1)<br />

p x f z<br />

x f z<br />

{ ( 7)<br />

f<br />

Cn<br />

( y 1)<br />

<br />

( y 1)<br />

x f z<br />

x f z<br />

hf (x)<br />

<br />

1 x f v<br />

{ 1 x f v<br />

<br />

<br />

N<br />

DR <br />

N<br />

N<br />

FAR<br />

N<br />

hf (x)<br />

<br />

n h f xn<br />

y n<br />

n<br />

w<br />

: ( )<br />

detected <br />

attack<br />

false<br />

normal<br />

1<br />

{ 1<br />

100 %<br />

100%<br />

x f v<br />

x f v<br />

( 8)<br />

( 9)<br />

( 10)<br />

where N denotes the number of attacks correctly<br />

det ected<br />

detected, N denotes the total number of attacks <strong>in</strong> the<br />

attack<br />

data set, N denotes the number of normal connections<br />

false<br />

that are wrongly detected as attacks, and N denotes the<br />

normal<br />

total number of normal connections. An <strong>in</strong>trusion detection<br />

is expected to have a high DR to guarantee the system<br />

security, a low FAR to decrease unnecessary human burden<br />

and ma<strong>in</strong>ta<strong>in</strong> the normal use of network.<br />

Unfortunately, the tra<strong>in</strong><strong>in</strong>g data set is often unbalanced<br />

between normal and <strong>in</strong>trusion data. This will br<strong>in</strong>g great<br />

difficulties <strong>in</strong> the onl<strong>in</strong>e boost<strong>in</strong>g based detection method. In<br />

fact, with the identical <strong>in</strong>itial weight 1<br />

for each tra<strong>in</strong><strong>in</strong>g<br />

sample <strong>in</strong> onl<strong>in</strong>e boost<strong>in</strong>g, the importance of each b<strong>in</strong>ary<br />

class relates to the number of positive and negative samples<br />

<strong>in</strong> the tra<strong>in</strong><strong>in</strong>g data. Generally, the more negative samples<br />

are concerned, the higher DR is obta<strong>in</strong>ed, with a higher FAR;<br />

the more positive samples are concerned, the lower DR is<br />

obta<strong>in</strong>ed, with a lower FAR. So it is very difficult to meet<br />

the requirement of a high DR and a low FAR at the same<br />

time. What is more practical is to make a balance between<br />

the two requirements <strong>in</strong> the specific application.<br />

In order to balance the requirements of DR and FAR,<br />

we <strong>in</strong>troduce a parameter r (<br />

0,<br />

1)<br />

<strong>in</strong> the sett<strong>in</strong>g of <strong>in</strong>itial<br />

weight for each tra<strong>in</strong><strong>in</strong>g sample:<br />

{<br />

N<br />

N<br />

N<br />

N<br />

normal<br />

normal<br />

N<br />

N<br />

normal<br />

attack<br />

attack<br />

attack<br />

r<br />

( 1 r)<br />

normal connection<br />

network <strong>in</strong>trusion<br />

( 11)<br />

Through adjust<strong>in</strong>g the parameter r , we can change the<br />

importance of positive and negative samples <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g<br />

process, and then get a balance between DR and FAR. The<br />

selection of r depends on the proportion of normal samples<br />

<strong>in</strong> the tra<strong>in</strong><strong>in</strong>g data, and the requirements of DR/FAR <strong>in</strong> the<br />

specific application.<br />

The above onl<strong>in</strong>e boost<strong>in</strong>g based detection method has<br />

the follow<strong>in</strong>g advantages <strong>in</strong> the <strong>in</strong>trusion detection task:<br />

the weak classifiers (6) and (8) only operate on<br />

<strong>in</strong>dividual feature dimensions, which avoids the<br />

difficulty caused by the large dist<strong>in</strong>ction of value<br />

ranges for different feature dimensions;<br />

the tra<strong>in</strong><strong>in</strong>g of the decision stumps is simple, and the<br />

onl<strong>in</strong>e updat<strong>in</strong>g can be efficiently implemented;<br />

the detection performance is guaranteed by the f<strong>in</strong>al<br />

ensemble of weak classifiers;<br />

a balance between DR and FAR can be achieved<br />

accord<strong>in</strong>g to the requirement of the specific application.<br />

Thus, the proposed onl<strong>in</strong>e boost<strong>in</strong>g based method for<br />

<strong>in</strong>trusion detection seems suitable for practical uses.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!