23.07.2013 Views

Online Boosting Based Intrusion Detection in Changing Environments

Online Boosting Based Intrusion Detection in Changing Environments

Online Boosting Based Intrusion Detection in Changing Environments

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Onl<strong>in</strong>e</strong> <strong>Boost<strong>in</strong>g</strong> <strong>Based</strong> <strong>Intrusion</strong> <strong>Detection</strong> <strong>in</strong> Chang<strong>in</strong>g<br />

<strong>Environments</strong><br />

Yan-guo Wang, Weim<strong>in</strong>g Hu, and Xiaoq<strong>in</strong> Zhang<br />

National Laboratory of Pattern Recognition<br />

Institute of Automation, Ch<strong>in</strong>ese Academy of Sciences<br />

Beij<strong>in</strong>g, Ch<strong>in</strong>a<br />

Abstract - <strong>Intrusion</strong> detection is an active research field <strong>in</strong><br />

the development of reliable network, where many mach<strong>in</strong>e<br />

learn<strong>in</strong>g techniques are exploited to fit the specific<br />

application. Although some detection algorithms have been<br />

developed, they lack the adaptability to the frequently<br />

chang<strong>in</strong>g network environments, s<strong>in</strong>ce they are mostly<br />

tra<strong>in</strong>ed <strong>in</strong> batch mode. In this paper, we propose an onl<strong>in</strong>e<br />

boost<strong>in</strong>g based <strong>in</strong>trusion detection method, which has the<br />

ability of efficient onl<strong>in</strong>e learn<strong>in</strong>g of new network <strong>in</strong>trusions.<br />

The detection can be performed <strong>in</strong> real-time with high<br />

detection accuracy. Experimental results show its<br />

advantage <strong>in</strong> the <strong>in</strong>trusion detection application.<br />

Keywords: Supervised Learn<strong>in</strong>g, Pattern Recognition,<br />

Network Security, <strong>Intrusion</strong> <strong>Detection</strong><br />

1 Introduction<br />

With the <strong>in</strong>creas<strong>in</strong>g of network <strong>in</strong>trusions, <strong>in</strong>formation<br />

security becomes crucial <strong>in</strong> the development of web-based<br />

<strong>in</strong>formation systems, such as e-bus<strong>in</strong>ess and e-government.<br />

It provides a reliable environment for communication,<br />

<strong>in</strong>formation <strong>in</strong>tegrity and privacy protection. As the<br />

complementary measures of <strong>in</strong>trusion preventions such as<br />

user authentication and encryption, <strong>in</strong>trusion detection<br />

attracts much attention from researchers and eng<strong>in</strong>eers.<br />

<strong>Intrusion</strong> detection has been an active research field<br />

ever s<strong>in</strong>ce it’s first proposed by Denn<strong>in</strong>g [1] <strong>in</strong> 1987. It is an<br />

effective tool for protect<strong>in</strong>g our systems aga<strong>in</strong>st various<br />

types of network attack. Generally, <strong>in</strong>trusion detection<br />

systems (IDSs) can be classified <strong>in</strong>to two categories: hostbased<br />

IDSs and network-based IDSs. Host-based IDSs make<br />

use of the system log of the target host mach<strong>in</strong>es, and some<br />

rule-based detection algorithms can be derived. However, it<br />

may be late when an <strong>in</strong>trusion is detected by host-based<br />

IDSs, as damage to the system may have already occurred.<br />

Moreover, it is difficult for host-based IDSs to detect<br />

distributed network attacks that aim to consume the system<br />

resources. Network-based IDSs perform detection at<br />

network nodes such as switchers and rooters. They exploit<br />

the <strong>in</strong>formation of separate IP packages and detect packages<br />

that are of potential harm to mach<strong>in</strong>es on the network.<br />

With<strong>in</strong> the architecture of network-based IDSs, the detection<br />

burden <strong>in</strong> the host mach<strong>in</strong>es is greatly lightened, and security<br />

measurements can be taken before the attack reaches the<br />

host mach<strong>in</strong>es. Furthermore, network-based IDSs are able to<br />

detect the distributed network attacks.<br />

There are various ways of <strong>in</strong>trusion detection that have<br />

been developed. A statistical method is proposed <strong>in</strong> [1],<br />

where statistical profiles for normal behaviors are<br />

constructed and used to detect anomalous behaviors as<br />

<strong>in</strong>trusions. Data m<strong>in</strong><strong>in</strong>g techniques are also widely used <strong>in</strong><br />

<strong>in</strong>trusion detection [2, 3]. The concepts of “association<br />

rules” and “frequent episodes” are <strong>in</strong>troduced <strong>in</strong>to the<br />

<strong>in</strong>trusion detection task to describe the network activities. In<br />

[4] a distributed outlier detection architecture is constructed<br />

to detect network attacks.<br />

Recently, methods <strong>in</strong> mach<strong>in</strong>e learn<strong>in</strong>g become popular<br />

<strong>in</strong> the <strong>in</strong>trusion detection research. Various new algorithms<br />

are <strong>in</strong>troduced and studied <strong>in</strong> IDSs. For supervised learn<strong>in</strong>g,<br />

Bivens et al. [5] used neural networks for <strong>in</strong>trusion detection.<br />

Another algorithm with great generalization ability, SVM, is<br />

also applied <strong>in</strong> [6, 7]. As for unsupervised learn<strong>in</strong>g, an<br />

effective cluster<strong>in</strong>g tool, self-organiz<strong>in</strong>g map, attracts great<br />

attention <strong>in</strong> [8, 9, 10]. There are also many other pieces of<br />

work on this research field [11, 12].<br />

Although many <strong>in</strong>trusion detection algorithms have<br />

been developed, there are still some problems <strong>in</strong> practical<br />

uses. First, there are many new <strong>in</strong>trusion types produced<br />

every week, and the ability of detect<strong>in</strong>g the new <strong>in</strong>trusion<br />

types should be <strong>in</strong>corporated <strong>in</strong>to the detector soon after<br />

their emergence for the security issue. To this end, most of<br />

exist<strong>in</strong>g algorithms have to add the labeled new types of<br />

<strong>in</strong>trusion data <strong>in</strong>to the tra<strong>in</strong><strong>in</strong>g data set, and retra<strong>in</strong> the whole<br />

detector from the very beg<strong>in</strong>n<strong>in</strong>g, s<strong>in</strong>ce they are tra<strong>in</strong>ed <strong>in</strong><br />

batch mode. However, the tra<strong>in</strong><strong>in</strong>g data set for <strong>in</strong>trusion<br />

detection is often very huge, and most <strong>in</strong>trusion detection<br />

algorithms have to make multi-pass over the tra<strong>in</strong><strong>in</strong>g data set<br />

to get the f<strong>in</strong>al detector, which yields a time-consum<strong>in</strong>g<br />

retra<strong>in</strong><strong>in</strong>g process, so it becomes impractical for frequently<br />

retra<strong>in</strong><strong>in</strong>g the detector to adapt to the dynamic environments.<br />

Second, the false alarm rates of some exist<strong>in</strong>g methods are


too high to be used <strong>in</strong> practice. Third, the variety of attribute<br />

of network data is also a difficult issue. There are various<br />

types of attributes for network data, <strong>in</strong>clud<strong>in</strong>g both<br />

categorical and cont<strong>in</strong>uous ones. Furthermore, the value<br />

ranges for different attributes differ greatly, from [ 0,<br />

1]<br />

to<br />

7<br />

[ 0,<br />

10 ] . This br<strong>in</strong>gs more difficulties for many detection<br />

methods and limits their performance.<br />

In this paper, an onl<strong>in</strong>e boost<strong>in</strong>g based method is<br />

proposed for <strong>in</strong>trusion detection. The carefully designed<br />

strategies for tra<strong>in</strong><strong>in</strong>g weak classifiers make the learn<strong>in</strong>g<br />

efficient, and the detection rate balance scheme produces<br />

great detection performance for the f<strong>in</strong>al ensemble classifier.<br />

Furthermore, the onl<strong>in</strong>e learn<strong>in</strong>g framework provides the<br />

ability of quick adaption to the chang<strong>in</strong>g environments,<br />

which only needs to make one pass over the tra<strong>in</strong><strong>in</strong>g data of<br />

the new <strong>in</strong>trusion types. The other tra<strong>in</strong><strong>in</strong>g data have not to<br />

be revisited and the detector is efficiently updated onl<strong>in</strong>e.<br />

Experimental results show that the proposed method quickly<br />

adapts to the chang<strong>in</strong>g environments, and can perform realtime<br />

<strong>in</strong>trusion detection with high detection accuracy.<br />

The rest of paper is organized as follows. In Section 2<br />

we <strong>in</strong>troduce the onl<strong>in</strong>e boost<strong>in</strong>g based <strong>in</strong>trusion detection<br />

algorithm, and provide its relation to the batch boost<strong>in</strong>g<br />

based detection scheme. In Section 3 some experimental<br />

results are presented which show the advantage of the<br />

proposed method. Then we draw the conclusions <strong>in</strong> the last<br />

section.<br />

2 <strong>Onl<strong>in</strong>e</strong> boost<strong>in</strong>g based ID<br />

2.1 Problem formulation<br />

In the network-based IDSs, the tra<strong>in</strong><strong>in</strong>g and detection<br />

are performed at network nodes such as switchers and<br />

routers. Three groups of features are extracted from each<br />

network connection:<br />

basic features of <strong>in</strong>dividual TCP connections;<br />

content features with<strong>in</strong> a connection suggested by<br />

doma<strong>in</strong> knowledge;<br />

traffic features computed us<strong>in</strong>g as two-second time<br />

w<strong>in</strong>dow.<br />

The above features are commonly used <strong>in</strong> <strong>in</strong>trusion detection.<br />

The framework for construct<strong>in</strong>g these features can be found<br />

<strong>in</strong> [2].<br />

For each network connection, the feature values<br />

extracted above form a vector x x , x ,..., x ] , where<br />

[ 1 2 d<br />

d is the number of features extracted. The label y <strong>in</strong>dicates<br />

the b<strong>in</strong>ary class of the network connection:<br />

1 normal connection<br />

{ 1 network <strong>in</strong>trusion<br />

The <strong>in</strong>trusion detection algorithm is expected to tra<strong>in</strong> a<br />

classifier H from the labeled tra<strong>in</strong><strong>in</strong>g data set, then use the<br />

classifier H to predict the b<strong>in</strong>ary label ~ y H(<br />

x)<br />

for a<br />

new network connection. Note that there are many types of<br />

<strong>in</strong>trusions, such as guess<strong>in</strong>g password, port scann<strong>in</strong>g,<br />

neptune, satan, and there are constantly new <strong>in</strong>trusion types<br />

emerg<strong>in</strong>g <strong>in</strong> the network. However, they are all expected to<br />

be classified <strong>in</strong>to the network <strong>in</strong>trusion class, no matter<br />

which types of <strong>in</strong>trusion they are.<br />

y<br />

( 1)<br />

y<br />

2.2 Adaoost algorithm<br />

Adaboost [13] is one of the most popular mach<strong>in</strong>e<br />

learn<strong>in</strong>g algorithms developed <strong>in</strong> recent years, which has<br />

been successfully used <strong>in</strong> many applications, such as face<br />

detection [14], and image retrieval [15].<br />

The classical Adaboost algorithm is tra<strong>in</strong>ed <strong>in</strong> batch<br />

mode, which is showed <strong>in</strong> Table 1. Note that N is the<br />

number of tra<strong>in</strong><strong>in</strong>g samples, M is the number of weak<br />

classifiers to generate, and L is the base model learn<strong>in</strong>g<br />

b<br />

algorithm, such as Naive Bayes and decision stumps. A<br />

sequence of weak classifiers is learned based on the<br />

evolv<strong>in</strong>g sampl<strong>in</strong>g distribution of the tra<strong>in</strong><strong>in</strong>g data set. The<br />

f<strong>in</strong>al strong classifier is an ensemble of the weak classifiers,<br />

and the vot<strong>in</strong>g weights are derived from the classification<br />

errors of these weak classifiers.<br />

(m )<br />

The evolv<strong>in</strong>g weight w plays a key role <strong>in</strong><br />

Adaboost. It <strong>in</strong>dicates the importance of the n-th tra<strong>in</strong><strong>in</strong>g<br />

sample while generat<strong>in</strong>g the m-th weak classifier. The<br />

) weight w is updated <strong>in</strong> the follow<strong>in</strong>g way:<br />

w<br />

(m<br />

n<br />

w<br />

( m<br />

1)<br />

( m)<br />

n<br />

n<br />

<br />

{<br />

2(<br />

1<br />

1<br />

2<br />

The weights of samples that are wrongly classified by the<br />

current weak classifier are <strong>in</strong>creased, the others decreased,<br />

so that more attention is paid to the samples that are difficult<br />

to classify while generat<strong>in</strong>g the next weak classifier.<br />

2.3 <strong>Onl<strong>in</strong>e</strong> boost<strong>in</strong>g<br />

1<br />

( m)<br />

( m)<br />

In many applications, learn<strong>in</strong>g process need to be<br />

performed <strong>in</strong> onl<strong>in</strong>e mode. For example, when the tra<strong>in</strong><strong>in</strong>g<br />

data are generated as data streams, or the size of the tra<strong>in</strong><strong>in</strong>g<br />

data set is too huge for memory resources, it is impractical to<br />

make multi-pass over the entire tra<strong>in</strong><strong>in</strong>g data set for the<br />

learn<strong>in</strong>g algorithm. Tra<strong>in</strong><strong>in</strong>g a classifier <strong>in</strong> onl<strong>in</strong>e mode is<br />

necessary <strong>in</strong> these cases. This opens a hot research field<br />

called “onl<strong>in</strong>e learn<strong>in</strong>g” or “<strong>in</strong>cremental learn<strong>in</strong>g”. <strong>Onl<strong>in</strong>e</strong><br />

learn<strong>in</strong>g means we process a tra<strong>in</strong><strong>in</strong>g sample, then discard it<br />

after updat<strong>in</strong>g the classifier. There is likely some difference<br />

between the two classifiers that tra<strong>in</strong>ed <strong>in</strong> batch and onl<strong>in</strong>e<br />

modes, s<strong>in</strong>ce the onl<strong>in</strong>e learn<strong>in</strong>g algorithm can only make<br />

)<br />

n<br />

if<br />

if<br />

( m)<br />

h ( xn<br />

) <br />

( m)<br />

h ( xn<br />

) <br />

y<br />

y<br />

n<br />

n<br />

( 2)


use of the <strong>in</strong>formation supplied by part of the tra<strong>in</strong><strong>in</strong>g data<br />

set. Thus, the key issue of onl<strong>in</strong>e learn<strong>in</strong>g research is how to<br />

control the above difference while tra<strong>in</strong><strong>in</strong>g a classifier onl<strong>in</strong>e.<br />

Table 1. Adaboost Algorithm<br />

Input: {( x 1 , y1),...,<br />

( xn<br />

, yn<br />

)}, M , Lb<br />

( 1)<br />

Initilization: 1 N n 1,<br />

2,...,<br />

N<br />

w n<br />

For m 1,<br />

2,...,<br />

M<br />

m)<br />

(<br />

h L ({( x , y ),..., ( x , y )}, w<br />

( m)<br />

b 1 1 n n<br />

( m)<br />

<br />

(m)<br />

h<br />

( m)<br />

( m)<br />

w<br />

n:<br />

h ( xn<br />

) y n<br />

n<br />

( m)<br />

1<br />

, then<br />

Calculate the weighted error of<br />

If 2<br />

Set M m1<br />

and stop loop<br />

endif<br />

Update the weights<br />

( m)<br />

( m1)<br />

( m)<br />

2(<br />

1<br />

)<br />

w<br />

if<br />

n wn<br />

{<br />

1<br />

( m)<br />

2<br />

if<br />

Output the f<strong>in</strong>al strong classifier:<br />

(<br />

M ( m) 1<br />

<br />

H ( x)<br />

sign(<br />

h ( x)<br />

lg<br />

m1<br />

( m)<br />

<br />

In order to adapt Adaboost to the data streams<br />

environment, Oza proposed an onl<strong>in</strong>e version of Adaboost <strong>in</strong><br />

[16], and the convergence proof for the onl<strong>in</strong>e version was<br />

also given. Recently, Grabner and Bischof [17] successfully<br />

<strong>in</strong>troduced the onl<strong>in</strong>e boost<strong>in</strong>g algorithm <strong>in</strong>to computer<br />

vision field.<br />

The detailed onl<strong>in</strong>e boost<strong>in</strong>g algorithm is presented <strong>in</strong><br />

Table 2. Here h is the set of weak classifiers to be updated<br />

onl<strong>in</strong>e, L is the onl<strong>in</strong>e base model learn<strong>in</strong>g algorithm. Note<br />

o<br />

that <strong>in</strong> the batch Adaboost algorithm, the sum of sample<br />

weights rema<strong>in</strong>s 1:<br />

N<br />

<br />

n1<br />

w<br />

( m)<br />

n<br />

)<br />

where the def<strong>in</strong>ition of w is already given <strong>in</strong> Section 2.2.<br />

(m<br />

n<br />

1<br />

However, <strong>in</strong> onl<strong>in</strong>e boost<strong>in</strong>g, the weight evolves<br />

)<br />

<strong>in</strong>dividually for each tra<strong>in</strong><strong>in</strong>g sample. The weight w is<br />

actually a sampl<strong>in</strong>g weight while generat<strong>in</strong>g the m-th weak<br />

classifier. The same function is taken through the parameter<br />

k <strong>in</strong> onl<strong>in</strong>e boost<strong>in</strong>g, which is randomly generated from<br />

Poission distribution with parameter . As to the weighted<br />

(m)<br />

classification error of h , an approximation is used:<br />

sw<br />

( m)<br />

m<br />

<br />

( 4)<br />

sc sw<br />

m<br />

m<br />

which <strong>in</strong>volves only samples already seen. Moreover, the<br />

number of weak classifiers is not fixed <strong>in</strong> Adaboost; while <strong>in</strong><br />

onl<strong>in</strong>e boost<strong>in</strong>g, the number of weak classifiers is fixed<br />

beforehand, and the weak classifiers are all learned onl<strong>in</strong>e.<br />

Although it may differ greatly from that learned <strong>in</strong> batch<br />

mode when only a few tra<strong>in</strong><strong>in</strong>g samples have been processed,<br />

the onl<strong>in</strong>e ensemble classifier converges statistically to the<br />

m)<br />

)<br />

)<br />

1,<br />

m 1,...,<br />

M<br />

( m)<br />

h ( xn<br />

) <br />

(<br />

m)<br />

h ( xn<br />

) <br />

y<br />

y<br />

n<br />

n<br />

(m<br />

n<br />

( 3)<br />

ensemble generated <strong>in</strong> batch mode, as the number of tra<strong>in</strong><strong>in</strong>g<br />

samples <strong>in</strong>creases [16].<br />

Table 2. <strong>Onl<strong>in</strong>e</strong> <strong>Boost<strong>in</strong>g</strong> Algorithm<br />

Input: {( x 1 , y1),...,<br />

( xn<br />

, yn<br />

)}, M , Lo<br />

sc sw<br />

Initilization: 0 0 m 1,<br />

2,...,<br />

M<br />

m<br />

m<br />

For each new tra<strong>in</strong><strong>in</strong>g sample ( x , y)<br />

Initialize weight of the current sample 1<br />

For m 1,<br />

2,...,<br />

M<br />

Set k accord<strong>in</strong>g to Poission ()<br />

Do k times<br />

h L (( x,<br />

y),<br />

h<br />

( m)<br />

( m)<br />

o<br />

h x<br />

m ( )<br />

( ) ,then<br />

If y<br />

<br />

sc sc<br />

m<br />

m<br />

( m)<br />

<br />

sc m<br />

<br />

1<br />

<br />

2(<br />

1<br />

<br />

else<br />

<br />

sw sw<br />

<br />

m<br />

( m)<br />

m<br />

<br />

<br />

<br />

<br />

2<br />

sc<br />

m<br />

1<br />

( m)<br />

sw<br />

m<br />

sw m<br />

(m)<br />

)<br />

sw<br />

m<br />

sw m<br />

endif<br />

Output the f<strong>in</strong>al strong classifier<br />

(<br />

M ( m) 1<br />

<br />

H ( x)<br />

sign(<br />

h ( x)<br />

lg<br />

m1<br />

( m)<br />

<br />

2.4 <strong>Onl<strong>in</strong>e</strong> boost<strong>in</strong>g based <strong>in</strong>trusion detection<br />

An <strong>in</strong>trusion detection algorithm is expected to fill<br />

ma<strong>in</strong>ly three requirements <strong>in</strong> order to be suitable for<br />

practical uses:<br />

)<br />

m)<br />

the detection should be perform <strong>in</strong> real-time;<br />

the detection accuracy should be as high as possible,<br />

which means a high detection rate to guarantee the<br />

system security, and a low false alarm rate to decrease<br />

unnecessary human burden;<br />

the detector should adapt quickly to the chang<strong>in</strong>g<br />

network environments, which implies the ability to<br />

accurately detect any new type of attack soon after its<br />

emergence.<br />

In order to make the updat<strong>in</strong>g of <strong>in</strong>trusion detector<br />

efficient, the tra<strong>in</strong><strong>in</strong>g of the detector should not be timeconsum<strong>in</strong>g,<br />

which prevents the use of some complex<br />

classifiers; on the other hand, the strict requirement of<br />

detection performance makes the direct use of simple<br />

classifiers impractical. Consider the variety of attribute of<br />

network connection data, the situation is even worse. We<br />

will try to solve these difficulties <strong>in</strong> the proposed onl<strong>in</strong>e<br />

boost<strong>in</strong>g based detection method.<br />

)


The tra<strong>in</strong>er of weak classifier, i.e. L <strong>in</strong> Table 1, often<br />

b<br />

selects a weak classifier from a specific function space <br />

based on the criterion below:<br />

( m)<br />

( m)<br />

Lb ({( x1,<br />

y1),...,<br />

( xN<br />

, yN<br />

)}, w ) argm<strong>in</strong><br />

<br />

h<br />

The complexity of L relates to the function space b<br />

from<br />

which h is selected. L <strong>in</strong> Table 2 is an onl<strong>in</strong>e version of<br />

o<br />

L . In order to meet the requirement of quick onl<strong>in</strong>e<br />

b<br />

updat<strong>in</strong>g of the ensemble detector <strong>in</strong> <strong>in</strong>trusion detection, the<br />

onl<strong>in</strong>e learn<strong>in</strong>g of weak classifiers should be efficiently<br />

implemented.<br />

We choose the weak classifiers to be the set of decision<br />

stumps on each feature dimension, s<strong>in</strong>ce they are very<br />

simple and can be updated onl<strong>in</strong>e efficiently. Each weak<br />

classifier is a decision stump on one feature dimension, thus<br />

the number of weak classifiers M is fixed, which equals to<br />

the number of features d . These weak classifiers are<br />

learned onl<strong>in</strong>e, and the approximated weighted classification<br />

error (4) is updated when a new tra<strong>in</strong><strong>in</strong>g sample comes.<br />

For a categorical feature f , the set of attribute values<br />

f<br />

f<br />

C is divided <strong>in</strong>to two subsets p<br />

f<br />

C and C with no<br />

For a cont<strong>in</strong>uous feature f , the range of attribute<br />

values is split by a threshold v , and the decision stump<br />

takes the follow<strong>in</strong>g form:<br />

or<br />

The threshold v and the above two cases are chosen to<br />

m<strong>in</strong>imize the weighted classification error<br />

There is another important issue to be addressed <strong>in</strong><br />

<strong>in</strong>trusion detection, which relates to the detection<br />

performance of the detector. We use the follow<strong>in</strong>g measures<br />

to evaluate the performance of the detection algorithm:<br />

detection rate:<br />

false alarm rate:<br />

n<br />

( 5)<br />

<strong>in</strong>tersection, and the decision stump takes the form as<br />

f<br />

f p<br />

hf (x)<br />

<br />

where x <strong>in</strong>dicates the attribute value of x on the feature<br />

f<br />

f . To avoid the comb<strong>in</strong>atorial computation of exam<strong>in</strong><strong>in</strong>g<br />

f<br />

all possible decision stumps, the division of C is<br />

efficiently constructed <strong>in</strong> the follow<strong>in</strong>g way:<br />

where z is an attribute value on the feature f <strong>in</strong> the<br />

tra<strong>in</strong><strong>in</strong>g data set, ()<br />

equals to 1 if condition ( ) satisfies,<br />

equals to 0 otherwise.<br />

C 1 x <br />

{ 1<br />

f<br />

f n C<br />

( 6)<br />

x <br />

f<br />

C<br />

z ( y 1)<br />

<br />

( y 1)<br />

p x f z<br />

x f z<br />

{ ( 7)<br />

f<br />

Cn<br />

( y 1)<br />

<br />

( y 1)<br />

x f z<br />

x f z<br />

hf (x)<br />

<br />

1 x f v<br />

{ 1 x f v<br />

<br />

<br />

N<br />

DR <br />

N<br />

N<br />

FAR<br />

N<br />

hf (x)<br />

<br />

n h f xn<br />

y n<br />

n<br />

w<br />

: ( )<br />

detected <br />

attack<br />

false<br />

normal<br />

1<br />

{ 1<br />

100 %<br />

100%<br />

x f v<br />

x f v<br />

( 8)<br />

( 9)<br />

( 10)<br />

where N denotes the number of attacks correctly<br />

det ected<br />

detected, N denotes the total number of attacks <strong>in</strong> the<br />

attack<br />

data set, N denotes the number of normal connections<br />

false<br />

that are wrongly detected as attacks, and N denotes the<br />

normal<br />

total number of normal connections. An <strong>in</strong>trusion detection<br />

is expected to have a high DR to guarantee the system<br />

security, a low FAR to decrease unnecessary human burden<br />

and ma<strong>in</strong>ta<strong>in</strong> the normal use of network.<br />

Unfortunately, the tra<strong>in</strong><strong>in</strong>g data set is often unbalanced<br />

between normal and <strong>in</strong>trusion data. This will br<strong>in</strong>g great<br />

difficulties <strong>in</strong> the onl<strong>in</strong>e boost<strong>in</strong>g based detection method. In<br />

fact, with the identical <strong>in</strong>itial weight 1<br />

for each tra<strong>in</strong><strong>in</strong>g<br />

sample <strong>in</strong> onl<strong>in</strong>e boost<strong>in</strong>g, the importance of each b<strong>in</strong>ary<br />

class relates to the number of positive and negative samples<br />

<strong>in</strong> the tra<strong>in</strong><strong>in</strong>g data. Generally, the more negative samples<br />

are concerned, the higher DR is obta<strong>in</strong>ed, with a higher FAR;<br />

the more positive samples are concerned, the lower DR is<br />

obta<strong>in</strong>ed, with a lower FAR. So it is very difficult to meet<br />

the requirement of a high DR and a low FAR at the same<br />

time. What is more practical is to make a balance between<br />

the two requirements <strong>in</strong> the specific application.<br />

In order to balance the requirements of DR and FAR,<br />

we <strong>in</strong>troduce a parameter r (<br />

0,<br />

1)<br />

<strong>in</strong> the sett<strong>in</strong>g of <strong>in</strong>itial<br />

weight for each tra<strong>in</strong><strong>in</strong>g sample:<br />

{<br />

N<br />

N<br />

N<br />

N<br />

normal<br />

normal<br />

N<br />

N<br />

normal<br />

attack<br />

attack<br />

attack<br />

r<br />

( 1 r)<br />

normal connection<br />

network <strong>in</strong>trusion<br />

( 11)<br />

Through adjust<strong>in</strong>g the parameter r , we can change the<br />

importance of positive and negative samples <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g<br />

process, and then get a balance between DR and FAR. The<br />

selection of r depends on the proportion of normal samples<br />

<strong>in</strong> the tra<strong>in</strong><strong>in</strong>g data, and the requirements of DR/FAR <strong>in</strong> the<br />

specific application.<br />

The above onl<strong>in</strong>e boost<strong>in</strong>g based detection method has<br />

the follow<strong>in</strong>g advantages <strong>in</strong> the <strong>in</strong>trusion detection task:<br />

the weak classifiers (6) and (8) only operate on<br />

<strong>in</strong>dividual feature dimensions, which avoids the<br />

difficulty caused by the large dist<strong>in</strong>ction of value<br />

ranges for different feature dimensions;<br />

the tra<strong>in</strong><strong>in</strong>g of the decision stumps is simple, and the<br />

onl<strong>in</strong>e updat<strong>in</strong>g can be efficiently implemented;<br />

the detection performance is guaranteed by the f<strong>in</strong>al<br />

ensemble of weak classifiers;<br />

a balance between DR and FAR can be achieved<br />

accord<strong>in</strong>g to the requirement of the specific application.<br />

Thus, the proposed onl<strong>in</strong>e boost<strong>in</strong>g based method for<br />

<strong>in</strong>trusion detection seems suitable for practical uses.


2.5 Computational complexity<br />

Computational complexity is an important aspect for<br />

<strong>in</strong>trusion detection algorithms, s<strong>in</strong>ce a fast response to<br />

network <strong>in</strong>trusions is necessary for tak<strong>in</strong>g subsequent<br />

security measurements <strong>in</strong> time. Particularly, when a new<br />

type of attack appears, the ability of accurate detection of the<br />

new attack should be quickly <strong>in</strong>corporated <strong>in</strong>to the detector.<br />

For the proposed onl<strong>in</strong>e boost<strong>in</strong>g based <strong>in</strong>trusion<br />

detection method, the learn<strong>in</strong>g and detection can be done <strong>in</strong><br />

parallel. When a new sample comes, we only need to update<br />

the decision stumps and the related ensemble weights for<br />

each feature dimension, so the computational complexity of<br />

the onl<strong>in</strong>e boost<strong>in</strong>g learn<strong>in</strong>g is only O (d ) , where d is the<br />

number of features of each network connection. Thus the<br />

updat<strong>in</strong>g of the <strong>in</strong>trusion detector can be implemented very<br />

efficiently, and the detector is expected to response quickly<br />

to new types of attack. Moreover, the detection of the<br />

ensemble classifier also has a computational complexity of<br />

O (d ) , which can be implemented <strong>in</strong> real-time.<br />

3 Experiments<br />

3.1 <strong>Intrusion</strong> data set<br />

The KDD Cup 1999 data set [18] is used <strong>in</strong> our<br />

experiments, s<strong>in</strong>ce it is a widely used benchmark data set for<br />

many network-based <strong>in</strong>trusion detection algorithms. It was<br />

used for the 1998 DARPA <strong>in</strong>trusion detection evaluation<br />

program, which was prepared and managed by MIT L<strong>in</strong>coln<br />

Labs. A network environment was set up to simulate a<br />

typical U.S. Air Force LAN, where a wide variety of<br />

<strong>in</strong>trusions were simulated like <strong>in</strong> a real military network.<br />

N<strong>in</strong>e weeks of TCP/IP connection data were collected, and<br />

they were labeled for test<strong>in</strong>g <strong>in</strong>trusion detection algorithms.<br />

For each network connection, 41 features are extracted,<br />

<strong>in</strong>clud<strong>in</strong>g 9 categorical and 32 cont<strong>in</strong>uous features. Attacks<br />

<strong>in</strong> the data set fall <strong>in</strong>to four ma<strong>in</strong> categories:<br />

DOS: denial-of-service;<br />

R2L: unauthorized access from a remote mach<strong>in</strong>e, e.g.,<br />

guess<strong>in</strong>g password;<br />

U2R: unauthorized access to local superuser (root)<br />

privileges;<br />

Prob<strong>in</strong>g: surveillance and other prob<strong>in</strong>g, e.g., port<br />

scann<strong>in</strong>g.<br />

In each of the four categories, there are many low level<br />

attack types. Note that the <strong>in</strong>trusion detection is a b<strong>in</strong>ary<br />

classification, which means that all the types of attack are<br />

expected to be classified <strong>in</strong>to the network <strong>in</strong>trusion class.<br />

Table 3. The KDD Cup 1999 Data Set<br />

Categories Tra<strong>in</strong><strong>in</strong>g data Test data<br />

Normal 97278 60593<br />

DOS 391458 223298<br />

R2L 1126 5993<br />

U2R 52 39<br />

Prob<strong>in</strong>g 4107 2377<br />

Others 0 18729<br />

Total 494021 311029<br />

The numbers of normal connections and each category<br />

of <strong>in</strong>trusions <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g and test data sets are listed <strong>in</strong><br />

Table 3. Note that the test data is not from the same<br />

distribution as the tra<strong>in</strong><strong>in</strong>g data, and it <strong>in</strong>cludes some attack<br />

types not exist<strong>in</strong>g <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g data. This makes the task<br />

more realistic. For more details of the <strong>in</strong>trusion data set,<br />

please refer to [18].<br />

3.2 <strong>Onl<strong>in</strong>e</strong> boost<strong>in</strong>g based <strong>in</strong>trusion detection<br />

To evaluate the detection performance on the KDD<br />

Cup 1999 data set, the <strong>in</strong>trusion detection algorithm is<br />

tra<strong>in</strong>ed on the tra<strong>in</strong><strong>in</strong>g data set, and then tested on the test<br />

data set.<br />

First, we use the proposed method without the<br />

parameter r . A high DR of 91 . 2756%<br />

is obta<strong>in</strong>ed on the<br />

test data. However, the FAR is 8 . 3805%<br />

, which is a little<br />

high for <strong>in</strong>trusion detection. Note that the normal samples<br />

only occupy 20 % of the tra<strong>in</strong><strong>in</strong>g data, which makes the<br />

detector pay too much attention to the attack samples. As a<br />

result, a high FAR is obta<strong>in</strong>ed from the unbalance tra<strong>in</strong><strong>in</strong>g<br />

data.<br />

To get a balance between DR and FAR, the parameter<br />

r is <strong>in</strong>troduced as <strong>in</strong> the proposed detection method. After<br />

test<strong>in</strong>g of some values between 0.05 and 0.95, we f<strong>in</strong>ally get<br />

an appropriate sett<strong>in</strong>g of r 0.<br />

35 . With this value of r , a<br />

much smaller FAR of 2 . 2374%<br />

is obta<strong>in</strong>ed, with a still<br />

acceptable DR of 90 . 1329%<br />

. This result is more suitable for<br />

the <strong>in</strong>trusion detection application.<br />

For comparison, we also test the batch Adaboost<br />

algorithm on the <strong>in</strong>trusion data set. The detection results of<br />

the above algorithms on the test data are listed <strong>in</strong> Table 4.<br />

Note that although learned <strong>in</strong> onl<strong>in</strong>e mode, the detection<br />

accuracy of the onl<strong>in</strong>e boost<strong>in</strong>g based method is comparable<br />

with that of Adaboost which is learned <strong>in</strong> batch mode, and<br />

an ideal small FAR can be obta<strong>in</strong>ed through appropriately<br />

adjust<strong>in</strong>g of the parameter r .<br />

Table 4. <strong>Detection</strong> Results on the Test Data<br />

Algorithms DR (%) FAR (%)<br />

<strong>Onl<strong>in</strong>e</strong> boost<strong>in</strong>g 91.2756 8.3805<br />

<strong>Onl<strong>in</strong>e</strong> boost<strong>in</strong>g (with r 0.<br />

35 ) 90.1329 2.2374<br />

Adaboost 92.6568 2.6686


To focus on some specific <strong>in</strong>trusion types, Jirapumm<strong>in</strong><br />

et al. [19] employed a hybrid neural network model to detect<br />

TCP SYN flood<strong>in</strong>g and port scan attacks. In order to<br />

exam<strong>in</strong>e the detection ability of the specific types of attack,<br />

a smaller data set that conta<strong>in</strong>s only three attack types is<br />

constructed. The samples are randomly selected from the<br />

KDD Cup 1999 data set, and the numbers of samples are<br />

listed <strong>in</strong> Table 5. <strong>Detection</strong> results on the test data are<br />

presented <strong>in</strong> Table 6, <strong>in</strong>clud<strong>in</strong>g also the results given by<br />

Sarasamma et al. [10] for comparison. We can see that the<br />

specific types of <strong>in</strong>trusions are accurately detected by the<br />

onl<strong>in</strong>e boost<strong>in</strong>g based method, and a smaller FAR of 0.17 is<br />

obta<strong>in</strong>ed compared with the other two methods.<br />

Table 5. A Smaller Data Set<br />

Categories Tra<strong>in</strong><strong>in</strong>g data Test data<br />

Normal 20676 60593<br />

Neptune 22783 58001<br />

Satan 316 1633<br />

Portsweep 225 354<br />

Total 44000 120581<br />

Table 6. <strong>Detection</strong> Results for Some Specific Attacks<br />

Algorithms DR (%) FAR (%)<br />

Hybrid Neural Networks [19] 90-99.72 0.06-4.5<br />

Hierarchical K-Map [10] 99.17-99.63 0.34-1.41<br />

Proposed Method 99.55 0.17<br />

3.3 Fast adaption to chang<strong>in</strong>g environments<br />

In the onl<strong>in</strong>e boost<strong>in</strong>g based <strong>in</strong>trusion detection method,<br />

the characteristic of a new type of attack is quickly learned<br />

<strong>in</strong> onl<strong>in</strong>e mode, and the ability to detect the new attack is<br />

soon <strong>in</strong>corporated <strong>in</strong>to the ensemble classifier.<br />

For example, there are two new attack types emerg<strong>in</strong>g,<br />

“guess-passwd” (a R2L attack) and “back” (a DOS attack),<br />

which have not been learned for the current detector. For the<br />

detection algorithms learned <strong>in</strong> batch mode, such as<br />

Adaboost, the detector has to be retra<strong>in</strong>ed from the very<br />

beg<strong>in</strong>n<strong>in</strong>g, and make multi-pass over the entire tra<strong>in</strong><strong>in</strong>g data<br />

<strong>in</strong>clud<strong>in</strong>g the new emerg<strong>in</strong>g ones. However, <strong>in</strong> the proposed<br />

detection method, the detector can be efficiently updated<br />

onl<strong>in</strong>e, and only one pass over the labeled <strong>in</strong>trusion data of<br />

new attack types is needed.<br />

The onl<strong>in</strong>e learn<strong>in</strong>g processes are showed <strong>in</strong> Fig. 1 and<br />

Fig. 2. The horizontal axis <strong>in</strong>dicates the sequence of samples<br />

of this new attack that are be<strong>in</strong>g learned, and the vertical<br />

axis <strong>in</strong>dicates the class label y ~ predicted by the detector.<br />

~ y 1<br />

means the sample is classified as a normal connection,<br />

and ~ y 1<br />

means a network <strong>in</strong>trusion. Note that the<br />

classification results are derived from the updated detector<br />

immediately after onl<strong>in</strong>e learn<strong>in</strong>g of each tra<strong>in</strong><strong>in</strong>g sample.<br />

We can see that the new types of attack are accurately<br />

detected soon after onl<strong>in</strong>e learn<strong>in</strong>g on a few samples (7<br />

samples for “guess-passwd” and 15 samples for “back”).<br />

Figure 1. <strong>Onl<strong>in</strong>e</strong> Learn<strong>in</strong>g of “guess-passwd” Attack<br />

Figure 2. <strong>Onl<strong>in</strong>e</strong> Learn<strong>in</strong>g of “back” Attack<br />

3.4 Comparison with some other algorithms<br />

We also compare the proposed method with some<br />

recent progresses on the <strong>in</strong>trusion detection research. For<br />

example, Esk<strong>in</strong> et al. proposed a geometric framework for<br />

unsupervised anomaly detection <strong>in</strong> [20], and three methods<br />

are presented to detect anomalies <strong>in</strong> the data set. Kayacik et<br />

al. [21] performed <strong>in</strong>trusion detection based on selforganiz<strong>in</strong>g<br />

feature maps. Recently, a multi-layer hierarchical<br />

K-Map was proposed by Sarasamma et al. [10].<br />

<strong>Detection</strong> results of the above methods are listed <strong>in</strong><br />

Table 7. Note that these results are all obta<strong>in</strong>ed us<strong>in</strong>g the<br />

KDD Cup 1999 data set. We can see that the detection<br />

performance of the onl<strong>in</strong>e boost<strong>in</strong>g based method is very<br />

comparable with those of the other methods, and the false<br />

alarm rate is superior to most of exist<strong>in</strong>g methods. Moreover,<br />

the proposed method only needs one pass over the new types<br />

of <strong>in</strong>trusion data <strong>in</strong> order to successfully detect them. This<br />

outperforms the other methods which have to make multipass<br />

over the entire tra<strong>in</strong><strong>in</strong>g data and retra<strong>in</strong> the whole<br />

detector.<br />

Table 7. Comparison with Some Other Algorithms<br />

Algorithms DR (%) FAR (%)<br />

Cluster<strong>in</strong>g [20] 93 10<br />

K-NN [20] 91 8<br />

SVM [20] 91-98 6-10<br />

SOM [21] 89-90.6 4.6-7.6<br />

Hierarchical K-Map [10] 90.94-93.46 2.19-3.99<br />

Proposed Method 90.1329 2.2374


4 Conclusions<br />

In this paper, an efficient onl<strong>in</strong>e boost<strong>in</strong>g based<br />

<strong>in</strong>trusion detection method is proposed. The method<br />

outperforms exist<strong>in</strong>g <strong>in</strong>trusion detection algorithms <strong>in</strong> the<br />

fast adaptability to the chang<strong>in</strong>g environments. The ability to<br />

detect new types of attack is quickly <strong>in</strong>corporated <strong>in</strong>to the<br />

ensemble classifier without retra<strong>in</strong><strong>in</strong>g the whole detector,<br />

and the detection can be performed <strong>in</strong> real-time with high<br />

accuracy. The balance parameter enables us to meet the<br />

requirement of detection performance <strong>in</strong> the specific<br />

application. Experimental results show the potential usage of<br />

the method <strong>in</strong> the <strong>in</strong>trusion detection application. 1<br />

5 References<br />

[1] D. Denn<strong>in</strong>g. “An <strong>Intrusion</strong>-<strong>Detection</strong> Model”. IEEE<br />

Trans. on Software Eng<strong>in</strong>eer<strong>in</strong>g, 13(2): 222-232, February<br />

1987.<br />

[2] W. Lee. “A Data M<strong>in</strong><strong>in</strong>g Framework for Build<strong>in</strong>g<br />

<strong>Intrusion</strong> <strong>Detection</strong> Models”. In Proc. of IEEE Symposium<br />

on Security and Privacy, 120-132, May 1999.<br />

[3] S. Zanero and S. M. Savaresi. “Unsupervised Learn<strong>in</strong>g<br />

Techniques for An <strong>Intrusion</strong> <strong>Detection</strong> System”. In Proc. of<br />

ACM Symposium on Applied Comput<strong>in</strong>g, 2004.<br />

[4] M. Otey, A. Ghot<strong>in</strong>g, and S. Parthasarathy. “Fast<br />

Distributed Outlier <strong>Detection</strong> <strong>in</strong> Mixed Attribute Data Sets”.<br />

Data M<strong>in</strong><strong>in</strong>g and Knowledge Discovery, 12(2-3): 203-228,<br />

May 2006.<br />

[5] A. Bivens, C. Palagiri, R. Smith, B. Szymanski, and M.<br />

Embrechts. “Network-based <strong>Intrusion</strong> <strong>Detection</strong> us<strong>in</strong>g<br />

Neural Networks”. In Proc. of ANNIE, 2002.<br />

[6] E. Esk<strong>in</strong>, A. Arnold, M. Prerau, L. Portnoy, and S.<br />

Stolfo. “A Geometric Framework for Unsupervised<br />

Anomaly <strong>Detection</strong>: Detect<strong>in</strong>g <strong>Intrusion</strong>s <strong>in</strong> Unlabeled<br />

Data”. Data M<strong>in</strong><strong>in</strong>g for Security Applications, 2002.<br />

[7] P. Hong, D. Zhang, and T. Wu. “An <strong>Intrusion</strong><br />

<strong>Detection</strong> Method <strong>Based</strong> on Rough Set and SVM<br />

Algorithm”. In Proc. of Int. Conf. on Communications,<br />

Circuits and Systems, 1127-1130, June 2004.<br />

[8] H. G. Kayacik, A. Z<strong>in</strong>cir-Heywood, and M. Heywood.<br />

“On the Capability of An SOM <strong>Based</strong> <strong>Intrusion</strong> <strong>Detection</strong><br />

System”. In Proc. of Int. Jo<strong>in</strong>t Conf. on Neural Networks,<br />

1808-1813, 2003.<br />

1 This work is supported <strong>in</strong> part by the National Natural Science<br />

Foundation under Grants 60520120099 and 60672040 and <strong>in</strong> part<br />

by the National 863 High-Tech R&D Program of Ch<strong>in</strong>a under<br />

Grant 2006AA01Z453.<br />

[9] K. Labib and R. Vemuri. “NSOM: A Real-time<br />

Network-based <strong>Intrusion</strong> <strong>Detection</strong> System us<strong>in</strong>g Self-<br />

Organiz<strong>in</strong>g Maps”. Networks and Security, 2002.<br />

[10] S. T. Sarasamma, Q. A. Zhu, and J. Huff. “Hierarchical<br />

Kohonenen Net for Anomaly <strong>Detection</strong> <strong>in</strong> Network<br />

Security”. IEEE Trans. on Systems, Man, and Cybernetics<br />

(B), 35(2): 302-312, April 2005.<br />

[11] A. Lazarevic, A. Ozgur, L. Ertoz, J. Srivastava, and V.<br />

Kumar. “A Comparative Study of Anomaly <strong>Detection</strong><br />

Schemes <strong>in</strong> Network <strong>Intrusion</strong> <strong>Detection</strong>”. In Proc. Of<br />

SIAM Conf. on Data M<strong>in</strong><strong>in</strong>g, 2003.<br />

[12] D. Song, M. I. Heywood, and A. N. Z<strong>in</strong>cir-Heywood.<br />

“Tra<strong>in</strong><strong>in</strong>g Genetic Programm<strong>in</strong>g on Half A Million Patterns:<br />

An Example From Anomaly <strong>Detection</strong>”. IEEE Trans. on<br />

Evolutionary Computation, 9(3): 225-239, June 2005.<br />

[13] Y. Freund and R. E. Schapire. “A Decision-theoretic<br />

Generalization of On-l<strong>in</strong>e Learn<strong>in</strong>g and An Application to<br />

<strong>Boost<strong>in</strong>g</strong>”. Journal of Computer and System Sciences, 55(1):<br />

119-139, August 1997.<br />

[14] R. Viola and M. Jones. “Rapid Object <strong>Detection</strong> us<strong>in</strong>g<br />

A Boosted Cascade of Simple Features”. In Proc. of<br />

Computer Vision and Pattern Recognition, 511-518, 2001.<br />

[15] K. Tieu and R. Viola. “<strong>Boost<strong>in</strong>g</strong> Image Retrieval”.<br />

International Journal of Computer Vision, 56(1-2): 17-36,<br />

2004.<br />

[16] N. Oza. “<strong>Onl<strong>in</strong>e</strong> Ensemble Learn<strong>in</strong>g”. PhD thesis,<br />

University of California, Berkeley, 2001.<br />

[17] H. Grabner and H. Bischof. “On-l<strong>in</strong>e <strong>Boost<strong>in</strong>g</strong> and<br />

Vision”. In Proc. of Computer Vision and Pattern<br />

Recognition, 260-267, 2006.<br />

[18] S. Stolfo et al. The third <strong>in</strong>ternational knowledge<br />

discovery and data m<strong>in</strong><strong>in</strong>g tools competition, 1999.<br />

[19] C. Jirapumm<strong>in</strong>, N. Wattanapongsakorn, and P.<br />

Kanthamanon. “Hybrid Neural Networks for <strong>Intrusion</strong><br />

<strong>Detection</strong> Systems”. In Proc. of The 2002 International<br />

Technical Conference on Circuits/Systems, Computers and<br />

Communications (ITC-CSCC 2002), 928-931, July 2002.<br />

[20] E. Esk<strong>in</strong>, A. Arnold, M. Prerau, L. Portnoy, and S.<br />

Stolfo. “A Geometric Framework for Unsupervised<br />

Anomaly <strong>Detection</strong>: Detect<strong>in</strong>g <strong>Intrusion</strong>s <strong>in</strong> Unlabeled<br />

Data”. Data M<strong>in</strong><strong>in</strong>g for Security Applications, 2002.<br />

[21] H. G. Kayacik, A. Z<strong>in</strong>cir-Heywood, and M. Heywood.<br />

“On the Capability of An SOM <strong>Based</strong> <strong>Intrusion</strong> <strong>Detection</strong><br />

System”. In Proc. of Int. Jo<strong>in</strong>t Conf. on Neural Networks,<br />

1808-1813, 2003.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!