Online Boosting Based Intrusion Detection in Changing Environments

More documents

Recommendations

Info

The trainer of weak classifier, i.e. L in Table 1, often b selects a weak classifier from a specific function space based on the criterion below: ( m) ( m) Lb ({( x1, y1),..., ( xN , yN )}, w ) argmin h The complexity of L relates to the function space b from which h is selected. L in Table 2 is an online version of o L . In order to meet the requirement of quick online b updating of the ensemble detector in intrusion detection, the online learning of weak classifiers should be efficiently implemented. We choose the weak classifiers to be the set of decision stumps on each feature dimension, since they are very simple and can be updated online efficiently. Each weak classifier is a decision stump on one feature dimension, thus the number of weak classifiers M is fixed, which equals to the number of features d . These weak classifiers are learned online, and the approximated weighted classification error (4) is updated when a new training sample comes. For a categorical feature f , the set of attribute values f f C is divided into two subsets p f C and C with no For a continuous feature f , the range of attribute values is split by a threshold v , and the decision stump takes the following form: or The threshold v and the above two cases are chosen to minimize the weighted classification error There is another important issue to be addressed in intrusion detection, which relates to the detection performance of the detector. We use the following measures to evaluate the performance of the detection algorithm: detection rate: false alarm rate: n ( 5) intersection, and the decision stump takes the form as f f p hf (x) where x indicates the attribute value of x on the feature f f . To avoid the combinatorial computation of examining f all possible decision stumps, the division of C is efficiently constructed in the following way: where z is an attribute value on the feature f in the training data set, () equals to 1 if condition ( ) satisfies, equals to 0 otherwise. C 1 x { 1 f f n C ( 6) x f C z ( y 1) ( y 1) p x f z x f z { ( 7) f Cn ( y 1) ( y 1) x f z x f z hf (x) 1 x f v { 1 x f v N DR N N FAR N hf (x) n h f xn y n n w : ( ) detected attack false normal 1 { 1 100 % 100% x f v x f v ( 8) ( 9) ( 10) where N denotes the number of attacks correctly det ected detected, N denotes the total number of attacks in the attack data set, N denotes the number of normal connections false that are wrongly detected as attacks, and N denotes the normal total number of normal connections. An intrusion detection is expected to have a high DR to guarantee the system security, a low FAR to decrease unnecessary human burden and maintain the normal use of network. Unfortunately, the training data set is often unbalanced between normal and intrusion data. This will bring great difficulties in the online boosting based detection method. In fact, with the identical initial weight 1 for each training sample in online boosting, the importance of each binary class relates to the number of positive and negative samples in the training data. Generally, the more negative samples are concerned, the higher DR is obtained, with a higher FAR; the more positive samples are concerned, the lower DR is obtained, with a lower FAR. So it is very difficult to meet the requirement of a high DR and a low FAR at the same time. What is more practical is to make a balance between the two requirements in the specific application. In order to balance the requirements of DR and FAR, we introduce a parameter r ( 0, 1) in the setting of initial weight for each training sample: { N N N N normal normal N N normal attack attack attack r ( 1 r) normal connection network intrusion ( 11) Through adjusting the parameter r , we can change the importance of positive and negative samples in the training process, and then get a balance between DR and FAR. The selection of r depends on the proportion of normal samples in the training data, and the requirements of DR/FAR in the specific application. The above online boosting based detection method has the following advantages in the intrusion detection task: the weak classifiers (6) and (8) only operate on individual feature dimensions, which avoids the difficulty caused by the large distinction of value ranges for different feature dimensions; the training of the decision stumps is simple, and the online updating can be efficiently implemented; the detection performance is guaranteed by the final ensemble of weak classifiers; a balance between DR and FAR can be achieved according to the requirement of the specific application. Thus, the proposed online boosting based method for intrusion detection seems suitable for practical uses.
2.5 Computational complexity Computational complexity is an important aspect for intrusion detection algorithms, since a fast response to network intrusions is necessary for taking subsequent security measurements in time. Particularly, when a new type of attack appears, the ability of accurate detection of the new attack should be quickly incorporated into the detector. For the proposed online boosting based intrusion detection method, the learning and detection can be done in parallel. When a new sample comes, we only need to update the decision stumps and the related ensemble weights for each feature dimension, so the computational complexity of the online boosting learning is only O (d ) , where d is the number of features of each network connection. Thus the updating of the intrusion detector can be implemented very efficiently, and the detector is expected to response quickly to new types of attack. Moreover, the detection of the ensemble classifier also has a computational complexity of O (d ) , which can be implemented in real-time. 3 Experiments 3.1 Intrusion data set The KDD Cup 1999 data set [18] is used in our experiments, since it is a widely used benchmark data set for many network-based intrusion detection algorithms. It was used for the 1998 DARPA intrusion detection evaluation program, which was prepared and managed by MIT Lincoln Labs. A network environment was set up to simulate a typical U.S. Air Force LAN, where a wide variety of intrusions were simulated like in a real military network. Nine weeks of TCP/IP connection data were collected, and they were labeled for testing intrusion detection algorithms. For each network connection, 41 features are extracted, including 9 categorical and 32 continuous features. Attacks in the data set fall into four main categories: DOS: denial-of-service; R2L: unauthorized access from a remote machine, e.g., guessing password; U2R: unauthorized access to local superuser (root) privileges; Probing: surveillance and other probing, e.g., port scanning. In each of the four categories, there are many low level attack types. Note that the intrusion detection is a binary classification, which means that all the types of attack are expected to be classified into the network intrusion class. Table 3. The KDD Cup 1999 Data Set Categories Training data Test data Normal 97278 60593 DOS 391458 223298 R2L 1126 5993 U2R 52 39 Probing 4107 2377 Others 0 18729 Total 494021 311029 The numbers of normal connections and each category of intrusions in the training and test data sets are listed in Table 3. Note that the test data is not from the same distribution as the training data, and it includes some attack types not existing in the training data. This makes the task more realistic. For more details of the intrusion data set, please refer to [18]. 3.2 Online boosting based intrusion detection To evaluate the detection performance on the KDD Cup 1999 data set, the intrusion detection algorithm is trained on the training data set, and then tested on the test data set. First, we use the proposed method without the parameter r . A high DR of 91 . 2756% is obtained on the test data. However, the FAR is 8 . 3805% , which is a little high for intrusion detection. Note that the normal samples only occupy 20 % of the training data, which makes the detector pay too much attention to the attack samples. As a result, a high FAR is obtained from the unbalance training data. To get a balance between DR and FAR, the parameter r is introduced as in the proposed detection method. After testing of some values between 0.05 and 0.95, we finally get an appropriate setting of r 0. 35 . With this value of r , a much smaller FAR of 2 . 2374% is obtained, with a still acceptable DR of 90 . 1329% . This result is more suitable for the intrusion detection application. For comparison, we also test the batch Adaboost algorithm on the intrusion data set. The detection results of the above algorithms on the test data are listed in Table 4. Note that although learned in online mode, the detection accuracy of the online boosting based method is comparable with that of Adaboost which is learned in batch mode, and an ideal small FAR can be obtained through appropriately adjusting of the parameter r . Table 4. Detection Results on the Test Data Algorithms DR (%) FAR (%) Online boosting 91.2756 8.3805 Online boosting (with r 0. 35 ) 90.1329 2.2374 Adaboost 92.6568 2.6686
Page 1 and 2: Online Boosting Based Intrusion Det
Page 3: use of the information supplied by
Page 7: 4 Conclusions In this paper, an eff

Online Boosting Based Intrusion Detection in Changing Environments

Create successful ePaper yourself

Delete template?

Save as template?