Views
5 years ago

April 10, 2011 Salzburg, Austria - WOMBAT project

April 10, 2011 Salzburg, Austria - WOMBAT project

Statistical Analysis of

Statistical Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation Jungsuk Song National Institute of Information and Communications Technology (NICT) song@nict.go.jp Masashi Eto National Institute of Information and Communications Technology (NICT) eto@nict.go.jp Abstract With the rapid evolution and proliferation of botnets, largescale cyber attacks such as DDoS, spam emails are also becoming more and more dangerous and serious cyber threats. Because of this, network based security technologies such as Network based Intrusion Detection Systems (NIDSs), Intrusion Prevention Systems (IPSs), firewalls have received remarkable attention to defend our crucial computer systems, networks and sensitive information from attackers on the Internet. In particular, there has been much effort towards high-performance NIDSs based on data mining and machine learning techniques. However, there is a fatal problem in that the existing evaluation dataset, called KDD Cup 99’ dataset, cannot reflect current network situations and the latest attack trends. This is because it was generated by simulation over a virtual network more than 10 years ago. To the best of our knowledge, there is no alternative evaluation dataset. In this paper, we present a new evaluation dataset, called Kyoto 2006+, built on the 3 years of real traffic data (Nov. 2006 ∼ Aug. 2009) which are obtained from diverse types of honeypots. Kyoto 2006+ dataset will greatly contribute to IDS researchers in obtaining more practical, useful and accurate evaluation results. Furthermore, we provide detailed analysis results of honeypot data and share our experiences so that security researchers are able to get insights into the trends of latest cyber attacks and the Internet situations. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. BADGERS ’11 April 10-13, 2011, Salzburg. Copyright c○ 2011 ACM [to be supplied]. . . $10.00 Hiroki Takakura Information Technology Center, Naogya University takakura@itc.nagoya-u.ac.jp Daisuke Inoue National Institute of Information and Communications Technology (NICT) dai@nict.go.jp 27 Yasuo Okabe Academic Center for Computing and Media Studies, Kyoto University okabe@i.kyoto-u.ac.jp Koji Nakao National Institute of Information and Communications Technology (NICT) ko-nakao@nict.go.jp Keywords NIDS, Honeypot Data, Kyoto 2006+ Dataset 1. Introduction In general, a botnet is referred as a collection of infected hosts, i.e., zombie PCs or bots, and the botnet herders use their botnets for launching large-scale cyber attacks such as Distributed Denial of Service (DDoS), spam emails, network scanning and so on. Also, in many cases, they rent the services of the botnets out to third parties who want to advertise their products or to attack a certain victim host. Due to the rapid evolution and proliferation of the botnets, network based security technologies such as Network based Intrusion Detection Systems (NIDSs), Intrusion Prevention Systems (IPSs), firewalls have received remarkable attention to defend our crucial computer systems, networks and sensitive information from attackers on the Internet. In particular, there has been much effort towards high-performance NIDSs based on data mining and machine learning techniques[1, 2]. In intrusion detection field, KDD Cup 99’ dataset[3] has been used for a long time as evaluation data of NIDSs. However, there is a fatal problem in that the KDD Cup 99’ dataset cannot reflect current network situations and the latest attack trends. This is because it was generated by simulation over the virtual network more than 10 years ago. To the best of our knowledge, there is no alternative evaluation dataset for NIDSs. This is because it is quite difficult to get high-quality real traffic data which contain both normal and attack data for long time constantly. In addition, it is extremely time-consuming to label traffic data as either normal or intrusion, because security experts have to inspect every traffic data and classify them accurately. To make matters worse, due to the privacy and competitive issues, many organizations and researcher do not share their

data with other institutions and researchers, even if they have real traffic data. In this paper, we present a new evaluation dataset, called Kyoto 2006+, built on the 3 years of real traffic data (Nov. 2006 ∼ Aug. 2009). It consists of 14 statistical features derived from KDD Cup 99’ dataset as well as 10 additional features which can be used for further analysis and evaluation of NIDSs. By using Kyoto 2006+ dataset, IDS researchers and operators are able to obtain more practical, useful and accurate evaluation results. Furthermore, we provide very detailed analysis results of honeypot data using five criteria (i.e., SNS7160 IDS[6], ClamAV software[7], Ashula[5], source IP addresses and destination ports), and share our experiences so that security researchers are able to get insights into the trends of latest cyber attacks and the Internet situations. Our key findings are: • about 50% of cyber attacks were launched from China, United States and South Korea; • the total number of unique IDS alerts, AV alerts, shellcodes, source IP addresses and destination ports was 290, 832, 231, 4,420,971 and 61,942, respectively; • the average number of unique IDS alerts, AV alerts, shellcodes, source IP addresses and destination ports in each day was 41, 5.5, 9, 5,851 and 557, respectively; • MSSQL StackOverflow (29%), SMB Large Return Field (17%) and Too Many SYNs for a TCP Connection (12%) occupied about 60% of the all IDS alerts; • most of AV alerts were related with Trojan, Worm, Phishing and Email; • only a single shellcode (ID 58) occupied about 88% of the all shellcodes, which is used for exploiting the vulnerability of MS02-039[13] or CAN-2002-0649[9] and its malware name is MS-SQL Slammer[14]; • top 6 destination ports (i.e., 445, 22, 0, 80, 139, 1434) occupied about 70% of the all destination ports; • 27 new shellcodes related with Win32/Conficker.A worm were detected during its development period (from Oct. 29th 2008 to Nov. 21st 2008). The rest of the paper is organized as follows. In Section 2, we describe overview of honeypots used for obtaining real traffic data briefly. In Section 3, we show our honeypot data and present their analysis results in detail. In Section 4, we introduce Kyoto 2006+ dataset built by honeypot data. Finally, Section 5 gives some conclusion and future work. 2. Overview of honeypots Table 1 shows the types of honeypots used for collecting real traffic data. In fact, we used many different types of real and virtual machines as our honeypots such as Windows machines (e.g., Windows XP SP2, full patched Windows XP, 28 Windows XP with no patch), Linux/Unix machines (e.g., Solaris 8, MacOS X), dedicated honeypots introduced in [4], network printer, home appliance (e.g., TV set, HDD Recorder) and so on. Also, we have deployed SGNET honeypots[15]. We have deployed these various types of honeypots on the 5 different networks which are inside and outside of Kyoto University: 1 class A and 4 class B networks. The total number of our honeypots are 348 including two black hole sensors with 318 unused IP addresses. Most of our honeypots are rebooted immediately after a malicious outgoing packet is observed. At the reboot, an image of HDD is overwritten by the original one, so that the honeypots return to the original condition. Some Windows based honeypots are, however, allowed to run for several weeks. Because we deploy in-line IPS between these honeypots and the Internet, all of detectable attacks to outside are blocked and we write custom signatures to detect exploit codes by using Ashula[5]. Table 1. Overview of honeypots Type Number of machines Solaris 8 (Symantec based) 4 Windows XP (full patch) 1 Windows XP (no patch) 5 Windows XP SP2 2 Windows Vista 1 Windows 2000 Server 1 MacOS X 2 (one is mail server) Printer 2 TV set 1 HDD recorder 1 dedicated honeypots[4] 5 SGNET honeypots[15] 4 Web Crawler 1 Balck hole sensor /24 1 Balck hole sensor /26 1 We have collected all traffic data to/from our honeypots, and thoroughly inspected them using three security softwares, SNS7160 IDS system[6], Clam AntiVirus software[7] and Ashula[5] which is a dedicated shellcode detection software, so that we can identify what happened on the networks. Currently we use only ClamAV and its detection patterns are being updated every hour. Also, since Apr. 1st 2010, we have deployed another IDS provided by Soucrefire, i.e., snort. Before that, we only used SNS7160. The detailed analysis results of honeypot data using those security softwares are described in subsections 3.2, 3.3 and 3.4. On the other hand, since most of the honeypot traffic data are composed of attack data, we need to prepare normal traffic data in order to build evaluation dataset for IDS. For generating normal traffic data, we have deployed a server on the same network with the above honeypots, and the server has two main functions of mailing service and DNS server

D06 (D3.1) Infrastructure Design - WOMBAT project
6-9 December 2012, Salzburg, Austria Social Programme
D I P L O M A R B E I T - Salzburg Research
D I P L O M A R B E I T - Salzburg Research
D I P L O M A R B E I T - Salzburg Research
ECCMID meeting Vienna, Austria 10-13 April 2010 - European ...
Communication Plan for EGU 2011 April 3-8, 2011, Vienna, Austria
8th Liquid Matter Conference September 6-10, 2011 Wien, Austria ...
8th Liquid Matter Conference September 6-10, 2011 Wien, Austria ...
April 10, 2011 - University of Cambridge
Top 10 Project Management Trends for 2011 from ESI International