Views
5 years ago

April 10, 2011 Salzburg, Austria - WOMBAT project

April 10, 2011 Salzburg, Austria - WOMBAT project

7. CONCLUSIONS We have

7. CONCLUSIONS We have described a number of large-scale datasets collected on a high-speed backbone link. The datasets have been far from trivial to collect, and for that reason we shared the challenges we faced as well as our solutions for processing the large-scale data. To exemplify the analysis process, we used the Antispam dataset to concretely discuss the collection and analysis of a large-scale dataset. This included our methodology for anonymization, i.e. the removal of any user-sensitive data in such a way that also allowed accurate traffic analysis, as well as a discussion of applying graph-theoretical thechniques to the generated e-mail network. To the best of our knowledge, this e-mail network is the largest that has been used to study the characteristics of such networks. We could find clear differences in the communication patterns of spam and ham traffic, something that we suggest can be used to both discriminate between them on the network level and to create more complete simulation models. The described type of data collection is necessary for such analysis since most other contemporary data collection approaches either lack participants’ e-mail addresses or do not have any legitimate traffic. We believe that the collection of large-scale datasets such as the datasets presented in this paper is crucial for understanding the behavior of the Internet and its applications. Security research in particular needs contemporary Internet traffic in order to show the usefulness and correctness of security mechanisms and algorithms. 8. ACKNOWLEDGMENTS This work was supported by .SE – The Internet Infrastructure Foundation, SUNET, and the Swedish Civil Contingencies Agency (MSB). The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 257007. 9. REFERENCES [1] M. Almgren and W. John. Tracking malicious hosts on a 10gbps backbone link. In 15th Nordic Conference in Secure IT Systems (NordSec 2010), 2010. [2] P. R. V. Boas, F. A. Rodrigues, G. Travieso, and L. da F Costa. Sensitivity of complex networks measurements. Statistical Mechanics, 2010. [3] P. O. Boykin and V. P. Roychowdhury. Leveraging social networks to fight spam. Computer, 38(4), 2005. [4] S. Donnelly. Endace dag timestamping whitepaper, endace,http://www.endace.com/, 2007. [5] W. K. Ehrlich, A. Karasaridis, D. Liu, and D. Hoeflin. Detection of spam hosts and spam bots using network flow traffic modeling. In Proceedings of the 3rd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more, LEET’10, pages 7–7, 2010. [6] L. H. Gomes, C. Cazita, J. M. Almeida, V. Almeida, and W. Meira, Jr. Workload models of spam and legitimate e-mails. Perform. Eval., 64(7-8), 2007. [7] W. John. Characterization and Classification of 67 Internet Backbone Traffic. PhD dissertation, Chalmers University of Technology, 2010. [8] W. John, M. Dusi, and k. claffy. Estimating routing symmetry on single links by passive flow measurements. In Proc. of the Wireless Communications and Mobile Computing Conference, 2010. [9] W. John and T. Olovsson. Detection of malicious traffic on backbone links via packet header analysis. Campus-Wide Information Systems, 25(5), 2008. [10] W. John and S. Tafvelin. Analysis of internet backbone traffic and header anomalies observed. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, IMC ’07, pages 111–116, 2007. [11] W. John and S. Tafvelin. Differences between in- and outbound Internet Backbone Traffic. In TERENA Networking Conference (TNC), 2007. [12] W. John, S. Tafvelin, and T. Olovsson. Trends and Differences in Connection-behavior within Classes of Internet Backbone Traffic. In Passive/Active Measurement (PAM), 2008. [13] W. John, S. Tafvelin, and T. Olovsson. Passive Internet measurement: Overview and guidelines based on experiences. Computer Communications, 33(5), 2010. [14] J. Klensin. Simple Mail Transfer Protocol. RFC 5321 http://www.ietf.org/rfc/rfc5321.txt, 2008. [15] D. Moore, K. Keys, R. Koga, E. Lagache, and k. claffy. The CoralReef Software Suite as a Tool for System and Network Administrators. In USENIX LISA, 2001. [16] D. Moore, C. Shannon, G. Voelker, and S. Savage. Network Telescopes. Tech.rep., CAIDA, 2004. [17] F. Moradi, T. Olovsson, and P. Tsigas. Analyzing the social structure and dynamics of e-mail and spam in massive backbone internet traffic. Technical report, Chalmers University of Technology, no. 2010-03, 2010. [18] A. Pathak, Y. C. Hu, and Z. M. Mao. Peeking into spammer behavior from a unique vantage point. In Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pages 3:1–3:9, 2008. [19] A. Ramachandran and N. Feamster. Understanding the network-level behavior of spammers. SIGCOMM, 36(4), 2006. [20] D. Schatzmann, M. Burkhart, and T. Spyropoulos. Inferring spammers in the network core. In Passive and Active Measurement Conference, 2009. [21] J. Xu, J. Fan, M. Ammar, and S. B. Moon. On the design and performance of prefix-preserving ip traffic trace anonymization. In Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, IMW ’01, pages 263–266, New York, NY, USA, 2001. ACM. [22] M. Zhang, M. Dusi, W. John, and C. Chen. Analysis of udp traffic usage on internet backbone links. In Proceedings of the 2009 Ninth Annual International Symposium on Applications and the Internet, pages 280–281, 2009.

An Experimental Study on the Measurement of Data Sensitivity ABSTRACT Youngja Park, Stephen C. Gates, Wilfried Teiken, Pau-Chen Cheng IBM T. J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598, USA {young_park, scgates, wteiken, pau}@us.ibm.com Data-centric security proposes to leverage the business value of data to determine the level of overall IT security. It has gained much enthusiasm from the security community, but has not materialized into a practical security system. In this paper, we introduce our recent work towards fine-grained data centric security, which estimates the sensitivity of enterprise data semi-automatically. Specifically, the categories of sensitive data and their relative sensitivities are initially determined by subject matter experts (SMEs). We then apply a suite of text analytics and classification tools to automatically discover sensitive information in enterprise data, such as personally identifiable information (PII) and confidential documents, and estimates the sensitivity of individual data. To validate the idea, we developed a proof-of-concept system that crawls all the files in a personal computer and estimates the sensitivity of individual files and the overall sensitivity level of the computer. We conducted a pilot test at a large IT company with its employees’ laptops. The pilot scanned 28 different laptops, in which 2.2 million files stored in various file formats were analyzed. Specifically, the files were analyzed to determine if they contain any of the pre-defined sensitive information, comprising 11 different PII types and 11 sensitive topics. In addition to the sensitivity estimation, we also conducted a risk survey to estimate the risk level of the laptops. We found that, surprisingly, 7% of the analyzed files belong to one of the eleven sensitive data categories defined by the SMEs of the company, and 37% of the files contain at least one piece of sensitive information such as address or person name. The analysis also discovered that the laptops have similar overall sensitivity levels, but a few machines have exceptionally high sensitivity. Interestingly, those few highly sensitive laptops were also most at risk of data loss and of malware infection, according to user survey responses. Furthermore, the tool produces the evidence of the discovered sensitive information including the surrounding context Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Badgers’11, 10-APR-2011, Salzburg, Austria. Copyright 2011 ACM /04/11 ...$10.00. 68 in the document, and thus users can easily redact the sensitive information or move it to a more secure location. Thus, this system can be used as a privacy enhancing tool as well as a security tool. 1. INTRODUCTION Identity theft and large-scale data leakage incidents are rapidly growing in recent years [1]. Loss or exposure of data, especially highly sensitive data, can cause a great deal of damage, both tangible and intangible, to the enterprise. Data-centric security has been proposed to provide fine-grained security to important and sensitive data [2]. The concept has gained much enthusiasm among security researchers and practitioners alike, but it has not fully materialized into a practical security system, mostly because enterprises do not have a clear idea of where their sensitive data resides [3]. Recently, many data loss protection (DLP) solutions have been proposed to prevent sensitive information from being leaked externally [4, 5, 6]. The state-ofthe-art technologies for DLP aim to discover sensitive information in data (e.g., for regulatory compliances such as HIPAA 1 and PCI-DSS 2 ), but do not have any automated mechanisms to measure the value or sensitivity of individual data items. For instance, these systems treat a file with a credit card number and a file with 100 credit card numbers equally. We argue that security protection should be commensurate with the value or the sensitivity level of the data, and propose a new systematic, end-to-end approach for helping large enterprises manage the security risks associated with their data. Specifically, our system aims to automatically estimate the sensitivity of enterprise data based on their content and the risk level of the data based on the usage patterns. To validate the idea, we developed a proof-of-concept (PoC) of the system, which primarily focuses on the sensitivity estimation of unstructured text, and conducted a pilot test at a large IT company with its employees’ laptops. The system scans the files in a personal computer and estimates the sensitivity of individual files and the overall sensitivity level of the computer. For risk assessment, the system estimates the risk level of laptop theft and infection by malware through a user survey. The goal is to find individual laptops which have high levels of sensitive data on them, and which are at 1 Health Insurance Portability and Accountability Act 2 Payment Card Industry Data Security Standard

D06 (D3.1) Infrastructure Design - WOMBAT project
6-9 December 2012, Salzburg, Austria Social Programme
D I P L O M A R B E I T - Salzburg Research
D I P L O M A R B E I T - Salzburg Research
D I P L O M A R B E I T - Salzburg Research
ECCMID meeting Vienna, Austria 10-13 April 2010 - European ...
Communication Plan for EGU 2011 April 3-8, 2011, Vienna, Austria
8th Liquid Matter Conference September 6-10, 2011 Wien, Austria ...
8th Liquid Matter Conference September 6-10, 2011 Wien, Austria ...
April 10, 2011 - University of Cambridge
Top 10 Project Management Trends for 2011 from ESI International