Views
5 years ago

April 10, 2011 Salzburg, Austria - WOMBAT project

April 10, 2011 Salzburg, Austria - WOMBAT project

PREDICT: A TRUSTED

PREDICT: A TRUSTED FRAMEWORK FOR SHARING DATA FOR CYBER SECURITY RESEARCH Charlotte Scheper RTI International 3040 Cornwallis Road Durham, NC 37705 1-919-485-5587 cscheper@rti.org ABSTRACT In this paper, we describe the formatting guidelines for ACM SIG The Protected Repository for Defense of Infrastructure against Cyber Threats (PREDICT) has established a trusted framework for sharing real-world security-related datasets for cyber security research. In establishing PREDICT, a set of key issues for sharing these data has been addressed: providing secure, centralized access to multiple sources of data; assuring confidentiality to protect the privacy of the individuals and the security of the networks from which the data are collected; assuring data integrity to protect access to the data and ensure its proper use; and protecting proprietary information and reducing legal risks. PREDICT continues to address issues in producing and sharing datasets as it enters its second phase of development, providing more controversial data, adding data providers, and initiating international participation. Categories and Subject Descriptors H.2.7 [Database Administration]: Data Warehouse and Repository. H.3.4 [Systems and Software]: Distributed Systems, Information Networks. General Terms Management, Legal Aspects. Standardization. Keywords Distributed Repository, Cyber Security, Internet, PREDICT. 1. INTRODUCTION Defensive cyber security technologies have to be improved to address the rapidly changing cyber security threat landscape. However, researchers have insufficient access to data to test their research prototypes and technology decision-makers have no data to evaluate competing products. The White House Cyberspace Policy Review [1] action plan called for providing data to the research community to use to develop tools, test theories, and identify workable solutions for cyber security. The Protected Susanna Cantor RTI International 3040 Cornwallis Road Durham, NC 37705 1-919-541-7323 scantor@rti.org Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Badgers'11, 10-APR-2011, Salzburg, Austria Copyright 2011 ACM /11/04…$10.00. 103 Dr. Douglas Maughan DHS Science & Technology Directorate Washington, DC dmaughan@dhs.gov Figure 1 PREDICT Repository Framework Repository for Defense of Infrastructure against Cyber Threats (PREDICT) is directly addressing this call to action. To provide security-related datasets, PREDICT has had to address a set of key issues: providing secure, centralized access to multiple sources of data; assuring confidentiality to protect the privacy of the individuals and the security of the networks from which the data are collected; assuring data integrity to protect access to the data and ensure its proper use; and protecting proprietary information and reducing legal risks. To address these issues, PREDICT followed a three-pronged approach: develop a framework based on the data sharing models used in other domains that share sensitive data; conduct a full review of the legal context in which data collection and sharing occur; and reach out to the privacy community. 2. PREDICT REPOSITORY FRAMEWORK Following the model of multi-site research networks [2], PREDICT is a distributed repository where multiple data providers collect and prepare data for sharing, multiple data hosts provide computing infrastructure to store the datasets and provide mechanisms to access them, and a central coordinating center (the PCC) provides a unified view of and portal [3] into the repository collection and manages the repository processes for accepting datasets and authorizing access. The PREDICT process includes sensitivity assessments of datasets to determine conditions of use; Memoranda of Agreement between the PCC and the providers,

hosts, and researchers containing legally binding terms and conditions for providing and accessing data; and expert review of data requests. As shown in the framework illustration in Figure 1, the PCC provides information (metadata) about the data collected by the Data Providers, Data Providers work with Data Hosts to store data, approved Researchers browse for datasets of interest and apply for access through the PCC, and once their dataset requests are approved by the PCC, Researchers work directly with the Data Hosts to obtain the datasets. In determining whether a dataset is suitable for inclusion in the repository, the following factors are considered: Who is the provider of the data? Who owns the data? How was the data obtained (i.e., was it intercepted or is it stored data?) What are the Data Provider’s privacy policies and operating procedures? What is contained in the data? The answers to these questions determine the legal risks and impact the conditions of use. In accepting a user into the PREDICT community and granting access to data, the following factors are considered: Who is the Researcher and does he/she have a legitimate cyber security research role? What organization is the researcher affiliated with and will that organization sponsor the researcher? Are the requested datasets suitable for the proposed research? 3. PREDICT LEGAL PROCESS A thorough review of applicable laws and regulations, both federal and state, was conducted in setting up PREDICT. As part of the process of approving datasets and data requests, a legal consultant reviews the policies and procedures and other available documents from providers, identifies legal relationships and agreements needed between PREDICT participants, prepares a risk chart for every dataset that identifies high risk data fields and/or datasets and establishes requirements for high risk fields, and works with the participants and the PCC to prepare Memoranda of Agreement (MOAs), which are legally binding within U.S. jurisdiction. While necessary to protect privacy rights and reduce the legal risk to data providers and researchers, the MOA process can be a hurdle for many researchers. Increasing researchers’ understanding of the legal risks involved, planned revisions to the MOAs, and the upcoming provision of less readily available datasets will increase the value of the return on the effort required. 3.1 Privacy and Legal Outreach During the design of the framework, the PREDICT program conducted a number of outreach activities to the legal and privacy communities. Privacy advocates, including the ACLU, the Electronic Frontier Foundation (EFF), and the Center for Democracy and Technology (CDT), were briefed and their input obtained. Working with the DHS Privacy Office, a Privacy Impact Assessment (PIA) [4] was prepared and government officials, including the DHS S&T General Counsel, the DHS General Counsel, and the Department of Justice, were briefed. This outreach successfully allayed concerns and identified key issues that had to be addressed. 3.2 2010 and Beyond PREDICT currently houses 140 datasets from five data providers. The types of data include BGP Routing Data, Blackhole Address Space Data, Internet Topology Data, IP Packet Headers, Traffic 104 Flow Data, and VOIP Measurement Data. Collection periods for the datasets vary from hours to days to months; the size of the datasets, from Bytes to Terabytes. In 2010, researchers from 24 academic, 1 government, and 31 private sector organizations joined the PREDICT community, resulting in a total community of 65 academic, 12 government, and 48 private sector organizations. PREDICT continues to address issues in producing and sharing datasets, developing a draft report on guidelines for ethical principles in networking and security research similar to the Belmont Report for human subject research, and holding workshops on disclosure control. In Phase II, currently scheduled to being operation in April 2011, PREDICT will expand datasets to include more controversial data such as unsolicited bulk email, DNS data, web logs, infrastructure data, and IDS and firewall data. New data providers will be added and international participation will be piloted through affiliation with research centers that will be responsible for vetting their researchers. In summary, PREDICT is addressing an acknowledged need by providing large-scale, real-world security-related datasets for cyber security research. Significant policy and legal issues exist in collecting and sharing security-related data: many of these have been addressed by PREDICT but many still remain to provide usable data across the entire spectrum of information security R&D activities. 4. AUTHORS The following authors represent RTI International, 3040 Cornwallis Road, Durham, NC 37705: Charlotte Scheper, Susanna Cantor, Renee Karlsen, Sandhya Bikmal, Roger Osborn, Gary Franceschini, Craig Hollingsworth, Al-Nisa Berry, The following author represents DHS Science & Technology (S&T) Directorate: Dr. Douglas Maughan. 5. ACKNOWLEDGEMENT PREDICT is funded by The Department of Homeland Security (DHS) Science and Technology (S&T) Directorate under contract number NBCHC070131. 6. REFERENCES [1] Cyberspace Policy Review: Assuring a Trusted and Resilient Information and Communications Infrastructure. May, 2009. http://www.whitehouse.gov/assets/documents/Cyberspace_Policy _Review_final.pdf. [2] Scheper, C. O., Cantor, S., & Karlsen, R. (2009, March). Trusted distributed repository of internet usage data for use in cyber security research. Proceedings of the Cybersecurity Applications and Technologies Conference for Homeland Security (CATCH). [3] PREDICT Portal. https://www.predict.org. [4] DHS Privacy Impact Assessment. February, 2008. http://www.dhs.gov/xlibrary/assets/privacy/privacy_pia_st_predic t.pdf

D06 (D3.1) Infrastructure Design - WOMBAT project
6-9 December 2012, Salzburg, Austria Social Programme
D I P L O M A R B E I T - Salzburg Research
D I P L O M A R B E I T - Salzburg Research
D I P L O M A R B E I T - Salzburg Research
ECCMID meeting Vienna, Austria 10-13 April 2010 - European ...
April 10, 2011 - University of Cambridge