Views
5 years ago

April 10, 2011 Salzburg, Austria - WOMBAT project

April 10, 2011 Salzburg, Austria - WOMBAT project

An Architectural

An Architectural Solution for Data Exchange in Cooperative Network Security Research Brian Trammell ETH Zurich Zurich, Switzerland trammell@tik.ee.ethz.ch 1. INTRODUCTION Science can be seen as a cycle of hypothesis, collection of experimental results, and analysis to refine or refute the original hypothesis. The desire to increase the rigor of largescale computer and network security studies extends to all three of these steps: an improved understanding of the situation leads to better hypotheses, improved collection and sharing of data increases the scope of the studies that can be done, and improved analysis techniques lead to deeper insights and more useful results. However, seeing this cycle as a set of discrete steps has disadvantages in the real world, especially when it comes to the collection step. Most computer security research requires data which have a significant impact on the privacy of the users of the studied systems. Protection of privacy in such studies is a matter of complying with legal obligations to protect the rights of individuals in such studies [1, 2]. Enhanced collection and centralization would indeed seem to violate the principle in European data protection legislation that “only the kind and amount of data that are functional and necessary to the specific processing purpose that is pursued” should be collected. The more useful a data set is, the more detail it tends to have; and the more detail it has, the larger a privacy threat it represents. It is very difficult to separate utility for researchers with a legitimate use for a data set from utility for miscreants. Anonymization techniques can help here but must be used with care; for example, in the area of network traffic traces, it has been shown that anonymization on most useful data sets can be compromised by the asymmetric ease of traffic injection [3]. Other side channels may exist for similar situations. This makes data sharing as a basis for collaboration risky. The EU FP7 integrated research project demons 1 pro- 1 This work was partially supported by DEMONS, a research project supported by the European Commission under its 7th Framework Program (contract no. 257315). The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. BADGERS ’11 Salzburg, Austria Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00. Jan Seedorf NEC Laboratories Europe Heidelberg, Germany seedorf@neclab.eu 101 Giuseppe Bianchi Uni. Roma Tor Vergata Rome, Italy giuseppe.bianchi@uniroma2.it poses an architectural solution to this problem. Instead of the centralization and dissemination of access to large-scale data sets, analysis code can be distributed to local data sets, which remain in the control of their original owners. Access control at each of these local repositories is applied to the incoming code (to ensure it is safe to run) as well as to the outgoing data (to ensure its identifiability is below an acceptable threshold). To further reduce the release of privacy-sensitive results in multi-domain scenarios, we apply secure multiparty computation to generate low-risk aggregates from multiple high-risk single-domain results. This paper gives some background on this architectural proposal, and applications to the security research use case. 2. FOUNDATIONS AND ARCHITECTURE Our proposal is inspired by previous work in the area of programmable measurement. SC2D [7] is an exploration of the properties of such “code mobility” architectures, envisioning a modular architecture built around a standard data model. In SC2D, the smallest unit of processing is a module, and SC2D programs are built by chaining modules together. Module security is handled out of band, via code signing. Trol [6] specifically adds the concept of privacy protection to this architectural reversal. Here, instead of modules, it provides a declarative query language similar to SQL, tailored to network measurement. This restricted language ensures it is possible to measure the privacy risk on data returned from the query, and to prevent the release of data with too much identifiable information, assuming a secure implementation of the interpreter. This last is an important point: when considering code mobility, the security of the implementation is much more important than in traditional centralize-and-process architectures. Here, implementation faults can lead not only to data disclosure but to complete compromise of the hosts providing the analysis services, which could have impacts on the hosting organization beyond the compromised data sharing application. Scriptroute[8], applied to active network measurement by untrusted parties, illustrates some design choices which can minimize security risk in running potentially untrusted code in a restricted interpreter. It provides a language in which a small set of safe, high-level primitives can be combined, followed by a sandbox built around the execution environment to independently place limits on traffic sent. official policies or endorsements, either expressed or implied, of the DEMONS project or the European Commission.

In demons, we apply these concepts to a distributed measurement system which allows dynamic composition of blocks, by chaining them together by their gates, or well defined interfaces, on a network of processing nodes. Processing nodes with packet capturing hardware or other raw data sources take the place of probes in traditional monitoring infrastructures. The functionality of the blocks covers a variety of granularities, from simple and generic primitives (e.g. “count elements”) to whole algorithms (e.g. “find DNS servers which are used in botnet control based on this set of reply packets”). New blocks can be dynamically added to nodes, as well, though we rely on access control, trusted peers and signed code to secure the implementation of the blocks. Access control with awareness of the semantics of each of the blocks is also applied to compositions before they are sent to the nodes, to evaluate the risk that a given composition would result in too much data being exported for a given identity, role, and purpose. Code mobility can be applied to aggressive data reduction. Moving analysis closer to the “edge” at which data is initially collected tends to reduce both the total resource demand as well as the privacy risk of a given analysis. Computational, storage, and bandwidth demand is reduced by throwing away irrelevant data as soon as possible, and data which is not discarded cannot be used to infer identifying information. In some cases, however, code mobility is not enough. Many common large-scale queries in network or security measurement deal with aggregates across multiple administrative domains. Here, the aggregate is not privacy-sensitive, but the intermediate results from each domain are. Here we can apply secure multiparty computation (MPC), which has recently emerged as a viable, scalable approach to secure sharing of computing tasks [4, 5]. In MPC as implemented by the sepia framework used by demons, input peers at each domain compute shared secrets, which represent the source data from a given domain without allowing recovery of the original data. These are then processed together with secrets from other domains by privacy peers to produce an aggregated result. The protocols presently supported by sepia include privacy-preserving aggregate counts, set union and intersection operations, and top-N lists. 3. APPLICATION TO LARGE-SCALE SECURITY DATA SHARING While the main focus of the demons project is the development of a network monitoring environment for operational use, we also intend to develop a platform suitable to collect large-scale data for research applications: The architecture illustrates several principles, and we intend to apply the flexibility of the primitives provided on the demons nodes to collaborative security measurement research problems. The principles advanced within demons we see as applicable to the this application area are as follows: • Mobile analysis of immobile data: Composition of analysis primitives allows safe execution of external code, which in turn allows data to be secured in a single location, reducing the risks associated with distribution of raw data sets. • Aggressive data reduction: At each layer in a given analysis, data no longer necessary for the analysis can 102 be aggregated down or discarded, benefiting both privacy and scalability. • Cryptographically-protected interdomain data sharing: By applying MPC to appropriate intermediate results per-domain, interdomain aggregation is possible without requiring raw data set distribution. 4. CONCLUSION Our proposed architecture differs somewhat from the customary workflow on large shared data sets. However, most any study that can be done in the traditional way can be translated to the model proposed by demons. For example, sequential experiments on a given data set used to verify or compare two given approaches to a given analysis could instead be run simultaneously on a data stream, using the same nodes for observation but different nodes for computation and aggregation. In any consideration of the best methods for advancing the science of network and host security data analysis through collaboration, “how can we improve data collection and sharing” is not necessarily the only question to be answered – architectural approaches such as those advanced by demons are important to consider as well. 5. REFERENCES [1] Directive 2002/58/EC of 12 July 2002, concerning the processing of personal data and the protection of privacy in the electronic communications sector (Directive on privacy and electronic communication), O.J. L 201/37, 31 July 2002. [2] Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data; O. J. L. 281, 23 November 1995. [3] Burkhart, M., Schatzmann, D., Trammell, B., Boschi, E., and Plattner, B., “The Role of Network Trace Anonymisation under Attack”, in ACM Computer Communications Review, 40(1) pp. 6–11, January 2010. [4] Burkhart, M., Strasser, M., Many, D., and Dimitropoulos, X., “SEPIA: Privacy-Preserving Aggregation of Multi0Domain Network Events and Statistics” in Proceedings of the 19th USENIX Security Symposium, Washington, DC, August 2010. [5] Duan, Y., Canny, J., Zhan, J., “P4P: Practical Large-Scale Privacy-Preserving Distributed Computation Robust against Malicious Users” in Proceedings of the 19th USENIX Security Symposium, Washington, DC, August 2010. [6] Mirkovic, J., “Privacy-safe network trace sharing via secure queries”, in NDA ’08: Proceedings of the 1st ACM workshop on Network data anonymization, ACM, pp. 3–10. [7] Mogul, J. C., and Arlitt, M. “SC2D: An Alternative to Trace Anonymization”, in MineNet ’06: Proceedings of the 2006 SIGCOMM workshop on mining network data, ACM, pp. 323–328. [8] Spring, N., Wetherall, D., and Anderson, T. “Scriptroute: A public internet measurement facility.”, in USITS’03: 4th USENIX Symposium on Internet Technologies and Systems, pp. 225–238.

D06 (D3.1) Infrastructure Design - WOMBAT project
6-9 December 2012, Salzburg, Austria Social Programme
D I P L O M A R B E I T - Salzburg Research
D I P L O M A R B E I T - Salzburg Research
D I P L O M A R B E I T - Salzburg Research
ECCMID meeting Vienna, Austria 10-13 April 2010 - European ...
April 10, 2011 - University of Cambridge