09.08.2015 Views

GUPT: Privacy Preserving Data Analysis Made Easy - Computer ...

GUPT: Privacy Preserving Data Analysis Made Easy - Computer ...

GUPT: Privacy Preserving Data Analysis Made Easy - Computer ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

DATASETBLOCKSPROGRAMTT 1 T 2 T 3… T lfAverage…f f ff(T 1 ) f(T 2 ) f(T 3 ) f(T l )+ Laplacian noisePrivate outputFigure 1: An instantiation of the Sample and AggregateFramework [24].Section 4.1 for more details.) Finally, a differentially privateaverage of the O i’s is calculated by adding Laplace noise(scaled according to the output range). This noisy final outputis now differentially private. The complete algorithm isprovided in Algorithm 1. Note that the choice of number ofblocks l = n 0.4 is from [24], used here for completeness. Forimproved choices of l, see Section 4.3.<strong>GUPT</strong> extends the conventional SAF described above inthe following ways: i) Resampling: <strong>GUPT</strong> introduces theuse of data resampling to improve the experimental accuracyof SAF, without degrading the privacy guarantee; ii) Optimalblock allocation: <strong>GUPT</strong> further improves experimentalaccuracy by finding the better block sizes (as comparedto the default choice of n 0.6 ) using the aging of sensitivitymodel explained in Section 3.3.2.2 Related WorkA number of advances in differential privacy have soughtto improve the accuracy of very specific types of data queriessuch as linear counting queries [14], graph queries [12] andhistogram analysis [11]. A recent system called PASTE [21]allows queries on time series data where the data is storedon distributed nodes and no trust is laid on the central aggregator.In contract to PASTE, <strong>GUPT</strong> trusts the aggregatorwith storing all of the data but provides a flexiblesystem that supports many different types of data analysisprograms.While systems tailored for specific tasks could potentiallyachieve better output accuracy, <strong>GUPT</strong> trades this for thegenerality of the platform. We show through experimentalresults that <strong>GUPT</strong> achieves reasonable accuracy for problemslike clustering and regression, and can even performbetter than the existing customized systems.Other differential privacy systems such as PINQ [16] andAiravat [22] have also attempted to operate on a wide varietyof data queries. PINQ (<strong>Privacy</strong> INtegrated Queries)proposed programming constructs which enable applicationdevelopers to write differentially private programs using basicfunctional building blocks of differential privacy (e.g.,exponential mechanism [17], noisy counts [5] etc.). It doesnot consider the application developer to be an adversary.It further requires the developers to rewrite the applicationto make use of the PINQ primitives. On the other hand,Airavat was the first system that attempted to run unmodifiedprograms in a differentially private manner. It howeverrequired the programs to be written for the Map-Reduceprogramming paradigm [4]. Further, Airavat only considersthe map program to be an “untrusted” computation whilethe reduce program is “trusted” to be implemented in a differentiallyprivate manner. In comparison, <strong>GUPT</strong> allows forthe private analysis of a wider range of unmodified programs.<strong>GUPT</strong> also introduces techniques that allow data analyststo specify their privacy budget in units of output accuracy.Section 7.3 presents a detailed comparison of <strong>GUPT</strong> withPINQ, Airavat and the sample and aggregate framework.Similar to iReduct [28], <strong>GUPT</strong> introduces techniques thatreduce the relative error (in contrast to absolute error). Bothsystems use a smaller privacy budget for programs that producelarger outputs, as the relative error would be smallas compared programs that generate smaller values for thesame absolute error. While iReduct optimizes the distributionof privacy budget across multiple queries, <strong>GUPT</strong>matches the relative error to the privacy budget of individualqueries.3. PROBLEM SETUPThere are three logical parties:1. The analyst/programmer, who wishes to perform aggregatedata analytics over sensitive datasets. Ourgoal is to make <strong>GUPT</strong> easy to use for an average programmerwho is not a privacy expert.2. The data owner, who owns one or more datasets, andwould like to allow analysts to perform data analyticsover the datasets without compromising the privacy ofusers in the dataset.3. The service provider, who hosts the <strong>GUPT</strong> service.The separation between these parties is logical; in reality,either the data owner or a third-party cloud service providercould host <strong>GUPT</strong>.Trust assumptions: We assume that the data owner andthe service provider are trusted, and that the analyst is untrusted.In particular, the programs supplied by the analystmay act maliciously and try to leak information. <strong>GUPT</strong>defends against such attacks using the security mechanismsproposed in Section 6.3.1 <strong>GUPT</strong> Overview1. <strong>Data</strong> Set2. <strong>Privacy</strong>↵Budget (ε)<strong>Data</strong> Owner1. Computation2. Accuracy3. Output RangeWeb Frontend <strong>Data</strong> Set Manager Comp Mgr XML RPC Layer Untrusted Computa4on <strong>Data</strong> AnalystDifferentiallyPrivate AnswerIsolated Execu.on Chambers Computa4on Manager Isolated Execu.on Chambers Isolated Execu.on Chambers Figure 2: Overview of <strong>GUPT</strong>’s ArchitectureThe building blocks of <strong>GUPT</strong> is shown in Figure 2:• The dataset manager is a database that registers in-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!