09.08.2015 Views

GUPT: Privacy Preserving Data Analysis Made Easy - Computer ...

GUPT: Privacy Preserving Data Analysis Made Easy - Computer ...

GUPT: Privacy Preserving Data Analysis Made Easy - Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Normalized privacy budget lifetime4.03.53.02.52.01.51.00.50.0<strong>GUPT</strong>-helper constant ɛ=1<strong>GUPT</strong>-helper variable ɛ<strong>GUPT</strong>-helper constant ɛ=0.3Figure 8: Increased lifetime of total privacy budgetusing privacy budget allocation mechanismNormalized RMSE0.50.40.30.20.1Median ɛ=2Median ɛ=6Mean ɛ=2Mean ɛ=60 10 20 30 40 50 60 70Block size (β)Figure 9: Change in error for different block sizeson the output whose true average age is 38.5816. Initially,the experiment was run with a constant privacy budgets ofɛ = 1 and ɛ = 0.3. <strong>GUPT</strong> allows the analyst to providelooser constraints such as “90% result accuracy for 90% ofthe results” and allocates only as much privacy budget as isrequired to meet these properties. In this experiment, the10% of the dataset was assumed to be completely privacy insensitiveand was used to estimate ɛ given a pre-determinedblock size. Figure 7 shows the CDF of the output accuracyboth for constant privacy budget values as well as for the accuracyrequirement. Interestingly, not only does the figureshow that the accuracy guarantees are met by <strong>GUPT</strong>, butalso it shows that if the analyst was to define the privacybudget manually (as in the case of ɛ = 1 or ɛ = 0.3), theneither too much or too little privacy budget is used. Theprivacy budget estimation technique thus has the additionaladvantage that the lifetime of the total privacy budget for adataset will be extended. Figure 8 shows that if we were torun the average age query with the above constraints overand over again, <strong>GUPT</strong> will be able to run 2.3 times morequeries than using a constant privacy budget of ɛ = 1.7.2.2 Optimal Block Size EstimationSection 4.3 shows that the estimation error decreases withan increase in data block size, whereas the noise decreaseswith an increased number of blocks. The optimal trade offpoint between the block size and number of data blockswould be different for different queries executed on the dataset.To illustrate the tradeoff, we show results from queries executedon an internet advertisement dataset also from theUCI machine learning repository [7]. Figure 9 shows thenormalized root mean square error (from the true value) inestimating the mean and median aspect ratio of advertisementsshown on Internet pages with privacy budgets ɛ of 2and 6. In the case of the “mean” query, since the averagingoperation is already performed by the sample and aggregateframework, smaller data blocks would reduce the noise<strong>GUPT</strong> PINQ AiravatWorks with unmodified Yes No NoprogramsAllows expressive programs Yes Yes NoAutomated privacy budget Yes No NoallocationProtection against privacy Yes No Yesbudget attackProtection against state Yes No NoattackProtection against timing Yes No NoattackTable 1: Comparison of <strong>GUPT</strong>, PINQ and Airavatadded to the output and thus provide more accurate results.As expected, we see that the ideal block size would be one.For the “median” query, it is expected that increasing theblock size would generate more accurate inputs to the averagingfunction. Figure 9 shows that when the “median”query is executed with ɛ = 2, the error is minimal for ablock size of 10. With increasing block sizes, the noise addedto compensate for the reduction in number of blocks wouldhave a dominating effect. On the other hand, when executingthe same query with ɛ = 6, the error continues to dropfor increased block sizes, as the estimation error dominatesthe Laplace noise (owing to the increased privacy budget).It is thus clear that <strong>GUPT</strong> can significantly reduce the totalerror by estimating the optimal block size for the sampleand aggregate framework.7.3 Qualitative <strong>Analysis</strong>In this section, <strong>GUPT</strong> is contrasted with both PINQ andAiravat on various fronts (see Table 1 for a summary). Wealso list the significant changes introduced by <strong>GUPT</strong> in orderto mold the sample and aggregate framework (SAF) [24]into a practically useful one.Unmodified programs: Because PINQ [16] is an API thatprovides a set of low-level data manipulation primitives, applicationswill need to be re-written to perform all operationsusing these primitives. On the other hand, Airavat [22] implementsthe Map-Reduce programming paradigm [4] andrequires that the analyst splits the user’s data analysis programinto an “untrusted” map program and a reduce aggregatorthat is “trusted” to be differentially private.In contrast, <strong>GUPT</strong> treats the complete application programas a black box and as a result the entire applicationprogram is deemed untrusted.Expressiveness of the program: PINQ provides a limitedset of primitives for data operations. However, if therequired primitives are not already available, then a privacyunaware analyst would be unable to ensure privacy for theoutput. Airavat also severely restricts the expressiveness ofthe programs that can run in it’s framework: a) the “untrusted”map program is completely isolated for each dataelement and cannot save any global state and b) it restrictsthe number of key-value pairs generated from the mapper.Many machine learning algorithms (such as clustering andclassification) require global state and complex aggregationfunctions. This would be infeasible in Airavat without placingmuch of the logic in the “trusted” reducer program.<strong>GUPT</strong> places no restriction on the application program,and thus does not degrade the expressiveness of the program.<strong>Privacy</strong> budget distribution: As was shown in Section

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!