Privacy with Prior Information

More documents

Recommendations

Info

• assume every secret pair is of the form (s, ¬s) and Pr(s|θ)/ Pr(¬s|θ) ∈ [1/c, c] for some constant c. This is reasonable in that, secret s is only worthy to protect if the adversary doesn’t know much about it in advance. • ɛ and δ are parameters The algorithm works as follows: 1. set α to some initial noise level α 0 2. sample O((c + 1)k) = O(k) databases independently from θ 3. compute the output for each sample and add noise Lap(α) 4. For every pair of secrets (s, ¬s) 5. find O s and O ¬s : outputs for samples with s and ¬s true 6. for every possible output w 7. denote p ws = Pr(w|s, θ) and p w¬s = Pr(w|¬s, θ) 8. estimate ¯p ws and ¯p ws ′ using fraction of w in O s and O ¬s . 9. if ¯p ws > e ɛ ¯p ws ′ + δ/2 10. set α = 2α and go to step 3 11. end for 12. end for 13. use α as the noise level for any future query. Before discussing the running time of the above procedure, we first show the accuracy of the estimation. Claim 2. Set the size of the sample k = O( (1+eɛ ) 2 δ log U), where U is the upper bound of the size of the 2 output range and the number of pairs of secrets. Let α ɛ and α ɛ,δ be the minimum noise levels that guarantee ɛ-Pufferfish and (ɛ, δ)-Pufferfish privacy. With high probability, our algorithm will output a noise level α such that α ɛ,δ ≤ α ≤ α ɛ . Proof. Denote S as the set of databases that satisfy secret s and θ and ¯S as θ − S. Firstly, given 2(c + 1)k independent samples from θ, with probability at least 1 − 1 , at least k samples are from each of S or ¯S. U O(1) Fix some output w and some secret s. Let X i be the indicator random variable of the event that the i-th sample outputs w. Note that E[X i ] = p ws . Let X = ∑ k i=1 X i. We have Pr(|¯p ws − p ws | ≥ ∆) = Pr(|X/k − p ws | ≥ ∆) ≤ 2e −2k∆2 Set ∆ = ≤ 1 U O(1) δ 2(1 + e ɛ ) Therefore, the probability that the estimated ¯p ws is more than ∆ away from its true value for any w and s is at most 1/U. In other words, with high probability, all estimates are within ∆ distance away from their true values. 6
In our algorithm, if α = α ɛ , then with high probability, for any w and (s, s ′ ), ¯p ws ≤ p ws + ∆ ≤ e ɛ p ws ′ + ∆ ≤ e ɛ (¯p ws ′ + ∆) + ∆ = e ɛ ¯p ws ′ + (1 + e ɛ )∆ = e ɛ ¯p ws ′ + δ/2 Therefore, our algorithm will output α = α ɛ . This means that with high probability, the output of our algorithm will not exceed α ɛ . On the other hand, if our algorithm outputs some α, then with high probability, for any w and (s, s ′ ), p ws ≤ ¯p ws + ∆ ≤ e ɛ ¯p ws ′ + δ/2 + ∆ ≤ e ɛ (p ws ′ + ∆) + δ/2 + ∆ = e ɛ p ws ′ + (1 + e ɛ )∆ + δ/2 = e ɛ p ws ′ + δ Therefore, with high probability, the noise level output by our algorithm will be at least α ɛ,δ . The running time depends on • how fast a sample of θ that satisfy s can be obtained? • how many different possible outputs? • how many different secret pairs? Claim 3. Let t θ be the time needed to obtain one sample from distribution θ, and let |O| denote the number of possible outputs. The running time of the sampling algorithm is O((k(t θ + t output ) + |O|) log α α 0 ), where k = O( (1+eɛ ) 2 δ log U) and t 2 output is the time to compute the query output given a database and perturb it using laplace noise. 4 Future Work The following questions are worth exploring in the future: 1. Is Induced Neighbor Privacy equivalent to Pufferfish under more general assumptions of the priors? 2. How to sample a database from some natural distribution θ? References [1] Daniel Kifer and Ashwin Machanavajjhala. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, SIGMOD ’11, pages 193–204, New York, NY, USA, 2011. ACM. [2] Daniel Kifer and Ashwin Machanavajjhala. A rigorous and customizable framework for privacy. In Proceedings of the 31st symposium on Principles of Database Systems, PODS ’12, pages 77–88, New York, NY, USA, 2012. ACM. 7
Page 1 and 2: Privacy with Prior Information Fina
Page 3 and 4: In this modified definition, each n
Page 5: 3.3 A Special Case We now consider

Privacy with Prior Information

Create successful ePaper yourself

Delete template?

Save as template?