16.01.2014 Views

Privacy with Prior Information

Privacy with Prior Information

Privacy with Prior Information

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3 A Special Case<br />

We now consider the following assumption on the data generating process. Suppose each record of the<br />

database is generated according to the same distribution f and the single attribute count constraints Q.<br />

For example, f may be the disease distribution of the whole population and the constraints may be of the<br />

following form: C(x 1 ) = n 1 , C(x 2 ) = n 2 , . . . , C(x k ) = n k , where x i is some disease and C(x i ) is the count of<br />

x i in the database. We have the following result.<br />

Claim 1. If all records of the database are identically distributed under the single-attribute count constraints<br />

Q, then a mechanism M that satisfies induced neighbor privacy also satisfies Pufferfish privacy.<br />

Proof. According to Observation 1, we only need to show the maximum flow from s to t is 1 in the graph G<br />

corresponding to our data generating process. Suppose the count constraints are C(x 1 ) = n 1 , . . . , C(x k ) = n k<br />

and ∑ k<br />

i=1 n k ≤ n, where n is the total number of records. Also assume there is a pair of secrets (s i , s j ),<br />

where s i means record r has value x i and s j means r has value x j . Define D i and D j similarly. The number<br />

of different databases in D i can be computed as N i = ( )(<br />

n−1 n−ni<br />

n i−1<br />

n j<br />

)<br />

A, where A is a function of n, n1 , . . . , n k .<br />

Similarly, N j = ( )(<br />

n−1 n−ni−1) n i n j−1 A.<br />

The incoming flow to each node D ∈ D i is 1/N i and the outgoing flow from each node D ′ ∈ D j is 1/N j .<br />

For each D ∈ D i , by changing r from x i to x j and changing one of the n j records <strong>with</strong> value x j to value<br />

x i , we get an induced neighbor in D j . Therefore, each D ∈ D j has n j induced neighbors. Similarly, each<br />

D ′ ∈ D j has n i induced neighbors from D i . We split the incoming flow of D evenly to each of its induced<br />

neighbor. That is, each edge in G from D i to D j has flow 1/(N i n j ). Then, each D ′ is getting a total flow of<br />

n i /(N i n j ) = 1/N j . Therefore, the maximum flow is 1.<br />

Note that if ∑ k<br />

h=1 n h < n, then each D and D ′ we talked about represents a group of databases <strong>with</strong><br />

the same configuration for x 1 , . . . , x k .<br />

3.4 A Sampling Based Mechanism<br />

Although we have shown equivalence of induced neighbor privacy and Pufferfish privacy in a special case,<br />

there is no instantiation of Pufferfish privacy under more general constraints such as multi-attribute count<br />

constraint and node degree constraint for graph data. In this section, we give a sampling based guess-and-test<br />

mechanism that guarantee an approximate notion of Pufferfish privacy.<br />

The mechanism is nothing but add Laplacian noise to the output. The problem is, we don’t know how<br />

much noise is enough to guarantee Pufferfish privacy. Our strategy is to first guess a noise level, and then<br />

use sampling to test whether Pufferfish is satisfied by adding so much noise.<br />

We first give a more general definition of Pufferfish by allowing additive error.<br />

Definition 5. ( (ɛ, δ)-Pufferfish <strong>Privacy</strong>) Given a set of potential secrets S, a set of discriminative pairs<br />

S pairs , a set of data generating distributions Θ, and a privacy parameter ɛ > 0, a (potentially randomized)<br />

algorithm M satisfies (ɛ, δ)-Pufferfish(S, S pairs , Θ) privacy if<br />

• for all possible outputs w,<br />

• for all pairs (s i , s j ) ∈ S pairs of potential secrets,<br />

• for all distributions θ ∈ Θ for which P (s i |θ) ≠ 0 and P (s j |θ) ≠ 0<br />

the following holds:<br />

P (M(Data) = w|s i , θ) ≤ e ɛ P (M(Data) = w|s j , θ) + δ (3)<br />

P (M(Data) = w|s j , θ) ≤ e ɛ P (M(Data) = w|s i , θ) + δ (4)<br />

Before describe the algorithm, we state the assumptions we make:<br />

• prior distribution θ is given and a database can be sampled from it efficiently<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!