16.01.2014 Views

Privacy with Prior Information

Privacy with Prior Information

Privacy with Prior Information

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.2 Maximum Flow<br />

We model our question as a maximum flow problem. Suppose we want a discriminative (s i , s j ) to be<br />

indistinguishable and the set of constraints is Q and the data generating distribution is θ. Define D i as the<br />

set of all databases that satisfy constraints Q and also fact s i . Similarly D j is defined. Define a directed<br />

graph G = ({s, t} ∪ V i ∪ V j , E) as follows.<br />

• For each database D ∈ D i , create a node v ∈ V i ; similarly, for each database D ∈ D j , create a node<br />

v ∈ V j .<br />

• There is an edge (s, v) for every v ∈ V i and an edge (v, t) for every v ∈ V j . An edge (v, v ′ ) for v ∈ V i<br />

and v ′ ∈ V j is created if the corresponding databases D and D ′ are induced neighbors.<br />

• The capacity on edge (s, v) is equal to Pr(D v |s i , Q) and the capacity on edge (v ′ , t) is equal to<br />

Pr(D v ′|s j , Q), where D v and D v ′ are databases corresponding to v and v ′ in D i and D j respectively.<br />

Notice that ∑ D∈D i<br />

Pr(D|s i , Q) = 1 and similarly ∑ D∈D j<br />

Pr(D|s j , Q) = 1.<br />

observation.<br />

We have the following<br />

Observation 1. If the maximum flow from s to t in graph G is equal to 1 for any data generating distribution<br />

θ, then induced neighbor privacy will guarantee Pufferfish privacy.<br />

Proof. According to the definition of Pufferfish privacy, we show that Eq. (1) is true, i.e.,<br />

The left side can be decomposed as<br />

Pr(M(Data) = w|s i , θ) ≤ e ɛ Pr(M(Data) = w|s j , θ)<br />

Pr(M(Data) = w|s i , θ) = ∑<br />

D∈D i<br />

Pr(M(Data) = w, Data = D|s i , θ)<br />

= ∑<br />

D∈D i<br />

Pr(Data = D|s i , θ) Pr(M(Data) = w|Data = D)<br />

Similar decomposition applies to the right side. For any D ∈ D i and D ′ ∈ D j , if D and D ′ are induced<br />

neighbors, induced neighbor privacy guarantees that Pr(M(Data) = w|Data = D) ≤ e ɛ Pr(M(Data) =<br />

w|Data = D ′ ).<br />

Suppose the maximum flow of G is 1, then for each D ∈ D i , there is a set of neighboring databases<br />

N D ⊆ D j such that each D ′ ∈ N D is a induced neighbor of D and Pr(Data = D|s i , θ) = ∑ D ′ ∈N D<br />

Pr(Data =<br />

D ′ |s j , θ). Therefore, we have<br />

Pr(M(Data) = w|s i , θ) = ∑<br />

Pr(Data = D|s i , θ) Pr(M(Data) = w|Data = D)<br />

D∈D i<br />

≤ e ∑ ∑<br />

ɛ<br />

Pr(Data = D ′ |s j , θ) Pr(M(Data) = w|Data = D ′ )<br />

D∈D i D ′ ∈N D<br />

∑<br />

= e ɛ Pr(Data = D ′ |s j , θ) Pr(M(Data) = w|Data = D ′ )<br />

D ′ ∈D j<br />

= e ɛ Pr(M(Data) = w|s j , θ)<br />

Notice that induced neighbor does not depend on the generating probability distribution θ, which makes<br />

it easy to apply in practice.<br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!