Privacy with Prior Information
Privacy with Prior Information
Privacy with Prior Information
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.2 Maximum Flow<br />
We model our question as a maximum flow problem. Suppose we want a discriminative (s i , s j ) to be<br />
indistinguishable and the set of constraints is Q and the data generating distribution is θ. Define D i as the<br />
set of all databases that satisfy constraints Q and also fact s i . Similarly D j is defined. Define a directed<br />
graph G = ({s, t} ∪ V i ∪ V j , E) as follows.<br />
• For each database D ∈ D i , create a node v ∈ V i ; similarly, for each database D ∈ D j , create a node<br />
v ∈ V j .<br />
• There is an edge (s, v) for every v ∈ V i and an edge (v, t) for every v ∈ V j . An edge (v, v ′ ) for v ∈ V i<br />
and v ′ ∈ V j is created if the corresponding databases D and D ′ are induced neighbors.<br />
• The capacity on edge (s, v) is equal to Pr(D v |s i , Q) and the capacity on edge (v ′ , t) is equal to<br />
Pr(D v ′|s j , Q), where D v and D v ′ are databases corresponding to v and v ′ in D i and D j respectively.<br />
Notice that ∑ D∈D i<br />
Pr(D|s i , Q) = 1 and similarly ∑ D∈D j<br />
Pr(D|s j , Q) = 1.<br />
observation.<br />
We have the following<br />
Observation 1. If the maximum flow from s to t in graph G is equal to 1 for any data generating distribution<br />
θ, then induced neighbor privacy will guarantee Pufferfish privacy.<br />
Proof. According to the definition of Pufferfish privacy, we show that Eq. (1) is true, i.e.,<br />
The left side can be decomposed as<br />
Pr(M(Data) = w|s i , θ) ≤ e ɛ Pr(M(Data) = w|s j , θ)<br />
Pr(M(Data) = w|s i , θ) = ∑<br />
D∈D i<br />
Pr(M(Data) = w, Data = D|s i , θ)<br />
= ∑<br />
D∈D i<br />
Pr(Data = D|s i , θ) Pr(M(Data) = w|Data = D)<br />
Similar decomposition applies to the right side. For any D ∈ D i and D ′ ∈ D j , if D and D ′ are induced<br />
neighbors, induced neighbor privacy guarantees that Pr(M(Data) = w|Data = D) ≤ e ɛ Pr(M(Data) =<br />
w|Data = D ′ ).<br />
Suppose the maximum flow of G is 1, then for each D ∈ D i , there is a set of neighboring databases<br />
N D ⊆ D j such that each D ′ ∈ N D is a induced neighbor of D and Pr(Data = D|s i , θ) = ∑ D ′ ∈N D<br />
Pr(Data =<br />
D ′ |s j , θ). Therefore, we have<br />
Pr(M(Data) = w|s i , θ) = ∑<br />
Pr(Data = D|s i , θ) Pr(M(Data) = w|Data = D)<br />
D∈D i<br />
≤ e ∑ ∑<br />
ɛ<br />
Pr(Data = D ′ |s j , θ) Pr(M(Data) = w|Data = D ′ )<br />
D∈D i D ′ ∈N D<br />
∑<br />
= e ɛ Pr(Data = D ′ |s j , θ) Pr(M(Data) = w|Data = D ′ )<br />
D ′ ∈D j<br />
= e ɛ Pr(M(Data) = w|s j , θ)<br />
Notice that induced neighbor does not depend on the generating probability distribution θ, which makes<br />
it easy to apply in practice.<br />
4