27.07.2016 Views

Hacker Bits, August 2016

HACKER BITS is the monthly magazine that gives you the hottest technology stories crowdsourced by the readers of Hacker News. We select from the top voted stories and publish them in an easy-to-read magazine format. Get HACKER BITS delivered to your inbox every month! For more, visit https://hackerbits.com/2016-08.

HACKER BITS is the monthly magazine that gives you the hottest technology stories crowdsourced by the readers of Hacker News. We select from the top voted stories and publish them in an easy-to-read magazine format.

Get HACKER BITS delivered to your inbox every month! For more, visit https://hackerbits.com/2016-08.

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Mortality vs. info disclosure, from Frederikson et al. The red line is patient mortality.<br />

A tradeoff between<br />

privacy and accuracy<br />

Now obviously calculating the<br />

total number of ice cream-loving<br />

users on a system is a<br />

pretty silly example. The neat<br />

thing about DP is that the same<br />

overall approach can be applied<br />

to much more interesting functions,<br />

including complex statistical<br />

calculations like the ones<br />

used by Machine Learning algorithms.<br />

It can even be applied<br />

when many different functions<br />

are all computed over the same<br />

database.<br />

But there's a big caveat here.<br />

Namely, while the amount of<br />

"information leakage" from a<br />

single query can be bounded by<br />

a small value, this value is not<br />

zero. Each time you query the<br />

database on some function, the<br />

total "leakage" increases and<br />

can never go down. Over time,<br />

as you make more queries, this<br />

leakage can start to add up.<br />

This is one of the more challenging<br />

aspects of DP. It manifests<br />

in two basic ways:<br />

1. The more information you<br />

intend to "ask" of your<br />

database, the more noise<br />

has to be injected in order<br />

to minimize the privacy<br />

leakage. This means that<br />

in DP there is generally<br />

a fundamental tradeoff<br />

between accuracy and<br />

privacy, which can be a<br />

big problem when training<br />

complex ML models.<br />

2. Once data has been<br />

leaked, it's gone. Once<br />

you've leaked as much<br />

data as your calculations<br />

tell you is safe, you can't<br />

keep going, at least not<br />

without risking your users'<br />

privacy. At this point, the<br />

best solution may be to<br />

just destroy the database<br />

and start over,if such a<br />

thing is possible.<br />

The total allowed leakage is<br />

often referred to as a "privacy<br />

budget", and it determines how<br />

many queries will be allowed<br />

(and how accurate the results<br />

will be). The basic lesson of DP<br />

is that the devil is in the budget.<br />

Set it too high, and you leak<br />

your sensitive data. Set it too<br />

low, and the answers you get<br />

might not be particularly useful.<br />

Now in some applications,<br />

like many of the ones on our iPhones,<br />

the lack of accuracy isn't<br />

a big deal. We're used to our<br />

phones making mistakes. But<br />

sometimes when DP is applied<br />

in complex applications, such as<br />

training Machine Learning models,<br />

this really does matter.<br />

To give an absolutely crazy<br />

example of how big the<br />

tradeoffs can be, consider this<br />

paper by Frederikson et al. from<br />

2014. The authors began with a<br />

public database linking warfarin<br />

dosage outcomes to specific<br />

genetic markers. They then used<br />

ML techniques to develop a dosing<br />

model based on their database,<br />

but applied DP at various<br />

privacy budgets while training<br />

the model. Then they evaluated<br />

both the information leakage<br />

and the model's success at treating<br />

simulated "patients."<br />

The results showed that the<br />

model's accuracy depends a lot<br />

on the privacy budget on which<br />

hacker bits<br />

9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!