02.11.2014 Views

untangling_the_web

untangling_the_web

untangling_the_web

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

DOCID: 4046925<br />

UNCLASSIFIEDhTOR OFFlelAL tis!: ONLY<br />

Protect Yourself from Search Engine Leaks<br />

In late July 2006, AOL published a list of 20 to 36 million search inquiries collected<br />

over a three-month period that included identification numbers for 658,000 unnamed<br />

users at <strong>the</strong>ir now defunct Research <strong>web</strong>site . It didn't<br />

take long for some fairly bright researchers to piece toge<strong>the</strong>r some of <strong>the</strong> information<br />

and come up with real people whose queries were released. This was possible<br />

largely because AOL kept individual user's queries toge<strong>the</strong>r in order to show <strong>the</strong><br />

pattern of a person's searches over a period of time. "Searches by individual users<br />

are grouped toge<strong>the</strong>r, often forming small profiles of a user's habits and interests.<br />

The files include <strong>the</strong> date and time of each inquiry and <strong>the</strong> address of <strong>the</strong> Web site<br />

<strong>the</strong> user chose to visit after searchtnq.F"<br />

Why would AOL do such a thing in <strong>the</strong> first place? AOL's intention was to provide<br />

useful data to researchers performing "search research." However, <strong>the</strong> data turned<br />

out to be more "helpful" than AOL intended. If you think about it, how much effort<br />

does it take to figure out a specific user's name and location if you have three<br />

months of his or her searches? And since all <strong>the</strong> queries also included a date/time<br />

stamp and <strong>the</strong> link to <strong>the</strong> site <strong>the</strong>y visited from AOL, <strong>the</strong>re are o<strong>the</strong>r ways a site<br />

manager could use site logs to put toge<strong>the</strong>r a profile on someone. What some truly<br />

enterprising person or group could do with this data is limited only by <strong>the</strong>ir<br />

imagination. Once <strong>the</strong> news came out that individuals could be identified from <strong>the</strong><br />

database, AOL took <strong>the</strong> data off its <strong>web</strong>site, but of course it was too late. Sites<br />

mirroring <strong>the</strong> database immediately popped up.<br />

The lessons to be drawn from this episode are too many to name, but at <strong>the</strong> very<br />

least we know that what we like to think of as privacy is largely an illusion and what<br />

seems like an innocent act of "openness" and "sharing" can backfire in <strong>the</strong> worst<br />

possible way. What can you do to protect yourself against disclosures such as <strong>the</strong><br />

one described above or from inadvertent leaks of search engine data? I have<br />

repeatedly warned people about using search services that require you to log into<br />

<strong>the</strong> site. AOL, Google. Live, and Yahoo all offer such services, which illustrate my<br />

rule of thumb: anything that adds convenience brings with it some degradation of<br />

privacy and/or security. The fact is that you are personally identifiable if you have<br />

an account with a search engine site.<br />

But what is <strong>the</strong> risk that you can be identified from your searches if you do not have<br />

an account at a search site? In light of <strong>the</strong> AOL incident, Wired updated a January<br />

2006 article on this topic, and some of <strong>the</strong> points <strong>the</strong>y make are as follows:<br />

216 Saul Hansell, "AOL Removes Search Data on Group of Web Users," New York Times, 8 August<br />

2006, (archived article requires<br />

payment).<br />

UNCLASSIFIEDI.'r;eR OFFIelAL tiS! OICJL f 595

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!