20.01.2015 Views

1ihHZ6h

1ihHZ6h

1ihHZ6h

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ISSA<br />

DEVELOPING AND CONNECTING<br />

CYBERSECURITY LEADERS GLOBALLY<br />

Legal Implications of<br />

Big Data<br />

A Primer<br />

By David Navetta, Esq. – ISSA member, Denver, USA Chapter<br />

The potential uses and benefits of Big Data are endless. Unfortunately, Big Data also poses<br />

some risk to both the companies seeking to unlock its potential and the individuals whose<br />

information is now continuously being collected, combined, mined, analyzed, disclosed,<br />

and acted upon. This article explores the concept of Big Data and some of the privacyrelated<br />

legal issues and risks associated with it.<br />

By now many lawyers, security and privacy professionals,<br />

and business managers have heard of the term<br />

“Big Data,” but many may not understand exactly<br />

what it refers to, and still more likely do not know how it will<br />

impact their clients and businesses (or perhaps it already is).<br />

Big Data is everywhere (quite literally). We see it drive the<br />

creative processes used by entertainment companies to construct<br />

the perfect television series based on their customer’s<br />

specific preferences. 1 We see Big Data in action when data<br />

brokers collect detailed employment information concerning<br />

190 million persons (including salary information) and sell it<br />

to debt collectors, financial institutions, and other entities. 2<br />

Big Data is in play when retailers like Target can determine<br />

when its customers are pregnant without being told, and send<br />

them marketing materials early on in order to win business. 3<br />

1 See http://www.salon.com/2013/02/01/how_netflix_is_turning_viewers_into_<br />

puppets/.<br />

2 See http://redtape.nbcnews.com/_news/2013/01/30/16762661-exclusive-youremployer-may-share-your-salary-and-equifax-might-sell-that-datalite.<br />

3 See http://www.nytimes.com/2012/02/19/magazine/shopping-habits.<br />

htmlpagewanted=all&_r=0.<br />

Big Data may also eventually help find the cure to cancer and<br />

other diseases. 4<br />

The potential uses and benefits of Big Data are endless. Unfortunately,<br />

Big Data also poses some risk to both the companies<br />

seeking to unlock its potential and the individuals whose<br />

information is now continuously being collected, combined,<br />

mined, analyzed, disclosed, and acted upon. This article explores<br />

the concept of Big Data and some of the privacy-related<br />

legal issues and risks associated with it.<br />

What is Big Data<br />

To understand the legal issues associated with Big Data, it is<br />

important to understand the meaning of the term. Wikipedia<br />

(part of the Big Data phenomenon itself) defines Big Data as<br />

follows:<br />

Big Data is a collection of data sets so large and complex<br />

that it becomes difficult to process using on-hand database<br />

management tools or traditional data processing applica-<br />

4 See http://articles.washingtonpost.com/2013-01-17/business/36384178_1_big-databreast-cancer-cure-cancer.<br />

14 – ISSA Journal | March 2013<br />

©2013 ISSA • www.issa.org • editor@issa.org • All rights reserved.


Legal Implications of Big Data: A Primer | David Navetta<br />

tions. The challenges include capture, curation, storage,<br />

search, sharing, analysis, and visualization. 5<br />

While the Wikipedia definition highlights the challenges<br />

associated with large data sets and understanding the data<br />

contained in those sets, a definition by the TechAmerican<br />

Foundation also captures the opportunities associated with<br />

Big Data:<br />

Big Data is a term that describes large volumes of high velocity,<br />

complex, and variable data that require advanced<br />

techniques and technologies to enable the capture, storage,<br />

distribution, management, and analysis of the information.<br />

6<br />

The Foundation stresses Big Data solutions as part of its attempt<br />

to define the term: Big Data Solutions: advanced techniques<br />

and technologies to enable the capture, storage, distribution,<br />

management and analysis of information.<br />

According to the TechAmerican Foundation, Big Data is<br />

characterized by three factors: volume, velocity, and variety: 7<br />

CHARACTERISTIC<br />

Volume<br />

Velocity<br />

Variety<br />

DESCRIPTION<br />

The sheer amount of data generated or data intensity<br />

that must be ingested, analyzed, and managed to<br />

make decisions based on complete data analysis<br />

How fast data is being produced and changed and the<br />

speed with which data must be received, understood,<br />

and processed<br />

The rise of information coming from new sources<br />

both inside and outside the walls of the enterprise<br />

or organization creates integration, management,<br />

governance, and architectural pressures on IT<br />

While these definitions and attributes of Big Data may be<br />

helpful, they are still rather abstract. Perhaps the better<br />

question to ask is “what does Big Data mean to companies<br />

or other organizations” Using this filter, Big Data and its<br />

use can be viewed as a business process or a supplement to<br />

existing business processes. Big Data in the business context<br />

means or encompasses the following:<br />

• The ability of the organization to (or could have) access<br />

unimaginable amounts of structured and unstructured<br />

data (much more of it likely in the unstructured category)<br />

both internally and through external resources (e.g., data<br />

brokers, affiliates, or partners).<br />

• A realization (or hope) that by capturing, structuring, and<br />

analyzing these huge volumes of data, and understanding<br />

the relationships within and between data, the company<br />

may gain valuable insights (often precise and nonobvious)<br />

that may significantly improve how the company<br />

does business.<br />

• The need to leverage specialized tools and specialized employees<br />

(e.g., data scientists) to enable the capture, cura-<br />

5 See http://en.wikipedia.org/wiki/Big_data.<br />

6 See Demystifying Big Data, http://www.techamerica.org/Docs/fileManager.<br />

cfmf=techamerica-bigdatareport-final.pdf.<br />

7 Ibid.<br />

tion, storage, search, sharing, and analysis of the data in a<br />

way that is valuable to the organization.<br />

• Analyzing and addressing the potential limitations and<br />

legal, security, and privacy risks and issues associated the<br />

collection, analysis, and use of Big Data (and the insights<br />

derived from it).<br />

While the specific applications of Big Data analysis will vary<br />

depending on the industry, the availability of data and the<br />

goals of a particular organization (and some of those practical<br />

applications are summarized above), many organizations will<br />

use Big Data to better understand and market to their customers<br />

(both individuals and corporate).<br />

Big Data and privacy<br />

When it comes to consumer marketing, the potential for Big<br />

Data is enormous (and some would argue that the confluence<br />

of online marketing and Big Data represents the “Holy Grail”<br />

of marketing). Big Data can allow marketers to target customers<br />

precisely and efficiently by providing advertising and<br />

product and services offers that are specifically tailored to a<br />

particular individual, based on his or her attributes. Big data<br />

combined with the use of mobile devices can result in offers<br />

to individuals that are highly relevant, delivered at the right<br />

time, and (with mobile and geo-location tracking) at the right<br />

place. However, one of the most significant legal challenges<br />

associated with Big Data, especially on the consumer marketing<br />

side, is privacy.<br />

Big Data and notice/consent<br />

In the United States, pursuant to the Fair Information Practice<br />

Principles, 8 the foundation of privacy protection includes<br />

the concepts of notice/awareness and choice/consent. To satisfy<br />

the principle of notice and awareness, the data subject from<br />

whom data will be collected must be made aware of the uses<br />

to which his or her personal information will be put, and to<br />

whom such personal information will be disclosed. 9 The notice<br />

is intended to allow the data subject to make an informed<br />

choice as to the collection and use of the subject’s personal<br />

information, and to consent (or not) to that collection and use.<br />

In a Big Data world, some contend that the goals of notice/<br />

consent may be circumvented due to the complexity of the Big<br />

Data ecosystem and practical limitations related to the use of<br />

written privacy policies. For example, privacy advocates believe<br />

that in some cases, a person that reads a privacy policy<br />

and agrees that his or her personal information can be collected,<br />

used, and disclosed for “marketing purposes” may not<br />

understand that such personal information may end up residing<br />

in the database of a data broker and combined and disclosed<br />

in ways not apparent in or contemplated by the privacy<br />

policy. For example, if an ecommerce vendor disclosed to a<br />

marketer that an individual customer purchased a deep fryer,<br />

such information could be combined into a profile about the<br />

individual in a database owned by a data broker. If the data<br />

8 http://www.ftc.gov/reports/privacy3/fairinfo.shtm.<br />

9 Ibid.<br />

©2013 ISSA • www.issa.org • editor@issa.org • All rights reserved.<br />

March 2013 | ISSA Journal – 15


Legal Implications of Big Data: A Primer | David Navetta<br />

In a Big Data<br />

world…data<br />

subjects have even<br />

less awareness<br />

and ability to<br />

provide meaningful<br />

consent.<br />

broker later sells access to the database to a health insurance<br />

company, whose algorithms put people who purchase deep<br />

fryers into a high risk category, in the world of Big Data the<br />

initial, relatively innocuous<br />

data disclosure (that was consented<br />

to) could suddenly serve<br />

as the basis to deny a person<br />

health care (or result in higher<br />

health care rates).<br />

The problem here is twofold.<br />

First, the consumer may not understand<br />

where his or her personal<br />

information may end up,<br />

and that it could be combined<br />

with other existing profile data<br />

in a manner that reveals more<br />

about the person than contemplated<br />

at the time of disclosure.<br />

Further onward transfer and combining with yet more databases<br />

could reveal even more. Second, the data subject lacks<br />

an understanding of the interpretations, inferences, and/or<br />

deductions that may be draw from his combined data using<br />

Big Data mining techniques and analytics. As such, in a Big<br />

Data world some would argue that data subjects have even<br />

less awareness and ability to provide meaningful consent. 10<br />

Big Data and access/participation<br />

Another area of privacy concern related to Big Data deals<br />

with the principle of “access/participation.” 11 This principle<br />

deals with a data subject’s ability to access his or her personal<br />

data in order to ascertain whether it is accurate and complete.<br />

This principle is necessary to allow individuals to correct inaccurate<br />

information about them.<br />

This principle has been incorporated into the Fair Credit Reporting<br />

Act 12 (FCRA), which requires credit reporting agencies<br />

to provide consumers with access to their credit reports<br />

so they can have inaccuracies corrected. In the Big Data<br />

context satisfying the access/participation principle poses<br />

significant challenges. Except for the established and highly<br />

visible players, the general public does not know what entities<br />

may be collecting information about them and creating profiles.<br />

While data subjects may be able to identify companies<br />

to whom they have provided personal information, and may<br />

have a direct relationship with such companies, the same is<br />

not true in the case of data brokers. In most cases data subjects<br />

do not have a direct relationship with them and these<br />

brokers typically do not receive information directly from<br />

the data subjects. Even if a consumer can identify a data<br />

broker that holds his or her profile, without a contract the<br />

consumer may have no legal recourse that would require the<br />

broker to provide access to his or her personal information.<br />

10 Notice and Consent in a World of Big Data, http://www.techpolicy.com/<br />

NoticeConsent-inWorldBigData.aspx.<br />

11 Fair Information Practice Principles, http://www.ftc.gov/reports/privacy3/fairinfo.<br />

shtm.<br />

12 Fair Credit Reporting Act (FCRA), 15 U.S.C. § 1681, et. seq.<br />

While some data brokers may be acting as “credit reporting<br />

agencies” and therefore subject to the FCRA, many take steps<br />

to avoid that status.<br />

Based on concerns over access and transparency, the Federal<br />

Trade Commission (FTC) has indicated a desire to consider<br />

additional regulatory scrutiny over data brokers:<br />

To address the invisibility of, and consumers’ lack of control<br />

over, data brokers’ collection and use of consumer information,<br />

the Commission supports targeted legislation<br />

– similar to that contained in several of the data security<br />

bills introduced in the 112th Congress – that would provide<br />

consumers with access to information about them<br />

held by a data broker. To further increase transparency,<br />

the Commission calls on data brokers that compile data<br />

for marketing purposes to explore creating a centralized<br />

website where data brokers could (1) identify themselves to<br />

consumers and describe how they collect and use consumer<br />

data, and (2) detail the access rights and other choices<br />

they provide with respect to the consumer data they maintain.<br />

13<br />

More recently, in December 2012, the FTC launched an investigation<br />

to study the data broker industry’s collection and<br />

use of consumer information. 14 Moreover, much of the privacy-related<br />

legislation proposed in Congress has included provisions<br />

related to the regulation and oversight of data brokers<br />

(although none has passed to date). 15 Overall, this is an area<br />

that is ripe for an increased regulatory response and potentially<br />

federal and/or state legislation.<br />

Big Data and Do Not Target/Do Not Collect<br />

Another privacy-related area impacted by Big Data is the do<br />

not track (DNT) debate. 16 For many in the advertising industry<br />

do not track refers to the use of consumer data for purposes<br />

of targeted advertising. In contrast, the FTC and privacy<br />

advocates believe the concept of DNT encompasses not only<br />

targeting of individuals, but also collection of personal information<br />

from individuals (do not collect). Recent regulatory<br />

emphasis on do not collect stems in part from concerns surrounding<br />

Big Data. With the pervasive and constant collection<br />

of information about individuals from multiple sources,<br />

many data brokers are able to pinpoint a user’s identity and<br />

specific preferences without having any information traditionally<br />

considered personally identifiable information. As<br />

discussed further below, common methods for de-identifying<br />

personal information may not be effective, if the unique<br />

identifier of the computer or mobile device used to access a<br />

website, when combined with specific behavioral and other<br />

data, can supply enough information to identify a person individually.<br />

This may lead to heightened regulatory scrutiny<br />

13 See Protecting Consumer Privacy in an Era of Rapid Change, http://ftc.gov/<br />

os/2012/03/120326privacyreport.pdf.<br />

14 See http://www.ftc.gov/opa/2012/12/databrokers.shtm.<br />

15 See e.g., http://www.infolawgroup.com/2010/08/articles/breach-notice/yet-anotherproposed-federal-data-security-and-breach-notification-bill-senators-rockefellerand-pryor-jump-into-the-fray/.<br />

16 See http://www.infolawgroup.com/2012/03/articles/data-privacy-law-or-regulation/<br />

ftc-looks-to-link-donottrack-big-data-privacy-concerns-seeks-solutions/.<br />

16 – ISSA Journal | March 2013<br />

©2013 ISSA • www.issa.org • editor@issa.org • All rights reserved.


Who’s minding<br />

your cloud<br />

Securely embrace the cloud. Our solutions do more than bring you to the cloud,<br />

they keep your business secure when you get there.<br />

+ FIND OUT how CA Technologies can help you secure and enable your<br />

business by visiting us at ca.com/mindyourcloud<br />

CA Technologies makes an<br />

important announcement<br />

in Cloud-Based Identity<br />

& Access Management<br />

Learn more at ca.com/mindyourcloud<br />

Copyright © 2013 CA. All rights reserved.


Legal Implications of Big Data: A Primer | David Navetta<br />

of Big Data practices, specifically where the collection and<br />

aggregation of seemingly harmless data about a person can<br />

be used to reveal sensitive information (e.g., health status,<br />

sexual orientation, and financial status).<br />

Anonymization and Big Data<br />

One technique for mitigating privacy-related risks associated<br />

with Big Data is de-identification or anonymization. 17 Data<br />

sets that are de-identified have had key information stripped<br />

away in order to prevent others from individually identifying<br />

the persons to whom the data set relates. This technique<br />

allows organizations to work with Big Data sets while mitigating<br />

privacy concerns, and has been used in many realms,<br />

including health care, banking and finance, and online advertising.<br />

In fact, many regulatory regimes recognize the concept of<br />

de-identified personal information. Under regulations promulgated<br />

pursuant to Gramm-Leach-Bliley 18 (regulating the<br />

privacy and security of financial data) “personally identifiable<br />

financial information” does not include information that<br />

does not identify a consumer “such as aggregate information<br />

or blind data that does not contain personal identifiers such<br />

as account numbers, names, or addresses.” 19 The Office for<br />

Civil Rights of the Department of Health and Human Services<br />

has issued extensive guidance concerning de-identification<br />

of health data, and sets forth two methods to achieve<br />

de-identification under HIPAA: expert determination and<br />

“safe harbor” de-identification (which involves removing<br />

eighteen types of identifiers from health data). 20 Under European<br />

data protection laws, to achieve legally permissible<br />

de-identification, “anonymization of data should exclude any<br />

possibility of individuals to be identified, even by combining<br />

anonymized information.” 21<br />

However, organizations relying on de-identification to circumvent<br />

privacy issues (and liability) must proceed carefully.<br />

If de-identification is not performed properly, it may be possible<br />

to re-identify individuals in an anonymized data set.<br />

There have been several real-life instances where re-identification<br />

has occurred, and researchers have also been able to<br />

demonstrate methods for identifying individuals from data<br />

that appeared anonymous on its face.<br />

In one infamous example, as part of a contest to create a better<br />

movie recommendation engine, Netflix released an anonymized<br />

data set containing the movie rental histories of approximately<br />

480,000 of its customers. Researchers established<br />

that they could re-identify some of the Netflix customers at<br />

issue by accessing and analyzing publicly available information<br />

concerning movie ratings performed by such custom-<br />

17 See http://en.wikipedia.org/wiki/De-identification.<br />

18 Gramm-Leach-Bliley Act of 1999, Pub. L. No. 106-102, 113 Stat. 1338 (codified as<br />

amended in scattered sections of 12 and 15 U.S.C. (2008)).<br />

19 See 17 CFR PART 248.<br />

20 See Guidance Regarding Methods for De-identification of Protected Health<br />

Information in Accordance with the Health Insurance Portability and Accountability<br />

Act (HIPAA) Privacy Rule, http://www.hhs.gov/ocr/privacy/hipaa/understanding/<br />

coveredentities/De-identification/hhs_deid_guidance.pdf.<br />

21 European Union Directive 95/46/EC.<br />

ers. 22 The Netflix contest eventually led to a lawsuit 23 against<br />

the company and regulatory scrutiny from the Federal Trade<br />

Commission. In another example, a researcher showed how<br />

she could re-identify persons with data in an anonymous<br />

health care data base by using publicly available voter records<br />

(in this case she was able to re-identify the information of the<br />

governor of Massachusetts). 24<br />

The risk of re-identification of Big Data sets using contextual<br />

“micro data” is a significant concern for organizations work-<br />

22 See Robust De-anonymization of Large Data sets (How to Break Anonymity of the<br />

Netflix Prize Data set) http://arxiv.org/PS_cache/cs/pdf/0610/0610105v2.pdf. The<br />

Netflix contest eventually lead to a lawsuit against the company and regulatory<br />

scrutiny by the Federal Trade Commission.<br />

23 See http://www.wired.com/images_blogs/threatlevel/2009/12/doe-v-netflix.pdf.<br />

24 See http://www.cs.duke.edu/~ashwin/pubs/BigPrivacyACMXRDS_final.<br />

pdf.<br />

The ISSA Web Conferences bring together ISSA<br />

members from around the world to share leading<br />

industry presentations and answer members’<br />

questions. Each event is designed to address the timely<br />

needs of our members through a live, online event and a<br />

subsequent recorded version for on-demand viewing.<br />

All content is developed by the ISSA Web Conference<br />

Committee. CPE credit available: ISSA members will be<br />

eligible for a certificate of attendance, after successful<br />

completion of a post-event quiz, to submit CPE credits for<br />

various certifications.<br />

Legislative Landscape<br />

2-Hour Live Event: March 26, 2013<br />

9am US Pacific/12pm US Eastern/5pm London<br />

Generously supported by Venafi.<br />

Increasingly legislation and regulation are becoming extremely<br />

important drivers for what information security<br />

professionals have to do, and the pace of delivery seems to<br />

be increasing wherever you work in the world today. What<br />

impacts will recently enabled, pending, and possible future<br />

legislation and regulation have on organizations and<br />

individuals and their approaches to what and how they do<br />

information security How do we prioritize what is most<br />

important What can we do to make compliance easier<br />

How do we get our policies aligned with the differing<br />

regulatory environments across different jurisdictions<br />

How do we deal with export controls (software and information)<br />

In some cases the question might be “How do<br />

we stay out of jail” Join our industry experts to get their<br />

views on this topic and the questions around it.<br />

Click here to register or here for more information.<br />

Visit https://www.issa.org/page=WebConferences<br />

for information on our 2013 schedule.<br />

18 – ISSA Journal | March 2013<br />

©2013 ISSA • www.issa.org • editor@issa.org • All rights reserved.


Legal Implications of Big Data: A Primer | David Navetta<br />

ing with de-identified data sets. If the de-identification is not<br />

done properly, third parties with access to de-identified data<br />

sets may be able to re-identify individuals, and that re-identification<br />

could expose the individuals at issue or constitute<br />

a data breach under existing data breach notification laws, 25<br />

and could lead to litigation or regulatory scrutiny. Organizations<br />

desiring to de-identify and anonymize their data sets<br />

should consider several questions to help understand and<br />

mitigate potential privacy and organizational risks, including:<br />

• What are the purposes, risks, and benefits of de-identifying<br />

and using or disclosing the data, and do the benefits<br />

outweigh the risks<br />

• Will the third parties and/or service providers at issue use<br />

any data (aggregate, de-identified, etc.) for own purposes<br />

Do they have any contractual rights to use the data or engage<br />

in their own aggregation or anonymization of data<br />

• Is the data truly anonymized How can the company be<br />

sure What information will be exposed if the data is reidentified<br />

Is it worth investing effort to verify anonymization<br />

• What is the risk to the business if the data is re-identified<br />

Data breach notification Lawsuits Regulatory investigations<br />

or actions<br />

Engaging in the analysis above can be very helpful in mitigating<br />

risks. However, companies need to be aware that the<br />

very nature of Big Data makes true anonymization more difficult.<br />

With reams of detailed data now available and accessible<br />

and sophisticated algorithms that allow data mining, it<br />

is arguably easier to re-identify individuals. The analysis and<br />

25 http://en.wikipedia.org/wiki/Security_breach_notification_laws.<br />

combination of anonymized data sets with data sets containing<br />

identified individuals is largely unpredictable, and yet can<br />

potentially result in an organization getting into legal trouble.<br />

Conclusion<br />

The Big Data era is upon us, and it will become increasingly<br />

common for companies to collect, data mine, and analyze<br />

large data sets in order to further their business interests. Big<br />

Data analytics is already the norm for many organizations,<br />

and this trend will only continue over time as more and more<br />

data is collected, and stronger and more predictive tools and<br />

processes are developed to understand that data. As companies<br />

rush headlong into the Big Data space, they would be<br />

wise to step back and contemplate the potential privacy implications<br />

of their activities, and consider steps to address<br />

privacy concerns. Proactively dealing with the privacy issues<br />

discussed in this article can help organizations safely leverage<br />

Big Data while still retaining customers and avoiding reputational<br />

harm, litigation, and regulatory scrutiny.<br />

About the Author<br />

David Navetta, Esq., CIPP/US, is one of<br />

the founding partners of Information Law<br />

Group LLP (www.infolawgroup.com). David<br />

has practiced law for over fifteen years,<br />

and focuses on technology, privacy, information<br />

security, and intellectual property<br />

law. He has previously served as a cochair<br />

of the American Bar Association’s<br />

Information Security Committee. He has<br />

spoken and written frequently concerning technology, privacy,<br />

and data security legal issues and can be reached at dnavetta@<br />

infolawgroup.com.<br />

©2013 ISSA • www.issa.org • editor@issa.org • All rights reserved.<br />

March 2013 | ISSA Journal – 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!