1ihHZ6h
1ihHZ6h
1ihHZ6h
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
ISSA<br />
DEVELOPING AND CONNECTING<br />
CYBERSECURITY LEADERS GLOBALLY<br />
Legal Implications of<br />
Big Data<br />
A Primer<br />
By David Navetta, Esq. – ISSA member, Denver, USA Chapter<br />
The potential uses and benefits of Big Data are endless. Unfortunately, Big Data also poses<br />
some risk to both the companies seeking to unlock its potential and the individuals whose<br />
information is now continuously being collected, combined, mined, analyzed, disclosed,<br />
and acted upon. This article explores the concept of Big Data and some of the privacyrelated<br />
legal issues and risks associated with it.<br />
By now many lawyers, security and privacy professionals,<br />
and business managers have heard of the term<br />
“Big Data,” but many may not understand exactly<br />
what it refers to, and still more likely do not know how it will<br />
impact their clients and businesses (or perhaps it already is).<br />
Big Data is everywhere (quite literally). We see it drive the<br />
creative processes used by entertainment companies to construct<br />
the perfect television series based on their customer’s<br />
specific preferences. 1 We see Big Data in action when data<br />
brokers collect detailed employment information concerning<br />
190 million persons (including salary information) and sell it<br />
to debt collectors, financial institutions, and other entities. 2<br />
Big Data is in play when retailers like Target can determine<br />
when its customers are pregnant without being told, and send<br />
them marketing materials early on in order to win business. 3<br />
1 See http://www.salon.com/2013/02/01/how_netflix_is_turning_viewers_into_<br />
puppets/.<br />
2 See http://redtape.nbcnews.com/_news/2013/01/30/16762661-exclusive-youremployer-may-share-your-salary-and-equifax-might-sell-that-datalite.<br />
3 See http://www.nytimes.com/2012/02/19/magazine/shopping-habits.<br />
htmlpagewanted=all&_r=0.<br />
Big Data may also eventually help find the cure to cancer and<br />
other diseases. 4<br />
The potential uses and benefits of Big Data are endless. Unfortunately,<br />
Big Data also poses some risk to both the companies<br />
seeking to unlock its potential and the individuals whose<br />
information is now continuously being collected, combined,<br />
mined, analyzed, disclosed, and acted upon. This article explores<br />
the concept of Big Data and some of the privacy-related<br />
legal issues and risks associated with it.<br />
What is Big Data<br />
To understand the legal issues associated with Big Data, it is<br />
important to understand the meaning of the term. Wikipedia<br />
(part of the Big Data phenomenon itself) defines Big Data as<br />
follows:<br />
Big Data is a collection of data sets so large and complex<br />
that it becomes difficult to process using on-hand database<br />
management tools or traditional data processing applica-<br />
4 See http://articles.washingtonpost.com/2013-01-17/business/36384178_1_big-databreast-cancer-cure-cancer.<br />
14 – ISSA Journal | March 2013<br />
©2013 ISSA • www.issa.org • editor@issa.org • All rights reserved.
Legal Implications of Big Data: A Primer | David Navetta<br />
tions. The challenges include capture, curation, storage,<br />
search, sharing, analysis, and visualization. 5<br />
While the Wikipedia definition highlights the challenges<br />
associated with large data sets and understanding the data<br />
contained in those sets, a definition by the TechAmerican<br />
Foundation also captures the opportunities associated with<br />
Big Data:<br />
Big Data is a term that describes large volumes of high velocity,<br />
complex, and variable data that require advanced<br />
techniques and technologies to enable the capture, storage,<br />
distribution, management, and analysis of the information.<br />
6<br />
The Foundation stresses Big Data solutions as part of its attempt<br />
to define the term: Big Data Solutions: advanced techniques<br />
and technologies to enable the capture, storage, distribution,<br />
management and analysis of information.<br />
According to the TechAmerican Foundation, Big Data is<br />
characterized by three factors: volume, velocity, and variety: 7<br />
CHARACTERISTIC<br />
Volume<br />
Velocity<br />
Variety<br />
DESCRIPTION<br />
The sheer amount of data generated or data intensity<br />
that must be ingested, analyzed, and managed to<br />
make decisions based on complete data analysis<br />
How fast data is being produced and changed and the<br />
speed with which data must be received, understood,<br />
and processed<br />
The rise of information coming from new sources<br />
both inside and outside the walls of the enterprise<br />
or organization creates integration, management,<br />
governance, and architectural pressures on IT<br />
While these definitions and attributes of Big Data may be<br />
helpful, they are still rather abstract. Perhaps the better<br />
question to ask is “what does Big Data mean to companies<br />
or other organizations” Using this filter, Big Data and its<br />
use can be viewed as a business process or a supplement to<br />
existing business processes. Big Data in the business context<br />
means or encompasses the following:<br />
• The ability of the organization to (or could have) access<br />
unimaginable amounts of structured and unstructured<br />
data (much more of it likely in the unstructured category)<br />
both internally and through external resources (e.g., data<br />
brokers, affiliates, or partners).<br />
• A realization (or hope) that by capturing, structuring, and<br />
analyzing these huge volumes of data, and understanding<br />
the relationships within and between data, the company<br />
may gain valuable insights (often precise and nonobvious)<br />
that may significantly improve how the company<br />
does business.<br />
• The need to leverage specialized tools and specialized employees<br />
(e.g., data scientists) to enable the capture, cura-<br />
5 See http://en.wikipedia.org/wiki/Big_data.<br />
6 See Demystifying Big Data, http://www.techamerica.org/Docs/fileManager.<br />
cfmf=techamerica-bigdatareport-final.pdf.<br />
7 Ibid.<br />
tion, storage, search, sharing, and analysis of the data in a<br />
way that is valuable to the organization.<br />
• Analyzing and addressing the potential limitations and<br />
legal, security, and privacy risks and issues associated the<br />
collection, analysis, and use of Big Data (and the insights<br />
derived from it).<br />
While the specific applications of Big Data analysis will vary<br />
depending on the industry, the availability of data and the<br />
goals of a particular organization (and some of those practical<br />
applications are summarized above), many organizations will<br />
use Big Data to better understand and market to their customers<br />
(both individuals and corporate).<br />
Big Data and privacy<br />
When it comes to consumer marketing, the potential for Big<br />
Data is enormous (and some would argue that the confluence<br />
of online marketing and Big Data represents the “Holy Grail”<br />
of marketing). Big Data can allow marketers to target customers<br />
precisely and efficiently by providing advertising and<br />
product and services offers that are specifically tailored to a<br />
particular individual, based on his or her attributes. Big data<br />
combined with the use of mobile devices can result in offers<br />
to individuals that are highly relevant, delivered at the right<br />
time, and (with mobile and geo-location tracking) at the right<br />
place. However, one of the most significant legal challenges<br />
associated with Big Data, especially on the consumer marketing<br />
side, is privacy.<br />
Big Data and notice/consent<br />
In the United States, pursuant to the Fair Information Practice<br />
Principles, 8 the foundation of privacy protection includes<br />
the concepts of notice/awareness and choice/consent. To satisfy<br />
the principle of notice and awareness, the data subject from<br />
whom data will be collected must be made aware of the uses<br />
to which his or her personal information will be put, and to<br />
whom such personal information will be disclosed. 9 The notice<br />
is intended to allow the data subject to make an informed<br />
choice as to the collection and use of the subject’s personal<br />
information, and to consent (or not) to that collection and use.<br />
In a Big Data world, some contend that the goals of notice/<br />
consent may be circumvented due to the complexity of the Big<br />
Data ecosystem and practical limitations related to the use of<br />
written privacy policies. For example, privacy advocates believe<br />
that in some cases, a person that reads a privacy policy<br />
and agrees that his or her personal information can be collected,<br />
used, and disclosed for “marketing purposes” may not<br />
understand that such personal information may end up residing<br />
in the database of a data broker and combined and disclosed<br />
in ways not apparent in or contemplated by the privacy<br />
policy. For example, if an ecommerce vendor disclosed to a<br />
marketer that an individual customer purchased a deep fryer,<br />
such information could be combined into a profile about the<br />
individual in a database owned by a data broker. If the data<br />
8 http://www.ftc.gov/reports/privacy3/fairinfo.shtm.<br />
9 Ibid.<br />
©2013 ISSA • www.issa.org • editor@issa.org • All rights reserved.<br />
March 2013 | ISSA Journal – 15
Legal Implications of Big Data: A Primer | David Navetta<br />
In a Big Data<br />
world…data<br />
subjects have even<br />
less awareness<br />
and ability to<br />
provide meaningful<br />
consent.<br />
broker later sells access to the database to a health insurance<br />
company, whose algorithms put people who purchase deep<br />
fryers into a high risk category, in the world of Big Data the<br />
initial, relatively innocuous<br />
data disclosure (that was consented<br />
to) could suddenly serve<br />
as the basis to deny a person<br />
health care (or result in higher<br />
health care rates).<br />
The problem here is twofold.<br />
First, the consumer may not understand<br />
where his or her personal<br />
information may end up,<br />
and that it could be combined<br />
with other existing profile data<br />
in a manner that reveals more<br />
about the person than contemplated<br />
at the time of disclosure.<br />
Further onward transfer and combining with yet more databases<br />
could reveal even more. Second, the data subject lacks<br />
an understanding of the interpretations, inferences, and/or<br />
deductions that may be draw from his combined data using<br />
Big Data mining techniques and analytics. As such, in a Big<br />
Data world some would argue that data subjects have even<br />
less awareness and ability to provide meaningful consent. 10<br />
Big Data and access/participation<br />
Another area of privacy concern related to Big Data deals<br />
with the principle of “access/participation.” 11 This principle<br />
deals with a data subject’s ability to access his or her personal<br />
data in order to ascertain whether it is accurate and complete.<br />
This principle is necessary to allow individuals to correct inaccurate<br />
information about them.<br />
This principle has been incorporated into the Fair Credit Reporting<br />
Act 12 (FCRA), which requires credit reporting agencies<br />
to provide consumers with access to their credit reports<br />
so they can have inaccuracies corrected. In the Big Data<br />
context satisfying the access/participation principle poses<br />
significant challenges. Except for the established and highly<br />
visible players, the general public does not know what entities<br />
may be collecting information about them and creating profiles.<br />
While data subjects may be able to identify companies<br />
to whom they have provided personal information, and may<br />
have a direct relationship with such companies, the same is<br />
not true in the case of data brokers. In most cases data subjects<br />
do not have a direct relationship with them and these<br />
brokers typically do not receive information directly from<br />
the data subjects. Even if a consumer can identify a data<br />
broker that holds his or her profile, without a contract the<br />
consumer may have no legal recourse that would require the<br />
broker to provide access to his or her personal information.<br />
10 Notice and Consent in a World of Big Data, http://www.techpolicy.com/<br />
NoticeConsent-inWorldBigData.aspx.<br />
11 Fair Information Practice Principles, http://www.ftc.gov/reports/privacy3/fairinfo.<br />
shtm.<br />
12 Fair Credit Reporting Act (FCRA), 15 U.S.C. § 1681, et. seq.<br />
While some data brokers may be acting as “credit reporting<br />
agencies” and therefore subject to the FCRA, many take steps<br />
to avoid that status.<br />
Based on concerns over access and transparency, the Federal<br />
Trade Commission (FTC) has indicated a desire to consider<br />
additional regulatory scrutiny over data brokers:<br />
To address the invisibility of, and consumers’ lack of control<br />
over, data brokers’ collection and use of consumer information,<br />
the Commission supports targeted legislation<br />
– similar to that contained in several of the data security<br />
bills introduced in the 112th Congress – that would provide<br />
consumers with access to information about them<br />
held by a data broker. To further increase transparency,<br />
the Commission calls on data brokers that compile data<br />
for marketing purposes to explore creating a centralized<br />
website where data brokers could (1) identify themselves to<br />
consumers and describe how they collect and use consumer<br />
data, and (2) detail the access rights and other choices<br />
they provide with respect to the consumer data they maintain.<br />
13<br />
More recently, in December 2012, the FTC launched an investigation<br />
to study the data broker industry’s collection and<br />
use of consumer information. 14 Moreover, much of the privacy-related<br />
legislation proposed in Congress has included provisions<br />
related to the regulation and oversight of data brokers<br />
(although none has passed to date). 15 Overall, this is an area<br />
that is ripe for an increased regulatory response and potentially<br />
federal and/or state legislation.<br />
Big Data and Do Not Target/Do Not Collect<br />
Another privacy-related area impacted by Big Data is the do<br />
not track (DNT) debate. 16 For many in the advertising industry<br />
do not track refers to the use of consumer data for purposes<br />
of targeted advertising. In contrast, the FTC and privacy<br />
advocates believe the concept of DNT encompasses not only<br />
targeting of individuals, but also collection of personal information<br />
from individuals (do not collect). Recent regulatory<br />
emphasis on do not collect stems in part from concerns surrounding<br />
Big Data. With the pervasive and constant collection<br />
of information about individuals from multiple sources,<br />
many data brokers are able to pinpoint a user’s identity and<br />
specific preferences without having any information traditionally<br />
considered personally identifiable information. As<br />
discussed further below, common methods for de-identifying<br />
personal information may not be effective, if the unique<br />
identifier of the computer or mobile device used to access a<br />
website, when combined with specific behavioral and other<br />
data, can supply enough information to identify a person individually.<br />
This may lead to heightened regulatory scrutiny<br />
13 See Protecting Consumer Privacy in an Era of Rapid Change, http://ftc.gov/<br />
os/2012/03/120326privacyreport.pdf.<br />
14 See http://www.ftc.gov/opa/2012/12/databrokers.shtm.<br />
15 See e.g., http://www.infolawgroup.com/2010/08/articles/breach-notice/yet-anotherproposed-federal-data-security-and-breach-notification-bill-senators-rockefellerand-pryor-jump-into-the-fray/.<br />
16 See http://www.infolawgroup.com/2012/03/articles/data-privacy-law-or-regulation/<br />
ftc-looks-to-link-donottrack-big-data-privacy-concerns-seeks-solutions/.<br />
16 – ISSA Journal | March 2013<br />
©2013 ISSA • www.issa.org • editor@issa.org • All rights reserved.
Who’s minding<br />
your cloud<br />
Securely embrace the cloud. Our solutions do more than bring you to the cloud,<br />
they keep your business secure when you get there.<br />
+ FIND OUT how CA Technologies can help you secure and enable your<br />
business by visiting us at ca.com/mindyourcloud<br />
CA Technologies makes an<br />
important announcement<br />
in Cloud-Based Identity<br />
& Access Management<br />
Learn more at ca.com/mindyourcloud<br />
Copyright © 2013 CA. All rights reserved.
Legal Implications of Big Data: A Primer | David Navetta<br />
of Big Data practices, specifically where the collection and<br />
aggregation of seemingly harmless data about a person can<br />
be used to reveal sensitive information (e.g., health status,<br />
sexual orientation, and financial status).<br />
Anonymization and Big Data<br />
One technique for mitigating privacy-related risks associated<br />
with Big Data is de-identification or anonymization. 17 Data<br />
sets that are de-identified have had key information stripped<br />
away in order to prevent others from individually identifying<br />
the persons to whom the data set relates. This technique<br />
allows organizations to work with Big Data sets while mitigating<br />
privacy concerns, and has been used in many realms,<br />
including health care, banking and finance, and online advertising.<br />
In fact, many regulatory regimes recognize the concept of<br />
de-identified personal information. Under regulations promulgated<br />
pursuant to Gramm-Leach-Bliley 18 (regulating the<br />
privacy and security of financial data) “personally identifiable<br />
financial information” does not include information that<br />
does not identify a consumer “such as aggregate information<br />
or blind data that does not contain personal identifiers such<br />
as account numbers, names, or addresses.” 19 The Office for<br />
Civil Rights of the Department of Health and Human Services<br />
has issued extensive guidance concerning de-identification<br />
of health data, and sets forth two methods to achieve<br />
de-identification under HIPAA: expert determination and<br />
“safe harbor” de-identification (which involves removing<br />
eighteen types of identifiers from health data). 20 Under European<br />
data protection laws, to achieve legally permissible<br />
de-identification, “anonymization of data should exclude any<br />
possibility of individuals to be identified, even by combining<br />
anonymized information.” 21<br />
However, organizations relying on de-identification to circumvent<br />
privacy issues (and liability) must proceed carefully.<br />
If de-identification is not performed properly, it may be possible<br />
to re-identify individuals in an anonymized data set.<br />
There have been several real-life instances where re-identification<br />
has occurred, and researchers have also been able to<br />
demonstrate methods for identifying individuals from data<br />
that appeared anonymous on its face.<br />
In one infamous example, as part of a contest to create a better<br />
movie recommendation engine, Netflix released an anonymized<br />
data set containing the movie rental histories of approximately<br />
480,000 of its customers. Researchers established<br />
that they could re-identify some of the Netflix customers at<br />
issue by accessing and analyzing publicly available information<br />
concerning movie ratings performed by such custom-<br />
17 See http://en.wikipedia.org/wiki/De-identification.<br />
18 Gramm-Leach-Bliley Act of 1999, Pub. L. No. 106-102, 113 Stat. 1338 (codified as<br />
amended in scattered sections of 12 and 15 U.S.C. (2008)).<br />
19 See 17 CFR PART 248.<br />
20 See Guidance Regarding Methods for De-identification of Protected Health<br />
Information in Accordance with the Health Insurance Portability and Accountability<br />
Act (HIPAA) Privacy Rule, http://www.hhs.gov/ocr/privacy/hipaa/understanding/<br />
coveredentities/De-identification/hhs_deid_guidance.pdf.<br />
21 European Union Directive 95/46/EC.<br />
ers. 22 The Netflix contest eventually led to a lawsuit 23 against<br />
the company and regulatory scrutiny from the Federal Trade<br />
Commission. In another example, a researcher showed how<br />
she could re-identify persons with data in an anonymous<br />
health care data base by using publicly available voter records<br />
(in this case she was able to re-identify the information of the<br />
governor of Massachusetts). 24<br />
The risk of re-identification of Big Data sets using contextual<br />
“micro data” is a significant concern for organizations work-<br />
22 See Robust De-anonymization of Large Data sets (How to Break Anonymity of the<br />
Netflix Prize Data set) http://arxiv.org/PS_cache/cs/pdf/0610/0610105v2.pdf. The<br />
Netflix contest eventually lead to a lawsuit against the company and regulatory<br />
scrutiny by the Federal Trade Commission.<br />
23 See http://www.wired.com/images_blogs/threatlevel/2009/12/doe-v-netflix.pdf.<br />
24 See http://www.cs.duke.edu/~ashwin/pubs/BigPrivacyACMXRDS_final.<br />
pdf.<br />
The ISSA Web Conferences bring together ISSA<br />
members from around the world to share leading<br />
industry presentations and answer members’<br />
questions. Each event is designed to address the timely<br />
needs of our members through a live, online event and a<br />
subsequent recorded version for on-demand viewing.<br />
All content is developed by the ISSA Web Conference<br />
Committee. CPE credit available: ISSA members will be<br />
eligible for a certificate of attendance, after successful<br />
completion of a post-event quiz, to submit CPE credits for<br />
various certifications.<br />
Legislative Landscape<br />
2-Hour Live Event: March 26, 2013<br />
9am US Pacific/12pm US Eastern/5pm London<br />
Generously supported by Venafi.<br />
Increasingly legislation and regulation are becoming extremely<br />
important drivers for what information security<br />
professionals have to do, and the pace of delivery seems to<br />
be increasing wherever you work in the world today. What<br />
impacts will recently enabled, pending, and possible future<br />
legislation and regulation have on organizations and<br />
individuals and their approaches to what and how they do<br />
information security How do we prioritize what is most<br />
important What can we do to make compliance easier<br />
How do we get our policies aligned with the differing<br />
regulatory environments across different jurisdictions<br />
How do we deal with export controls (software and information)<br />
In some cases the question might be “How do<br />
we stay out of jail” Join our industry experts to get their<br />
views on this topic and the questions around it.<br />
Click here to register or here for more information.<br />
Visit https://www.issa.org/page=WebConferences<br />
for information on our 2013 schedule.<br />
18 – ISSA Journal | March 2013<br />
©2013 ISSA • www.issa.org • editor@issa.org • All rights reserved.
Legal Implications of Big Data: A Primer | David Navetta<br />
ing with de-identified data sets. If the de-identification is not<br />
done properly, third parties with access to de-identified data<br />
sets may be able to re-identify individuals, and that re-identification<br />
could expose the individuals at issue or constitute<br />
a data breach under existing data breach notification laws, 25<br />
and could lead to litigation or regulatory scrutiny. Organizations<br />
desiring to de-identify and anonymize their data sets<br />
should consider several questions to help understand and<br />
mitigate potential privacy and organizational risks, including:<br />
• What are the purposes, risks, and benefits of de-identifying<br />
and using or disclosing the data, and do the benefits<br />
outweigh the risks<br />
• Will the third parties and/or service providers at issue use<br />
any data (aggregate, de-identified, etc.) for own purposes<br />
Do they have any contractual rights to use the data or engage<br />
in their own aggregation or anonymization of data<br />
• Is the data truly anonymized How can the company be<br />
sure What information will be exposed if the data is reidentified<br />
Is it worth investing effort to verify anonymization<br />
• What is the risk to the business if the data is re-identified<br />
Data breach notification Lawsuits Regulatory investigations<br />
or actions<br />
Engaging in the analysis above can be very helpful in mitigating<br />
risks. However, companies need to be aware that the<br />
very nature of Big Data makes true anonymization more difficult.<br />
With reams of detailed data now available and accessible<br />
and sophisticated algorithms that allow data mining, it<br />
is arguably easier to re-identify individuals. The analysis and<br />
25 http://en.wikipedia.org/wiki/Security_breach_notification_laws.<br />
combination of anonymized data sets with data sets containing<br />
identified individuals is largely unpredictable, and yet can<br />
potentially result in an organization getting into legal trouble.<br />
Conclusion<br />
The Big Data era is upon us, and it will become increasingly<br />
common for companies to collect, data mine, and analyze<br />
large data sets in order to further their business interests. Big<br />
Data analytics is already the norm for many organizations,<br />
and this trend will only continue over time as more and more<br />
data is collected, and stronger and more predictive tools and<br />
processes are developed to understand that data. As companies<br />
rush headlong into the Big Data space, they would be<br />
wise to step back and contemplate the potential privacy implications<br />
of their activities, and consider steps to address<br />
privacy concerns. Proactively dealing with the privacy issues<br />
discussed in this article can help organizations safely leverage<br />
Big Data while still retaining customers and avoiding reputational<br />
harm, litigation, and regulatory scrutiny.<br />
About the Author<br />
David Navetta, Esq., CIPP/US, is one of<br />
the founding partners of Information Law<br />
Group LLP (www.infolawgroup.com). David<br />
has practiced law for over fifteen years,<br />
and focuses on technology, privacy, information<br />
security, and intellectual property<br />
law. He has previously served as a cochair<br />
of the American Bar Association’s<br />
Information Security Committee. He has<br />
spoken and written frequently concerning technology, privacy,<br />
and data security legal issues and can be reached at dnavetta@<br />
infolawgroup.com.<br />
©2013 ISSA • www.issa.org • editor@issa.org • All rights reserved.<br />
March 2013 | ISSA Journal – 19