NCDS-Summit-2013
NCDS-Summit-2013
NCDS-Summit-2013
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Introduction<br />
Our world is undergoing a massive transformation due<br />
to the widespread availability of “big data.” Science,<br />
medicine, education, government, industry, social media,<br />
and entertainment—all are experiencing radical changes<br />
as a result of these massive data sets. Immense streams of<br />
data are routinely collected by scientists during the course<br />
of experimentation, health care providers documenting electronic<br />
medical records, educators collecting data on student<br />
performance, students manipulating large data sets as part<br />
of their curriculum, governmental agencies tracking citizen<br />
behavior in the interest of local and national security, businesses<br />
tracking online behavior and customer transactions,<br />
investors tallying the risks and rewards of financial investments,<br />
and children and adults participating in social media<br />
forums and online games. These data collections hold<br />
tremendous potential to solve problems,<br />
improve health and quality of 9000<br />
life, reform our education system, and<br />
create economic growth. Yet our ability<br />
to use these massive data sets in a<br />
8000<br />
cost-efficient and -effective manner is<br />
7000<br />
hampered by our fundamental lack of<br />
6000<br />
understanding of the science of data<br />
(Horvitz and Mitchell, 2010; Agrawal<br />
5000<br />
et al., 2011; Boyd and Crawford, 2011;<br />
Manyika et al., 2011) (Figure 1).<br />
The highly interdisciplinary field of<br />
data science can provide the solution to<br />
the big data impasse through the systematic<br />
study of data to further our understanding<br />
of the production, management,<br />
and use of data and to develop<br />
principles and theories that can guide<br />
new discoveries and innovations and<br />
thus foster economic activity and the<br />
health, education, and well-being of<br />
people around the world.<br />
Exabytes<br />
4000<br />
3000<br />
2000<br />
1000<br />
0<br />
tion, and economic opportunity.<br />
The Consortium has defined data science as the systematic<br />
study of the organization and use of digital data in order to<br />
accelerate discovery, improve critical decision-making processes,<br />
and enable a data-driven economy.<br />
The Consortium aims to serve as a vital forum to: engage<br />
a broad community of data science experts to define challenges,<br />
solutions, and visions for the future of data science;<br />
facilitate frequent, close interchange between data scientists<br />
and business leaders, funders, policy makers, and scientists<br />
in data-intensive domains; coordinate research priorities in<br />
data science; promote a focus on data science at multiple levels,<br />
including local, state, national, and international; support<br />
the development of technical, ethical, and policy standards<br />
2005 2010 2015<br />
Figure 1: Projected growth in stored data in exabytes. (Figure created from data<br />
openly provided by IDC, IDC View, 2011, http://www.emc.com/collateral/about/<br />
news/idc-emc-digital-universe-2011-infographic.pdf).<br />
“235 terabytes of data collected by the U.S. Library of<br />
Congress by April 2011” and yet “15 out of 17 sectors<br />
in the United States have more data stored per company<br />
than the U.S. Library of Congress.” —Manyika<br />
et al., McKinsey Global Institute Report, 2011<br />
The National Consortium for Data Science 1 was established<br />
in February <strong>2013</strong> as a public/private partnership of leading<br />
universities, governmental agencies, and businesses. The<br />
mission of the <strong>NCDS</strong> is to provide the foundation needed<br />
to advance the field of data science through research, educa-<br />
1. Establishing a National Consortium for Data Science, http://<br />
data2discovery.org/dev/wp-content/uploads/2012/09/<strong>NCDS</strong>-<br />
Consortium-Roadmap_July.pdf.<br />
for data; provide leadership in the advancement of data science;<br />
and cultivate the next generation of data science leaders<br />
through education and training programs in data science.<br />
Organizations that committed to the establishment of the<br />
Consortium have been designated as Founding Members<br />
(see page 18) and share the responsibility of overseeing and<br />
directing the work of the Consortium through their participation<br />
in various committees and working groups. Membership<br />
fees support the Consortium’s activities.<br />
As part of its efforts to advance data science, the Consortium<br />
established an Annual Data Science Leadership <strong>Summit</strong>.<br />
The first summit was held on April 23 and 24, <strong>2013</strong><br />
in Chapel Hill, North Carolina, and the theme was “Data<br />
to Discovery: Genomes to Health”—a theme that was chosen<br />
Data to Discovery: Genomes to Health / page 1