Five Practices to Improve the Value of Data Science
five_practices_to_improve_the_value_of_data_science_-_full_report
five_practices_to_improve_the_value_of_data_science_-_full_report
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Five</strong> <strong>Practices</strong> <strong>to</strong> <strong>Improve</strong> <strong>the</strong><br />
<strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong><br />
Optimizing <strong>the</strong> Effectiveness <strong>of</strong> <strong>Data</strong> <strong>Science</strong> Teams<br />
<strong>Data</strong> science efforts are only as good as <strong>the</strong> results <strong>the</strong>y deliver. In this Big <strong>Data</strong><br />
world, executives are relying on data scientists <strong>to</strong> help <strong>the</strong>m move <strong>the</strong>ir company<br />
Cus<strong>to</strong>mer Experience Management<br />
Best <strong>Practices</strong> Study Results<br />
forward. Here are five ways companies can improve <strong>the</strong> value <strong>of</strong> data science.<br />
How <strong>to</strong> improve <strong>the</strong> effectiveness <strong>of</strong> your CXM<br />
Program<br />
Bob E. Hayes, PhD<br />
bob@analyticsweek.com<br />
@bobehayes<br />
Bob E. Hayes, PhD, Chief Research Officer<br />
email: bob@appuri.com<br />
phone: 206.886.0893<br />
twitter: @bobehayes
Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 2<br />
The value <strong>of</strong> data is measured by what you do with it, and organizations are relying on<br />
data scientists <strong>to</strong> extract insights from <strong>the</strong>ir data. Many data pr<strong>of</strong>essionals, pundits and<br />
bloggers have asserted <strong>the</strong>ir opinions about how companies can leverage data science <strong>to</strong><br />
<strong>the</strong>ir advantage. While <strong>the</strong>se experts have made good points, it’s important <strong>to</strong> note that<br />
<strong>the</strong>ir assertions are simply opinions that need <strong>to</strong> be verified by data. I conducted a<br />
survey <strong>of</strong> data pr<strong>of</strong>essionals <strong>to</strong> better understand what it means <strong>to</strong> be a data scientist<br />
and how <strong>to</strong> best leverage <strong>the</strong>ir unique skill set <strong>to</strong> unlock <strong>the</strong> value <strong>of</strong> business data.<br />
Based on that research, I discovered a few things that can help improve <strong>the</strong> effectiveness<br />
<strong>of</strong> data scientists and <strong>the</strong> work <strong>the</strong>y do.<br />
<strong>Data</strong> Scientists and <strong>the</strong> Practice <strong>of</strong> <strong>Data</strong> <strong>Science</strong><br />
<strong>Data</strong> scientists have many diverse<br />
skills. While we measured 25<br />
distinct skills across five general<br />
skill types, a fac<strong>to</strong>r analysis <strong>of</strong><br />
pr<strong>of</strong>iciency ratings <strong>of</strong> <strong>the</strong> 25 skills<br />
resulted in three distinct skill<br />
types. These skill areas included:<br />
1. Business (Subject Matter<br />
Expertise)<br />
2. Technology / Programming<br />
3. Statistics / Math<br />
Additionally, we found that <strong>the</strong>re<br />
are four distinct job roles among<br />
<strong>the</strong>se data pr<strong>of</strong>essionals:<br />
1. Developer (e.g., developer,<br />
engineer)<br />
2. Researcher (e.g., researcher,<br />
scientist, statistician)<br />
3. Creative (e.g., Jack <strong>of</strong> all<br />
trades, artist, hacker)<br />
4. Business Management (e.g.,<br />
leader, business person,<br />
entrepreneur)<br />
Figure 1. <strong>Data</strong> Scientists have different skill sets. From:<br />
Investigating <strong>Data</strong> Scientists, <strong>the</strong>ir Skills and Team Makeup<br />
appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />
Copyright © 2016 Appuri Inc
Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 3<br />
<strong>Data</strong> pr<strong>of</strong>essionals in<br />
different job roles have<br />
different skill sets (see<br />
Figure 1). Not<br />
surprisingly, data<br />
pr<strong>of</strong>essionals who<br />
identified as Developers<br />
reported <strong>the</strong> highest levels<br />
<strong>of</strong> pr<strong>of</strong>iciency in<br />
Technology and<br />
Programming skills<br />
compared <strong>to</strong> <strong>the</strong>ir<br />
counterparts.<br />
Additionally, Researchers<br />
reported <strong>the</strong> highest levels<br />
<strong>of</strong> pr<strong>of</strong>iciency in Statistics<br />
and Math while data<br />
pr<strong>of</strong>essionals who<br />
identified as Business<br />
Management reported <strong>the</strong><br />
highest levels <strong>of</strong><br />
pr<strong>of</strong>iciency in Business.<br />
Finally, data pr<strong>of</strong>essionals who Figure 2. From: <strong>Data</strong> science skills and <strong>the</strong> improbable unicorn<br />
identified as Creative reported<br />
moderate ratings across all skill sets, suggesting <strong>the</strong>y are indeed jack-<strong>of</strong>-all-trades.<br />
Finding a data scientist who is pr<strong>of</strong>icient in all data science skill areas is extremely<br />
difficult (see Figure 2). <strong>Data</strong> pr<strong>of</strong>essionals rarely possess pr<strong>of</strong>iciency in all five skill areas<br />
at <strong>the</strong> level needed <strong>to</strong> be successful at work. In fact, <strong>the</strong> chance <strong>of</strong> finding a data<br />
pr<strong>of</strong>essional with expert skills in all five data science skills is akin <strong>to</strong> finding a unicorn;<br />
<strong>the</strong>y simply don't exist.<br />
appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />
Copyright © 2016 Appuri Inc
Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 4<br />
<strong>Five</strong> Ways <strong>to</strong> <strong>Improve</strong> <strong>the</strong> Success <strong>of</strong> <strong>Data</strong> <strong>Science</strong> Projects<br />
Given that <strong>the</strong>re are different types <strong>of</strong> data scientists with unique skills sets, how can<br />
companies leverage data scientists <strong>to</strong> extract insights from <strong>the</strong>ir data? Following <strong>the</strong>se<br />
five practices is a good start.<br />
1. Adopt a team approach for your data science projects<br />
We found that data pr<strong>of</strong>essionals who worked with o<strong>the</strong>r data pr<strong>of</strong>essionals who had<br />
complementary skills were more satisfied with <strong>the</strong>ir work than when <strong>the</strong>y did not work<br />
with ano<strong>the</strong>r data pr<strong>of</strong>essional. For example, Business Management pr<strong>of</strong>essionals were<br />
more satisfied with <strong>the</strong> outcome <strong>of</strong> <strong>the</strong>ir work when <strong>the</strong>y had quantitative-minded<br />
experts on <strong>the</strong>ir team (e.g., Math & Modeling and Statistics) compared <strong>to</strong> when <strong>the</strong>y did<br />
not have <strong>the</strong>m on <strong>the</strong>ir team. Also, Researchers were more satisfied with <strong>the</strong>ir work<br />
outcome when <strong>the</strong>y were paired with experts in Business and Math & Modeling.<br />
Developers were more satisfied with <strong>the</strong>ir work outcomes when paired with an expert in<br />
Business. Creatives’ satisfaction with <strong>the</strong>ir work product is not impacted by <strong>the</strong> presence<br />
<strong>of</strong> o<strong>the</strong>r experts; this finding is likely due <strong>to</strong> <strong>the</strong> fact that Creatives are not able <strong>to</strong><br />
contribute sufficiently <strong>to</strong> teamwork success because <strong>the</strong>y are not highly pr<strong>of</strong>icient in any<br />
<strong>of</strong> <strong>the</strong> data skills.<br />
2. Employ <strong>the</strong> scientific<br />
method for data-intensive<br />
projects<br />
Scientists have been getting insight<br />
from data for centuries using <strong>the</strong><br />
scientific method. Formally defined,<br />
<strong>the</strong> scientific method is a body <strong>of</strong><br />
techniques for objectively<br />
investigating phenomena, acquiring<br />
new knowledge, or correcting and<br />
integrating previous knowledge. The<br />
scientific method includes <strong>the</strong><br />
collection <strong>of</strong> empirical evidence,<br />
subject <strong>to</strong> specific principles <strong>of</strong><br />
reasoning. The application <strong>of</strong> <strong>the</strong><br />
scientific method helps us be honest<br />
with ourselves and minimizes <strong>the</strong><br />
chances <strong>of</strong> us arriving at <strong>the</strong> wrong<br />
Figure 3. The scientific method is a good approach <strong>to</strong> extracting<br />
value from data.<br />
appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />
Copyright © 2016 Appuri Inc
Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 5<br />
conclusion. The scientific method plays a critical role in understanding any data,<br />
irrespective <strong>of</strong> <strong>the</strong>ir size or speed or variety.<br />
The scientific method follows <strong>the</strong>se general steps (see Figure 3):<br />
1. Formulate a question or problem statement<br />
2. Generate a hypo<strong>the</strong>sis that is testable<br />
3. Ga<strong>the</strong>r/Generate data <strong>to</strong> understand <strong>the</strong> phenomenon in question. <strong>Data</strong> can<br />
be generated through experimentation; when we can’t conduct true experiments,<br />
data are obtained through observations and measurements.<br />
4. Analyze data <strong>to</strong> test <strong>the</strong> hypo<strong>the</strong>ses / Draw conclusions<br />
5. Communicate results <strong>to</strong> interested parties or take action (e.g., change<br />
processes) based on <strong>the</strong> conclusions. Additionally, <strong>the</strong> outcome <strong>of</strong> <strong>the</strong> scientific<br />
method can help us refine our hypo<strong>the</strong>ses for fur<strong>the</strong>r testing.<br />
When I map <strong>the</strong> three data science skills against <strong>the</strong> five steps <strong>of</strong> <strong>the</strong> scientific method,<br />
it's clear why data science skills are so important in extracting insight from data (see<br />
Figure 4). Pr<strong>of</strong>iciency in each <strong>of</strong> <strong>the</strong> three data science skills is required <strong>to</strong> successfully<br />
implement <strong>the</strong> scientific method as a way <strong>to</strong> get insights from data. Business knowledge<br />
is necessary <strong>to</strong> help formulate <strong>the</strong> right questions, generate hypo<strong>the</strong>ses, ga<strong>the</strong>r data and<br />
communicate results. Technology/Programming skills are needed <strong>to</strong> ga<strong>the</strong>r/generate<br />
data and analyze data/test hypo<strong>the</strong>ses. Finally, Statistics/Math skills are necessary <strong>to</strong><br />
ga<strong>the</strong>r data, analyze data/test hypo<strong>the</strong>ses and communicate results.<br />
Figure 4. You need all three data science skills when you adopt <strong>the</strong> scientific method<br />
appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />
Copyright © 2016 Appuri Inc
Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 6<br />
3. Educate all data science team members on statistics<br />
Different data science job roles require different data science skills for success. For<br />
Business Managers, <strong>the</strong>y need <strong>to</strong> be savvy in statistics, machine learning and big<br />
and distributed data. For Developers, <strong>the</strong>ir skills need <strong>to</strong> include product design<br />
and development, systems administration and back-end programming. For<br />
Creatives, <strong>the</strong>ir skills need <strong>to</strong> include math, business development and graphical<br />
models. For Researchers, <strong>the</strong>y need <strong>to</strong> possess skills in statistics, algorithms and<br />
simulations and product design and development.<br />
Figure 5. Top data science skills across different data scientists include statistics. From: 10 <strong>Data</strong><br />
<strong>Science</strong> Skills You Need <strong>to</strong> <strong>Improve</strong> Project Success<br />
It’s clear that data science skills are necessary for successful analytics projects. While<br />
<strong>to</strong>p skills varied by data science job roles, it is interesting <strong>to</strong> note that skills in statistics<br />
and math dominated <strong>the</strong> <strong>to</strong>p 10 drivers <strong>of</strong> project outcomes (see Figure 5). In fact, <strong>the</strong><br />
most important data science skill <strong>to</strong> project outcome was <strong>Data</strong> Mining and<br />
Visualization Tools; if you are a data scientist, it doesn't matter if you are a Developer<br />
or Researcher or any o<strong>the</strong>r any o<strong>the</strong>r kind, it would benefit you <strong>to</strong> learn <strong>to</strong>ols that help<br />
you mine and visualize data. The more pr<strong>of</strong>icient in <strong>the</strong>se <strong>to</strong>ols you become, <strong>the</strong> better<br />
you will feel about <strong>the</strong> outcome <strong>of</strong> your analytics projects.<br />
appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />
Copyright © 2016 Appuri Inc
Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 7<br />
4. Integrate your data silos<br />
<strong>Data</strong> scientists’ work is fueled by <strong>the</strong> data that are available <strong>to</strong> <strong>the</strong>m. Unfortunately, data<br />
are <strong>of</strong>ten housed in different systems, making it difficult <strong>to</strong> collect and integrate <strong>the</strong>m.<br />
For example, data scientists can use Google Analytics <strong>to</strong> understand cus<strong>to</strong>mers’ search<br />
behavior. They leverage Mixpanel <strong>to</strong> learn how cus<strong>to</strong>mers use <strong>the</strong>ir applications. They<br />
rely on Marke<strong>to</strong> <strong>to</strong> track <strong>the</strong> effectiveness <strong>of</strong> different forms <strong>of</strong> communication. They<br />
rely on Salesforce <strong>to</strong> track cus<strong>to</strong>mer interactions across throughout <strong>the</strong> lifecycle. The use<br />
<strong>of</strong> <strong>the</strong>se separate <strong>to</strong>ols results in data silos, each one housing a particular piece <strong>of</strong> <strong>the</strong><br />
cus<strong>to</strong>mer puzzle. While each data silo contains important pieces <strong>of</strong> information about<br />
your cus<strong>to</strong>mers, if you don't connect those pieces across those different data silos, you're<br />
only seeing parts <strong>of</strong> <strong>the</strong> entire cus<strong>to</strong>mer picture.<br />
Analyzing each data source separately is limited by <strong>the</strong> variables in each data set. To get<br />
<strong>the</strong> complete picture <strong>of</strong> your cus<strong>to</strong>mers, you need <strong>to</strong> connect <strong>the</strong> dots across <strong>the</strong> data<br />
silos. By integrating all your data, you will be able <strong>to</strong> analyze all your data <strong>to</strong> extract<br />
deeper insights in<strong>to</strong> <strong>the</strong> causes <strong>of</strong> cus<strong>to</strong>mer churn.<br />
Siloed data sets prevent business leaders from gaining a complete understanding <strong>of</strong><br />
<strong>the</strong>ir cus<strong>to</strong>mers. In this scenario, analytics can only be conducted within one data silo at<br />
a time, restricting <strong>the</strong> set <strong>of</strong> information (i.e., variables) that can be used <strong>to</strong> describe a<br />
given phenomenon; your analytic models are likely underspecified (not using <strong>the</strong><br />
complete set <strong>of</strong> useful predic<strong>to</strong>rs), decreasing your model's predictive power / increasing<br />
your model's error. The bot<strong>to</strong>m line is that you are not able <strong>to</strong> make <strong>the</strong> best prediction<br />
about your cus<strong>to</strong>mers because you don't have all <strong>the</strong> necessary information about <strong>the</strong>m.<br />
The integration <strong>of</strong> <strong>the</strong>se disparate cus<strong>to</strong>mer data silos helps your data science team<br />
identify <strong>the</strong> interrelationships among <strong>the</strong> different pieces <strong>of</strong> cus<strong>to</strong>mer information,<br />
including <strong>the</strong>ir purchasing behavior, values, interests, attitudes about your brand,<br />
interactions with your brand and more. Integrating information/facts about your<br />
cus<strong>to</strong>mers allows you <strong>to</strong> gain an understanding about how all <strong>the</strong> variables work<br />
<strong>to</strong>ge<strong>the</strong>r (i.e., are related <strong>to</strong> each o<strong>the</strong>r), driving deeper cus<strong>to</strong>mer insight about why<br />
cus<strong>to</strong>mers churn, recommend you and buy more from you.<br />
appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />
Copyright © 2016 Appuri Inc
Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 8<br />
5. Adopt Machine Learning Capabilities<br />
While it is true that <strong>the</strong> value <strong>of</strong> all <strong>of</strong> your data is greater than <strong>the</strong> value <strong>of</strong> each data silo<br />
taken alone, extracting that value from <strong>the</strong> combined data in your data platform can be<br />
overwhelming. As <strong>the</strong> number <strong>of</strong> variables grows, <strong>the</strong> more statistical tests you are able<br />
<strong>to</strong> run. Consequently, data scientists are utilizing <strong>the</strong> power <strong>of</strong> machine learning <strong>to</strong> help<br />
<strong>the</strong>m get insights from extremely large data sets.<br />
Iterative in nature, machine learning algorithms continually learn from data. The more<br />
data <strong>the</strong>y ingest, <strong>the</strong> better <strong>the</strong>y get. Based on math, statistics and probability,<br />
algorithms find connections among variables that help optimize important<br />
organizational outcomes. For our cus<strong>to</strong>mers, <strong>the</strong>ir outcome <strong>of</strong> interest is cus<strong>to</strong>mer<br />
churn. We apply machine learning on <strong>the</strong>ir data <strong>to</strong> understand which cus<strong>to</strong>mers are<br />
likely <strong>to</strong> churn and why. By identifying <strong>the</strong>ir at-risk cus<strong>to</strong>mers, our clients are able <strong>to</strong><br />
decrease cus<strong>to</strong>mer churn through proactive retention management efforts.<br />
Summary<br />
Based on a study <strong>of</strong> hundreds <strong>of</strong> data scientists, we learned a lot about how <strong>to</strong> improve<br />
<strong>the</strong> effectiveness <strong>of</strong> data scientists. The practice <strong>of</strong> data science requires pr<strong>of</strong>iciency in a<br />
handful <strong>of</strong> specific data skills, including business acumen, technology / programming<br />
and statistics / math. Different data pr<strong>of</strong>essionals report vastly different pr<strong>of</strong>iciency<br />
levels across <strong>the</strong>se skills. Because data pr<strong>of</strong>essionals tend <strong>to</strong> specialize in only one or two<br />
skill areas, organizations have a better chance <strong>of</strong> extracting value from <strong>the</strong>ir data when<br />
<strong>the</strong>y adopt a team approach consisting <strong>of</strong> data scientists who have complementary skill<br />
sets.<br />
Following <strong>the</strong> scientific<br />
method in our data projects<br />
helps us keep our biases in<br />
check and minimizes <strong>the</strong><br />
chances <strong>of</strong> us arriving at <strong>the</strong><br />
wrong conclusion. Through<br />
trial and error, <strong>the</strong> scientific<br />
method helps us uncover <strong>the</strong><br />
reasons why variables are<br />
related <strong>to</strong> each o<strong>the</strong>r and <strong>the</strong><br />
underlying processes that<br />
drive <strong>the</strong> observed<br />
relationships.<br />
appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />
Copyright © 2016 Appuri Inc
Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 9<br />
Additionally, it is important that your data science team understands statistics and data<br />
mining. Not only do Researchers get value from understanding statistics, but o<strong>the</strong>r<br />
kinds <strong>of</strong> data scientists (i.e., business management, creative, developer) also benefit<br />
greatly by knowing statistics.<br />
Finally, we recommend that businesses integrate <strong>the</strong>ir data silos and apply <strong>the</strong> power <strong>of</strong><br />
machine learning <strong>to</strong> help <strong>the</strong>m connect <strong>the</strong> dots across <strong>the</strong>ir disparate data sources. The<br />
more you know about your cus<strong>to</strong>mers, <strong>the</strong> better you are able <strong>to</strong> meet <strong>the</strong>ir needs and<br />
ensure <strong>the</strong>y are receiving value from your solutions.<br />
About Appuri<br />
------<br />
Appuri provides an enterprise-grade data platform for businesses <strong>to</strong> transform <strong>the</strong>ir<br />
cus<strong>to</strong>mer data in<strong>to</strong> deep cus<strong>to</strong>mer experience insights. The Appuri platform helps<br />
businesses integrate <strong>the</strong>ir data silos and leverage <strong>the</strong> power <strong>of</strong> machine learning <strong>to</strong><br />
analyze billions <strong>of</strong> events <strong>to</strong> better predict <strong>the</strong> causes <strong>of</strong> cus<strong>to</strong>mer loyalty. These insights<br />
help business deliver a better cus<strong>to</strong>mer experience <strong>to</strong> decrease cus<strong>to</strong>mer churn, acquire<br />
new cus<strong>to</strong>mers and deepen <strong>the</strong>ir relationship with existing cus<strong>to</strong>mers.<br />
Appuri, Inc.<br />
119 Pine St., Ste. 300<br />
Seattle, WA 98101<br />
info@appuri.com<br />
sales@appuri.com<br />
appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />
Copyright © 2016 Appuri Inc