10.10.2016 Views

Five Practices to Improve the Value of Data Science

five_practices_to_improve_the_value_of_data_science_-_full_report

five_practices_to_improve_the_value_of_data_science_-_full_report

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Five</strong> <strong>Practices</strong> <strong>to</strong> <strong>Improve</strong> <strong>the</strong><br />

<strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong><br />

Optimizing <strong>the</strong> Effectiveness <strong>of</strong> <strong>Data</strong> <strong>Science</strong> Teams<br />

<strong>Data</strong> science efforts are only as good as <strong>the</strong> results <strong>the</strong>y deliver. In this Big <strong>Data</strong><br />

world, executives are relying on data scientists <strong>to</strong> help <strong>the</strong>m move <strong>the</strong>ir company<br />

Cus<strong>to</strong>mer Experience Management<br />

Best <strong>Practices</strong> Study Results<br />

forward. Here are five ways companies can improve <strong>the</strong> value <strong>of</strong> data science.<br />

How <strong>to</strong> improve <strong>the</strong> effectiveness <strong>of</strong> your CXM<br />

Program<br />

Bob E. Hayes, PhD<br />

bob@analyticsweek.com<br />

@bobehayes<br />

Bob E. Hayes, PhD, Chief Research Officer<br />

email: bob@appuri.com<br />

phone: 206.886.0893<br />

twitter: @bobehayes


Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 2<br />

The value <strong>of</strong> data is measured by what you do with it, and organizations are relying on<br />

data scientists <strong>to</strong> extract insights from <strong>the</strong>ir data. Many data pr<strong>of</strong>essionals, pundits and<br />

bloggers have asserted <strong>the</strong>ir opinions about how companies can leverage data science <strong>to</strong><br />

<strong>the</strong>ir advantage. While <strong>the</strong>se experts have made good points, it’s important <strong>to</strong> note that<br />

<strong>the</strong>ir assertions are simply opinions that need <strong>to</strong> be verified by data. I conducted a<br />

survey <strong>of</strong> data pr<strong>of</strong>essionals <strong>to</strong> better understand what it means <strong>to</strong> be a data scientist<br />

and how <strong>to</strong> best leverage <strong>the</strong>ir unique skill set <strong>to</strong> unlock <strong>the</strong> value <strong>of</strong> business data.<br />

Based on that research, I discovered a few things that can help improve <strong>the</strong> effectiveness<br />

<strong>of</strong> data scientists and <strong>the</strong> work <strong>the</strong>y do.<br />

<strong>Data</strong> Scientists and <strong>the</strong> Practice <strong>of</strong> <strong>Data</strong> <strong>Science</strong><br />

<strong>Data</strong> scientists have many diverse<br />

skills. While we measured 25<br />

distinct skills across five general<br />

skill types, a fac<strong>to</strong>r analysis <strong>of</strong><br />

pr<strong>of</strong>iciency ratings <strong>of</strong> <strong>the</strong> 25 skills<br />

resulted in three distinct skill<br />

types. These skill areas included:<br />

1. Business (Subject Matter<br />

Expertise)<br />

2. Technology / Programming<br />

3. Statistics / Math<br />

Additionally, we found that <strong>the</strong>re<br />

are four distinct job roles among<br />

<strong>the</strong>se data pr<strong>of</strong>essionals:<br />

1. Developer (e.g., developer,<br />

engineer)<br />

2. Researcher (e.g., researcher,<br />

scientist, statistician)<br />

3. Creative (e.g., Jack <strong>of</strong> all<br />

trades, artist, hacker)<br />

4. Business Management (e.g.,<br />

leader, business person,<br />

entrepreneur)<br />

Figure 1. <strong>Data</strong> Scientists have different skill sets. From:<br />

Investigating <strong>Data</strong> Scientists, <strong>the</strong>ir Skills and Team Makeup<br />

appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />

Copyright © 2016 Appuri Inc


Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 3<br />

<strong>Data</strong> pr<strong>of</strong>essionals in<br />

different job roles have<br />

different skill sets (see<br />

Figure 1). Not<br />

surprisingly, data<br />

pr<strong>of</strong>essionals who<br />

identified as Developers<br />

reported <strong>the</strong> highest levels<br />

<strong>of</strong> pr<strong>of</strong>iciency in<br />

Technology and<br />

Programming skills<br />

compared <strong>to</strong> <strong>the</strong>ir<br />

counterparts.<br />

Additionally, Researchers<br />

reported <strong>the</strong> highest levels<br />

<strong>of</strong> pr<strong>of</strong>iciency in Statistics<br />

and Math while data<br />

pr<strong>of</strong>essionals who<br />

identified as Business<br />

Management reported <strong>the</strong><br />

highest levels <strong>of</strong><br />

pr<strong>of</strong>iciency in Business.<br />

Finally, data pr<strong>of</strong>essionals who Figure 2. From: <strong>Data</strong> science skills and <strong>the</strong> improbable unicorn<br />

identified as Creative reported<br />

moderate ratings across all skill sets, suggesting <strong>the</strong>y are indeed jack-<strong>of</strong>-all-trades.<br />

Finding a data scientist who is pr<strong>of</strong>icient in all data science skill areas is extremely<br />

difficult (see Figure 2). <strong>Data</strong> pr<strong>of</strong>essionals rarely possess pr<strong>of</strong>iciency in all five skill areas<br />

at <strong>the</strong> level needed <strong>to</strong> be successful at work. In fact, <strong>the</strong> chance <strong>of</strong> finding a data<br />

pr<strong>of</strong>essional with expert skills in all five data science skills is akin <strong>to</strong> finding a unicorn;<br />

<strong>the</strong>y simply don't exist.<br />

appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />

Copyright © 2016 Appuri Inc


Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 4<br />

<strong>Five</strong> Ways <strong>to</strong> <strong>Improve</strong> <strong>the</strong> Success <strong>of</strong> <strong>Data</strong> <strong>Science</strong> Projects<br />

Given that <strong>the</strong>re are different types <strong>of</strong> data scientists with unique skills sets, how can<br />

companies leverage data scientists <strong>to</strong> extract insights from <strong>the</strong>ir data? Following <strong>the</strong>se<br />

five practices is a good start.<br />

1. Adopt a team approach for your data science projects<br />

We found that data pr<strong>of</strong>essionals who worked with o<strong>the</strong>r data pr<strong>of</strong>essionals who had<br />

complementary skills were more satisfied with <strong>the</strong>ir work than when <strong>the</strong>y did not work<br />

with ano<strong>the</strong>r data pr<strong>of</strong>essional. For example, Business Management pr<strong>of</strong>essionals were<br />

more satisfied with <strong>the</strong> outcome <strong>of</strong> <strong>the</strong>ir work when <strong>the</strong>y had quantitative-minded<br />

experts on <strong>the</strong>ir team (e.g., Math & Modeling and Statistics) compared <strong>to</strong> when <strong>the</strong>y did<br />

not have <strong>the</strong>m on <strong>the</strong>ir team. Also, Researchers were more satisfied with <strong>the</strong>ir work<br />

outcome when <strong>the</strong>y were paired with experts in Business and Math & Modeling.<br />

Developers were more satisfied with <strong>the</strong>ir work outcomes when paired with an expert in<br />

Business. Creatives’ satisfaction with <strong>the</strong>ir work product is not impacted by <strong>the</strong> presence<br />

<strong>of</strong> o<strong>the</strong>r experts; this finding is likely due <strong>to</strong> <strong>the</strong> fact that Creatives are not able <strong>to</strong><br />

contribute sufficiently <strong>to</strong> teamwork success because <strong>the</strong>y are not highly pr<strong>of</strong>icient in any<br />

<strong>of</strong> <strong>the</strong> data skills.<br />

2. Employ <strong>the</strong> scientific<br />

method for data-intensive<br />

projects<br />

Scientists have been getting insight<br />

from data for centuries using <strong>the</strong><br />

scientific method. Formally defined,<br />

<strong>the</strong> scientific method is a body <strong>of</strong><br />

techniques for objectively<br />

investigating phenomena, acquiring<br />

new knowledge, or correcting and<br />

integrating previous knowledge. The<br />

scientific method includes <strong>the</strong><br />

collection <strong>of</strong> empirical evidence,<br />

subject <strong>to</strong> specific principles <strong>of</strong><br />

reasoning. The application <strong>of</strong> <strong>the</strong><br />

scientific method helps us be honest<br />

with ourselves and minimizes <strong>the</strong><br />

chances <strong>of</strong> us arriving at <strong>the</strong> wrong<br />

Figure 3. The scientific method is a good approach <strong>to</strong> extracting<br />

value from data.<br />

appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />

Copyright © 2016 Appuri Inc


Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 5<br />

conclusion. The scientific method plays a critical role in understanding any data,<br />

irrespective <strong>of</strong> <strong>the</strong>ir size or speed or variety.<br />

The scientific method follows <strong>the</strong>se general steps (see Figure 3):<br />

1. Formulate a question or problem statement<br />

2. Generate a hypo<strong>the</strong>sis that is testable<br />

3. Ga<strong>the</strong>r/Generate data <strong>to</strong> understand <strong>the</strong> phenomenon in question. <strong>Data</strong> can<br />

be generated through experimentation; when we can’t conduct true experiments,<br />

data are obtained through observations and measurements.<br />

4. Analyze data <strong>to</strong> test <strong>the</strong> hypo<strong>the</strong>ses / Draw conclusions<br />

5. Communicate results <strong>to</strong> interested parties or take action (e.g., change<br />

processes) based on <strong>the</strong> conclusions. Additionally, <strong>the</strong> outcome <strong>of</strong> <strong>the</strong> scientific<br />

method can help us refine our hypo<strong>the</strong>ses for fur<strong>the</strong>r testing.<br />

When I map <strong>the</strong> three data science skills against <strong>the</strong> five steps <strong>of</strong> <strong>the</strong> scientific method,<br />

it's clear why data science skills are so important in extracting insight from data (see<br />

Figure 4). Pr<strong>of</strong>iciency in each <strong>of</strong> <strong>the</strong> three data science skills is required <strong>to</strong> successfully<br />

implement <strong>the</strong> scientific method as a way <strong>to</strong> get insights from data. Business knowledge<br />

is necessary <strong>to</strong> help formulate <strong>the</strong> right questions, generate hypo<strong>the</strong>ses, ga<strong>the</strong>r data and<br />

communicate results. Technology/Programming skills are needed <strong>to</strong> ga<strong>the</strong>r/generate<br />

data and analyze data/test hypo<strong>the</strong>ses. Finally, Statistics/Math skills are necessary <strong>to</strong><br />

ga<strong>the</strong>r data, analyze data/test hypo<strong>the</strong>ses and communicate results.<br />

Figure 4. You need all three data science skills when you adopt <strong>the</strong> scientific method<br />

appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />

Copyright © 2016 Appuri Inc


Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 6<br />

3. Educate all data science team members on statistics<br />

Different data science job roles require different data science skills for success. For<br />

Business Managers, <strong>the</strong>y need <strong>to</strong> be savvy in statistics, machine learning and big<br />

and distributed data. For Developers, <strong>the</strong>ir skills need <strong>to</strong> include product design<br />

and development, systems administration and back-end programming. For<br />

Creatives, <strong>the</strong>ir skills need <strong>to</strong> include math, business development and graphical<br />

models. For Researchers, <strong>the</strong>y need <strong>to</strong> possess skills in statistics, algorithms and<br />

simulations and product design and development.<br />

Figure 5. Top data science skills across different data scientists include statistics. From: 10 <strong>Data</strong><br />

<strong>Science</strong> Skills You Need <strong>to</strong> <strong>Improve</strong> Project Success<br />

It’s clear that data science skills are necessary for successful analytics projects. While<br />

<strong>to</strong>p skills varied by data science job roles, it is interesting <strong>to</strong> note that skills in statistics<br />

and math dominated <strong>the</strong> <strong>to</strong>p 10 drivers <strong>of</strong> project outcomes (see Figure 5). In fact, <strong>the</strong><br />

most important data science skill <strong>to</strong> project outcome was <strong>Data</strong> Mining and<br />

Visualization Tools; if you are a data scientist, it doesn't matter if you are a Developer<br />

or Researcher or any o<strong>the</strong>r any o<strong>the</strong>r kind, it would benefit you <strong>to</strong> learn <strong>to</strong>ols that help<br />

you mine and visualize data. The more pr<strong>of</strong>icient in <strong>the</strong>se <strong>to</strong>ols you become, <strong>the</strong> better<br />

you will feel about <strong>the</strong> outcome <strong>of</strong> your analytics projects.<br />

appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />

Copyright © 2016 Appuri Inc


Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 7<br />

4. Integrate your data silos<br />

<strong>Data</strong> scientists’ work is fueled by <strong>the</strong> data that are available <strong>to</strong> <strong>the</strong>m. Unfortunately, data<br />

are <strong>of</strong>ten housed in different systems, making it difficult <strong>to</strong> collect and integrate <strong>the</strong>m.<br />

For example, data scientists can use Google Analytics <strong>to</strong> understand cus<strong>to</strong>mers’ search<br />

behavior. They leverage Mixpanel <strong>to</strong> learn how cus<strong>to</strong>mers use <strong>the</strong>ir applications. They<br />

rely on Marke<strong>to</strong> <strong>to</strong> track <strong>the</strong> effectiveness <strong>of</strong> different forms <strong>of</strong> communication. They<br />

rely on Salesforce <strong>to</strong> track cus<strong>to</strong>mer interactions across throughout <strong>the</strong> lifecycle. The use<br />

<strong>of</strong> <strong>the</strong>se separate <strong>to</strong>ols results in data silos, each one housing a particular piece <strong>of</strong> <strong>the</strong><br />

cus<strong>to</strong>mer puzzle. While each data silo contains important pieces <strong>of</strong> information about<br />

your cus<strong>to</strong>mers, if you don't connect those pieces across those different data silos, you're<br />

only seeing parts <strong>of</strong> <strong>the</strong> entire cus<strong>to</strong>mer picture.<br />

Analyzing each data source separately is limited by <strong>the</strong> variables in each data set. To get<br />

<strong>the</strong> complete picture <strong>of</strong> your cus<strong>to</strong>mers, you need <strong>to</strong> connect <strong>the</strong> dots across <strong>the</strong> data<br />

silos. By integrating all your data, you will be able <strong>to</strong> analyze all your data <strong>to</strong> extract<br />

deeper insights in<strong>to</strong> <strong>the</strong> causes <strong>of</strong> cus<strong>to</strong>mer churn.<br />

Siloed data sets prevent business leaders from gaining a complete understanding <strong>of</strong><br />

<strong>the</strong>ir cus<strong>to</strong>mers. In this scenario, analytics can only be conducted within one data silo at<br />

a time, restricting <strong>the</strong> set <strong>of</strong> information (i.e., variables) that can be used <strong>to</strong> describe a<br />

given phenomenon; your analytic models are likely underspecified (not using <strong>the</strong><br />

complete set <strong>of</strong> useful predic<strong>to</strong>rs), decreasing your model's predictive power / increasing<br />

your model's error. The bot<strong>to</strong>m line is that you are not able <strong>to</strong> make <strong>the</strong> best prediction<br />

about your cus<strong>to</strong>mers because you don't have all <strong>the</strong> necessary information about <strong>the</strong>m.<br />

The integration <strong>of</strong> <strong>the</strong>se disparate cus<strong>to</strong>mer data silos helps your data science team<br />

identify <strong>the</strong> interrelationships among <strong>the</strong> different pieces <strong>of</strong> cus<strong>to</strong>mer information,<br />

including <strong>the</strong>ir purchasing behavior, values, interests, attitudes about your brand,<br />

interactions with your brand and more. Integrating information/facts about your<br />

cus<strong>to</strong>mers allows you <strong>to</strong> gain an understanding about how all <strong>the</strong> variables work<br />

<strong>to</strong>ge<strong>the</strong>r (i.e., are related <strong>to</strong> each o<strong>the</strong>r), driving deeper cus<strong>to</strong>mer insight about why<br />

cus<strong>to</strong>mers churn, recommend you and buy more from you.<br />

appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />

Copyright © 2016 Appuri Inc


Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 8<br />

5. Adopt Machine Learning Capabilities<br />

While it is true that <strong>the</strong> value <strong>of</strong> all <strong>of</strong> your data is greater than <strong>the</strong> value <strong>of</strong> each data silo<br />

taken alone, extracting that value from <strong>the</strong> combined data in your data platform can be<br />

overwhelming. As <strong>the</strong> number <strong>of</strong> variables grows, <strong>the</strong> more statistical tests you are able<br />

<strong>to</strong> run. Consequently, data scientists are utilizing <strong>the</strong> power <strong>of</strong> machine learning <strong>to</strong> help<br />

<strong>the</strong>m get insights from extremely large data sets.<br />

Iterative in nature, machine learning algorithms continually learn from data. The more<br />

data <strong>the</strong>y ingest, <strong>the</strong> better <strong>the</strong>y get. Based on math, statistics and probability,<br />

algorithms find connections among variables that help optimize important<br />

organizational outcomes. For our cus<strong>to</strong>mers, <strong>the</strong>ir outcome <strong>of</strong> interest is cus<strong>to</strong>mer<br />

churn. We apply machine learning on <strong>the</strong>ir data <strong>to</strong> understand which cus<strong>to</strong>mers are<br />

likely <strong>to</strong> churn and why. By identifying <strong>the</strong>ir at-risk cus<strong>to</strong>mers, our clients are able <strong>to</strong><br />

decrease cus<strong>to</strong>mer churn through proactive retention management efforts.<br />

Summary<br />

Based on a study <strong>of</strong> hundreds <strong>of</strong> data scientists, we learned a lot about how <strong>to</strong> improve<br />

<strong>the</strong> effectiveness <strong>of</strong> data scientists. The practice <strong>of</strong> data science requires pr<strong>of</strong>iciency in a<br />

handful <strong>of</strong> specific data skills, including business acumen, technology / programming<br />

and statistics / math. Different data pr<strong>of</strong>essionals report vastly different pr<strong>of</strong>iciency<br />

levels across <strong>the</strong>se skills. Because data pr<strong>of</strong>essionals tend <strong>to</strong> specialize in only one or two<br />

skill areas, organizations have a better chance <strong>of</strong> extracting value from <strong>the</strong>ir data when<br />

<strong>the</strong>y adopt a team approach consisting <strong>of</strong> data scientists who have complementary skill<br />

sets.<br />

Following <strong>the</strong> scientific<br />

method in our data projects<br />

helps us keep our biases in<br />

check and minimizes <strong>the</strong><br />

chances <strong>of</strong> us arriving at <strong>the</strong><br />

wrong conclusion. Through<br />

trial and error, <strong>the</strong> scientific<br />

method helps us uncover <strong>the</strong><br />

reasons why variables are<br />

related <strong>to</strong> each o<strong>the</strong>r and <strong>the</strong><br />

underlying processes that<br />

drive <strong>the</strong> observed<br />

relationships.<br />

appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />

Copyright © 2016 Appuri Inc


Improving <strong>the</strong> <strong>Value</strong> <strong>of</strong> <strong>Data</strong> <strong>Science</strong> | 9<br />

Additionally, it is important that your data science team understands statistics and data<br />

mining. Not only do Researchers get value from understanding statistics, but o<strong>the</strong>r<br />

kinds <strong>of</strong> data scientists (i.e., business management, creative, developer) also benefit<br />

greatly by knowing statistics.<br />

Finally, we recommend that businesses integrate <strong>the</strong>ir data silos and apply <strong>the</strong> power <strong>of</strong><br />

machine learning <strong>to</strong> help <strong>the</strong>m connect <strong>the</strong> dots across <strong>the</strong>ir disparate data sources. The<br />

more you know about your cus<strong>to</strong>mers, <strong>the</strong> better you are able <strong>to</strong> meet <strong>the</strong>ir needs and<br />

ensure <strong>the</strong>y are receiving value from your solutions.<br />

About Appuri<br />

------<br />

Appuri provides an enterprise-grade data platform for businesses <strong>to</strong> transform <strong>the</strong>ir<br />

cus<strong>to</strong>mer data in<strong>to</strong> deep cus<strong>to</strong>mer experience insights. The Appuri platform helps<br />

businesses integrate <strong>the</strong>ir data silos and leverage <strong>the</strong> power <strong>of</strong> machine learning <strong>to</strong><br />

analyze billions <strong>of</strong> events <strong>to</strong> better predict <strong>the</strong> causes <strong>of</strong> cus<strong>to</strong>mer loyalty. These insights<br />

help business deliver a better cus<strong>to</strong>mer experience <strong>to</strong> decrease cus<strong>to</strong>mer churn, acquire<br />

new cus<strong>to</strong>mers and deepen <strong>the</strong>ir relationship with existing cus<strong>to</strong>mers.<br />

Appuri, Inc.<br />

119 Pine St., Ste. 300<br />

Seattle, WA 98101<br />

info@appuri.com<br />

sales@appuri.com<br />

appuri.com | 206.886.0893 | info@appuri.com | @AppuriCorp<br />

Copyright © 2016 Appuri Inc

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!