03.04.2017 Views

The Data Lake Survival Guide

2o2JwuQ

2o2JwuQ

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>The</strong> <strong>Data</strong> <strong>Lake</strong> <strong>Survival</strong> <strong>Guide</strong><br />

provided by orders, contact records and so on. If you look at a set of customer data<br />

it may yield information that is not part of any single record. As soon as you have<br />

multiple records you can calculate categories (gender, age, location, etc.). From a<br />

“business intelligence” perspective you are creating information with such activities.<br />

When you link customer data to all the orders placed by a customer, you can generate<br />

useful information about that customer’s buying patterns.<br />

We define information to be: collections of data linked together for human consumption.<br />

In general BI products are software products that present information, possibly as<br />

reports or visually on dashboards. Some BI products are interactive, enabling the user<br />

to slice and dice the information. BI tools such as spreadsheets or OLAP tools can be<br />

thought of as user workbenches for the further analysis of information. <strong>The</strong> databases<br />

and data warehouses that feed such tools store information in a semi-refined form.<br />

Knowledge<br />

We define knowledge to be information that has been refined to the point where it is<br />

actionable.<br />

Consider the BI tools that simply present information for decision support. <strong>The</strong> user<br />

knows their own context and consumes the information to take an action, such as<br />

resolving an insurance claim or approving a loan. <strong>The</strong> user has the knowledge of what<br />

to do and the BI tools assists by providing information.<br />

In the case of the BI tools that enable data exploration, the user has some idea of what<br />

he or she needs to know, explores the information to create that knowledge and then<br />

applies it to their business context. As such the knowledge of how to explore the data<br />

lives in the user. Such a user can accurately be described as a knowledge worker.<br />

Knowledge can also be stored in computer systems. This is where we encounter rules<br />

based systems and all the technology that is normally classified as AI. But knowledge<br />

manifests in computer systems in many other ways. <strong>The</strong> people who run a business<br />

create business processes to carry out particular activities. <strong>The</strong>se are normally<br />

improved over time on the basis of acquired experience (feedback). <strong>The</strong>y may even be<br />

fully automated - converted into software and implemented without the need for any<br />

human intervention. This is implemented knowledge.<br />

Indeed, all software, no matter what it does, can be classified as implemented<br />

knowledge. Nevertheless within any business there will also be other knowledge:<br />

rules, procedures, guidelines and policies that are not automated and are implemented<br />

by staff.<br />

<strong>Data</strong> analytics, or <strong>Data</strong> Science as it has now been named, is the activity of trying to<br />

discover new knowledge from data by applying mathematical techniques to reveal<br />

previously unknown patterns. It is science of a kind, in the sense that the data scientist<br />

may formulate and then test hypotheses, although there are some brute force techniques<br />

that can discover patterns without the need to hypothesize. It is not the only way to<br />

discover new knowledge, but it can be a very powerful and rewarding route.<br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!