The Data Lake Survival Guide
2o2JwuQ
2o2JwuQ
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>The</strong> <strong>Data</strong> <strong>Lake</strong> <strong>Survival</strong> <strong>Guide</strong><br />
provided by orders, contact records and so on. If you look at a set of customer data<br />
it may yield information that is not part of any single record. As soon as you have<br />
multiple records you can calculate categories (gender, age, location, etc.). From a<br />
“business intelligence” perspective you are creating information with such activities.<br />
When you link customer data to all the orders placed by a customer, you can generate<br />
useful information about that customer’s buying patterns.<br />
We define information to be: collections of data linked together for human consumption.<br />
In general BI products are software products that present information, possibly as<br />
reports or visually on dashboards. Some BI products are interactive, enabling the user<br />
to slice and dice the information. BI tools such as spreadsheets or OLAP tools can be<br />
thought of as user workbenches for the further analysis of information. <strong>The</strong> databases<br />
and data warehouses that feed such tools store information in a semi-refined form.<br />
Knowledge<br />
We define knowledge to be information that has been refined to the point where it is<br />
actionable.<br />
Consider the BI tools that simply present information for decision support. <strong>The</strong> user<br />
knows their own context and consumes the information to take an action, such as<br />
resolving an insurance claim or approving a loan. <strong>The</strong> user has the knowledge of what<br />
to do and the BI tools assists by providing information.<br />
In the case of the BI tools that enable data exploration, the user has some idea of what<br />
he or she needs to know, explores the information to create that knowledge and then<br />
applies it to their business context. As such the knowledge of how to explore the data<br />
lives in the user. Such a user can accurately be described as a knowledge worker.<br />
Knowledge can also be stored in computer systems. This is where we encounter rules<br />
based systems and all the technology that is normally classified as AI. But knowledge<br />
manifests in computer systems in many other ways. <strong>The</strong> people who run a business<br />
create business processes to carry out particular activities. <strong>The</strong>se are normally<br />
improved over time on the basis of acquired experience (feedback). <strong>The</strong>y may even be<br />
fully automated - converted into software and implemented without the need for any<br />
human intervention. This is implemented knowledge.<br />
Indeed, all software, no matter what it does, can be classified as implemented<br />
knowledge. Nevertheless within any business there will also be other knowledge:<br />
rules, procedures, guidelines and policies that are not automated and are implemented<br />
by staff.<br />
<strong>Data</strong> analytics, or <strong>Data</strong> Science as it has now been named, is the activity of trying to<br />
discover new knowledge from data by applying mathematical techniques to reveal<br />
previously unknown patterns. It is science of a kind, in the sense that the data scientist<br />
may formulate and then test hypotheses, although there are some brute force techniques<br />
that can discover patterns without the need to hypothesize. It is not the only way to<br />
discover new knowledge, but it can be a very powerful and rewarding route.<br />
4