03.04.2017 Views

The Data Lake Survival Guide

2o2JwuQ

2o2JwuQ

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>The</strong> <strong>Data</strong> <strong>Lake</strong> <strong>Survival</strong> <strong>Guide</strong><br />

Analytics Development<br />

Analytics Implementation<br />

Passive<br />

Decision<br />

Support<br />

<strong>Data</strong><br />

Set<br />

User<br />

Analytic<br />

Exploration<br />

New<br />

Knowledge<br />

Interactive<br />

Decision<br />

Support<br />

<strong>Data</strong><br />

Scientist<br />

<strong>Data</strong><br />

Set<br />

<strong>Data</strong><br />

Set<br />

User<br />

Automation<br />

<strong>Data</strong><br />

Set<br />

Figure 4. Analytics and BI, Development and Implementation<br />

Alternatively, the knowledge may be automatically included in an operational system<br />

improving it in some way. <strong>The</strong> illustration does not try to elaborate on how an analytic<br />

discovery is made operational, since this varies according to context.<br />

<strong>The</strong> <strong>Data</strong> <strong>Lake</strong> Dynamic<br />

<strong>The</strong> fundamental assumption of the data warehouse architecture was that there<br />

needed to be a very powerful query engine (database) at the center of the data flow.<br />

It thus suggested a centralized architecture where, first of all data flowed to the data<br />

warehouse. It was then used in place or it was distributed from there for use elsewhere.<br />

<strong>The</strong> fatal flaw of this architecture was that it did not scale out well. However this<br />

limitations did not become apparent until a whole series of forces came into play. <strong>The</strong>y<br />

were:<br />

• <strong>The</strong> need to analyze unstructured data, both external and internal. <strong>The</strong> need for<br />

this continues to grow.<br />

• External data sources began to multiply. Particularly prominent in this was social<br />

media data, but it was by no means the only source. Until recently, selling or<br />

renting data was a niche activity but this has ceased to be the case. An expanding<br />

amount of valuable data is now bought and sold publicly.<br />

• Traditionally analytics applications lived in “walled gardens” served by their<br />

own data mart. However a data lake could do service as an analytics sandbox<br />

6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!