The Data Lake Survival Guide
2o2JwuQ
2o2JwuQ
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>The</strong> <strong>Data</strong> <strong>Lake</strong> <strong>Survival</strong> <strong>Guide</strong><br />
Analytics Development<br />
Analytics Implementation<br />
Passive<br />
Decision<br />
Support<br />
<strong>Data</strong><br />
Set<br />
User<br />
Analytic<br />
Exploration<br />
New<br />
Knowledge<br />
Interactive<br />
Decision<br />
Support<br />
<strong>Data</strong><br />
Scientist<br />
<strong>Data</strong><br />
Set<br />
<strong>Data</strong><br />
Set<br />
User<br />
Automation<br />
<strong>Data</strong><br />
Set<br />
Figure 4. Analytics and BI, Development and Implementation<br />
Alternatively, the knowledge may be automatically included in an operational system<br />
improving it in some way. <strong>The</strong> illustration does not try to elaborate on how an analytic<br />
discovery is made operational, since this varies according to context.<br />
<strong>The</strong> <strong>Data</strong> <strong>Lake</strong> Dynamic<br />
<strong>The</strong> fundamental assumption of the data warehouse architecture was that there<br />
needed to be a very powerful query engine (database) at the center of the data flow.<br />
It thus suggested a centralized architecture where, first of all data flowed to the data<br />
warehouse. It was then used in place or it was distributed from there for use elsewhere.<br />
<strong>The</strong> fatal flaw of this architecture was that it did not scale out well. However this<br />
limitations did not become apparent until a whole series of forces came into play. <strong>The</strong>y<br />
were:<br />
• <strong>The</strong> need to analyze unstructured data, both external and internal. <strong>The</strong> need for<br />
this continues to grow.<br />
• External data sources began to multiply. Particularly prominent in this was social<br />
media data, but it was by no means the only source. Until recently, selling or<br />
renting data was a niche activity but this has ceased to be the case. An expanding<br />
amount of valuable data is now bought and sold publicly.<br />
• Traditionally analytics applications lived in “walled gardens” served by their<br />
own data mart. However a data lake could do service as an analytics sandbox<br />
6