03.04.2017 Views

The Data Lake Survival Guide

2o2JwuQ

2o2JwuQ

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>The</strong> <strong>Data</strong> <strong>Lake</strong> <strong>Survival</strong> <strong>Guide</strong><br />

If we now look at<br />

Figure 10, which<br />

illustrates the data<br />

lake complete, the<br />

only two processes,<br />

we have not yet<br />

discussed are data<br />

extracts and data lifecycle<br />

management.<br />

While data lake<br />

processing can be<br />

fast, if particularly<br />

high performance<br />

is required for some<br />

applications, there<br />

will be a need to<br />

export data to a<br />

fast data engine or<br />

database. It will<br />

probably be many<br />

years before data<br />

lake data access<br />

speed gets close<br />

to a purpose built<br />

database.<br />

<strong>The</strong> truth is that<br />

the focus of data<br />

lake architecture is<br />

data ingest and data<br />

governance. <strong>The</strong>re<br />

are many processes,<br />

all of them important,<br />

competing for the<br />

same resources.<br />

Servers, Desktops, Mobile, Network Devices, Embedded<br />

Chips, RFID, IoT, <strong>The</strong> Cloud, Oses, VMs, Log Files, Sys<br />

Mgt Apps, ESBs, Web Services, SaaS, Business Apps,<br />

Office Apps, BI Apps, Workflow, <strong>Data</strong> Streams, Social...<br />

<strong>Data</strong><br />

Governance<br />

<strong>Data</strong> <strong>Lake</strong><br />

Mgt<br />

Ingest<br />

Transform &<br />

Aggregate<br />

Archive<br />

<strong>Data</strong><br />

Security<br />

Life Cycle<br />

Mgt<br />

DATA LAKE<br />

Real-Time<br />

Apps<br />

Metadata<br />

Mgt<br />

<strong>Data</strong><br />

Cleansing<br />

Extracts<br />

Search &<br />

Query<br />

BI, Visual'n<br />

& Analytics<br />

Other<br />

Apps<br />

To<br />

<strong>Data</strong>bases<br />

<strong>Data</strong> Marts<br />

Other Apps<br />

Figure 10. <strong>The</strong> <strong>Data</strong> <strong>Lake</strong> Complete<br />

<strong>The</strong>re will always be a limit to the capacity of the data lake, and governance processes<br />

naturally take priority, so it will prove necessary to replicate data to other data lakes or<br />

data marts to properly serve some applications or users.<br />

As regards data archive, data life-cycle management can be regarded as an aspect of<br />

data governance. It can best be thought of as a background process. <strong>The</strong> exact rules<br />

of if and when data needs to be deleted may be influenced by business imperatives<br />

(regulation), but may also be determined by storage costs. Ideally, archive will be an<br />

automatic process.<br />

32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!