17.08.2017 Views

Ainda precisamos de um Data Warehouse

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Gartner Business Intelligence & Information Management S<strong>um</strong>mit<br />

23-24 <strong>de</strong> junho 2015 | São Paulo, Brasil<br />

Do We Still Need a <strong>Data</strong> <strong>Warehouse</strong>?<br />

Donald Feinberg<br />

This presentation, including any supporting materials, is owned by Gartner, Inc. and/or its affiliates and is for the sole use of the inten<strong>de</strong>d Gartner audience or other inten<strong>de</strong>d recipients. This presentation may<br />

contain information that is confi<strong>de</strong>ntial, proprietary or otherwise legally protected, and it may not be further copied, distributed or publicly displayed without the express written permission of Gartner, Inc. or its affiliates.<br />

© 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Do You Actually Know What a <strong>Data</strong> <strong>Warehouse</strong> Is?<br />

• Consolidated, integrated,<br />

subject-oriented, time-variant<br />

data management solution.<br />

• It never was …<br />

– A database.<br />

– A DBMS.<br />

– An appliance.<br />

– Built in.<br />

– … done.<br />

Things are not always what you first see, first experience, first build,<br />

first encounter.<br />

1 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Key Issues<br />

1. What aspects of a data warehouse methodology are still required?<br />

2. How does the concept of a logical data warehouse balance the<br />

needs of agility and governance?<br />

3. What are your architectural options?<br />

2 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Key Issues<br />

1. What aspects of a data warehouse methodology are still required?<br />

2. How does the concept of a logical data warehouse balance the<br />

needs of agility and governance?<br />

3. What are your architectural options?<br />

3 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


What Is a <strong>Data</strong> <strong>Warehouse</strong>?<br />

• Architectural Construct:<br />

– Integrated<br />

– Time variant<br />

– Subject orientation<br />

– Analytic orientation<br />

– Service level driven<br />

– Mission-critical<br />

• Physical Implementation:<br />

– Centrally stored data<br />

– Mo<strong>de</strong>led<br />

– Mixed workload<br />

– Optimized<br />

– Servers<br />

– Storage<br />

4 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


The <strong>Data</strong> <strong>Warehouse</strong> Was a Bridge to "Near" Information<br />

Shores … Now We Are in Orbit!<br />

Evolution<br />

Craftsmanship<br />

5 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.<br />

Adaptation<br />

The data warehouse crossed relatively simple information usage barriers using<br />

binary dichotomy. Go "off world." Embrace more than local topography.


You Still Need All of the Analytics <strong>Data</strong> Management SLAs<br />

But these SLAs are mostly incompatible!<br />

• Compromise:<br />

– The ".AND."<br />

– Pervasive.<br />

– Persistent.<br />

– Latent.<br />

– Source "write."<br />

– Target "write."<br />

– Static optimization<br />

for all 5.<br />

• Conten<strong>de</strong>r:<br />

– The ".OR."<br />

– Diverse.<br />

– Transient.<br />

– Native.<br />

– Source "write."<br />

– Target "read."<br />

– Optimization<br />

focuses on the<br />

dominant.<br />

• Candidate:<br />

– ?<br />

– Unique.<br />

– Ambivalent.<br />

– Contrary.<br />

– Source "read."<br />

– Target "read."<br />

– Optimized by<br />

processing.<br />

Since you must continue to meet the <strong>de</strong>mand of the data warehouse —<br />

you must continue to support the data warehouse.<br />

6 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


You Still Need Best Practices For …<br />

• Source system i<strong>de</strong>ntification.<br />

• Determining to instantiate<br />

data.<br />

• Optimization by<br />

end-user class.<br />

• Optimization for<br />

data processes.<br />

• Requirements for static<br />

reporting and "canned"<br />

analytics.<br />

• Information life cycle.<br />

• Security management.<br />

• <strong>Data</strong> governance & quality.<br />

• <strong>Data</strong> preparation.<br />

Moving data to a central location or giving a central point of access to many data locations<br />

is the new warehouse. Moving data to a central location with less transformation actually<br />

puts that job on the user.<br />

7 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Funding the Need Somewhere Else Doesn't Mean<br />

It's Not a <strong>Warehouse</strong><br />

• Appliances are platforms.<br />

• Cloud? IaaS! PaaS! DBaaS!<br />

• Mo<strong>de</strong>ls are customized.<br />

• Source system access differs.<br />

• Software and servers<br />

are "pieces."<br />

• Distributing cost among many in<br />

"self-service" doesn't make it<br />

disappear.<br />

You need a different warehouse that inclu<strong>de</strong>s the old warehouse —<br />

but, you still cannot just "buy" an LDW.<br />

8 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Key Issues<br />

1. What aspects of a data warehouse methodology are still required?<br />

2. How does the concept of a logical data warehouse balance the<br />

needs of agility and governance?<br />

3. What are your architectural options?<br />

9 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


% of vol<strong>um</strong>e contribution<br />

"New" Big data and "Rough" <strong>Data</strong> Vol<strong>um</strong>e Exceeds but<br />

Does Not Eliminate "Curated" <strong>Data</strong>!<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

Curated data (traditional)<br />

Rough data (virtual)<br />

New (big) data<br />

0<br />

About 20% of the data warehouse market can be consi<strong>de</strong>red leading<br />

implementations. By 2018, logical data warehouses in half of them will have<br />

combinations of all three of these data management approaches.<br />

10 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


% of vol<strong>um</strong>e contribution<br />

The Notion of "Fit for Purpose" Has Evolved and Is<br />

Supported in the LDW<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

2005 2010 2015 2020<br />

Curated data (traditional)<br />

Rough data (virtual)<br />

New (big) data<br />

The law of<br />

diminishing<br />

returns<br />

Curated data for analytics was differentiating in 1995. In 2015, differentiation is<br />

in ren<strong>de</strong>ring data in various states to various user types. Think about the value<br />

of each data type to each user—not vol<strong>um</strong>e!<br />

11 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Use-Case Access<br />

Semantics<br />

Traditional <strong>Warehouse</strong> Practices Are Augmented<br />

by the EDW Evolution!<br />

Can the Architecture<br />

Perform as Required?<br />

Do the Descriptions and<br />

Categories Match?<br />

Taxonomy/Ontology Resolution<br />

SLA Requirements<br />

Auditing and Management<br />

Statistics<br />

/=/~<br />

DQ, MDM, Gov.<br />

Metadata<br />

Locate <strong>Data</strong><br />

Audit <strong>Data</strong><br />

Repositories<br />

And/Or And/Or<br />

Fe<strong>de</strong>ration/<br />

Virtualization<br />

Distributed<br />

Process<br />

Optimize <strong>Data</strong><br />

Persist, Pervasive,<br />

Latency, Optimized<br />

Transient, Diverse<br />

Usage, Native<br />

Comprehensive,<br />

Un<strong>de</strong>fined<br />

12 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.<br />

Schema<br />

"Write to Write"<br />

Schema<br />

"Write to Read"<br />

Schema<br />

"Read to Read"


Key Issues<br />

1. What aspects of a data warehouse methodology are still required?<br />

2. How does the concept of a logical data warehouse balance the<br />

needs of agility and governance?<br />

3. What are your architectural options?<br />

13 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Use-Case Access<br />

Semantics<br />

Semantics Form a Bridge to <strong>Data</strong> Science &<br />

New Practices<br />

Veracity<br />

COMPROMISE<br />

SLA Requirements<br />

Repositories<br />

80% of Analytics<br />

Infrastructure Is the<br />

Same as It Ever Was.<br />

Taxonomy/Ontology Resolution<br />

And/Or<br />

/=/~<br />

Fe<strong>de</strong>ration/<br />

Virtualization<br />

Variability<br />

Auditing and Management<br />

Statistics<br />

And/Or<br />

CONTENDER<br />

10% Need Flexibility and<br />

Short Duration.<br />

Distributed<br />

Process<br />

DQ, MDM, Gov.<br />

?<br />

Metadata<br />

High Value<br />

CANDIDATE<br />

5% Converting<br />

Structure on Read<br />

and Integrated.<br />

SCIENCE SANDBOX<br />

5% Constant<br />

Experimentation and<br />

Exploration:<br />

• 80/10/5 Rule Again!<br />

A Semantic Tier Is How You Move From a Binary Design That Forces Unnatural Dichotomy and Creates<br />

a More Flexible and Responsive Architecture.<br />

14 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Semantic Interface<br />

General Ratio in<br />

Population<br />

Single-Process<br />

SME<br />

Multiprocess<br />

SME<br />

Information/<strong>Data</strong><br />

Architects<br />

Statistical Science<br />

Frequent Simple<br />

Support<br />

Frequent<br />

Advanced Support<br />

Infrequent<br />

Advanced Support<br />

Even the Most Advanced Users Use That Curated<br />

Repository — The <strong>Warehouse</strong><br />

Repository<br />

Example<br />

User Class<br />

Typical Characteristics<br />

Virtual<br />

Apprentice<br />

Reports, dashboards. Level 1 (L1)<br />

support, maybe L2.<br />

1,000<br />

Journeyman<br />

Create new reports. Needs<br />

technical assistance. L2 and L3.<br />

90<br />

Distributed<br />

Process<br />

Master<br />

Guild Master<br />

Reliably utilizes knowledge of<br />

systems and data to explore data<br />

assets.<br />

Mo<strong>de</strong>ling theory, graph theory,<br />

mathematics, program languages. 1<br />

5<br />

In general, there is a disconnect. We tend to build toward either the "ease of use"<br />

user or the "advanced analysis" user — and we build twice or more times than<br />

that. The logical data warehouse says "build once, access many."<br />

15 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Integration<br />

Semantic Interface<br />

Architecture Issue No. 1: Analysts Need a Unified <strong>Data</strong><br />

Delivery & Access Semantic<br />

Apprentice<br />

Reports, Dashboards. L1<br />

Support, Maybe L2.<br />

1,000<br />

Compromise<br />

Journeyman<br />

Create New Reports. Needs<br />

Technical Assist. L2 & L3.<br />

90<br />

Embed<strong>de</strong>d/<br />

Smart<br />

(Stores)<br />

Conten<strong>de</strong>r<br />

Master<br />

Reliably Use Knowledge of<br />

Systems and <strong>Data</strong> to<br />

Explore Assets.<br />

5<br />

Analytics<br />

(Access)<br />

Guild Master<br />

Mo<strong>de</strong>ling Theory, Graph<br />

Theory, Mathematics,<br />

Program Languages.<br />

1<br />

Candidate<br />

(Processes)<br />

Logical <strong>Data</strong> <strong>Warehouse</strong><br />

16 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Integration<br />

Semantic Interface<br />

Architecture Issue No. 2: Need to Convert Discovery<br />

Into Production — The Organization!<br />

LDW<br />

BICC<br />

Casual<br />

Analyst<br />

Miner<br />

Science<br />

Reports,<br />

Dashboards. L1<br />

Support, Maybe L2.<br />

Create New Reports.<br />

Needs tech. Assist,<br />

L2 & L3.<br />

Ops. Process,<br />

Systems Analyst,<br />

<strong>Data</strong> Architect,<br />

Reliable Tech.<br />

Mo<strong>de</strong>ling Theory,<br />

Graph Theory,<br />

Mathematics,<br />

Program Languages.<br />

1,000<br />

90<br />

5<br />

1<br />

A "Bridge"<br />

Embed<strong>de</strong>d/<br />

Smart<br />

Analytics<br />

<strong>Data</strong> Science Laboratory<br />

Compromise<br />

(Stores)<br />

Conten<strong>de</strong>r<br />

(Access)<br />

Candidate<br />

(Processes)<br />

Information Architect, Systems Architect, Application Architect<br />

17 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.<br />

<strong>Data</strong> Services<br />

Administrators<br />

<strong>Data</strong> Mo<strong>de</strong>ling<br />

Information<br />

Administrators


Option No. 1: BI Platform as Semantic Tier<br />

• Design Principles:<br />

– <strong>Data</strong> scientist guild master<br />

specifies processes and<br />

i<strong>de</strong>ntifies objects<br />

– Business analyst journeyman<br />

utilizes and reuses objects<br />

– Architects work with BA<br />

Journeyman to "productionize"<br />

science<br />

– <strong>Data</strong> analyst master is on<br />

special projects<br />

• Benefits:<br />

– Familiar to users<br />

– Encourages discipline on<br />

business and data analysts<br />

– Free's up commitments from<br />

the data scientist<br />

• Risks:<br />

– Limits sharing data to BI<br />

export process<br />

– Challenging to add new BI<br />

and data discovery tools —<br />

must reinvent the mo<strong>de</strong>l<br />

"wheel" each time over<br />

all 3 SLAs<br />

19 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Option No. 2: DMSA (DBMS) as Semantic Tier<br />

• Design Principles:<br />

– Business analyst journeyman<br />

assist data analyst master<br />

– DA master works with<br />

Architects and DBA team for<br />

optimization<br />

– <strong>Data</strong> scientist guild master<br />

presents reinforcing and<br />

dissenting analytics<br />

– Architect team works with DA<br />

and BA to prioritize and <strong>de</strong>ploy<br />

• Benefits:<br />

– Familiar to DBAs and data<br />

architects<br />

– Leverages discipline of<br />

analyst/DBA interaction<br />

– Encourages workflow to<br />

inclu<strong>de</strong> data scientist with IT<br />

and data analysts<br />

• Risks:<br />

– Most DBMS external calls<br />

cannot extend optimization<br />

– Channels data access through<br />

a single "choke" point — but<br />

can be managed<br />

20 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Option No. 3: Virtualization as Semantic Tier<br />

• Design Principles:<br />

– <strong>Data</strong> scientist and data analyst<br />

masters i<strong>de</strong>ntify analytic data<br />

patterns and propose.<br />

– DA and business analyst<br />

Journeyman <strong>de</strong>termine top<br />

conten<strong>de</strong>rs.<br />

– Architecture team works with<br />

DA on metrics to <strong>de</strong>termine<br />

promotion to compromise and<br />

conten<strong>de</strong>r.<br />

• Benefits:<br />

– Encourages additional<br />

technology for SLAs<br />

– Creates semantic tier<br />

accessible by all technologies<br />

– Permits rapid prototyping and<br />

re-mo<strong>de</strong>ling of data AND<br />

quick-change between SLAs<br />

• Risks:<br />

– Heavily reliant upon internal<br />

optimization and its limits<br />

– Ass<strong>um</strong>es knowledge of data<br />

access and processing issues<br />

21 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Recommendations<br />

Keep your existing data warehouse as one engine beneath a<br />

unified access tier.<br />

Determine your best strategy for <strong>de</strong>ploying a unified access tier<br />

based on risks and benefits.<br />

Use your ratio of analyst user types to <strong>de</strong>termine a center of<br />

gravity for using high value.<br />

Gauge your rate of shift between "curated," "rough" and "new"<br />

information asset types to <strong>de</strong>velop priorities and timelines.<br />

22 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Recommen<strong>de</strong>d Gartner Research<br />

<br />

<br />

<br />

Focus on the 'Three Vs' of Big <strong>Data</strong> Analytics: Variability, Veracity<br />

and Value<br />

Alan D. Duncan (G00270472)<br />

Making Big <strong>Data</strong> Normal Begins With Self-Classifying and<br />

Self-Disciplined Users<br />

Mark A. Beyer and Others (G00271691)<br />

Avoid a Big <strong>Data</strong> <strong>Warehouse</strong> Mistake by Evolving to the Logical<br />

<strong>Data</strong> <strong>Warehouse</strong> Now<br />

Mark A. Beyer (G00252003)<br />

Business Drivers of Technology Decisions for Healthcare Provi<strong>de</strong>rs, 2015<br />

Vi Shaffer and Others (G00272361)<br />

For more information, stop by Gartner Research Zone.<br />

23 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!