17.08.2017 Views

Ainda precisamos de um Data Warehouse

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Gartner Business Intelligence & Information Management S<strong>um</strong>mit<br />

23-24 <strong>de</strong> junho 2015 | São Paulo, Brasil<br />

Do We Still Need a <strong>Data</strong> <strong>Warehouse</strong>?<br />

Donald Feinberg<br />

This presentation, including any supporting materials, is owned by Gartner, Inc. and/or its affiliates and is for the sole use of the inten<strong>de</strong>d Gartner audience or other inten<strong>de</strong>d recipients. This presentation may<br />

contain information that is confi<strong>de</strong>ntial, proprietary or otherwise legally protected, and it may not be further copied, distributed or publicly displayed without the express written permission of Gartner, Inc. or its affiliates.<br />

© 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Do You Actually Know What a <strong>Data</strong> <strong>Warehouse</strong> Is?<br />

• Consolidated, integrated,<br />

subject-oriented, time-variant<br />

data management solution.<br />

• It never was …<br />

– A database.<br />

– A DBMS.<br />

– An appliance.<br />

– Built in.<br />

– … done.<br />

Things are not always what you first see, first experience, first build,<br />

first encounter.<br />

1 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Key Issues<br />

1. What aspects of a data warehouse methodology are still required?<br />

2. How does the concept of a logical data warehouse balance the<br />

needs of agility and governance?<br />

3. What are your architectural options?<br />

2 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Key Issues<br />

1. What aspects of a data warehouse methodology are still required?<br />

2. How does the concept of a logical data warehouse balance the<br />

needs of agility and governance?<br />

3. What are your architectural options?<br />

3 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


What Is a <strong>Data</strong> <strong>Warehouse</strong>?<br />

• Architectural Construct:<br />

– Integrated<br />

– Time variant<br />

– Subject orientation<br />

– Analytic orientation<br />

– Service level driven<br />

– Mission-critical<br />

• Physical Implementation:<br />

– Centrally stored data<br />

– Mo<strong>de</strong>led<br />

– Mixed workload<br />

– Optimized<br />

– Servers<br />

– Storage<br />

4 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


The <strong>Data</strong> <strong>Warehouse</strong> Was a Bridge to "Near" Information<br />

Shores … Now We Are in Orbit!<br />

Evolution<br />

Craftsmanship<br />

5 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.<br />

Adaptation<br />

The data warehouse crossed relatively simple information usage barriers using<br />

binary dichotomy. Go "off world." Embrace more than local topography.


You Still Need All of the Analytics <strong>Data</strong> Management SLAs<br />

But these SLAs are mostly incompatible!<br />

• Compromise:<br />

– The ".AND."<br />

– Pervasive.<br />

– Persistent.<br />

– Latent.<br />

– Source "write."<br />

– Target "write."<br />

– Static optimization<br />

for all 5.<br />

• Conten<strong>de</strong>r:<br />

– The ".OR."<br />

– Diverse.<br />

– Transient.<br />

– Native.<br />

– Source "write."<br />

– Target "read."<br />

– Optimization<br />

focuses on the<br />

dominant.<br />

• Candidate:<br />

– ?<br />

– Unique.<br />

– Ambivalent.<br />

– Contrary.<br />

– Source "read."<br />

– Target "read."<br />

– Optimized by<br />

processing.<br />

Since you must continue to meet the <strong>de</strong>mand of the data warehouse —<br />

you must continue to support the data warehouse.<br />

6 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


You Still Need Best Practices For …<br />

• Source system i<strong>de</strong>ntification.<br />

• Determining to instantiate<br />

data.<br />

• Optimization by<br />

end-user class.<br />

• Optimization for<br />

data processes.<br />

• Requirements for static<br />

reporting and "canned"<br />

analytics.<br />

• Information life cycle.<br />

• Security management.<br />

• <strong>Data</strong> governance & quality.<br />

• <strong>Data</strong> preparation.<br />

Moving data to a central location or giving a central point of access to many data locations<br />

is the new warehouse. Moving data to a central location with less transformation actually<br />

puts that job on the user.<br />

7 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Funding the Need Somewhere Else Doesn't Mean<br />

It's Not a <strong>Warehouse</strong><br />

• Appliances are platforms.<br />

• Cloud? IaaS! PaaS! DBaaS!<br />

• Mo<strong>de</strong>ls are customized.<br />

• Source system access differs.<br />

• Software and servers<br />

are "pieces."<br />

• Distributing cost among many in<br />

"self-service" doesn't make it<br />

disappear.<br />

You need a different warehouse that inclu<strong>de</strong>s the old warehouse —<br />

but, you still cannot just "buy" an LDW.<br />

8 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Key Issues<br />

1. What aspects of a data warehouse methodology are still required?<br />

2. How does the concept of a logical data warehouse balance the<br />

needs of agility and governance?<br />

3. What are your architectural options?<br />

9 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


% of vol<strong>um</strong>e contribution<br />

"New" Big data and "Rough" <strong>Data</strong> Vol<strong>um</strong>e Exceeds but<br />

Does Not Eliminate "Curated" <strong>Data</strong>!<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

Curated data (traditional)<br />

Rough data (virtual)<br />

New (big) data<br />

0<br />

About 20% of the data warehouse market can be consi<strong>de</strong>red leading<br />

implementations. By 2018, logical data warehouses in half of them will have<br />

combinations of all three of these data management approaches.<br />

10 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


% of vol<strong>um</strong>e contribution<br />

The Notion of "Fit for Purpose" Has Evolved and Is<br />

Supported in the LDW<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

2005 2010 2015 2020<br />

Curated data (traditional)<br />

Rough data (virtual)<br />

New (big) data<br />

The law of<br />

diminishing<br />

returns<br />

Curated data for analytics was differentiating in 1995. In 2015, differentiation is<br />

in ren<strong>de</strong>ring data in various states to various user types. Think about the value<br />

of each data type to each user—not vol<strong>um</strong>e!<br />

11 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Use-Case Access<br />

Semantics<br />

Traditional <strong>Warehouse</strong> Practices Are Augmented<br />

by the EDW Evolution!<br />

Can the Architecture<br />

Perform as Required?<br />

Do the Descriptions and<br />

Categories Match?<br />

Taxonomy/Ontology Resolution<br />

SLA Requirements<br />

Auditing and Management<br />

Statistics<br />

/=/~<br />

DQ, MDM, Gov.<br />

Metadata<br />

Locate <strong>Data</strong><br />

Audit <strong>Data</strong><br />

Repositories<br />

And/Or And/Or<br />

Fe<strong>de</strong>ration/<br />

Virtualization<br />

Distributed<br />

Process<br />

Optimize <strong>Data</strong><br />

Persist, Pervasive,<br />

Latency, Optimized<br />

Transient, Diverse<br />

Usage, Native<br />

Comprehensive,<br />

Un<strong>de</strong>fined<br />

12 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.<br />

Schema<br />

"Write to Write"<br />

Schema<br />

"Write to Read"<br />

Schema<br />

"Read to Read"


Key Issues<br />

1. What aspects of a data warehouse methodology are still required?<br />

2. How does the concept of a logical data warehouse balance the<br />

needs of agility and governance?<br />

3. What are your architectural options?<br />

13 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Use-Case Access<br />

Semantics<br />

Semantics Form a Bridge to <strong>Data</strong> Science &<br />

New Practices<br />

Veracity<br />

COMPROMISE<br />

SLA Requirements<br />

Repositories<br />

80% of Analytics<br />

Infrastructure Is the<br />

Same as It Ever Was.<br />

Taxonomy/Ontology Resolution<br />

And/Or<br />

/=/~<br />

Fe<strong>de</strong>ration/<br />

Virtualization<br />

Variability<br />

Auditing and Management<br />

Statistics<br />

And/Or<br />

CONTENDER<br />

10% Need Flexibility and<br />

Short Duration.<br />

Distributed<br />

Process<br />

DQ, MDM, Gov.<br />

?<br />

Metadata<br />

High Value<br />

CANDIDATE<br />

5% Converting<br />

Structure on Read<br />

and Integrated.<br />

SCIENCE SANDBOX<br />

5% Constant<br />

Experimentation and<br />

Exploration:<br />

• 80/10/5 Rule Again!<br />

A Semantic Tier Is How You Move From a Binary Design That Forces Unnatural Dichotomy and Creates<br />

a More Flexible and Responsive Architecture.<br />

14 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Semantic Interface<br />

General Ratio in<br />

Population<br />

Single-Process<br />

SME<br />

Multiprocess<br />

SME<br />

Information/<strong>Data</strong><br />

Architects<br />

Statistical Science<br />

Frequent Simple<br />

Support<br />

Frequent<br />

Advanced Support<br />

Infrequent<br />

Advanced Support<br />

Even the Most Advanced Users Use That Curated<br />

Repository — The <strong>Warehouse</strong><br />

Repository<br />

Example<br />

User Class<br />

Typical Characteristics<br />

Virtual<br />

Apprentice<br />

Reports, dashboards. Level 1 (L1)<br />

support, maybe L2.<br />

1,000<br />

Journeyman<br />

Create new reports. Needs<br />

technical assistance. L2 and L3.<br />

90<br />

Distributed<br />

Process<br />

Master<br />

Guild Master<br />

Reliably utilizes knowledge of<br />

systems and data to explore data<br />

assets.<br />

Mo<strong>de</strong>ling theory, graph theory,<br />

mathematics, program languages. 1<br />

5<br />

In general, there is a disconnect. We tend to build toward either the "ease of use"<br />

user or the "advanced analysis" user — and we build twice or more times than<br />

that. The logical data warehouse says "build once, access many."<br />

15 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Integration<br />

Semantic Interface<br />

Architecture Issue No. 1: Analysts Need a Unified <strong>Data</strong><br />

Delivery & Access Semantic<br />

Apprentice<br />

Reports, Dashboards. L1<br />

Support, Maybe L2.<br />

1,000<br />

Compromise<br />

Journeyman<br />

Create New Reports. Needs<br />

Technical Assist. L2 & L3.<br />

90<br />

Embed<strong>de</strong>d/<br />

Smart<br />

(Stores)<br />

Conten<strong>de</strong>r<br />

Master<br />

Reliably Use Knowledge of<br />

Systems and <strong>Data</strong> to<br />

Explore Assets.<br />

5<br />

Analytics<br />

(Access)<br />

Guild Master<br />

Mo<strong>de</strong>ling Theory, Graph<br />

Theory, Mathematics,<br />

Program Languages.<br />

1<br />

Candidate<br />

(Processes)<br />

Logical <strong>Data</strong> <strong>Warehouse</strong><br />

16 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Integration<br />

Semantic Interface<br />

Architecture Issue No. 2: Need to Convert Discovery<br />

Into Production — The Organization!<br />

LDW<br />

BICC<br />

Casual<br />

Analyst<br />

Miner<br />

Science<br />

Reports,<br />

Dashboards. L1<br />

Support, Maybe L2.<br />

Create New Reports.<br />

Needs tech. Assist,<br />

L2 & L3.<br />

Ops. Process,<br />

Systems Analyst,<br />

<strong>Data</strong> Architect,<br />

Reliable Tech.<br />

Mo<strong>de</strong>ling Theory,<br />

Graph Theory,<br />

Mathematics,<br />

Program Languages.<br />

1,000<br />

90<br />

5<br />

1<br />

A "Bridge"<br />

Embed<strong>de</strong>d/<br />

Smart<br />

Analytics<br />

<strong>Data</strong> Science Laboratory<br />

Compromise<br />

(Stores)<br />

Conten<strong>de</strong>r<br />

(Access)<br />

Candidate<br />

(Processes)<br />

Information Architect, Systems Architect, Application Architect<br />

17 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.<br />

<strong>Data</strong> Services<br />

Administrators<br />

<strong>Data</strong> Mo<strong>de</strong>ling<br />

Information<br />

Administrators


Option No. 1: BI Platform as Semantic Tier<br />

• Design Principles:<br />

– <strong>Data</strong> scientist guild master<br />

specifies processes and<br />

i<strong>de</strong>ntifies objects<br />

– Business analyst journeyman<br />

utilizes and reuses objects<br />

– Architects work with BA<br />

Journeyman to "productionize"<br />

science<br />

– <strong>Data</strong> analyst master is on<br />

special projects<br />

• Benefits:<br />

– Familiar to users<br />

– Encourages discipline on<br />

business and data analysts<br />

– Free's up commitments from<br />

the data scientist<br />

• Risks:<br />

– Limits sharing data to BI<br />

export process<br />

– Challenging to add new BI<br />

and data discovery tools —<br />

must reinvent the mo<strong>de</strong>l<br />

"wheel" each time over<br />

all 3 SLAs<br />

19 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Option No. 2: DMSA (DBMS) as Semantic Tier<br />

• Design Principles:<br />

– Business analyst journeyman<br />

assist data analyst master<br />

– DA master works with<br />

Architects and DBA team for<br />

optimization<br />

– <strong>Data</strong> scientist guild master<br />

presents reinforcing and<br />

dissenting analytics<br />

– Architect team works with DA<br />

and BA to prioritize and <strong>de</strong>ploy<br />

• Benefits:<br />

– Familiar to DBAs and data<br />

architects<br />

– Leverages discipline of<br />

analyst/DBA interaction<br />

– Encourages workflow to<br />

inclu<strong>de</strong> data scientist with IT<br />

and data analysts<br />

• Risks:<br />

– Most DBMS external calls<br />

cannot extend optimization<br />

– Channels data access through<br />

a single "choke" point — but<br />

can be managed<br />

20 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Option No. 3: Virtualization as Semantic Tier<br />

• Design Principles:<br />

– <strong>Data</strong> scientist and data analyst<br />

masters i<strong>de</strong>ntify analytic data<br />

patterns and propose.<br />

– DA and business analyst<br />

Journeyman <strong>de</strong>termine top<br />

conten<strong>de</strong>rs.<br />

– Architecture team works with<br />

DA on metrics to <strong>de</strong>termine<br />

promotion to compromise and<br />

conten<strong>de</strong>r.<br />

• Benefits:<br />

– Encourages additional<br />

technology for SLAs<br />

– Creates semantic tier<br />

accessible by all technologies<br />

– Permits rapid prototyping and<br />

re-mo<strong>de</strong>ling of data AND<br />

quick-change between SLAs<br />

• Risks:<br />

– Heavily reliant upon internal<br />

optimization and its limits<br />

– Ass<strong>um</strong>es knowledge of data<br />

access and processing issues<br />

21 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Recommendations<br />

Keep your existing data warehouse as one engine beneath a<br />

unified access tier.<br />

Determine your best strategy for <strong>de</strong>ploying a unified access tier<br />

based on risks and benefits.<br />

Use your ratio of analyst user types to <strong>de</strong>termine a center of<br />

gravity for using high value.<br />

Gauge your rate of shift between "curated," "rough" and "new"<br />

information asset types to <strong>de</strong>velop priorities and timelines.<br />

22 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.


Recommen<strong>de</strong>d Gartner Research<br />

<br />

<br />

<br />

Focus on the 'Three Vs' of Big <strong>Data</strong> Analytics: Variability, Veracity<br />

and Value<br />

Alan D. Duncan (G00270472)<br />

Making Big <strong>Data</strong> Normal Begins With Self-Classifying and<br />

Self-Disciplined Users<br />

Mark A. Beyer and Others (G00271691)<br />

Avoid a Big <strong>Data</strong> <strong>Warehouse</strong> Mistake by Evolving to the Logical<br />

<strong>Data</strong> <strong>Warehouse</strong> Now<br />

Mark A. Beyer (G00252003)<br />

Business Drivers of Technology Decisions for Healthcare Provi<strong>de</strong>rs, 2015<br />

Vi Shaffer and Others (G00272361)<br />

For more information, stop by Gartner Research Zone.<br />

23 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!