Ainda precisamos de um Data Warehouse
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Gartner Business Intelligence & Information Management S<strong>um</strong>mit<br />
23-24 <strong>de</strong> junho 2015 | São Paulo, Brasil<br />
Do We Still Need a <strong>Data</strong> <strong>Warehouse</strong>?<br />
Donald Feinberg<br />
This presentation, including any supporting materials, is owned by Gartner, Inc. and/or its affiliates and is for the sole use of the inten<strong>de</strong>d Gartner audience or other inten<strong>de</strong>d recipients. This presentation may<br />
contain information that is confi<strong>de</strong>ntial, proprietary or otherwise legally protected, and it may not be further copied, distributed or publicly displayed without the express written permission of Gartner, Inc. or its affiliates.<br />
© 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Do You Actually Know What a <strong>Data</strong> <strong>Warehouse</strong> Is?<br />
• Consolidated, integrated,<br />
subject-oriented, time-variant<br />
data management solution.<br />
• It never was …<br />
– A database.<br />
– A DBMS.<br />
– An appliance.<br />
– Built in.<br />
– … done.<br />
Things are not always what you first see, first experience, first build,<br />
first encounter.<br />
1 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Key Issues<br />
1. What aspects of a data warehouse methodology are still required?<br />
2. How does the concept of a logical data warehouse balance the<br />
needs of agility and governance?<br />
3. What are your architectural options?<br />
2 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Key Issues<br />
1. What aspects of a data warehouse methodology are still required?<br />
2. How does the concept of a logical data warehouse balance the<br />
needs of agility and governance?<br />
3. What are your architectural options?<br />
3 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
What Is a <strong>Data</strong> <strong>Warehouse</strong>?<br />
• Architectural Construct:<br />
– Integrated<br />
– Time variant<br />
– Subject orientation<br />
– Analytic orientation<br />
– Service level driven<br />
– Mission-critical<br />
• Physical Implementation:<br />
– Centrally stored data<br />
– Mo<strong>de</strong>led<br />
– Mixed workload<br />
– Optimized<br />
– Servers<br />
– Storage<br />
4 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
The <strong>Data</strong> <strong>Warehouse</strong> Was a Bridge to "Near" Information<br />
Shores … Now We Are in Orbit!<br />
Evolution<br />
Craftsmanship<br />
5 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.<br />
Adaptation<br />
The data warehouse crossed relatively simple information usage barriers using<br />
binary dichotomy. Go "off world." Embrace more than local topography.
You Still Need All of the Analytics <strong>Data</strong> Management SLAs<br />
But these SLAs are mostly incompatible!<br />
• Compromise:<br />
– The ".AND."<br />
– Pervasive.<br />
– Persistent.<br />
– Latent.<br />
– Source "write."<br />
– Target "write."<br />
– Static optimization<br />
for all 5.<br />
• Conten<strong>de</strong>r:<br />
– The ".OR."<br />
– Diverse.<br />
– Transient.<br />
– Native.<br />
– Source "write."<br />
– Target "read."<br />
– Optimization<br />
focuses on the<br />
dominant.<br />
• Candidate:<br />
– ?<br />
– Unique.<br />
– Ambivalent.<br />
– Contrary.<br />
– Source "read."<br />
– Target "read."<br />
– Optimized by<br />
processing.<br />
Since you must continue to meet the <strong>de</strong>mand of the data warehouse —<br />
you must continue to support the data warehouse.<br />
6 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
You Still Need Best Practices For …<br />
• Source system i<strong>de</strong>ntification.<br />
• Determining to instantiate<br />
data.<br />
• Optimization by<br />
end-user class.<br />
• Optimization for<br />
data processes.<br />
• Requirements for static<br />
reporting and "canned"<br />
analytics.<br />
• Information life cycle.<br />
• Security management.<br />
• <strong>Data</strong> governance & quality.<br />
• <strong>Data</strong> preparation.<br />
Moving data to a central location or giving a central point of access to many data locations<br />
is the new warehouse. Moving data to a central location with less transformation actually<br />
puts that job on the user.<br />
7 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Funding the Need Somewhere Else Doesn't Mean<br />
It's Not a <strong>Warehouse</strong><br />
• Appliances are platforms.<br />
• Cloud? IaaS! PaaS! DBaaS!<br />
• Mo<strong>de</strong>ls are customized.<br />
• Source system access differs.<br />
• Software and servers<br />
are "pieces."<br />
• Distributing cost among many in<br />
"self-service" doesn't make it<br />
disappear.<br />
You need a different warehouse that inclu<strong>de</strong>s the old warehouse —<br />
but, you still cannot just "buy" an LDW.<br />
8 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Key Issues<br />
1. What aspects of a data warehouse methodology are still required?<br />
2. How does the concept of a logical data warehouse balance the<br />
needs of agility and governance?<br />
3. What are your architectural options?<br />
9 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
% of vol<strong>um</strong>e contribution<br />
"New" Big data and "Rough" <strong>Data</strong> Vol<strong>um</strong>e Exceeds but<br />
Does Not Eliminate "Curated" <strong>Data</strong>!<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
Curated data (traditional)<br />
Rough data (virtual)<br />
New (big) data<br />
0<br />
About 20% of the data warehouse market can be consi<strong>de</strong>red leading<br />
implementations. By 2018, logical data warehouses in half of them will have<br />
combinations of all three of these data management approaches.<br />
10 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
% of vol<strong>um</strong>e contribution<br />
The Notion of "Fit for Purpose" Has Evolved and Is<br />
Supported in the LDW<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
2005 2010 2015 2020<br />
Curated data (traditional)<br />
Rough data (virtual)<br />
New (big) data<br />
The law of<br />
diminishing<br />
returns<br />
Curated data for analytics was differentiating in 1995. In 2015, differentiation is<br />
in ren<strong>de</strong>ring data in various states to various user types. Think about the value<br />
of each data type to each user—not vol<strong>um</strong>e!<br />
11 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Use-Case Access<br />
Semantics<br />
Traditional <strong>Warehouse</strong> Practices Are Augmented<br />
by the EDW Evolution!<br />
Can the Architecture<br />
Perform as Required?<br />
Do the Descriptions and<br />
Categories Match?<br />
Taxonomy/Ontology Resolution<br />
SLA Requirements<br />
Auditing and Management<br />
Statistics<br />
/=/~<br />
DQ, MDM, Gov.<br />
Metadata<br />
Locate <strong>Data</strong><br />
Audit <strong>Data</strong><br />
Repositories<br />
And/Or And/Or<br />
Fe<strong>de</strong>ration/<br />
Virtualization<br />
Distributed<br />
Process<br />
Optimize <strong>Data</strong><br />
Persist, Pervasive,<br />
Latency, Optimized<br />
Transient, Diverse<br />
Usage, Native<br />
Comprehensive,<br />
Un<strong>de</strong>fined<br />
12 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.<br />
Schema<br />
"Write to Write"<br />
Schema<br />
"Write to Read"<br />
Schema<br />
"Read to Read"
Key Issues<br />
1. What aspects of a data warehouse methodology are still required?<br />
2. How does the concept of a logical data warehouse balance the<br />
needs of agility and governance?<br />
3. What are your architectural options?<br />
13 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Use-Case Access<br />
Semantics<br />
Semantics Form a Bridge to <strong>Data</strong> Science &<br />
New Practices<br />
Veracity<br />
COMPROMISE<br />
SLA Requirements<br />
Repositories<br />
80% of Analytics<br />
Infrastructure Is the<br />
Same as It Ever Was.<br />
Taxonomy/Ontology Resolution<br />
And/Or<br />
/=/~<br />
Fe<strong>de</strong>ration/<br />
Virtualization<br />
Variability<br />
Auditing and Management<br />
Statistics<br />
And/Or<br />
CONTENDER<br />
10% Need Flexibility and<br />
Short Duration.<br />
Distributed<br />
Process<br />
DQ, MDM, Gov.<br />
?<br />
Metadata<br />
High Value<br />
CANDIDATE<br />
5% Converting<br />
Structure on Read<br />
and Integrated.<br />
SCIENCE SANDBOX<br />
5% Constant<br />
Experimentation and<br />
Exploration:<br />
• 80/10/5 Rule Again!<br />
A Semantic Tier Is How You Move From a Binary Design That Forces Unnatural Dichotomy and Creates<br />
a More Flexible and Responsive Architecture.<br />
14 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Semantic Interface<br />
General Ratio in<br />
Population<br />
Single-Process<br />
SME<br />
Multiprocess<br />
SME<br />
Information/<strong>Data</strong><br />
Architects<br />
Statistical Science<br />
Frequent Simple<br />
Support<br />
Frequent<br />
Advanced Support<br />
Infrequent<br />
Advanced Support<br />
Even the Most Advanced Users Use That Curated<br />
Repository — The <strong>Warehouse</strong><br />
Repository<br />
Example<br />
User Class<br />
Typical Characteristics<br />
Virtual<br />
Apprentice<br />
Reports, dashboards. Level 1 (L1)<br />
support, maybe L2.<br />
1,000<br />
Journeyman<br />
Create new reports. Needs<br />
technical assistance. L2 and L3.<br />
90<br />
Distributed<br />
Process<br />
Master<br />
Guild Master<br />
Reliably utilizes knowledge of<br />
systems and data to explore data<br />
assets.<br />
Mo<strong>de</strong>ling theory, graph theory,<br />
mathematics, program languages. 1<br />
5<br />
In general, there is a disconnect. We tend to build toward either the "ease of use"<br />
user or the "advanced analysis" user — and we build twice or more times than<br />
that. The logical data warehouse says "build once, access many."<br />
15 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Integration<br />
Semantic Interface<br />
Architecture Issue No. 1: Analysts Need a Unified <strong>Data</strong><br />
Delivery & Access Semantic<br />
Apprentice<br />
Reports, Dashboards. L1<br />
Support, Maybe L2.<br />
1,000<br />
Compromise<br />
Journeyman<br />
Create New Reports. Needs<br />
Technical Assist. L2 & L3.<br />
90<br />
Embed<strong>de</strong>d/<br />
Smart<br />
(Stores)<br />
Conten<strong>de</strong>r<br />
Master<br />
Reliably Use Knowledge of<br />
Systems and <strong>Data</strong> to<br />
Explore Assets.<br />
5<br />
Analytics<br />
(Access)<br />
Guild Master<br />
Mo<strong>de</strong>ling Theory, Graph<br />
Theory, Mathematics,<br />
Program Languages.<br />
1<br />
Candidate<br />
(Processes)<br />
Logical <strong>Data</strong> <strong>Warehouse</strong><br />
16 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Integration<br />
Semantic Interface<br />
Architecture Issue No. 2: Need to Convert Discovery<br />
Into Production — The Organization!<br />
LDW<br />
BICC<br />
Casual<br />
Analyst<br />
Miner<br />
Science<br />
Reports,<br />
Dashboards. L1<br />
Support, Maybe L2.<br />
Create New Reports.<br />
Needs tech. Assist,<br />
L2 & L3.<br />
Ops. Process,<br />
Systems Analyst,<br />
<strong>Data</strong> Architect,<br />
Reliable Tech.<br />
Mo<strong>de</strong>ling Theory,<br />
Graph Theory,<br />
Mathematics,<br />
Program Languages.<br />
1,000<br />
90<br />
5<br />
1<br />
A "Bridge"<br />
Embed<strong>de</strong>d/<br />
Smart<br />
Analytics<br />
<strong>Data</strong> Science Laboratory<br />
Compromise<br />
(Stores)<br />
Conten<strong>de</strong>r<br />
(Access)<br />
Candidate<br />
(Processes)<br />
Information Architect, Systems Architect, Application Architect<br />
17 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.<br />
<strong>Data</strong> Services<br />
Administrators<br />
<strong>Data</strong> Mo<strong>de</strong>ling<br />
Information<br />
Administrators
Option No. 1: BI Platform as Semantic Tier<br />
• Design Principles:<br />
– <strong>Data</strong> scientist guild master<br />
specifies processes and<br />
i<strong>de</strong>ntifies objects<br />
– Business analyst journeyman<br />
utilizes and reuses objects<br />
– Architects work with BA<br />
Journeyman to "productionize"<br />
science<br />
– <strong>Data</strong> analyst master is on<br />
special projects<br />
• Benefits:<br />
– Familiar to users<br />
– Encourages discipline on<br />
business and data analysts<br />
– Free's up commitments from<br />
the data scientist<br />
• Risks:<br />
– Limits sharing data to BI<br />
export process<br />
– Challenging to add new BI<br />
and data discovery tools —<br />
must reinvent the mo<strong>de</strong>l<br />
"wheel" each time over<br />
all 3 SLAs<br />
19 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Option No. 2: DMSA (DBMS) as Semantic Tier<br />
• Design Principles:<br />
– Business analyst journeyman<br />
assist data analyst master<br />
– DA master works with<br />
Architects and DBA team for<br />
optimization<br />
– <strong>Data</strong> scientist guild master<br />
presents reinforcing and<br />
dissenting analytics<br />
– Architect team works with DA<br />
and BA to prioritize and <strong>de</strong>ploy<br />
• Benefits:<br />
– Familiar to DBAs and data<br />
architects<br />
– Leverages discipline of<br />
analyst/DBA interaction<br />
– Encourages workflow to<br />
inclu<strong>de</strong> data scientist with IT<br />
and data analysts<br />
• Risks:<br />
– Most DBMS external calls<br />
cannot extend optimization<br />
– Channels data access through<br />
a single "choke" point — but<br />
can be managed<br />
20 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Option No. 3: Virtualization as Semantic Tier<br />
• Design Principles:<br />
– <strong>Data</strong> scientist and data analyst<br />
masters i<strong>de</strong>ntify analytic data<br />
patterns and propose.<br />
– DA and business analyst<br />
Journeyman <strong>de</strong>termine top<br />
conten<strong>de</strong>rs.<br />
– Architecture team works with<br />
DA on metrics to <strong>de</strong>termine<br />
promotion to compromise and<br />
conten<strong>de</strong>r.<br />
• Benefits:<br />
– Encourages additional<br />
technology for SLAs<br />
– Creates semantic tier<br />
accessible by all technologies<br />
– Permits rapid prototyping and<br />
re-mo<strong>de</strong>ling of data AND<br />
quick-change between SLAs<br />
• Risks:<br />
– Heavily reliant upon internal<br />
optimization and its limits<br />
– Ass<strong>um</strong>es knowledge of data<br />
access and processing issues<br />
21 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Recommendations<br />
Keep your existing data warehouse as one engine beneath a<br />
unified access tier.<br />
Determine your best strategy for <strong>de</strong>ploying a unified access tier<br />
based on risks and benefits.<br />
Use your ratio of analyst user types to <strong>de</strong>termine a center of<br />
gravity for using high value.<br />
Gauge your rate of shift between "curated," "rough" and "new"<br />
information asset types to <strong>de</strong>velop priorities and timelines.<br />
22 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.
Recommen<strong>de</strong>d Gartner Research<br />
<br />
<br />
<br />
Focus on the 'Three Vs' of Big <strong>Data</strong> Analytics: Variability, Veracity<br />
and Value<br />
Alan D. Duncan (G00270472)<br />
Making Big <strong>Data</strong> Normal Begins With Self-Classifying and<br />
Self-Disciplined Users<br />
Mark A. Beyer and Others (G00271691)<br />
Avoid a Big <strong>Data</strong> <strong>Warehouse</strong> Mistake by Evolving to the Logical<br />
<strong>Data</strong> <strong>Warehouse</strong> Now<br />
Mark A. Beyer (G00252003)<br />
Business Drivers of Technology Decisions for Healthcare Provi<strong>de</strong>rs, 2015<br />
Vi Shaffer and Others (G00272361)<br />
For more information, stop by Gartner Research Zone.<br />
23 © 2015 Gartner, Inc. and/or its affiliates. All rights reserved.