Solving Performance Issues in BI through Purpose-built Analytical ...
Solving Performance Issues in BI through Purpose-built Analytical ...
Solving Performance Issues in BI through Purpose-built Analytical ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Solv<strong>in</strong>g</strong> <strong>Performance</strong> <strong>Issues</strong> <strong>in</strong> <strong>BI</strong> <strong>through</strong><br />
<strong>Purpose</strong>-<strong>built</strong> <strong>Analytical</strong> Databases<br />
Silicon India <strong>BI</strong> Conference, Sept 24, 2011, Hyderabad<br />
Vivek Bhatnagar<br />
Actian Corporation<br />
1 of 9<br />
1 of 9<br />
Confidential © 2011 Actian Corporation
Agenda<br />
� Today’s Biggest Challenge <strong>in</strong> <strong>BI</strong> - <strong>Performance</strong><br />
� Common Approaches Used For Database<br />
<strong>Performance</strong><br />
� New Break<strong>through</strong>: On-Chip Comput<strong>in</strong>g/ Columnar<br />
Design<br />
� VectorWise – A Puropose-Built <strong>Analytical</strong> Database<br />
� Illustrative Use Cases
Today’s Challenges <strong>in</strong> <strong>BI</strong><br />
Make <strong>in</strong>formed decisions faster<br />
Analysis <strong>in</strong> seconds not m<strong>in</strong>utes, m<strong>in</strong>utes not hours<br />
Data explosion<br />
Collect<strong>in</strong>g more <strong>in</strong>formation, less resources<br />
>44x growth <strong>in</strong> next 10 years<br />
Exist<strong>in</strong>g Tools:<br />
Too Slow. Too complex. Too expensive.<br />
<strong>Analytical</strong> databases designed <strong>in</strong> 80s & 90s do not take<br />
advantage of today’s modern hardware
Biggest <strong>BI</strong> Challenge - SPEED<br />
What Problem will eventually drive you to replace your<br />
current data warehouse platform?<br />
1. Poor Query Response<br />
2. Can’t Support Advanced Analytics<br />
40%<br />
45%<br />
Source: TDWI Q4 2009 Best Practices Report<br />
“Gartner clients <strong>in</strong>creas<strong>in</strong>gly report performance<br />
constra<strong>in</strong>ed data warehouses dur<strong>in</strong>g <strong>in</strong>quires. Based<br />
on these <strong>in</strong>quiries, we estimate that nearly 70% of data<br />
warehouses experience performance-constra<strong>in</strong>ed<br />
issues of various types.” Source: Gartner Magic Quadrant for Data Warehouse Database<br />
Management Systems, Jan 2010
Biggest <strong>BI</strong> Challenge - SPEED<br />
Source: 2010 <strong>BI</strong> Survey 9 – World’s largest <strong>in</strong>dependent <strong>BI</strong> Survey 3093 respondents
Cubes and Speed<br />
As cubes grow <strong>in</strong> size they take longer to load and build<br />
• Process<strong>in</strong>g time might exceed batch w<strong>in</strong>dow<br />
• Difficulty manag<strong>in</strong>g large cubes<br />
• Time required to add new dimensions<br />
Source: 2010 TDWI Benchmarks
Relational Databases and Speed<br />
Limitations <strong>in</strong> SQL technology<br />
• Adhoc queries too slow<br />
• Index<strong>in</strong>g/Aggregations cost time & money<br />
• 25% average <strong>BI</strong>/DW team time used up for ma<strong>in</strong>tenance/change<br />
management<br />
Source: 2010 TDWI <strong>BI</strong> Benchmark Report
Challenges with Current State of <strong>BI</strong><br />
Bus<strong>in</strong>ess Wants<br />
Faster Answers<br />
Database<br />
Too Slow<br />
Complex, Risky<br />
& Expensive<br />
More<br />
Hardware<br />
OLAP<br />
Cubes<br />
Application<br />
Traditional Database<br />
Indices<br />
Rollups<br />
Hardware<br />
Star<br />
Schemas<br />
Tun<strong>in</strong>g<br />
Data Growth<br />
Exponential<br />
<strong>Analytical</strong><br />
Bottleneck<br />
DB only use<br />
a fraction of<br />
CPU<br />
Capability
Approaches Used for Achiev<strong>in</strong>g Database<br />
<strong>Performance</strong><br />
Parallel Process<strong>in</strong>g<br />
Optimizations for parallel process<strong>in</strong>g and m<strong>in</strong>imal data retrieval<br />
MPP<br />
Column<br />
Disk<br />
Column store with compression Proprietary hardware<br />
Decompress<br />
RAM<br />
Buffer<br />
Manager<br />
Data Warehouse Appliances<br />
Acceptable performance has been achieved by us<strong>in</strong>g more hardware<br />
or by <strong>in</strong>telligently lower<strong>in</strong>g the volume of data to be processed<br />
However, none of these approaches leverages the performance features of<br />
today’s CPUs i.e. tak<strong>in</strong>g the most out of each modern commodity CPU
New Break<strong>through</strong><br />
VectorWise <strong>Analytical</strong> Database<br />
<strong>Purpose</strong>-Built <strong>Analytical</strong><br />
Relational database for <strong>BI</strong><br />
and data analysis<br />
Runs blaz<strong>in</strong>g<br />
fast/<strong>in</strong>teractive data<br />
analysis<br />
Exploits performance<br />
potential <strong>in</strong> today’s CPUs<br />
Delivers <strong>in</strong>-memory<br />
performance without be<strong>in</strong>g<br />
memory constra<strong>in</strong>t<br />
“Game-chang<strong>in</strong>g technology.”<br />
Don Fe<strong>in</strong>berg, Gartner Group<br />
“This is def<strong>in</strong>itely a break<strong>through</strong>. It delivers<br />
faster results at lower costs.”<br />
Noel Yuhanna, Forrester Research<br />
“This <strong>in</strong>evitability puts VectorWise 4 years ahead<br />
of the competition <strong>in</strong> terms of performance – and it<br />
will rema<strong>in</strong> 4 years ahead until some competitor<br />
f<strong>in</strong>ds a way to catch up at a software level. This is<br />
unprecedented.”<br />
Rob<strong>in</strong> Bloor, The Virtual Circle
VectorWise:<br />
On Chip Comput<strong>in</strong>g/Columnar Database<br />
Time / Cycles to Process<br />
2-20 150-250 Millions<br />
Break<strong>through</strong> technology<br />
Vector Process<strong>in</strong>g<br />
On Chip Comput<strong>in</strong>g<br />
DISK<br />
40-400MB<br />
RAM<br />
2-3GB<br />
Data Processed<br />
CHIP<br />
10GB<br />
Innovations on proven techniques<br />
Updateable Column Store<br />
Automatic Compression<br />
Automatic Storage Indexes<br />
M<strong>in</strong>imize IO<br />
Parallel Process<strong>in</strong>g
VectorWise Technology<br />
� Vector process<strong>in</strong>g<br />
– Exploits super-scalar features us<strong>in</strong>g<br />
SIMD capabilities of today’s CPUs<br />
� Optimizes memory hierarchy<br />
– Maximizes use of CPU cache<br />
– Fewer requests to RAM and disk<br />
� Data Compression/De-Compression<br />
– Optimized compression enabl<strong>in</strong>g very<br />
fast de-compression for overall<br />
performance enhancement<br />
– Vectorized de-compression<br />
– Automatic compression <strong>through</strong> ultraefficient<br />
algorithms<br />
� Automatic Index<strong>in</strong>g<br />
– System generated Storage<br />
Indexes<br />
– Easy identification of candidate<br />
data blocks for queries<br />
� Integration<br />
– Standard SQL and <strong>in</strong>terfaces<br />
– Common <strong>BI</strong>/Data Integration<br />
tools
Modern CPU Instruction Capabilities<br />
� SIMD<br />
• Traditional CPU process<strong>in</strong>g: S<strong>in</strong>gle Instruction, S<strong>in</strong>gle<br />
Data (SISD)<br />
• Modern CPU process<strong>in</strong>g capabilities: S<strong>in</strong>gle Instruction,<br />
Multiple Data (SIMD)<br />
� Out-of-order execution<br />
� Chip multi-thread<strong>in</strong>g<br />
� Large L2/L3 caches<br />
� Stream<strong>in</strong>g SIMD Extensions for efficient<br />
SIMD process<strong>in</strong>g<br />
� Hardware accelerated Str<strong>in</strong>g Process<strong>in</strong>g
Vector Process<strong>in</strong>g<br />
One operation<br />
performed on one<br />
element at a time<br />
Large overheads<br />
Traditional Scalar<br />
Process<strong>in</strong>g<br />
1 x 1 = 1<br />
2 x 2 = 4<br />
3 x 3 = 9<br />
4 x 4 = 16<br />
5 x 5 = 25<br />
6 x 6 = 36<br />
7 x 7 = 49<br />
8 x 8 = 64<br />
.<br />
.<br />
.<br />
n x n = n 2<br />
Many<br />
V’s<br />
1<br />
Vector Process<strong>in</strong>g<br />
1 x 1<br />
2 x 2<br />
3 x 3<br />
4 x 4<br />
5 x 5<br />
6 x 6<br />
7 x 7<br />
8 x 8<br />
.<br />
.<br />
.<br />
n x n<br />
=<br />
1<br />
4<br />
9<br />
16<br />
25<br />
36<br />
49<br />
64<br />
.<br />
.<br />
.<br />
n 2<br />
One operation<br />
performed on a set<br />
of data at a time<br />
No overheads<br />
Process even 1.5GB<br />
per second
Process<strong>in</strong>g <strong>in</strong> Chip Cache<br />
Time / Cycles to Process<br />
2-20 150-250 Millions<br />
Us<strong>in</strong>g CPU cache is far more faster & efficient<br />
DISK<br />
40-100MB<br />
RAM<br />
2-3GB<br />
Data Processed<br />
GB/s Measure of Throughput<br />
Cycles Amount of CPU time required to<br />
process data<br />
CHIP<br />
10GB
Updateable Column Store<br />
� Only access relevant data<br />
� Enable <strong>in</strong>cremental updates efficiently<br />
• Traditionally a weakness for column-based stores<br />
Cust_Num Cust_surna<br />
me<br />
Cust_first_nam<br />
e<br />
Cust_mid_nam<br />
e<br />
Cust_DOB Cust_Sex Cust_Add_1 Cust_Addr_2 Cust_City Cust_State<br />
46328927956 Jones Steven Sean 17-JAN-1971 M 333 StKilda Rd Melbourne Vic<br />
98679975745 Smith Leonard Patrick 04-APR-1964 M Unit 12, 147<br />
Trafalgar Sqr<br />
52634346735 Rogers C<strong>in</strong>dy Carm<strong>in</strong>e 11-MAR-1980 F Belmont Rail<br />
Service<br />
346737347347 Andrews Jenny 14-SEP-1977 F Apt1, 117 West<br />
42 nd St<br />
88673477347 Cooper Sheldon Michael 30-JUN-1980 M Ingres<br />
Corporation<br />
34673447568 Kollwitz Rolf 22-DEC-1975 M IBM<br />
Headquarters<br />
99554443044 Wong Penny Lee 13-NOV-1981 F M<strong>in</strong>g On Tower<br />
1<br />
Birm<strong>in</strong>gham London<br />
421 Station St Belmont CA<br />
Level 2, 426<br />
Argello St<br />
123 Mount<br />
View Crs<br />
1777 Moa Tzu<br />
Tung Rd<br />
New York NY<br />
Redwood City CA<br />
Atlantic City PN<br />
M<strong>in</strong>g Now<br />
Prov<strong>in</strong>ce<br />
Shanghi
Optimized Compression & Fast De-Compression<br />
� Column-based compression with multiple algorithms<br />
• Automatically determ<strong>in</strong>ed by VectorWise<br />
� Vectorised decompression<br />
• Only for data process<strong>in</strong>g <strong>in</strong> CPU cache<br />
Column<br />
Disk<br />
Maximize I/O<br />
<strong>through</strong>put to chip<br />
RAM<br />
Column<br />
Buffer<br />
Manager<br />
Elim<strong>in</strong>ate slow<br />
round trip to RAM<br />
Decompress and<br />
store <strong>in</strong> chip cache<br />
Cache<br />
Decompress<br />
CPU
Storage Index<br />
� Always automatically created<br />
� Automatically ma<strong>in</strong>ta<strong>in</strong>ed<br />
� Stores m<strong>in</strong>/max value per data block<br />
� Enables database to efficiently identify candidate data blocks
VectorWise Features<br />
<strong>Performance</strong><br />
•10x-75x faster for <strong>BI</strong>,<br />
analytics & report<strong>in</strong>g<br />
• In-memory performance<br />
without memory restra<strong>in</strong>ts<br />
• Near real-time updatable<br />
database<br />
• Delivers results <strong>in</strong><br />
seconds not m<strong>in</strong>utes<br />
m<strong>in</strong>utes not hours<br />
Usage & Integration<br />
•Uses ANSI standard<br />
queries & SQL statements<br />
•Elim<strong>in</strong>ate/reduce Cubes,<br />
aggregate tables, roll ups,<br />
<strong>in</strong>dexes….<br />
•Self <strong>in</strong>dex<strong>in</strong>g & self tun<strong>in</strong>g<br />
database<br />
•Deliver <strong>BI</strong> projects faster<br />
with lower cost & risk<br />
TCO<br />
• Maximize utilization of<br />
CPUs <strong>in</strong> low cost<br />
commodity hardware<br />
• Handle tens of terabytes<br />
scale data with a s<strong>in</strong>gle<br />
server<br />
• Requires commodity<br />
hardware<br />
• Does not require MPP
VectorWise<br />
TPC-H <strong>Performance</strong> Benchmark
TPC Benchmark Results – 1TB<br />
2.5x <strong>Performance</strong><br />
<strong>Performance</strong><br />
(QphH@1TB)<br />
500,000<br />
400,000<br />
300,000<br />
200,000<br />
100,000<br />
0<br />
Price/<strong>Performance</strong><br />
US$/QphH@1TB<br />
Oracle<br />
29 April 2009<br />
123,323<br />
$20.54 US<br />
Oracle<br />
26 April 2010<br />
140,181<br />
$12.15 US<br />
Sybase IQ<br />
15 Dec 2010<br />
164,747<br />
$6.85 US<br />
Top 5 Non-Clustered Database System<br />
TPC-H 1TB Scale Factor<br />
Microsoft<br />
5 Apr 2011<br />
80 cores<br />
2 TB RAM<br />
173,961<br />
$1.37 US<br />
VectorWise<br />
3 May 2011<br />
Source: www.tpc.org / May 23, 2011<br />
TPC, TPC Benchmark, TPC-H, QppH, QthH and QphH are trademarks of the Transaction Process<strong>in</strong>g <strong>Performance</strong> Council (TPC)<br />
32 cores<br />
1 TB RAM<br />
436,788<br />
$0.88 US
VectorWise<br />
<strong>BI</strong> Tun<strong>in</strong>g & Complexity<br />
Traditional<br />
Data Mart<br />
Project<br />
VectorWise<br />
Project<br />
Requirements<br />
Assessment<br />
Requirements<br />
Assessment<br />
Design<br />
Data<br />
Schema<br />
Design<br />
Data<br />
Schema<br />
Load<br />
Data<br />
Load<br />
Data<br />
Build<br />
Reports<br />
UAT<br />
2010 TDWI <strong>BI</strong> Benchmark Report<br />
Average time to build a complex report or dashboard<br />
(20 dimensions, 12 measures, and 6 user access roles)<br />
Build<br />
Aggregates,<br />
OLAP Cubes<br />
Build<br />
Reports<br />
UAT<br />
Time to Deploy Project<br />
2008 6.7 weeks<br />
2009 6.3 weeks<br />
2010 6.6 weeks
VectorWise<br />
<strong>BI</strong> Tun<strong>in</strong>g & Complexity<br />
Traditional<br />
Architecture<br />
VectorWise<br />
Architecture<br />
End<br />
of<br />
day<br />
End<br />
of<br />
day<br />
Fast Process<strong>in</strong>g Everyday!!!<br />
Load<br />
Data<br />
Warehouse<br />
Load<br />
Data<br />
Warehouse<br />
Build<br />
Reports<br />
Build<br />
Aggregates,<br />
OLAP Cubes<br />
Build<br />
Reports<br />
Time to Deploy Project<br />
Rega<strong>in</strong>ed Time
VectorWise<br />
TPC-H Price/<strong>Performance</strong> Benchmark<br />
TCO<br />
• Spend less on <strong>in</strong>frastructure<br />
• Spend less on <strong>BI</strong> tun<strong>in</strong>g
<strong>Analytical</strong> Databases - Illustrative Use Cases<br />
Telcos/VAS Retail FSI Web 2.0<br />
Store & analyze CDR, VAS<br />
downloads & other<br />
subscriber/network data for:<br />
- Revenue assurance<br />
- Price optimization<br />
- Customer loyalty/churn<br />
- Market<strong>in</strong>g effectiveness<br />
- Service level effectiveness<br />
- Network performance<br />
Store & analyze data for:<br />
- Customer loyalty<br />
- Buy<strong>in</strong>g behavior<br />
- Market<strong>in</strong>g effectiveness<br />
- SKU level analysis<br />
Store & analyze transaction,<br />
market & customer data for:<br />
- Risk management &<br />
compliance<br />
- Quantitative analysis of<br />
f<strong>in</strong>ancial models<br />
- Claims data analysis<br />
- Fraud detection<br />
- Credit rat<strong>in</strong>g<br />
- Market<strong>in</strong>g effectiveness<br />
Store & analyze data for:<br />
- Weblog data<br />
- Onl<strong>in</strong>e behavior<br />
- Buy<strong>in</strong>g behavior<br />
- Market<strong>in</strong>g effectiveness<br />
Healthcare & Biotech Transportation Manufactur<strong>in</strong>g Government<br />
Store & analyze data for:<br />
- Patient data records<br />
- Cl<strong>in</strong>ical data analysis<br />
- Drug discovery &<br />
development analysis<br />
Store & analyze data for:<br />
- Passenger traffic data<br />
- Customer behavior<br />
- Customer loyalty<br />
- Market<strong>in</strong>g effectiveness<br />
Store & analyze data for:<br />
- Supply cha<strong>in</strong><br />
- Product quality<br />
- Strategic procurement<br />
Store & analyze data for:<br />
- Fraud detection<br />
- Cyber security<br />
- Immigration control
VectorWise – Illustrative Real-world Use<br />
Cases<br />
o F<strong>in</strong>ancial Services<br />
� Hedge fund - Risk management <strong>in</strong> position analysis<br />
� Bank - Risk management & compliance report<strong>in</strong>g, Interactive <strong>BI</strong>/Analytics Platform<br />
o Telecom<br />
� 3G operator - CDR analysis for better customer <strong>in</strong>sight and cross/upsell<br />
� BSS solutions provider - Telecom analytics<br />
o Web 2.0/Social Media<br />
� Social media portal – Analyz<strong>in</strong>g user traffic analysis for better targeted advertis<strong>in</strong>g<br />
� Freight exchange – Customer behavior analytics<br />
o Retail<br />
� Data aggregator – Customer and <strong>in</strong>fomercial analytics<br />
� Solution provider – Retail analytics<br />
o Energy<br />
o Govt.<br />
� Services provider to Utilities – Cloud-based smart meter<strong>in</strong>g solution<br />
� Tax authority – Tax compliance analysis<br />
26
More Information<br />
VectorWise L<strong>in</strong>kedIn User Group<br />
www.<strong>in</strong>gres.com/products/vectorwise<br />
Confidential © 2011 Actian Corporation
Confidential © 2011 Actian Corporation