02.08.2012 Views

Solving Performance Issues in BI through Purpose-built Analytical ...

Solving Performance Issues in BI through Purpose-built Analytical ...

Solving Performance Issues in BI through Purpose-built Analytical ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Solv<strong>in</strong>g</strong> <strong>Performance</strong> <strong>Issues</strong> <strong>in</strong> <strong>BI</strong> <strong>through</strong><br />

<strong>Purpose</strong>-<strong>built</strong> <strong>Analytical</strong> Databases<br />

Silicon India <strong>BI</strong> Conference, Sept 24, 2011, Hyderabad<br />

Vivek Bhatnagar<br />

Actian Corporation<br />

1 of 9<br />

1 of 9<br />

Confidential © 2011 Actian Corporation


Agenda<br />

� Today’s Biggest Challenge <strong>in</strong> <strong>BI</strong> - <strong>Performance</strong><br />

� Common Approaches Used For Database<br />

<strong>Performance</strong><br />

� New Break<strong>through</strong>: On-Chip Comput<strong>in</strong>g/ Columnar<br />

Design<br />

� VectorWise – A Puropose-Built <strong>Analytical</strong> Database<br />

� Illustrative Use Cases


Today’s Challenges <strong>in</strong> <strong>BI</strong><br />

Make <strong>in</strong>formed decisions faster<br />

Analysis <strong>in</strong> seconds not m<strong>in</strong>utes, m<strong>in</strong>utes not hours<br />

Data explosion<br />

Collect<strong>in</strong>g more <strong>in</strong>formation, less resources<br />

>44x growth <strong>in</strong> next 10 years<br />

Exist<strong>in</strong>g Tools:<br />

Too Slow. Too complex. Too expensive.<br />

<strong>Analytical</strong> databases designed <strong>in</strong> 80s & 90s do not take<br />

advantage of today’s modern hardware


Biggest <strong>BI</strong> Challenge - SPEED<br />

What Problem will eventually drive you to replace your<br />

current data warehouse platform?<br />

1. Poor Query Response<br />

2. Can’t Support Advanced Analytics<br />

40%<br />

45%<br />

Source: TDWI Q4 2009 Best Practices Report<br />

“Gartner clients <strong>in</strong>creas<strong>in</strong>gly report performance<br />

constra<strong>in</strong>ed data warehouses dur<strong>in</strong>g <strong>in</strong>quires. Based<br />

on these <strong>in</strong>quiries, we estimate that nearly 70% of data<br />

warehouses experience performance-constra<strong>in</strong>ed<br />

issues of various types.” Source: Gartner Magic Quadrant for Data Warehouse Database<br />

Management Systems, Jan 2010


Biggest <strong>BI</strong> Challenge - SPEED<br />

Source: 2010 <strong>BI</strong> Survey 9 – World’s largest <strong>in</strong>dependent <strong>BI</strong> Survey 3093 respondents


Cubes and Speed<br />

As cubes grow <strong>in</strong> size they take longer to load and build<br />

• Process<strong>in</strong>g time might exceed batch w<strong>in</strong>dow<br />

• Difficulty manag<strong>in</strong>g large cubes<br />

• Time required to add new dimensions<br />

Source: 2010 TDWI Benchmarks


Relational Databases and Speed<br />

Limitations <strong>in</strong> SQL technology<br />

• Adhoc queries too slow<br />

• Index<strong>in</strong>g/Aggregations cost time & money<br />

• 25% average <strong>BI</strong>/DW team time used up for ma<strong>in</strong>tenance/change<br />

management<br />

Source: 2010 TDWI <strong>BI</strong> Benchmark Report


Challenges with Current State of <strong>BI</strong><br />

Bus<strong>in</strong>ess Wants<br />

Faster Answers<br />

Database<br />

Too Slow<br />

Complex, Risky<br />

& Expensive<br />

More<br />

Hardware<br />

OLAP<br />

Cubes<br />

Application<br />

Traditional Database<br />

Indices<br />

Rollups<br />

Hardware<br />

Star<br />

Schemas<br />

Tun<strong>in</strong>g<br />

Data Growth<br />

Exponential<br />

<strong>Analytical</strong><br />

Bottleneck<br />

DB only use<br />

a fraction of<br />

CPU<br />

Capability


Approaches Used for Achiev<strong>in</strong>g Database<br />

<strong>Performance</strong><br />

Parallel Process<strong>in</strong>g<br />

Optimizations for parallel process<strong>in</strong>g and m<strong>in</strong>imal data retrieval<br />

MPP<br />

Column<br />

Disk<br />

Column store with compression Proprietary hardware<br />

Decompress<br />

RAM<br />

Buffer<br />

Manager<br />

Data Warehouse Appliances<br />

Acceptable performance has been achieved by us<strong>in</strong>g more hardware<br />

or by <strong>in</strong>telligently lower<strong>in</strong>g the volume of data to be processed<br />

However, none of these approaches leverages the performance features of<br />

today’s CPUs i.e. tak<strong>in</strong>g the most out of each modern commodity CPU


New Break<strong>through</strong><br />

VectorWise <strong>Analytical</strong> Database<br />

<strong>Purpose</strong>-Built <strong>Analytical</strong><br />

Relational database for <strong>BI</strong><br />

and data analysis<br />

Runs blaz<strong>in</strong>g<br />

fast/<strong>in</strong>teractive data<br />

analysis<br />

Exploits performance<br />

potential <strong>in</strong> today’s CPUs<br />

Delivers <strong>in</strong>-memory<br />

performance without be<strong>in</strong>g<br />

memory constra<strong>in</strong>t<br />

“Game-chang<strong>in</strong>g technology.”<br />

Don Fe<strong>in</strong>berg, Gartner Group<br />

“This is def<strong>in</strong>itely a break<strong>through</strong>. It delivers<br />

faster results at lower costs.”<br />

Noel Yuhanna, Forrester Research<br />

“This <strong>in</strong>evitability puts VectorWise 4 years ahead<br />

of the competition <strong>in</strong> terms of performance – and it<br />

will rema<strong>in</strong> 4 years ahead until some competitor<br />

f<strong>in</strong>ds a way to catch up at a software level. This is<br />

unprecedented.”<br />

Rob<strong>in</strong> Bloor, The Virtual Circle


VectorWise:<br />

On Chip Comput<strong>in</strong>g/Columnar Database<br />

Time / Cycles to Process<br />

2-20 150-250 Millions<br />

Break<strong>through</strong> technology<br />

Vector Process<strong>in</strong>g<br />

On Chip Comput<strong>in</strong>g<br />

DISK<br />

40-400MB<br />

RAM<br />

2-3GB<br />

Data Processed<br />

CHIP<br />

10GB<br />

Innovations on proven techniques<br />

Updateable Column Store<br />

Automatic Compression<br />

Automatic Storage Indexes<br />

M<strong>in</strong>imize IO<br />

Parallel Process<strong>in</strong>g


VectorWise Technology<br />

� Vector process<strong>in</strong>g<br />

– Exploits super-scalar features us<strong>in</strong>g<br />

SIMD capabilities of today’s CPUs<br />

� Optimizes memory hierarchy<br />

– Maximizes use of CPU cache<br />

– Fewer requests to RAM and disk<br />

� Data Compression/De-Compression<br />

– Optimized compression enabl<strong>in</strong>g very<br />

fast de-compression for overall<br />

performance enhancement<br />

– Vectorized de-compression<br />

– Automatic compression <strong>through</strong> ultraefficient<br />

algorithms<br />

� Automatic Index<strong>in</strong>g<br />

– System generated Storage<br />

Indexes<br />

– Easy identification of candidate<br />

data blocks for queries<br />

� Integration<br />

– Standard SQL and <strong>in</strong>terfaces<br />

– Common <strong>BI</strong>/Data Integration<br />

tools


Modern CPU Instruction Capabilities<br />

� SIMD<br />

• Traditional CPU process<strong>in</strong>g: S<strong>in</strong>gle Instruction, S<strong>in</strong>gle<br />

Data (SISD)<br />

• Modern CPU process<strong>in</strong>g capabilities: S<strong>in</strong>gle Instruction,<br />

Multiple Data (SIMD)<br />

� Out-of-order execution<br />

� Chip multi-thread<strong>in</strong>g<br />

� Large L2/L3 caches<br />

� Stream<strong>in</strong>g SIMD Extensions for efficient<br />

SIMD process<strong>in</strong>g<br />

� Hardware accelerated Str<strong>in</strong>g Process<strong>in</strong>g


Vector Process<strong>in</strong>g<br />

One operation<br />

performed on one<br />

element at a time<br />

Large overheads<br />

Traditional Scalar<br />

Process<strong>in</strong>g<br />

1 x 1 = 1<br />

2 x 2 = 4<br />

3 x 3 = 9<br />

4 x 4 = 16<br />

5 x 5 = 25<br />

6 x 6 = 36<br />

7 x 7 = 49<br />

8 x 8 = 64<br />

.<br />

.<br />

.<br />

n x n = n 2<br />

Many<br />

V’s<br />

1<br />

Vector Process<strong>in</strong>g<br />

1 x 1<br />

2 x 2<br />

3 x 3<br />

4 x 4<br />

5 x 5<br />

6 x 6<br />

7 x 7<br />

8 x 8<br />

.<br />

.<br />

.<br />

n x n<br />

=<br />

1<br />

4<br />

9<br />

16<br />

25<br />

36<br />

49<br />

64<br />

.<br />

.<br />

.<br />

n 2<br />

One operation<br />

performed on a set<br />

of data at a time<br />

No overheads<br />

Process even 1.5GB<br />

per second


Process<strong>in</strong>g <strong>in</strong> Chip Cache<br />

Time / Cycles to Process<br />

2-20 150-250 Millions<br />

Us<strong>in</strong>g CPU cache is far more faster & efficient<br />

DISK<br />

40-100MB<br />

RAM<br />

2-3GB<br />

Data Processed<br />

GB/s Measure of Throughput<br />

Cycles Amount of CPU time required to<br />

process data<br />

CHIP<br />

10GB


Updateable Column Store<br />

� Only access relevant data<br />

� Enable <strong>in</strong>cremental updates efficiently<br />

• Traditionally a weakness for column-based stores<br />

Cust_Num Cust_surna<br />

me<br />

Cust_first_nam<br />

e<br />

Cust_mid_nam<br />

e<br />

Cust_DOB Cust_Sex Cust_Add_1 Cust_Addr_2 Cust_City Cust_State<br />

46328927956 Jones Steven Sean 17-JAN-1971 M 333 StKilda Rd Melbourne Vic<br />

98679975745 Smith Leonard Patrick 04-APR-1964 M Unit 12, 147<br />

Trafalgar Sqr<br />

52634346735 Rogers C<strong>in</strong>dy Carm<strong>in</strong>e 11-MAR-1980 F Belmont Rail<br />

Service<br />

346737347347 Andrews Jenny 14-SEP-1977 F Apt1, 117 West<br />

42 nd St<br />

88673477347 Cooper Sheldon Michael 30-JUN-1980 M Ingres<br />

Corporation<br />

34673447568 Kollwitz Rolf 22-DEC-1975 M IBM<br />

Headquarters<br />

99554443044 Wong Penny Lee 13-NOV-1981 F M<strong>in</strong>g On Tower<br />

1<br />

Birm<strong>in</strong>gham London<br />

421 Station St Belmont CA<br />

Level 2, 426<br />

Argello St<br />

123 Mount<br />

View Crs<br />

1777 Moa Tzu<br />

Tung Rd<br />

New York NY<br />

Redwood City CA<br />

Atlantic City PN<br />

M<strong>in</strong>g Now<br />

Prov<strong>in</strong>ce<br />

Shanghi


Optimized Compression & Fast De-Compression<br />

� Column-based compression with multiple algorithms<br />

• Automatically determ<strong>in</strong>ed by VectorWise<br />

� Vectorised decompression<br />

• Only for data process<strong>in</strong>g <strong>in</strong> CPU cache<br />

Column<br />

Disk<br />

Maximize I/O<br />

<strong>through</strong>put to chip<br />

RAM<br />

Column<br />

Buffer<br />

Manager<br />

Elim<strong>in</strong>ate slow<br />

round trip to RAM<br />

Decompress and<br />

store <strong>in</strong> chip cache<br />

Cache<br />

Decompress<br />

CPU


Storage Index<br />

� Always automatically created<br />

� Automatically ma<strong>in</strong>ta<strong>in</strong>ed<br />

� Stores m<strong>in</strong>/max value per data block<br />

� Enables database to efficiently identify candidate data blocks


VectorWise Features<br />

<strong>Performance</strong><br />

•10x-75x faster for <strong>BI</strong>,<br />

analytics & report<strong>in</strong>g<br />

• In-memory performance<br />

without memory restra<strong>in</strong>ts<br />

• Near real-time updatable<br />

database<br />

• Delivers results <strong>in</strong><br />

seconds not m<strong>in</strong>utes<br />

m<strong>in</strong>utes not hours<br />

Usage & Integration<br />

•Uses ANSI standard<br />

queries & SQL statements<br />

•Elim<strong>in</strong>ate/reduce Cubes,<br />

aggregate tables, roll ups,<br />

<strong>in</strong>dexes….<br />

•Self <strong>in</strong>dex<strong>in</strong>g & self tun<strong>in</strong>g<br />

database<br />

•Deliver <strong>BI</strong> projects faster<br />

with lower cost & risk<br />

TCO<br />

• Maximize utilization of<br />

CPUs <strong>in</strong> low cost<br />

commodity hardware<br />

• Handle tens of terabytes<br />

scale data with a s<strong>in</strong>gle<br />

server<br />

• Requires commodity<br />

hardware<br />

• Does not require MPP


VectorWise<br />

TPC-H <strong>Performance</strong> Benchmark


TPC Benchmark Results – 1TB<br />

2.5x <strong>Performance</strong><br />

<strong>Performance</strong><br />

(QphH@1TB)<br />

500,000<br />

400,000<br />

300,000<br />

200,000<br />

100,000<br />

0<br />

Price/<strong>Performance</strong><br />

US$/QphH@1TB<br />

Oracle<br />

29 April 2009<br />

123,323<br />

$20.54 US<br />

Oracle<br />

26 April 2010<br />

140,181<br />

$12.15 US<br />

Sybase IQ<br />

15 Dec 2010<br />

164,747<br />

$6.85 US<br />

Top 5 Non-Clustered Database System<br />

TPC-H 1TB Scale Factor<br />

Microsoft<br />

5 Apr 2011<br />

80 cores<br />

2 TB RAM<br />

173,961<br />

$1.37 US<br />

VectorWise<br />

3 May 2011<br />

Source: www.tpc.org / May 23, 2011<br />

TPC, TPC Benchmark, TPC-H, QppH, QthH and QphH are trademarks of the Transaction Process<strong>in</strong>g <strong>Performance</strong> Council (TPC)<br />

32 cores<br />

1 TB RAM<br />

436,788<br />

$0.88 US


VectorWise<br />

<strong>BI</strong> Tun<strong>in</strong>g & Complexity<br />

Traditional<br />

Data Mart<br />

Project<br />

VectorWise<br />

Project<br />

Requirements<br />

Assessment<br />

Requirements<br />

Assessment<br />

Design<br />

Data<br />

Schema<br />

Design<br />

Data<br />

Schema<br />

Load<br />

Data<br />

Load<br />

Data<br />

Build<br />

Reports<br />

UAT<br />

2010 TDWI <strong>BI</strong> Benchmark Report<br />

Average time to build a complex report or dashboard<br />

(20 dimensions, 12 measures, and 6 user access roles)<br />

Build<br />

Aggregates,<br />

OLAP Cubes<br />

Build<br />

Reports<br />

UAT<br />

Time to Deploy Project<br />

2008 6.7 weeks<br />

2009 6.3 weeks<br />

2010 6.6 weeks


VectorWise<br />

<strong>BI</strong> Tun<strong>in</strong>g & Complexity<br />

Traditional<br />

Architecture<br />

VectorWise<br />

Architecture<br />

End<br />

of<br />

day<br />

End<br />

of<br />

day<br />

Fast Process<strong>in</strong>g Everyday!!!<br />

Load<br />

Data<br />

Warehouse<br />

Load<br />

Data<br />

Warehouse<br />

Build<br />

Reports<br />

Build<br />

Aggregates,<br />

OLAP Cubes<br />

Build<br />

Reports<br />

Time to Deploy Project<br />

Rega<strong>in</strong>ed Time


VectorWise<br />

TPC-H Price/<strong>Performance</strong> Benchmark<br />

TCO<br />

• Spend less on <strong>in</strong>frastructure<br />

• Spend less on <strong>BI</strong> tun<strong>in</strong>g


<strong>Analytical</strong> Databases - Illustrative Use Cases<br />

Telcos/VAS Retail FSI Web 2.0<br />

Store & analyze CDR, VAS<br />

downloads & other<br />

subscriber/network data for:<br />

- Revenue assurance<br />

- Price optimization<br />

- Customer loyalty/churn<br />

- Market<strong>in</strong>g effectiveness<br />

- Service level effectiveness<br />

- Network performance<br />

Store & analyze data for:<br />

- Customer loyalty<br />

- Buy<strong>in</strong>g behavior<br />

- Market<strong>in</strong>g effectiveness<br />

- SKU level analysis<br />

Store & analyze transaction,<br />

market & customer data for:<br />

- Risk management &<br />

compliance<br />

- Quantitative analysis of<br />

f<strong>in</strong>ancial models<br />

- Claims data analysis<br />

- Fraud detection<br />

- Credit rat<strong>in</strong>g<br />

- Market<strong>in</strong>g effectiveness<br />

Store & analyze data for:<br />

- Weblog data<br />

- Onl<strong>in</strong>e behavior<br />

- Buy<strong>in</strong>g behavior<br />

- Market<strong>in</strong>g effectiveness<br />

Healthcare & Biotech Transportation Manufactur<strong>in</strong>g Government<br />

Store & analyze data for:<br />

- Patient data records<br />

- Cl<strong>in</strong>ical data analysis<br />

- Drug discovery &<br />

development analysis<br />

Store & analyze data for:<br />

- Passenger traffic data<br />

- Customer behavior<br />

- Customer loyalty<br />

- Market<strong>in</strong>g effectiveness<br />

Store & analyze data for:<br />

- Supply cha<strong>in</strong><br />

- Product quality<br />

- Strategic procurement<br />

Store & analyze data for:<br />

- Fraud detection<br />

- Cyber security<br />

- Immigration control


VectorWise – Illustrative Real-world Use<br />

Cases<br />

o F<strong>in</strong>ancial Services<br />

� Hedge fund - Risk management <strong>in</strong> position analysis<br />

� Bank - Risk management & compliance report<strong>in</strong>g, Interactive <strong>BI</strong>/Analytics Platform<br />

o Telecom<br />

� 3G operator - CDR analysis for better customer <strong>in</strong>sight and cross/upsell<br />

� BSS solutions provider - Telecom analytics<br />

o Web 2.0/Social Media<br />

� Social media portal – Analyz<strong>in</strong>g user traffic analysis for better targeted advertis<strong>in</strong>g<br />

� Freight exchange – Customer behavior analytics<br />

o Retail<br />

� Data aggregator – Customer and <strong>in</strong>fomercial analytics<br />

� Solution provider – Retail analytics<br />

o Energy<br />

o Govt.<br />

� Services provider to Utilities – Cloud-based smart meter<strong>in</strong>g solution<br />

� Tax authority – Tax compliance analysis<br />

26


More Information<br />

VectorWise L<strong>in</strong>kedIn User Group<br />

www.<strong>in</strong>gres.com/products/vectorwise<br />

Confidential © 2011 Actian Corporation


Confidential © 2011 Actian Corporation

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!