13.04.2015 Views

CLOUDERA AND PERVASIVE - Cloudera Blog

CLOUDERA AND PERVASIVE - Cloudera Blog

CLOUDERA AND PERVASIVE - Cloudera Blog

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

PARTNER SOLUTION BRIEF<br />

<strong>CLOUDERA</strong> <strong>AND</strong> <strong>PERVASIVE</strong><br />

The Problem:<br />

Hadoop and the Hadoop Distributed File System (HDFS) provide a reliable and scalable<br />

solution on inexpensive commodity servers. As Hadoop-based solutions have migrated into<br />

production, the very real requirement to increase performance and reduce operational cost is<br />

beginning to surface.<br />

Hive was created to allow Hadoop users to express queries in a familiar, high-level language<br />

without having to directly write MapReduce programs. Hive and Hadoop continue to excel in<br />

scalability, handling thousands of nodes and petabytes of data.<br />

However, there still exists the challenge of improving performance and operational efficiency.<br />

Pervasive DataRush delivers a fast and easy solution that allows you to extract considerably<br />

greater performance out of your current cluster, reduce the number of machines in your cluster,<br />

and increase the workload capacity and run more queries or perform deeper analysis.<br />

The Solution:<br />

Pervasive DataRush is a leader in delivering increased performance with less hardware.<br />

Pervasive DataRush and Hadoop<br />

Adding Pervasive DataRush to Hadoop can boost performance by more than 4x. What you do<br />

with that level of performance is up to you.<br />

• Run your Hadoop jobs on 1⁄4 of the hardware, which is both cheaper and greener<br />

• Run more jobs and perform deeper analysis<br />

Pervasive DataRush integrates with Hadoop in several ways, depending on your needs. If you<br />

already have an established Hadoop cluster and are using MapReduce, you can call Pervasive<br />

DataRush from your Map and Reduce functions, allowing each mapper and reducer to fully use<br />

the parallelism of multicore processors. You could see a 4x improvement in performance.<br />

You can also integrate Pervasive DataRush with the data in HDFS, as an alternative to using<br />

the MapReduce model. The operators and convenience methods provided by the Pervasive<br />

DataRush-Hadoop library are optimized for HDFS. They take full advantage of Pervasive<br />

DataRush parallelism. Using this method, you could see up to 13x improvement in<br />

performance.<br />

Pervasive TurboRush for Hive<br />

If you use Hive to interface to Hadoop, simply add Pervasive TurboRush for Hive, and your<br />

Hive queries will run faster <strong>AND</strong> require fewer resources.<br />

• No learning curve – just drop it in<br />

• Preserve your HiveQL development investment<br />

In normal operation, when Apache Hive receives an SQL query, it parses the query, creates a<br />

query execution plan as a series of MapReduce tasks, and runs the plan. Pervasive TurboRush<br />

allows Hive to generate an execution plan using efficient dataflow graphs as an alternative to<br />

MapReduce. It then executes these graphs using Pervasive DataRush distributed across the<br />

machines of the cluster. Once installed, it is completely transparent to the user and can result in<br />

substantial performance gains.<br />

Pervasive Software<br />

Industries:<br />

• Hadoop Performance and Ease of<br />

Use<br />

Website:<br />

http:// www.pervasivedatarush.com<br />

Company Overview<br />

Pervasive Software (NASDAQ:<br />

PVSW) helps companies get the<br />

most out of their data investments<br />

through agile, embeddable software<br />

including on-premises and cloudbased<br />

services for data<br />

management, data integration, B2B<br />

exchange and analytics.<br />

Product Overview<br />

Pervasive DataRush is a leader in<br />

delivering increased performance<br />

with less hardware. Pervasive<br />

DataRush for Hadoop, available in<br />

both Community and Enterprise<br />

Editions, enables you to boost the<br />

performance of your Hadoop<br />

application by more than 4x. For<br />

Hive applications, simply add<br />

TurboRush for Hive and run the<br />

same queries faster on half of the<br />

machines. Pervasive DataRush<br />

products are easy to integrate into<br />

your existing Hadoop solution and<br />

deliver exceptional performance.<br />

Add Pervasive DataRush and<br />

turbocharge your CDH application.


PARTNER SOLUTION BRIEF<br />

About <strong>Cloudera</strong>’s Distribution Including Apache Hadoop<br />

Available for free download at www.cloudera.com/downloads, CDH delivers a streamlined path for putting Apache Hadoop to work<br />

solving business problems in production. Ideal for enterprises seeking a stable, tested Hadoop solution without proprietary vendor<br />

lock-in, CDH is the bridge between the insights of organizations using Hadoop in production and the continuous stream of innovations<br />

from the Apache community.<br />

About <strong>Cloudera</strong><br />

<strong>Cloudera</strong> is the leading provider of Apache Hadoop-based software<br />

and services and works with customers in financial services, web,<br />

telecommunications, government and other industries. The<br />

company's products, <strong>Cloudera</strong> Enterprise and <strong>Cloudera</strong>'s<br />

Distribution including Apache Hadoop, help organizations profit<br />

from all of their information. <strong>Cloudera</strong>'s Distribution including<br />

Apache Hadoop is the most comprehensive Apache Hadoop-based<br />

platform in the industry. <strong>Cloudera</strong> Enterprise is the most costeffective<br />

way to perform large-scale data storage and analysis and<br />

includes the tools, platform and support necessary to use Apache<br />

Hadoop in a production environment. For more on <strong>Cloudera</strong>, please<br />

visit www.cloudera.com.<br />

About Pervasive<br />

Pervasive Software (NASDAQ: PVSW) helps companies get the<br />

most out of their data investments through agile, embeddable<br />

software including on-premises and cloud-based services for data<br />

management, data integration, B2B exchange and analytics.<br />

Pervasive DataRush is an embeddable high-performance<br />

software platform for Big Data analytics and preparation<br />

applications such as claims processing, risk analysis, fraud<br />

detection, data mining, predictive analytics, sales optimization and<br />

marketing analytics. Pervasive Innovation Labs invests in exploring<br />

and creating cutting-edge solutions for the toughest data analysis<br />

and data delivery challenges. For more than two decades,<br />

Pervasive products have delivered value to tens of thousands of<br />

customers worldwide.<br />

<strong>Cloudera</strong>, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-­‐888-­‐789-­‐1488 or 1-­‐650-­‐362-­‐0488 | cloudera.com <br />

©2011 <strong>Cloudera</strong>, Inc. All rights reserved. <strong>Cloudera</strong> and the <strong>Cloudera</strong> logo are trademarks or registered trademarks of <strong>Cloudera</strong> Inc. in the USA and other countries.<br />

All other trademarks are the property of their respective companies. Information is subject to change without notice.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!