CLOUDERA AND PERVASIVE - Cloudera Blog
CLOUDERA AND PERVASIVE - Cloudera Blog
CLOUDERA AND PERVASIVE - Cloudera Blog
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
PARTNER SOLUTION BRIEF<br />
<strong>CLOUDERA</strong> <strong>AND</strong> <strong>PERVASIVE</strong><br />
The Problem:<br />
Hadoop and the Hadoop Distributed File System (HDFS) provide a reliable and scalable<br />
solution on inexpensive commodity servers. As Hadoop-based solutions have migrated into<br />
production, the very real requirement to increase performance and reduce operational cost is<br />
beginning to surface.<br />
Hive was created to allow Hadoop users to express queries in a familiar, high-level language<br />
without having to directly write MapReduce programs. Hive and Hadoop continue to excel in<br />
scalability, handling thousands of nodes and petabytes of data.<br />
However, there still exists the challenge of improving performance and operational efficiency.<br />
Pervasive DataRush delivers a fast and easy solution that allows you to extract considerably<br />
greater performance out of your current cluster, reduce the number of machines in your cluster,<br />
and increase the workload capacity and run more queries or perform deeper analysis.<br />
The Solution:<br />
Pervasive DataRush is a leader in delivering increased performance with less hardware.<br />
Pervasive DataRush and Hadoop<br />
Adding Pervasive DataRush to Hadoop can boost performance by more than 4x. What you do<br />
with that level of performance is up to you.<br />
• Run your Hadoop jobs on 1⁄4 of the hardware, which is both cheaper and greener<br />
• Run more jobs and perform deeper analysis<br />
Pervasive DataRush integrates with Hadoop in several ways, depending on your needs. If you<br />
already have an established Hadoop cluster and are using MapReduce, you can call Pervasive<br />
DataRush from your Map and Reduce functions, allowing each mapper and reducer to fully use<br />
the parallelism of multicore processors. You could see a 4x improvement in performance.<br />
You can also integrate Pervasive DataRush with the data in HDFS, as an alternative to using<br />
the MapReduce model. The operators and convenience methods provided by the Pervasive<br />
DataRush-Hadoop library are optimized for HDFS. They take full advantage of Pervasive<br />
DataRush parallelism. Using this method, you could see up to 13x improvement in<br />
performance.<br />
Pervasive TurboRush for Hive<br />
If you use Hive to interface to Hadoop, simply add Pervasive TurboRush for Hive, and your<br />
Hive queries will run faster <strong>AND</strong> require fewer resources.<br />
• No learning curve – just drop it in<br />
• Preserve your HiveQL development investment<br />
In normal operation, when Apache Hive receives an SQL query, it parses the query, creates a<br />
query execution plan as a series of MapReduce tasks, and runs the plan. Pervasive TurboRush<br />
allows Hive to generate an execution plan using efficient dataflow graphs as an alternative to<br />
MapReduce. It then executes these graphs using Pervasive DataRush distributed across the<br />
machines of the cluster. Once installed, it is completely transparent to the user and can result in<br />
substantial performance gains.<br />
Pervasive Software<br />
Industries:<br />
• Hadoop Performance and Ease of<br />
Use<br />
Website:<br />
http:// www.pervasivedatarush.com<br />
Company Overview<br />
Pervasive Software (NASDAQ:<br />
PVSW) helps companies get the<br />
most out of their data investments<br />
through agile, embeddable software<br />
including on-premises and cloudbased<br />
services for data<br />
management, data integration, B2B<br />
exchange and analytics.<br />
Product Overview<br />
Pervasive DataRush is a leader in<br />
delivering increased performance<br />
with less hardware. Pervasive<br />
DataRush for Hadoop, available in<br />
both Community and Enterprise<br />
Editions, enables you to boost the<br />
performance of your Hadoop<br />
application by more than 4x. For<br />
Hive applications, simply add<br />
TurboRush for Hive and run the<br />
same queries faster on half of the<br />
machines. Pervasive DataRush<br />
products are easy to integrate into<br />
your existing Hadoop solution and<br />
deliver exceptional performance.<br />
Add Pervasive DataRush and<br />
turbocharge your CDH application.
PARTNER SOLUTION BRIEF<br />
About <strong>Cloudera</strong>’s Distribution Including Apache Hadoop<br />
Available for free download at www.cloudera.com/downloads, CDH delivers a streamlined path for putting Apache Hadoop to work<br />
solving business problems in production. Ideal for enterprises seeking a stable, tested Hadoop solution without proprietary vendor<br />
lock-in, CDH is the bridge between the insights of organizations using Hadoop in production and the continuous stream of innovations<br />
from the Apache community.<br />
About <strong>Cloudera</strong><br />
<strong>Cloudera</strong> is the leading provider of Apache Hadoop-based software<br />
and services and works with customers in financial services, web,<br />
telecommunications, government and other industries. The<br />
company's products, <strong>Cloudera</strong> Enterprise and <strong>Cloudera</strong>'s<br />
Distribution including Apache Hadoop, help organizations profit<br />
from all of their information. <strong>Cloudera</strong>'s Distribution including<br />
Apache Hadoop is the most comprehensive Apache Hadoop-based<br />
platform in the industry. <strong>Cloudera</strong> Enterprise is the most costeffective<br />
way to perform large-scale data storage and analysis and<br />
includes the tools, platform and support necessary to use Apache<br />
Hadoop in a production environment. For more on <strong>Cloudera</strong>, please<br />
visit www.cloudera.com.<br />
About Pervasive<br />
Pervasive Software (NASDAQ: PVSW) helps companies get the<br />
most out of their data investments through agile, embeddable<br />
software including on-premises and cloud-based services for data<br />
management, data integration, B2B exchange and analytics.<br />
Pervasive DataRush is an embeddable high-performance<br />
software platform for Big Data analytics and preparation<br />
applications such as claims processing, risk analysis, fraud<br />
detection, data mining, predictive analytics, sales optimization and<br />
marketing analytics. Pervasive Innovation Labs invests in exploring<br />
and creating cutting-edge solutions for the toughest data analysis<br />
and data delivery challenges. For more than two decades,<br />
Pervasive products have delivered value to tens of thousands of<br />
customers worldwide.<br />
<strong>Cloudera</strong>, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-‐888-‐789-‐1488 or 1-‐650-‐362-‐0488 | cloudera.com <br />
©2011 <strong>Cloudera</strong>, Inc. All rights reserved. <strong>Cloudera</strong> and the <strong>Cloudera</strong> logo are trademarks or registered trademarks of <strong>Cloudera</strong> Inc. in the USA and other countries.<br />
All other trademarks are the property of their respective companies. Information is subject to change without notice.