24.11.2015 Views

REX

BD16_REX

BD16_REX

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Retours d’expériences Big Data en entreprise<br />

MAPR - COMSCORE<br />

COMSCORE RELIABLY PROCESSES OVER 1.7 TRILLION INTERNET &<br />

MOBILE EVENTS EVERY MONTH ON MAPR<br />

THE BUSINESS<br />

comScore is a global leader in digital media analytics and the preferred source of digital marketing intelligence.<br />

comScore provides syndicated and custom solutions in online audience measurement, e-commerce, advertis- ing,<br />

search, video and mobile. Advertising agencies, publishers, marketers and financial analysts rely on comScore<br />

for the industry-leading solutions needed to craft successful digital, marketing, sales, product development and<br />

trading strategies.<br />

c omScore ingests over 20 terabytes of new data on a daily basis. In order to keep up with this data, comScore<br />

uses Hadoop to process over 1.7 trillion Internet and mobile events every month. The Hadoop jobs are run every<br />

hour, day, week, month and quarter, and once they’re done, data is nor- malized against the comScore URL data<br />

dictionary and then batch loaded into a relational database for analysis and reporting. comScore clients and analysts<br />

generate reports from this data; these reports enable comScore clients to gain behavioral insights into their<br />

mobile and online customer base.<br />

HADOOP REQUIREMENTS<br />

The comScore engineering team processes a wide variety of Hadoop work- loads and requires a Hadoop distribution<br />

that excels across multiple areas:<br />

Performance : As comScore continues to expand, the Hadoop cluster needs to maintain performance integrity,<br />

deliver insights faster, and also needs to produce more with less to minimize costs.<br />

Availability : comScore needs a Hadoop platform that provides data protection and high availability as the cluster<br />

grows in size.<br />

Scalability : comScore’s Hadoop cluster has grown to process over 1.7 trillion events a month from across the<br />

world, in the past comScore has seen increases of over 100 billion events on a month over month basis. Consequently,<br />

comScore needs a Hadoop platform that will enable them to maintain per- formance, ease of use and<br />

business continuity as they continue to scale.<br />

Ease of Use : comScore needs things to just work, and operating the cluster at scale needs to be easy and intuitive.<br />

BENEFITS<br />

MapR has been in continuous use at comScore for over two years. MapR has demonstrated superior performance,<br />

availability, scalability, ease of use, and significant cost savings over other distributions.<br />

Performance : Across various benchmarks, MapR executes jobs 3 - 5 times faster when compared to other Hadoop<br />

distri-butions and requires substantially less hardware than other distributions.<br />

Availability : MapR protects against cluster failures and data loss with its distributed NameNode and JobTracker<br />

HA. Rolling upgrades are also now possible with MapR.<br />

Scalability<br />

With architectural changes made possible by it’s no NameNode archi- tecture, MapR creates more files faster, processes<br />

more data faster, and produces better streaming and random I/O results than other distributions. comScore<br />

now runs more than 20,000 jobs each day on its production MapR cluster.<br />

Ease of Use : comScore’s Vice President of Engineering, Will Duckworth said, “With MapR, things that should just<br />

work, just work.” This means there is a lot less for comScore to manage with MapR. One of the advantages that<br />

Duckworth cites is that everything is a data node. This configuration re- sults in much better hardware utilization<br />

from his perspective. With MapR, it is easy to install, manage, and get data in and out of the cluster.<br />

Speed : comScore is also able to use the MapR advanced capabilities to enforce parallel data allocation patterns.<br />

This enables key analyses to be performed using map-side merge-joins that have guaranteed data locality, resulting<br />

in a 10x increase in computation speed. “The specific features of MapR, such as volumes, mirroring and snapshots,<br />

have allowed us to iterate much faster,” said Michael Brown, CTO of comScore.<br />

ABOUT MAPR<br />

MapR delivers on the promise of Hadoop with a proven, enterprise-grade platform that supports a broad set of<br />

mission-critical and real-time production uses. MapR brings unprecedented dependability, ease-of-use and worldrecord<br />

speed to Hadoop, NoSQL, database and streaming applications in one unified big data platform.<br />

MapR is used by more than 500 customers across financial services, retail, media, healthcare, manufacturing,<br />

telecommunications and government organizations as well as by leading Fortune 100 and Web 2.0 companies.<br />

Amazon, Cisco, Google and HP are part of the broad MapR partner ecosystem. Investors include Lightspeed Venture<br />

Partners, Mayfield Fund, NEA, and Redpoint Ventures. MapR is based in San Jose, CA.<br />

Connect with MapR on Facebook, LinkedIn, and Twitter.<br />

Document réalisé par la Société Corp Events - Janvier 2015<br />

44

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!