REX
BD16_REX
BD16_REX
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Retours d’expériences Big Data en entreprise<br />
MAPR - COMSCORE<br />
COMSCORE RELIABLY PROCESSES OVER 1.7 TRILLION INTERNET &<br />
MOBILE EVENTS EVERY MONTH ON MAPR<br />
THE BUSINESS<br />
comScore is a global leader in digital media analytics and the preferred source of digital marketing intelligence.<br />
comScore provides syndicated and custom solutions in online audience measurement, e-commerce, advertis- ing,<br />
search, video and mobile. Advertising agencies, publishers, marketers and financial analysts rely on comScore<br />
for the industry-leading solutions needed to craft successful digital, marketing, sales, product development and<br />
trading strategies.<br />
c omScore ingests over 20 terabytes of new data on a daily basis. In order to keep up with this data, comScore<br />
uses Hadoop to process over 1.7 trillion Internet and mobile events every month. The Hadoop jobs are run every<br />
hour, day, week, month and quarter, and once they’re done, data is nor- malized against the comScore URL data<br />
dictionary and then batch loaded into a relational database for analysis and reporting. comScore clients and analysts<br />
generate reports from this data; these reports enable comScore clients to gain behavioral insights into their<br />
mobile and online customer base.<br />
HADOOP REQUIREMENTS<br />
The comScore engineering team processes a wide variety of Hadoop work- loads and requires a Hadoop distribution<br />
that excels across multiple areas:<br />
Performance : As comScore continues to expand, the Hadoop cluster needs to maintain performance integrity,<br />
deliver insights faster, and also needs to produce more with less to minimize costs.<br />
Availability : comScore needs a Hadoop platform that provides data protection and high availability as the cluster<br />
grows in size.<br />
Scalability : comScore’s Hadoop cluster has grown to process over 1.7 trillion events a month from across the<br />
world, in the past comScore has seen increases of over 100 billion events on a month over month basis. Consequently,<br />
comScore needs a Hadoop platform that will enable them to maintain per- formance, ease of use and<br />
business continuity as they continue to scale.<br />
Ease of Use : comScore needs things to just work, and operating the cluster at scale needs to be easy and intuitive.<br />
BENEFITS<br />
MapR has been in continuous use at comScore for over two years. MapR has demonstrated superior performance,<br />
availability, scalability, ease of use, and significant cost savings over other distributions.<br />
Performance : Across various benchmarks, MapR executes jobs 3 - 5 times faster when compared to other Hadoop<br />
distri-butions and requires substantially less hardware than other distributions.<br />
Availability : MapR protects against cluster failures and data loss with its distributed NameNode and JobTracker<br />
HA. Rolling upgrades are also now possible with MapR.<br />
Scalability<br />
With architectural changes made possible by it’s no NameNode archi- tecture, MapR creates more files faster, processes<br />
more data faster, and produces better streaming and random I/O results than other distributions. comScore<br />
now runs more than 20,000 jobs each day on its production MapR cluster.<br />
Ease of Use : comScore’s Vice President of Engineering, Will Duckworth said, “With MapR, things that should just<br />
work, just work.” This means there is a lot less for comScore to manage with MapR. One of the advantages that<br />
Duckworth cites is that everything is a data node. This configuration re- sults in much better hardware utilization<br />
from his perspective. With MapR, it is easy to install, manage, and get data in and out of the cluster.<br />
Speed : comScore is also able to use the MapR advanced capabilities to enforce parallel data allocation patterns.<br />
This enables key analyses to be performed using map-side merge-joins that have guaranteed data locality, resulting<br />
in a 10x increase in computation speed. “The specific features of MapR, such as volumes, mirroring and snapshots,<br />
have allowed us to iterate much faster,” said Michael Brown, CTO of comScore.<br />
ABOUT MAPR<br />
MapR delivers on the promise of Hadoop with a proven, enterprise-grade platform that supports a broad set of<br />
mission-critical and real-time production uses. MapR brings unprecedented dependability, ease-of-use and worldrecord<br />
speed to Hadoop, NoSQL, database and streaming applications in one unified big data platform.<br />
MapR is used by more than 500 customers across financial services, retail, media, healthcare, manufacturing,<br />
telecommunications and government organizations as well as by leading Fortune 100 and Web 2.0 companies.<br />
Amazon, Cisco, Google and HP are part of the broad MapR partner ecosystem. Investors include Lightspeed Venture<br />
Partners, Mayfield Fund, NEA, and Redpoint Ventures. MapR is based in San Jose, CA.<br />
Connect with MapR on Facebook, LinkedIn, and Twitter.<br />
Document réalisé par la Société Corp Events - Janvier 2015<br />
44