27.01.2014 Views

Analytics for Enterprise Class Hadoop and Streaming Data

Analytics for Enterprise Class Hadoop and Streaming Data

Analytics for Enterprise Class Hadoop and Streaming Data

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

xxviii About this Book<br />

world-class researchers, mathematicians, statisticians, <strong>and</strong> more: there’s lots<br />

of this caliber talent in the halls of IBM, many working on Big <strong>Data</strong> problems.<br />

Think Watson (famous <strong>for</strong> its winning Jeopardy! per<strong>for</strong>mance) as a proof<br />

point of what IBM is capable of providing. Of course, you’re going to want<br />

support <strong>for</strong> your Big <strong>Data</strong> plat<strong>for</strong>m, <strong>and</strong> who can provide direct-to-engineer<br />

support, around the world, in a 24×7 manner? What are you going to do with<br />

your Big <strong>Data</strong>? Analyze it! The lineage of IBM’s data analysis plat<strong>for</strong>ms (SPSS,<br />

Cognos, Smart <strong>Analytics</strong> Systems, Netezza, text annotators, speech-to-text,<br />

<strong>and</strong> so much more—IBM has spent over $14 billion in the last five years on<br />

analytic acquisitions alone) offer immense opportunity <strong>for</strong> year-after-year<br />

extensions to its Big <strong>Data</strong> plat<strong>for</strong>m.<br />

Of course we would be remiss not to mention how dedicated IBM is to the<br />

open source community in general. IBM has a rich heritage of supporting<br />

open source. Contributions such as the de facto st<strong>and</strong>ard integrated development<br />

environment (IDE) used in open source—Eclipse, Unstructured In<strong>for</strong>mation<br />

Management Architecture (UIMA), Apache Derby, Lucene, XQuery,<br />

SQL, <strong>and</strong> Xerces XML processor—are but a few of the too many to mention.<br />

We want to make one thing very clear—IBM is committed to <strong>Hadoop</strong> open<br />

source. In fact, Jaql (you will learn about this in Chapter 4) was donated to<br />

the open source <strong>Hadoop</strong> community by IBM. Moreover, IBM is continually<br />

working on additional technologies <strong>for</strong> potential <strong>Hadoop</strong>-related donations.<br />

Our development labs have <strong>Hadoop</strong> committers that work alongside other<br />

<strong>Hadoop</strong> committers from Facebook, LinkedIn, <strong>and</strong> more. Finally, you are<br />

likely to find one of our developers on any <strong>Hadoop</strong> <strong>for</strong>um. We believe IBM’s<br />

commitment to open source <strong>Hadoop</strong>, combined with its vast intellectual<br />

property <strong>and</strong> research around enterprise needs <strong>and</strong> analytics, delivers a true<br />

Big <strong>Data</strong> plat<strong>for</strong>m.<br />

Part II—Big <strong>Data</strong>: From the Technology Perspective starts by giving you<br />

some basics about Big <strong>Data</strong> open source technologies in Chapter 4. This chapter<br />

lays the “ground floor” with respect to open source technologies that are<br />

synonymous with Big <strong>Data</strong>—the most common being <strong>Hadoop</strong> (an Apache toplevel<br />

project whose execution engine is behind the Big <strong>Data</strong> movement).<br />

You’re not going to be a <strong>Hadoop</strong> expert after reading this chapter, but you’re<br />

going to have a basis <strong>for</strong> underst<strong>and</strong>ing such terms as Pig, Hive, HDFS,<br />

MapReduce, <strong>and</strong> ZooKeeper, among others.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!