Analytics for Enterprise Class Hadoop and Streaming Data

More documents

Recommendations

Info

xxviii About this Book world-class researchers, mathematicians, statisticians, and more: there’s lots of this caliber talent in the halls of IBM, many working on Big Data problems. Think Watson (famous for its winning Jeopardy! performance) as a proof point of what IBM is capable of providing. Of course, you’re going to want support for your Big Data platform, and who can provide direct-to-engineer support, around the world, in a 24×7 manner? What are you going to do with your Big Data? Analyze it! The lineage of IBM’s data analysis platforms (SPSS, Cognos, Smart Analytics Systems, Netezza, text annotators, speech-to-text, and so much more—IBM has spent over $14 billion in the last five years on analytic acquisitions alone) offer immense opportunity for year-after-year extensions to its Big Data platform. Of course we would be remiss not to mention how dedicated IBM is to the open source community in general. IBM has a rich heritage of supporting open source. Contributions such as the de facto standard integrated development environment (IDE) used in open source—Eclipse, Unstructured Information Management Architecture (UIMA), Apache Derby, Lucene, XQuery, SQL, and Xerces XML processor—are but a few of the too many to mention. We want to make one thing very clear—IBM is committed to Hadoop open source. In fact, Jaql (you will learn about this in Chapter 4) was donated to the open source Hadoop community by IBM. Moreover, IBM is continually working on additional technologies for potential Hadoop-related donations. Our development labs have Hadoop committers that work alongside other Hadoop committers from Facebook, LinkedIn, and more. Finally, you are likely to find one of our developers on any Hadoop forum. We believe IBM’s commitment to open source Hadoop, combined with its vast intellectual property and research around enterprise needs and analytics, delivers a true Big Data platform. Part II—Big Data: From the Technology Perspective starts by giving you some basics about Big Data open source technologies in Chapter 4. This chapter lays the “ground floor” with respect to open source technologies that are synonymous with Big Data—the most common being Hadoop (an Apache toplevel project whose execution engine is behind the Big Data movement). You’re not going to be a Hadoop expert after reading this chapter, but you’re going to have a basis for understanding such terms as Pig, Hive, HDFS, MapReduce, and ZooKeeper, among others.
About this Book xxix Chapter 5 is one of the most important chapters in this book. This chapter introduces you to the concept that splits Big Data into two key areas that only IBM seems to be talking about when defining Big Data: Big Data in motion and Big Data at rest. In this chapter, we focus on the at-rest side of the Big Data equation and IBM’s InfoSphere BigInsights (BigInsights), which is the enterprise capable Hadoop platform from IBM. We talk about the IBM technologies we alluded to in Chapter 3—only with technical explanations and illustrations into how IBM differentiates itself with its Big Data platform. You’ll learn about how IBM’s General Parallel File system (GPFS), synonymous with enterprise class, has been extended to participate in a Hadoop environment as GPFS shared nothing cluster (SNC). You’ll learn about how IBM’s BigInsights platform includes a text analytics toolkit with a rich annotation development environment that lets you build or customize text annotators without having to use Java or some other programming language. You’ll learn about fast data compression without GPL licensing concerns in the Hadoop world, special high-speed database connector technologies, machine learning analytics, management tooling, a flexible workload governor that provides a richer business policy–oriented management framework than the default Hadoop workload manager, security lockdown, enhancing MapReduce with intelligent adaptation, and more. After reading this chapter, we think the questions or capabilities you will want your Big Data provider to answer will change and will lead you to ask questions that prove your vendor actually has a real Big Data platform. We truly believe your Big Data journey needs to start with a Big Data platform—powerful analytics tooling that sits on top of world class enterprise-hardened and capable technology. In Chapter 6 we finish off the book by covering the other side of the Big Data “coin”: analytics on data in motion. Chapter 6 introduces you to IBM InfoSphere Streams (Streams), in some depth, along with examples from real clients and how they are using Streams to realize better business outcomes, make better predictions, gain a competitive advantage for their company, and even improve the health of our most fragile. We also detail how Streams works, a special streams processing language built to flatten the time it takes to write Streams applications, how it is configured, and the components of a stream (namely operators and adapters). In much the same way as BigInsights makes Hadoop enterprise-ready, we round off the
Page 2 and 3: Understanding Big Data
Page 4 and 5: Information Architect. Dirk has a B
Page 6 and 7: Understanding Big Data Analytics fo
Page 8 and 9: My fifteenth book in my eighteenth
Page 10 and 11: CONTENTS AT A GLANCE PART I Big Dat
Page 12 and 13: xii Contents PART II Big Data: From
Page 14 and 15: Executive Letter from Rob Thomas FO
Page 16 and 17: Foreword xvii these warehouses into
Page 18 and 19: Foreword xix people who are passion
Page 20 and 21: xxii Acknowledgments Finally, to Li
Page 22 and 23: xxiv About this Book spectrum, an a
Page 24 and 25: xxvi About this Book Data platform
Page 28 and 29: xxx About this Book chapter detaili
Page 30 and 31: Part I Big Data: From the Business
Page 32 and 33: 4 Understanding Big Data Quite simp
Page 34 and 35: 6 Understanding Big Data terabytes
Page 36 and 37: 8 Understanding Big Data Quite simp
Page 38 and 39: 10 Understanding Big Data platform
Page 40 and 41: 12 Understanding Big Data Indeed, s
Page 42 and 43: 2 Why Is Big Data Important? This c
Page 44 and 45: Why Is Big Data Important? 17 A goo
Page 46 and 47: Why Is Big Data Important? 19 We th
Page 48 and 49: Why Is Big Data Important? 21 lever
Page 50 and 51: Why Is Big Data Important? 23 Mashu
Page 52 and 53: Why Is Big Data Important? 25 shape
Page 54 and 55: Why Is Big Data Important? 27 bette
Page 56 and 57: Why Is Big Data Important? 29 the c
Page 58 and 59: Why Is Big Data Important? 31 Big D
Page 60 and 61: Why Is Big Data Important? 33 Hadoo
Page 62 and 63: 36 Understanding Big Data that we w
Page 64 and 65: 38 Understanding Big Data be. They
Page 66 and 67: 40 Understanding Big Data The IBM $
Page 68 and 69: 42 Understanding Big Data could acc
Page 70 and 71: 44 Understanding Big Data people se
Page 72 and 73: 46 Understanding Big Data cyclical
Page 74 and 75: 48 Understanding Big Data workload
Page 76 and 77:
50 Understanding Big Data applied e
Page 78 and 79:
4 All About Hadoop: The Big Data Li
Page 80 and 81:
All About Hadoop: The Big Data Ling
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
Page 88 and 89:
Page 90 and 91:
Page 92 and 93:
Page 94 and 95:
Page 96 and 97:
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
Page 106 and 107:
5 InfoSphere BigInsights: Analytics
Page 108 and 109:
InfoSphere BigInsights: Analytics f
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
124 Understanding Big Data InfoSphe
Page 150 and 151:
126 Understanding Big Data (as well
Page 152 and 153:
128 Understanding Big Data maintena
Page 154 and 155:
130 Understanding Big Data to the n
Page 156 and 157:
132 Understanding Big Data runs in
Page 158 and 159:
134 Understanding Big Data streamID
Page 160 and 161:
136 Understanding Big Data given at
Page 162 and 163:
138 Understanding Big Data add data
Page 164 and 165:
140 Understanding Big Data relocata
Page 166:
Additional Skills Resources Rely on
show all

Analytics for Enterprise Class Hadoop and Streaming Data

Create successful ePaper yourself

Delete template?

Save as template?