10.07.2015 Views

vP0Ui

vP0Ui

vP0Ui

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

About This Book xxixplatform comes with a rich set of toolkits and accelerators for text analyticsand machine learning (among others), a vast array of samples, enriched toolingenvironments, and declarative languages that are purpose-built andoptimized for specific Big Data tasks at hand. Collectively, these effortsflatten the time to analytics, enabling our clients to monetize their datawith unprecedented speed and agility—showcasing some of these IBM BigData platform characteristics is the focus of Part V.Chapter 8 provides details about the Advanced Text Analytics Toolkit.Text analytics requires a different set of optimization choices than a Big Dataworkload that crunches through millions of files. In the Hadoop benchmarkworld, you see many references to grep or terasort, which stress the I/Ocharacteristics of the cluster. However, text analytics is heavily dependent onCPU. The IBM Big Data platform has a declarative language called AnnotatedQuery Language (AQL), which is part of this toolkit. Writing AQL programsis facilitated by an Eclipse plug-in that provides design-time assistance, contextsensitivehelp, and other rapid application development features that areassociated with a typical IDE. The toolkit includes a cost-based optimizerthat understands the resource requirements of text analysis and optimizesthe execution plan with this consideration (IBM, the company that inventedSQL and its associated cost-based optimization processing engines is doingthe same for AQL).You’ll build your analytical text-based Big Data applications 50 percentfaster than using alternative approaches, they will run up to ten times faster(that’s what we found when we compared it to some alternatives), and theresults you get will be more precise and complete (read Chapter 8 for moredetails).Chapter 9 describes the three accelerators that IBM has delivered for specificBig Data usage patterns: machine data, social media, and call detail records(CDRs). These accelerators reflect years of client interaction and are packagedtogether for quick deployment. What’s more, they showcase the componentintegration of the IBM Big Data platform, combining text analytics, Streams,and BigInsights.The book finishes with Part VI, “Integration and Governance in a BigData World.” What’s true in the relational world applies to the Big Dataworld as well. Chapter 10 provides a brief overview of Big Data governance,from reducing the surface area security profile of data at rest, to archiving, to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!