10.07.2015 Views

vP0Ui

vP0Ui

vP0Ui

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

What Is Big Data? 152012 presidential election in Mexico turned into a Twitter veracity examplewith fake accounts, which polluted political discussion, introduced derogatoryhash tags, and more. Spam is nothing new to folks in IT, but youneed to be aware that in the Big Data world, there is also Big Spam potential,and you need a way to sift through it and figure out what data can andcan’t be trusted. Of course, there are words that need to be understood incontext, jargon, and more (we cover this in Chapter 8).As previously noted, embedded within all of this noise are useful signals:the person who professes a profound disdain for her current smartphonemanufacturer and starts a soliloquy about the need for a new one is expressingmonetizable intent. Big Data is so vast that quality issues are a reality, andveracity is what we generally use to refer to this problem domain. The factthat one in three business leaders don’t trust the information that they use tomake decisions is a strong indicator that a good Big Data platform needs toaddress veracity.What About My Data Warehousein a Big Data World?There are pundits who insist that the traditional method of doing analytics isover. Sometimes these NoSQL (which really means Not Only SQL) punditssuggest that all warehouses will go the way of the dinosaur—ironic, consideringa lot of focus surrounding NoSQL databases is about bringing SQL interfacesto the runtime. Nothing could be further from the truth. We see a numberof purpose-built engines and programming models that are well suited forcertain kinds of analytics. For example, Hadoop’s MapReduce programmingmodel is better suited for some kinds of data than traditional warehouses.For this reason, as you will learn in Chapter 3, the IBM Big Data platformincludes a Hadoop engine (and support for other Hadoop engines as well,such as Cloudera). What’s more, IBM recognizes the flexibility of theprogramming model, so the IBM PureData System for Analytics (formerlyknown as Netezza) can execute MapReduce programs within a database. It’sreally important in the Big Data era that you choose a platform that providesthe flexibility of a purpose-built engine that’s well suited for the task at hand(the kind of analytics you are doing, the type of data you are doing it on, andso on). This platform must also allow you to seamlessly move programming

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!