27.01.2014 Views

Analytics for Enterprise Class Hadoop and Streaming Data

Analytics for Enterprise Class Hadoop and Streaming Data

Analytics for Enterprise Class Hadoop and Streaming Data

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

12 Underst<strong>and</strong>ing Big <strong>Data</strong><br />

Indeed, some warehouses are built with a predefined corpus of questions in<br />

mind. Although such a warehouse provides some degree of freedom <strong>for</strong> query<br />

<strong>and</strong> mining, it could be that it’s constrained by what is in the schema (most<br />

unstructured data isn’t found here) <strong>and</strong> often by a per<strong>for</strong>mance envelope that<br />

can be a functional/operational hard limit. Again, as we’ll reiterate often in this<br />

book, we are not saying a <strong>Hadoop</strong> plat<strong>for</strong>m such as IBM InfoSphere BigInsights<br />

is a replacement <strong>for</strong> your warehouse; instead, it’s a complement.<br />

A Big <strong>Data</strong> plat<strong>for</strong>m lets you store all of the data in its native business<br />

object <strong>for</strong>mat <strong>and</strong> get value out of it through massive parallelism on readily<br />

available components. For your interactive navigational needs, you’ll continue<br />

to pick <strong>and</strong> choose sources <strong>and</strong> cleanse that data <strong>and</strong> keep it in warehouses.<br />

But you can get more value out of analyzing more data (that may<br />

even initially seem unrelated) in order to paint a more robust picture of the<br />

issue at h<strong>and</strong>. Indeed, data might sit in <strong>Hadoop</strong> <strong>for</strong> a while, <strong>and</strong> when you<br />

discover its value, it might migrate its way into the warehouse when its<br />

value is proven <strong>and</strong> sustainable.<br />

Wrapping It Up<br />

We’ll conclude this chapter with a gold mining analogy to articulate the<br />

points from the previous section <strong>and</strong> the Big <strong>Data</strong> opportunity that lies be<strong>for</strong>e<br />

you. In the “olden days” (which, <strong>for</strong> some reason, our kids think is a<br />

time when we were their age), miners could actually see nuggets or veins of<br />

gold; they clearly appreciated the value <strong>and</strong> would dig <strong>and</strong> sift near previous<br />

gold finds hoping to strike it rich. That said, although there was more gold<br />

out there—it could have been in the hill next to them or miles away—it just<br />

wasn’t visible to the naked eye, <strong>and</strong> it became a gambling game. You dug like<br />

crazy near where gold was found, but you had no idea whether more gold<br />

would be found. And although history has its stories of gold rush fevers,<br />

nobody mobilized millions of people to dig everywhere <strong>and</strong> anywhere.<br />

In contrast, today’s gold rush works quite differently. Gold mining is executed<br />

with massive capital equipment that can process millions of tons of dirt<br />

that is worth nothing. Ore grades of 30 mg/kg (30 ppm) are usually needed<br />

be<strong>for</strong>e gold is visible to the naked eye—that is, most gold in gold mines today<br />

is invisible. Although there is all this gold (high-valued data) in all this dirt<br />

(low-valued data), by using the right equipment, you can economically

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!