10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 12<br />

In big data, we can't load our data into memory, In many ways, this is a good<br />

definition for whether a problem is big data or not—if the data can fit in the memory<br />

on your computer, you aren't dealing <strong>with</strong> a big data problem.<br />

Application scenario and goals<br />

There are many use cases for big data, in the public and private sectors.<br />

The most common experience people have using a big-data-based system is<br />

in Internet search, such as Google. To run these systems, a search needs to be<br />

carried out over billions of websites in a fraction of a second. Doing a basic<br />

text-based search would be inadequate to deal <strong>with</strong> such a problem. Simply<br />

storing the text of all those websites is a large problem. In order to deal <strong>with</strong><br />

queries, new data structures and data mining methods need to be created and<br />

implemented specifically for this application.<br />

Big data is also used in many other scientific experiments such as the Large<br />

Hadron Collider, part of which is pictured below, that stretches over 17 kilometers<br />

and contains 150 million sensors monitoring hundreds of millions of particle<br />

collisions per second. The data from this experiment is massive, <strong>with</strong> 25 petabytes<br />

created daily, after a filtering process (if filtering was not used, there would be 150<br />

million petabytes per year). Analysis on data this big has led to amazing insights<br />

about our universe, but has been a significant engineering and analytics challenge.<br />

[ 273 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!