27.01.2014 Views

Analytics for Enterprise Class Hadoop and Streaming Data

Analytics for Enterprise Class Hadoop and Streaming Data

Analytics for Enterprise Class Hadoop and Streaming Data

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

110 Underst<strong>and</strong>ing Big <strong>Data</strong><br />

Three easy steps are involved in using BigSheets to per<strong>for</strong>m Big <strong>Data</strong><br />

analysis:<br />

1. Collect data. You can collect data from multiple sources, including<br />

crawling the Web, local files, or files on your network. Multiple<br />

protocols <strong>and</strong> <strong>for</strong>mats are supported, including HTTP, HDFS,<br />

Amazon S3 Native File System (s3n), <strong>and</strong> Amazon S3 Block File<br />

System (s3). When crawling the Web, you can specify the web pages<br />

you want to crawl <strong>and</strong> the crawl depth (<strong>for</strong> instance, a crawl depth<br />

of two gathers data from the starting web page <strong>and</strong> also the pages<br />

linked from the starting page). There is also a facility <strong>for</strong> extending<br />

BigSheets with custom plug-ins <strong>for</strong> importing data. For example,<br />

you could build a plug-in to harvest Twitter data <strong>and</strong> include it in<br />

your BigSheets collections.<br />

2. Extract <strong>and</strong> analyze data. Once you have collected your in<strong>for</strong>mation,<br />

you can see a sample of it in the spreadsheet interface, such as that<br />

shown in Figure 5-10. At this point, you can manipulate your data<br />

using the spreadsheet-type tools available in BigSheets. For example,<br />

Figure 5-10 Analyze data in BigSheets

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!