19.08.2013 Views

Introduction and MapReduce - SNAP - Stanford University

Introduction and MapReduce - SNAP - Stanford University

Introduction and MapReduce - SNAP - Stanford University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Suppose we have a large web corpus<br />

Look at the metadata file<br />

Lines of the form (URL, size, date, …)<br />

For each host, find the total number of bytes<br />

i.e., the sum of the page sizes for all URLs from<br />

that host<br />

Other examples:<br />

Link analysis <strong>and</strong> graph processing<br />

Machine Learning algorithms<br />

1/8/2012 Jure Leskovec, <strong>Stanford</strong> CS246: Mining Massive Datasets, http://cs246.stanford.edu 53

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!