Introduction and MapReduce - SNAP - Stanford University
Introduction and MapReduce - SNAP - Stanford University
Introduction and MapReduce - SNAP - Stanford University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Suppose we have a large web corpus<br />
Look at the metadata file<br />
Lines of the form (URL, size, date, …)<br />
For each host, find the total number of bytes<br />
i.e., the sum of the page sizes for all URLs from<br />
that host<br />
Other examples:<br />
Link analysis <strong>and</strong> graph processing<br />
Machine Learning algorithms<br />
1/8/2012 Jure Leskovec, <strong>Stanford</strong> CS246: Mining Massive Datasets, http://cs246.stanford.edu 53