19.08.2013 Views

Introduction and MapReduce - SNAP - Stanford University

Introduction and MapReduce - SNAP - Stanford University

Introduction and MapReduce - SNAP - Stanford University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Total communication<br />

cost = O(|R|+|S|+|R ⋈ S|)<br />

Elapsed communication cost = O(s)<br />

We’re going to pick k <strong>and</strong> the number of Map<br />

processes so I/O limit s is respected<br />

We put a limit s on the amount of input or output that<br />

any one process can have. s could be:<br />

What fits in main memory<br />

What fits on local disk<br />

With proper indexes, computation cost is linear<br />

in the input + output size<br />

So computation costs are like comm. costs<br />

1/9/2012 Jure Leskovec, <strong>Stanford</strong> CS246: Mining Massive Datasets, http://cs246.stanford.edu 60

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!