13.11.2012 Views

Hadoop Development - CSC

Hadoop Development - CSC

Hadoop Development - CSC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Hadoop</strong> Architectural Overview (cont’d)<br />

• <strong>Hadoop</strong> scales linearly<br />

– Suppose 100TB of data to process. With one server and a read rate of 100MB/s, it<br />

would take:<br />

• (100 x 10^12) / (100 x 10^6) = 10^6 = 1,000,000 seconds to read the files.<br />

– With 100 servers each with a 100MB/s read rate and the equal distribution of files<br />

across all 100 servers, i.e. each server has 1TB to read, it will take:<br />

• (1 x 10^12) / (100 x 10^6) = 10^4 = 10,000 seconds<br />

• Since all 100 read the data in parallel, total read time is 10,000 seconds, i.e. 100<br />

times faster.<br />

– If have 1000 servers and equal distribution of the data across the cluster, then can<br />

read all 100TB in just 1,000 seconds, i.e. less than 20 minutes.<br />

TBSC 2009<br />

11/10/2011 12:53 PM 0725-23_TBSC 2009 15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!