Hadoop Development - CSC
Hadoop Development - CSC
Hadoop Development - CSC
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Hadoop</strong> Architectural Overview (cont’d)<br />
• <strong>Hadoop</strong> scales linearly<br />
– Suppose 100TB of data to process. With one server and a read rate of 100MB/s, it<br />
would take:<br />
• (100 x 10^12) / (100 x 10^6) = 10^6 = 1,000,000 seconds to read the files.<br />
– With 100 servers each with a 100MB/s read rate and the equal distribution of files<br />
across all 100 servers, i.e. each server has 1TB to read, it will take:<br />
• (1 x 10^12) / (100 x 10^6) = 10^4 = 10,000 seconds<br />
• Since all 100 read the data in parallel, total read time is 10,000 seconds, i.e. 100<br />
times faster.<br />
– If have 1000 servers and equal distribution of the data across the cluster, then can<br />
read all 100TB in just 1,000 seconds, i.e. less than 20 minutes.<br />
TBSC 2009<br />
11/10/2011 12:53 PM 0725-23_TBSC 2009 15