Data Warehousing and Analytics Infrastructure at Facebook
Data Warehousing and Analytics Infrastructure at Facebook
Data Warehousing and Analytics Infrastructure at Facebook
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Hive: Optimizing Resource Utiliz<strong>at</strong>ion<br />
Joins:<br />
– Joins try to reduce the number of map/reduce jobs needed.<br />
– Memory efficient joins by streaming largest tables.<br />
– Map Joins<br />
User specified small tables stored in hash tables on the mapper<br />
No reducer needed<br />
Aggreg<strong>at</strong>ions:<br />
– Map side partial aggreg<strong>at</strong>ions<br />
Hash-based aggreg<strong>at</strong>es<br />
Serialized key/values in hash tables<br />
– 90% speed improvement on Query<br />
SELECT count(1) FROM t;<br />
– Load balancing for d<strong>at</strong>a skew