28.11.2014 Views

Data Warehousing and Analytics Infrastructure at Facebook

Data Warehousing and Analytics Infrastructure at Facebook

Data Warehousing and Analytics Infrastructure at Facebook

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Hive: Optimizing Resource Utiliz<strong>at</strong>ion<br />

Joins:<br />

– Joins try to reduce the number of map/reduce jobs needed.<br />

– Memory efficient joins by streaming largest tables.<br />

– Map Joins<br />

User specified small tables stored in hash tables on the mapper<br />

No reducer needed<br />

Aggreg<strong>at</strong>ions:<br />

– Map side partial aggreg<strong>at</strong>ions<br />

Hash-based aggreg<strong>at</strong>es<br />

Serialized key/values in hash tables<br />

– 90% speed improvement on Query<br />

SELECT count(1) FROM t;<br />

– Load balancing for d<strong>at</strong>a skew

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!