25.03.2013 Views

Cracking the Coding Interview - Fooo

Cracking the Coding Interview - Fooo

Cracking the Coding Interview - Fooo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Solutions to Chapter 11 | System Design and Memory Limits<br />

11 2 How would you design <strong>the</strong> data structures for a very large social network (Facebook,<br />

LinkedIn, etc)? Describe how you would design an algorithm to show <strong>the</strong> connection,<br />

or path, between two people (e g , Me -> Bob -> Susan -> Jason -> You)<br />

SOLUTION<br />

Approach:<br />

Forget that we’re dealing with millions of users at first Design this for <strong>the</strong> simple case<br />

1 9 9<br />

<strong>Cracking</strong> <strong>the</strong> <strong>Coding</strong> <strong>Interview</strong> | Concepts and Algorithms<br />

pg 72<br />

We can construct a graph by assuming every person is a node and if <strong>the</strong>re is an edge between<br />

two nodes, <strong>the</strong>n <strong>the</strong> two people are friends with each o<strong>the</strong>r<br />

class Person {<br />

Person[] friends;<br />

// O<strong>the</strong>r info<br />

}<br />

If I want to find <strong>the</strong> connection between two people, I would start with one person and do a<br />

simple breadth first search<br />

But... oh no! Millions of users!<br />

When we deal with a service <strong>the</strong> size of Orkut or Facebook, we cannot possibly keep all of our<br />

data on one machine That means that our simple Person data structure from above doesn’t<br />

quite work—our friends may not live on <strong>the</strong> same machine as us Instead, we can replace our<br />

list of friends with a list of <strong>the</strong>ir IDs, and traverse as follows:<br />

1 For each friend ID: int machine_index = lookupMachineForUserID(id);<br />

2 Go to machine machine_index<br />

3 Person friend = lookupFriend(machine_index);<br />

There are more optimizations and follow up questions here than we could possibly discuss,<br />

but here are just a few thoughts<br />

Optimization: Reduce Machine Jumps<br />

Jumping from one machine to ano<strong>the</strong>r is expensive Instead of randomly jumping from machine<br />

to machine with each friend, try to batch <strong>the</strong>se jumps—e g , if 5 of my friends live on<br />

one machine, I should look <strong>the</strong>m up all at once<br />

Optimization: Smart Division of People and Machines<br />

People are much more likely to be friends with people who live in <strong>the</strong> same country as <strong>the</strong>m<br />

Ra<strong>the</strong>r than randomly dividing people up across machines, try to divvy <strong>the</strong>m up by country,<br />

city, state, etc This will reduce <strong>the</strong> number of jumps<br />

Question: Breadth First Search usually requires “marking” a node as visited. How do you do that in

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!