27.10.2014 Views

Cracking the Coding Interview, 4 Edition - 150 Programming Interview Questions and Solutions

Cracking the Coding Interview, 4 Edition - 150 Programming Interview Questions and Solutions

Cracking the Coding Interview, 4 Edition - 150 Programming Interview Questions and Solutions

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Solutions</strong> to Chapter 11 | System Design <strong>and</strong> Memory Limits<br />

11.5 If you were designing a web crawler, how would you avoid getting into infinite loops?<br />

SOLUTION<br />

pg 72<br />

First, how does <strong>the</strong> crawler get into a loop? The answer is very simple: when we re-parse an<br />

already parsed page. This would mean that we revisit all <strong>the</strong> links found in that page, <strong>and</strong> this<br />

would continue in a circular fashion.<br />

Be careful about what <strong>the</strong> interviewer considers <strong>the</strong> “same” page. Is it URL or content? One<br />

could easily get redirected to a previously crawled page.<br />

So how do we stop visiting an already visited page? The web is a graph-based structure,<br />

<strong>and</strong> we commonly use DFS (depth first search) <strong>and</strong> BFS (breadth first search) for traversing<br />

graphs. We can mark already visited pages <strong>the</strong> same way that we would in a BFS/DFS.<br />

We can easily prove that this algorithm will terminate in any case. We know that each step<br />

of <strong>the</strong> algorithm will parse only new pages, not already visited pages. So, if we assume that<br />

we have N number of unvisited pages, <strong>the</strong>n at every step we are reducing N (N-1) by 1. That<br />

proves that our algorithm will continue until <strong>the</strong>y are only N steps.<br />

SUGGESTIONS AND OBSERVATIONS<br />

»»<br />

This question has a lot of ambiguity. Ask clarifying questions!<br />

»»<br />

Be prepared to answer questions about coverage.<br />

»»<br />

What kind of pages will you hit with a DFS versus a BFS?<br />

» » What will you do when your crawler runs into a honey pot that generates an infinite<br />

subgraph for you to w<strong>and</strong>er about?<br />

CareerCup.com<br />

2 0 6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!