08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.10 Small World Graphs<br />

In the 1960’s, Stanley Milgram carried out an experiment that indicated that most<br />

pairs <strong>of</strong> individuals in the United States were connected by a short sequence <strong>of</strong> acquaintances.<br />

Milgram would ask a source individual, say in Nebraska, to start a letter on its<br />

journey to a target individual in Massachusetts. The Nebraska individual would be given<br />

basic information about the target including his address and occupation and asked to<br />

send the letter to someone he knew on a first name basis, who was closer to the target<br />

individual, in order to transmit the letter to the target in as few steps as possible. Each<br />

person receiving the letter would be given the same instructions. In successful experiments,<br />

it would take on average five to six steps for a letter to reach its target. This<br />

research generated the phrase “six degrees <strong>of</strong> separation” along with substantial research<br />

in social science on the interconnections between people. Surprisingly, there was no work<br />

on how to find the short paths using only local information.<br />

In many situations, phenomena are modeled by graphs whose edges can be partitioned<br />

into local and long distance. We adopt a simple model <strong>of</strong> a directed graph due to Kleinberg,<br />

having local and long distance edges. Consider a 2-dimensional n×n grid where each<br />

vertex is connected to its four adjacent vertices via bidirectional local edges. In addition<br />

to these local edges, there is one long distance edge out <strong>of</strong> each vertex. The probability<br />

that the long distance edge from vertex u terminates at v, v ≠ u, is a function <strong>of</strong> the<br />

distance d(u, v) from u to v. Here distance is measured by the shortest path consisting<br />

only <strong>of</strong> local grid edges. The probability is proportional to 1/d r (u, v) for some constant<br />

r. This gives a one parameter family <strong>of</strong> random graphs. For r equal zero, 1/d 0 (u, v) = 1<br />

for all u and v and thus the end <strong>of</strong> the long distance edge at u is uniformly distributed<br />

over all vertices independent <strong>of</strong> distance. As r increases the expected length <strong>of</strong> the long<br />

distance edge decreases. As r approaches infinity, there are no long distance edges and<br />

thus no paths shorter than that <strong>of</strong> the lattice path. What is interesting is that for r less<br />

than two, there are always short paths, but no local algorithm to find them. A local<br />

algorithm is an algorithm that is only allowed to remember the source, the destination,<br />

and its current location and can query the graph to find the long-distance edge at the<br />

current location. Based on this information, it decides the next vertex on the path.<br />

The difficulty is that for r < 2, the end points <strong>of</strong> the long distance edges tend to<br />

be uniformly distributed over the vertices <strong>of</strong> the grid. Although short paths exist, it is<br />

unlikely on a short path to encounter a long distance edge whose end point is close to<br />

the destination. When r equals two, there are short paths and the simple algorithm that<br />

always selects the edge that ends closest to the destination will find a short path. For r<br />

greater than two, again there is no local algorithm to find a short path. Indeed, with high<br />

probability, there are no short paths at all.<br />

124

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!