08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1 − o(1)<br />

Probability<br />

<strong>of</strong> a giant<br />

component<br />

o(1)<br />

1 − ε 1 + ε<br />

Expected number <strong>of</strong> friends per person<br />

Figure 4.1: Probability <strong>of</strong> a giant component as a function <strong>of</strong> the expected number <strong>of</strong><br />

people each person knows directly.<br />

chain <strong>of</strong> connections, such as A knows B, B knows C, C knows D, ..., and Y knows Z, we<br />

say that A indirectly knows Z. Thus, all people belonging to a connected component <strong>of</strong><br />

the graph indirectly know each other. Suppose each pair <strong>of</strong> people, independent <strong>of</strong> other<br />

pairs, tosses a coin that comes up heads with probability p = d/n. If it is heads, they<br />

know each other; if it comes up tails, they don’t. The value <strong>of</strong> d can be interpreted as the<br />

expected number <strong>of</strong> people a single person directly knows. The question arises as to how<br />

large are sets <strong>of</strong> people who indirectly know each other ?<br />

If the expected number <strong>of</strong> people each person knows is more than one, then a giant<br />

component <strong>of</strong> people, all <strong>of</strong> whom indirectly know each other, will be present consisting<br />

<strong>of</strong> a constant fraction <strong>of</strong> all the people. On the other hand, if in expectation, each person<br />

knows less than one person, the largest set <strong>of</strong> people who know each other indirectly is a<br />

vanishingly small fraction <strong>of</strong> the whole. Furthermore, the transition from the vanishing<br />

fraction to a constant fraction <strong>of</strong> the whole, happens abruptly between d slightly less than<br />

one to d slightly more than one. See Figure 4.1. Note that there is no global coordination<br />

<strong>of</strong> who knows whom. Each pair <strong>of</strong> individuals decides independently. Indeed, many large<br />

real-world graphs, with constant average degree, have a giant component. This is perhaps<br />

the most important global property <strong>of</strong> the G(n, p) model.<br />

4.1.1 Degree Distribution<br />

One <strong>of</strong> the simplest quantities to observe in a real graph is the number <strong>of</strong> vertices <strong>of</strong><br />

given degree, called the vertex degree distribution. It is also very simple to study these<br />

distributions in G (n, p) since the degree <strong>of</strong> each vertex is the sum <strong>of</strong> n − 1 independent<br />

random variables, which results in a binomial distribution. Since we will be dealing with<br />

72

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!