08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.5.2 Full Connectivity<br />

As p increases from p = 0, small components form. At p = 1/n a giant component<br />

emerges and swallows up smaller components, starting with the larger components and<br />

ending up swallowing isolated vertices forming a single connected component at p = ln n,<br />

n<br />

at which point the graph becomes connected. We begin our development with a technical<br />

lemma.<br />

Lemma 4.14 The expected number <strong>of</strong> connected components <strong>of</strong> size k in G(n, p) is at<br />

most ( n<br />

k)<br />

k k−2 p k−1 (1 − p) kn−k2 .<br />

Pro<strong>of</strong>: The probability that k vertices form a connected component consists <strong>of</strong> the product<br />

<strong>of</strong> two probabilities. The first is the probability that the k vertices are connected,<br />

and the second is the probability that there are no edges out <strong>of</strong> the component to the<br />

remainder <strong>of</strong> the graph. The first probability is at most the sum over all spanning trees<br />

<strong>of</strong> the k vertices, that the edges <strong>of</strong> the spanning tree are present. The ”at most” in the<br />

lemma statement is because G (n, p) may contain more than one spanning tree on these<br />

nodes and, in this case, the union bound is higher than the actual probability. There are<br />

k k−2 spanning trees on k nodes. See Section 12.9.6 in the appendix. The probability <strong>of</strong><br />

all the k − 1 edges <strong>of</strong> one spanning tree being present is p k−1 and the probability that<br />

there are no edges connecting the k vertices to the remainder <strong>of</strong> the graph is (1 − p) k(n−k) .<br />

Thus, the probability <strong>of</strong> one particular set <strong>of</strong> k vertices forming a connected component<br />

is at most k k−2 p k−1 (1 − p) kn−k2 . Thus, the expected number <strong>of</strong> connected components <strong>of</strong><br />

size k is at most ( n<br />

k)<br />

k k−2 p k−1 (1 − p) kn−k2 .<br />

, the giant component has absorbed all small compo-<br />

We now prove that for p = 1 2<br />

nents except for isolated vertices.<br />

ln n<br />

n<br />

Theorem 4.15 For p = c ln n with c > 1/2, almost surely there are only isolated vertices<br />

n<br />

and a giant component. For c > 1, almost surely the graph is connected.<br />

Pro<strong>of</strong>: We prove that almost surely for c > 1/2, there is no connected component with<br />

k vertices for any k, 2 ≤ k ≤ n/2. This proves the first statement <strong>of</strong> the theorem since, if<br />

there were two or more components that are not isolated vertices, both <strong>of</strong> them could not<br />

be <strong>of</strong> size greater than n/2. The second statement that for c > 1 the graph is connected<br />

then follows from Theorem 4.6 which states that isolated vertices disappear at c = 1.<br />

We now show that for p = c ln n , the expected number <strong>of</strong> components <strong>of</strong> size k,<br />

n<br />

2 ≤ k ≤ n/2, is less than n 1−2c and thus for c > 1/2 there are no components, except<br />

for isolated vertices and the giant component. Let x k be the number <strong>of</strong> connected components<br />

<strong>of</strong> size k. Substitute p = c ln n into ( n<br />

( n<br />

k)<br />

k k−2 p k−1 (1 − p) kn−k2 and simplify using<br />

n<br />

)<br />

k ≤ (en/k) k , 1 − p ≤ e −p , k − 1 < k, and x = e ln x to get<br />

(<br />

E(x k ) ≤ exp ln n + k + k ln ln n − 2 ln k + k ln c − ck ln n + ck 2 ln n )<br />

.<br />

n<br />

104

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!