08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Pro<strong>of</strong>: Let |S i | = k. For each vertex u not in S 1 ∪ S 2 ∪ . . . ∪ S i , the probability that<br />

u is not in S i+1 is (1 − p) k and these events are independent. So, |S i+1 | is the sum <strong>of</strong><br />

n − (|S 1 | + |S 2 | + · · · + |S i |) independent Bernoulli random variables, each with probability<br />

<strong>of</strong><br />

1 − (1 − p) k −ck ln n/n<br />

≥ 1 − e<br />

<strong>of</strong> being one. Note that n − (|S 1 | + |S 2 | + · · · + |S i |) ≥ 999n/1000. So,<br />

Subtracting 200k from each side<br />

E(|S i+1 |) ≥ 999n<br />

ln n<br />

(1 − e−ck n ).<br />

1000<br />

E(|S i+1 |) − 200k ≥ n 2<br />

(<br />

)<br />

ln n<br />

−ck k<br />

1 − e n − 400 .<br />

n<br />

Let α = k n and f(α) = 1 − e−cα ln n − 400α. By differentiation f ′′ (α) ≤ 0, so f is concave<br />

and the minimum value <strong>of</strong> f over the interval [0, 1/1000] is attained at one <strong>of</strong> the end<br />

points. It is easy to check that both f(0) and f(1/1000) are greater than or equal to<br />

zero for sufficiently large n. Thus, f is nonnegative throughout the interval proving that<br />

E(|S i+1 |) ≥ 200|S i |. The lemma follows from Chern<strong>of</strong>f bounds.<br />

Theorem 4.17 For p ≥ c ln n/n, where c is a sufficiently large constant, almost surely,<br />

G(n, p) has diameter O(ln n).<br />

Pro<strong>of</strong>: By Corollary 4.2, almost surely, the degree <strong>of</strong> every vertex is Ω(np) = Ω(ln n),<br />

which is at least 20 ln n for c sufficiently large. Assume this holds. So, for a fixed vertex<br />

v, S 1 as defined in Lemma 4.16 satisfies |S 1 | ≥ 20 ln n.<br />

Let i 0 be the least i such that |S 1 |+|S 2 |+· · ·+|S i | > n/1000. From Lemma 4.16 and the<br />

union bound, the probability that for some i, 1 ≤ i ≤ i 0 −1, |S i+1 | < 2(|S 1 |+|S 2 |+· · ·+|S i |)<br />

is at most ∑ n/1000<br />

k=20 ln n e−10k ≤ 1/n 4 . So, with probability at least 1 − (1/n 4 ), each S i+1 is<br />

at least double the sum <strong>of</strong> the previous S j ’s, which implies that in O(ln n) steps, i 0 + 1<br />

is reached.<br />

Consider any other vertex w. We wish to find a short O(ln n) length path between<br />

v and w. By the same argument as above, the number <strong>of</strong> vertices at distance O(ln n)<br />

from w is at least n/1000. To complete the argument, either these two sets intersect in<br />

which case we have found a path from v to w <strong>of</strong> length O(ln n) or they do not intersect.<br />

In the latter case, with high probability there is some edge between them. For a pair <strong>of</strong><br />

disjoint sets <strong>of</strong> size at least n/1000, the probability that none <strong>of</strong> the possible n 2 /10 6 or<br />

more edges between them is present is at most (1−p) n2 /10 6 = e −Ω(n ln n) . There are at most<br />

2 2n pairs <strong>of</strong> such sets and so the probability that there is some such pair with no edges<br />

is e −Ω(n ln n)+O(n) → 0. Note that there is no conditioning problem since we are arguing<br />

this for every pair <strong>of</strong> such sets. Think <strong>of</strong> whether such an argument made for just the n<br />

subsets <strong>of</strong> vertices, which are vertices at distance at most O(ln n) from a specific vertex,<br />

would work.<br />

106

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!