08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Keep in mind that the leading terms here for large k are the last two and, in fact, at k = n,<br />

they cancel each other so that our argument does not prove the fallacious statement for<br />

c ≥ 1 that there is no connected component <strong>of</strong> size n, since there is. Let<br />

f(k) = ln n + k + k ln ln n − 2 ln k + k ln c − ck ln n + ck 2 ln n<br />

n .<br />

Differentiating with respect to k,<br />

f ′ (k) = 1 + ln ln n − 2 k<br />

+ ln c − c ln n +<br />

2ck ln n<br />

n<br />

and<br />

f ′′ (k) = 2 2c ln n<br />

+ > 0.<br />

k2 n<br />

Thus, the function f(k) attains its maximum over the range [2, n/2] at one <strong>of</strong> the extreme<br />

points 2 or n/2. At k = 2, f(2) ≈ (1 − 2c) ln n and at k = n/2, f(n/2) ≈ −c n ln n. So<br />

4<br />

f(k) is maximum at k = 2. For k = 2, E(x k ) = e f(k) is approximately e (1−2c) ln n = n 1−2c<br />

and is geometrically falling as k increases from 2. At some point E(x k ) starts to increase<br />

but never gets above n − c 4 n . Thus, the expected sum <strong>of</strong> the number <strong>of</strong> components <strong>of</strong> size<br />

k, for 2 ≤ k ≤ n/2 is<br />

⎛ ⎞<br />

n/2<br />

∑<br />

E ⎝ x k<br />

⎠ = O(n 1−2c ).<br />

k=2<br />

This expected number goes to zero for c > 1/2 and the first-moment method implies that,<br />

almost surely, there are no components <strong>of</strong> size between 2 and n/2. This completes the<br />

pro<strong>of</strong> <strong>of</strong> Theorem 4.15.<br />

4.5.3 Threshold for O(ln n) Diameter<br />

We now show that within a constant factor <strong>of</strong> the threshold for graph connectivity,<br />

not only is the graph connected, but its diameter is O(ln n). That is, if p is Ω(ln n/n), the<br />

diameter <strong>of</strong> G(n, p) is O(ln n).<br />

Consider a particular vertex v. Let S i be the set <strong>of</strong> vertices at distance i from v. We<br />

argue that as i grows, |S 1 | + |S 2 | + · · · + |S i | grows by a constant factor up to a size <strong>of</strong><br />

n/1000. This implies that in O(ln n) steps, at least n/1000 vertices are connected to v.<br />

Then, there is a simple argument at the end <strong>of</strong> the pro<strong>of</strong> <strong>of</strong> Theorem 4.17 that a pair <strong>of</strong><br />

n/1000 sized subsets, connected to two different vertices v and w, have an edge between<br />

them.<br />

Lemma 4.16 Consider G(n, p) for sufficiently large n with p = c ln n/n for any c > 0.<br />

Let S i be the set <strong>of</strong> vertices at distance i from some fixed vertex v. If |S 1 |+|S 2 |+· · ·+|S i | ≤<br />

n/1000, then<br />

Prob ( |S i+1 | < 2(|S 1 | + |S 2 | + · · · + |S i |) ) ≤ e −10|S i| .<br />

105

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!