08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Theorem 4.8 Let p=d/n with d > 1.<br />

1. There are constants c 1 and c 2 such that the probability that there is a connected<br />

component <strong>of</strong> size between c 1 log n and c 2 n is at most 1/n.<br />

2. The number <strong>of</strong> vertices in components <strong>of</strong> size O(ln n) is almost surely at most cn<br />

for some c < 1. Thus, with probability 1 − o(1), there is a connected component <strong>of</strong><br />

size Ω(n).<br />

3. The probability that there are two or more connected components, each <strong>of</strong> size more<br />

than n 2/3 , is at most 1/n.<br />

Pro<strong>of</strong>: We begin with (1). If a particular vertex v is in a connected component <strong>of</strong> size k,<br />

then the frontier <strong>of</strong> the breadth first search process started at v becomes empty for the first<br />

time at time k implying that z k = k. Now z k has distribution Binomial ( n − 1, 1 − (1 − (d/n)) k) .<br />

Let c 2 = d−1 and assume k < c<br />

d 2<br />

2 n and k > c 1 log n for a sufficiently large c 1 . By truncated<br />

Taylor series expansion, it is easy to see that<br />

(1 − d n )k ≤ 1 − kd<br />

n + k2 d 2<br />

2n 2 .<br />

Thus, E(z k ) ≥ kd − (k 2 d 2 /2n). So, if z k were to be exactly k, then its deviation from its<br />

mean is at least kd − (k 2 d 2 /(2n)) − k ≥ (k(d − 1)/2) ≥ Ω(k). So by a Chern<strong>of</strong>f bound<br />

Prob(z k = k) ≤ e −Ω(k) . If k is greater than c 1 log n for some sufficiently large c 1 , then<br />

z k = k with probability at most 1/n 3 and this along with a union bound gives (1).<br />

Next consider (2). For a vertex v, let cc(v) denote the set <strong>of</strong> vertices in the connected<br />

component containing v. By (1), almost surely, cc(v) is a small set <strong>of</strong> size at most c 1 log n<br />

or a large set <strong>of</strong> size at least c 2 n for every vertex v. The central part <strong>of</strong> the pro<strong>of</strong> <strong>of</strong><br />

(2) that the probability <strong>of</strong> a vertex being in a small component is strictly less than one<br />

was established in Lemma 4.7. Let x be the number <strong>of</strong> vertices in a small connected<br />

component. Lemma 4.7 implies that the expectation <strong>of</strong> the random variable x equals the<br />

number <strong>of</strong> vertices in small connected components is at most some c 3 n, for a constant c 3<br />

strictly less than one. But we need to show that for any graph almost surely the actual<br />

number x <strong>of</strong> such vertices is at most some constant strictly less than one times n. For<br />

this, we use the second moment method. In this case, the pro<strong>of</strong> that the variance <strong>of</strong> x is<br />

o(E 2 (x)) is easy. Let x i be the indicator random variable <strong>of</strong> the event that cc(i) is small.<br />

Let S and T run over all small sets. Noting that for i ≠ j, cc(i) and cc(j) either are the<br />

94

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!