08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Size <strong>of</strong><br />

frontier<br />

0 ln n nθ<br />

n<br />

Number <strong>of</strong><br />

search steps<br />

Figure 4.9: The solid curve is the expected size <strong>of</strong> the frontier. The two dashed curves<br />

indicate the high-probability range <strong>of</strong> possible values for the actual size <strong>of</strong> the frontier.<br />

For d ≠ 1, d − 1 − ln d > 0. 14 Thus the probability e −(d−1−ln d)i drops <strong>of</strong>f exponentially<br />

with i. For i > c ln n and sufficiently large c, the probability that the breadth first search<br />

starting from a particular vertex terminates with a component <strong>of</strong> size i is o(1/n) as long<br />

as the Poisson approximation is valid. In the range <strong>of</strong> this approximation, the probability<br />

that a breadth first search started from any vertex terminates with i > c ln n vertices is<br />

o(1). Intuitively, if the component has not stopped growing within Ω(ln n) steps, it is<br />

likely to continue to grow until it becomes much larger and the expected value <strong>of</strong> the size<br />

<strong>of</strong> the frontier again becomes small. While the expected value <strong>of</strong> the frontier is large, the<br />

probability that the actual size will differ from the expected size sufficiently for the actual<br />

size <strong>of</strong> the frontier to be zero is vanishingly small.<br />

For i near nθ the absolute value <strong>of</strong> the expected size <strong>of</strong> the frontier increases linearly<br />

with |i − nθ|. Thus for the actual size <strong>of</strong> the frontier to be zero, z i must deviate from its<br />

expected value by an amount proportional to |i − nθ|. For values <strong>of</strong> i near nθ, the binomial<br />

distribution can be approximated by a Gaussian distribution. The Gaussian falls <strong>of</strong>f<br />

exponentially fast with the square <strong>of</strong> the distance from its mean. The distribution falls<br />

<strong>of</strong>f proportional to e − k2<br />

σ 2 where σ 2 is the variance and is proportional to n. Thus to have a<br />

non vanishing probability, k must be at most √ n. This implies that the giant component<br />

is in the range [nθ − √ n, nθ + √ n]. Thus a component is either small or in the range<br />

[nθ − √ n, nθ + √ n].<br />

In Theorem 4.8, we prove that there is one giant component <strong>of</strong> size Ω(n) along with<br />

a number <strong>of</strong> components <strong>of</strong> size O(ln n). We first prove a technical lemma stating that<br />

the probability <strong>of</strong> a vertex being in a small component is strictly less than one and hence<br />

there is a giant component. We refer to a connected component <strong>of</strong> size O(log n) as a small<br />

component.<br />

14 Let f(d) = d − 1 − ln d. Then ∂f<br />

∂d = 1 − 1 d<br />

at d = 1 and is positive for all other d > 1.<br />

∂f<br />

∂f<br />

and<br />

∂d<br />

< 0 for d < 1 and<br />

∂d<br />

> 0 for d > 1. Now f(d) = 0<br />

91

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!