08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Lemma 4.7 Assume d > 1. For any vertex v, the probability that cc(v), the connected<br />

component containing v, is small (i.e., <strong>of</strong> size O(log n)) is a constant strictly less than 1.<br />

Pro<strong>of</strong>: Let p be the probability that cc(v) is small, i.e., the probability that the breadth<br />

first search started at v terminates before c 1 log n vertices are discovered. Slightly modify<br />

the breadth first search as follows: If in exploring a vertex u at some point, there are m<br />

undiscovered vertices, choose the number k <strong>of</strong> vertices which will be adjacent to u from<br />

the Binomial(m, d ) distribution. Having picked k, pick one <strong>of</strong> the ( )<br />

m<br />

n k subsets <strong>of</strong> k undiscovered<br />

vertices to be the set <strong>of</strong> vertices adjacent to u, and make the other m − k vertices<br />

not adjacent to u. This process has the same distribution as picking each edge from u<br />

independently at random to be present with probability d/n. As the search proceeds, m<br />

decreases. If cc(v) is small, m, the number <strong>of</strong> undiscovered vertices, is always greater than<br />

s = n − c 1 log n. Modify the process once more picking k from Binomial(s, d ) instead <strong>of</strong><br />

n<br />

from Binomial(m, d ). Let n p′ be the probability that cc(v) is small for the modified process.<br />

Clearly, p ′ ≥ p, so it suffices to prove that p ′ is a constant strictly less than one. The<br />

mean <strong>of</strong> the binomial now is d 1 = sd/n which is strictly greater than one. It is clear that<br />

the probability that the modified process ends before c 1 log n vertices are discovered is at<br />

least the probability for the original process, since picking from n − c 1 log n vertices has<br />

decreased the number <strong>of</strong> newly discovered vertices each time. Modifying the process so<br />

that the newly discovered vertices are picked from a fixed size set, converts the problem to<br />

what is called a branching process. Branching processes are discussed further in Section<br />

4.4, but we describe here everything that is needed for our analysis <strong>of</strong> the giant component.<br />

A branching process is a method for creating a possibly infinite random tree. There<br />

is a nonnegative integer-valued random variable y that is the number <strong>of</strong> children <strong>of</strong> the<br />

node being explored. First, the root v <strong>of</strong> the tree chooses a value <strong>of</strong> y according to the<br />

distribution <strong>of</strong> y and spawns that number <strong>of</strong> children. Each <strong>of</strong> the children independently<br />

chooses a value according to the same distribution <strong>of</strong> y and spawns that many children.<br />

The process terminates when all <strong>of</strong> the vertices have spawned children. The process may<br />

go on forever. If it does terminate with a finite tree, we say that the process has become<br />

“extinct”. Let Binomial(s, d ) be the distribution <strong>of</strong> y. Let q be the probability <strong>of</strong> extinction.<br />

Then, q ≥ p ′ , since, the breadth first search terminating with at most c 1 log n<br />

n<br />

vertices is one way <strong>of</strong> becoming extinct. Let p i = ( s<br />

i)<br />

(d/n) i (1 − (d/n)) s−i be the probability<br />

that y spawns i children. We have ∑ s<br />

i=0 p i = 1 and ∑ s<br />

i=1 ip i = E(y) = ds/n > 1.<br />

The depth <strong>of</strong> a tree is at most the number <strong>of</strong> nodes in the tree. Let a t be the probability<br />

that the branching process terminates at depth at most t. If the root v has no children,<br />

then the process terminates with depth one where the root is counted as a depth one node<br />

which is at most t. If v has i children, the process from v terminates at depth at most t if<br />

and only if the i sub processes, one rooted at each child <strong>of</strong> v terminate at depth t − 1 or<br />

less. The i processes are independent, so the probability that they all terminate at depth<br />

92

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!