08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

edges between the two frontier sets is encountered is (1 − p) Ω(n4/3) ≤ e −Ω(dn1/3) , which<br />

converges to zero. So almost surely, one <strong>of</strong> the edges is encountered and u and v end up<br />

in the same connected component. This argument shows for a particular pair <strong>of</strong> vertices<br />

u and v, the probability that they belong to different large connected components is very<br />

small. Now use the union bound to conclude that this does not happen for any <strong>of</strong> the ( )<br />

n<br />

2<br />

pairs <strong>of</strong> vertices. The details are left to the reader.<br />

For the d < 1 case, almost surely, there is no connected component <strong>of</strong> size Ω(ln n).<br />

Theorem 4.9 Let p=d/n with d < 1. The probability that G(n, p) has a component <strong>of</strong><br />

size more than c ln n is at most 1/n for a suitable constant c depending on d but not on<br />

(1−d) 2<br />

n.<br />

Pro<strong>of</strong>: There is a connected component <strong>of</strong> size at least k containing a particular vertex<br />

v only if the breadth first search started at v has a nonempty frontier at all times up<br />

to k. Let z k be the number <strong>of</strong> discovered vertices after k steps. The probability that<br />

v is in a connected component <strong>of</strong> size greater than or equal to k is less than or equal<br />

to Prob(z k > k). Now the distribution <strong>of</strong> z k is Binomial ( n − 1, 1 − (1 − d/n) k) . Since<br />

(1 − d/n) k ≥ 1 − dk/n, the mean <strong>of</strong> Binomial ( n − 1, 1 − (1 − d/n) k) is less than the mean<br />

<strong>of</strong> Binomial(n, dk n ). Since Binomial(n, dk n ) has mean dk, the mean <strong>of</strong> z k is at most dk where<br />

d < 1. By a Chern<strong>of</strong>f bound, the probability that z k is greater than k is at most e −c 0k for<br />

some constant c 0 > 0. If k ≥ c ln n for a suitably large c, then this probability is at most<br />

1/n 2 . This bound is for a single vertex v. Multiplying by n for the union bound completes<br />

the pro<strong>of</strong>.<br />

4.4 Branching Processes<br />

As discussed in the previous section, a branching process is a method for creating a<br />

random tree. Starting with the root node, each node has a probability distribution for<br />

the number <strong>of</strong> its children. The root <strong>of</strong> the tree denotes a parent and its descendants are<br />

the children with their descendants being the grandchildren. The children <strong>of</strong> the root are<br />

the first generation, their children the second generation, and so on. Branching processes<br />

have obvious applications in population studies, and as we saw earlier in exploring a connected<br />

component in a random graph. Here, we give a more in-depth discussion <strong>of</strong> their<br />

properties.<br />

We analyze a simple case <strong>of</strong> a branching process where the distribution <strong>of</strong> the number<br />

<strong>of</strong> children at each node in the tree is the same. The basic question asked is what is the<br />

probability that the tree is finite, i.e., the probability that the branching process dies out?<br />

This is called the extinction probability.<br />

Our analysis <strong>of</strong> the branching process will give the probability <strong>of</strong> extinction, as well<br />

as the expected size <strong>of</strong> the components conditioned on extinction. This will imply that in<br />

G(n, d ), with d > 1, there is one giant component <strong>of</strong> size Ω(n), the rest <strong>of</strong> the components<br />

n<br />

96

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!