08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

By Chebyshev inequality<br />

(<br />

)<br />

Prob |x − E(x)| ≥ E(x) ≤ Var(x)<br />

E 2 (x) → 0.<br />

Thus, Prob(x ≤ 0) goes to zero if Var(x) is o (E 2 (x)) .<br />

Corollary 4.4 Let x be a random variable with E(x) > 0. If<br />

then x is almost surely greater than zero.<br />

Pro<strong>of</strong>: If E(x 2 ) ≤ E 2 (x)(1 + o(1)), then<br />

E(x 2 ) ≤ E 2 (x)(1 + o(1)),<br />

V ar(x) = E(x 2 ) − E 2 (x) ≤ E 2 (x)o(1) = o(E 2 (x)).<br />

Second moment arguments are more difficult than first moment arguments since they<br />

deal with variance and without independence we do not have E(xy) = E(x)E(y). In the<br />

triangle example, dependence occurs when two triangles share a common edge. However,<br />

if p = d , there are so few triangles that almost surely no two triangles share a common<br />

n<br />

edge and the lack <strong>of</strong> statistical independence does not affect the answer. In looking for<br />

a phase transition, almost always the transition in probability <strong>of</strong> an item being present<br />

occurs when the expected number <strong>of</strong> items transitions.<br />

Threshold for graph diameter two (two degrees <strong>of</strong> separation)<br />

We now present the first example <strong>of</strong> a sharp phase transition for a property. This<br />

means that slightly increasing the edge probability p near the threshold takes us from<br />

almost surely not having the property to almost surely having it. The property is that<br />

<strong>of</strong> a random graph having diameter less than or equal to two. The diameter <strong>of</strong> a graph<br />

is the maximum length <strong>of</strong> the shortest path between a pair <strong>of</strong> nodes. In other words, the<br />

property is that every pair <strong>of</strong> nodes has “at most two degrees <strong>of</strong> separation”.<br />

The following technique for deriving the threshold for a graph having diameter two<br />

is a standard method <strong>of</strong>ten used to determine the threshold for many other objects. Let<br />

x be a random variable for the number <strong>of</strong> objects such as triangles, isolated vertices, or<br />

Hamiltonian circuits, for which we wish to determine a threshold. Then we determine<br />

the value <strong>of</strong> p, say p 0 , where the expected value <strong>of</strong> x goes from vanishingly small to unboundedly<br />

large. For p < p 0 almost surely a graph selected at random will not have a<br />

copy <strong>of</strong> the item. For p > p 0 , a second moment argument is needed to establish that the<br />

items are not concentrated on a vanishingly small fraction <strong>of</strong> the graphs and that a graph<br />

picked at random will almost surely have a copy.<br />

82

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!