08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Prob(vertex has degree k) = ( )<br />

n−1<br />

k p k (1 − p) n−k−1 ≈ ( n<br />

k)<br />

p k (1 − p) n−k .<br />

The quantity ( )<br />

n−1<br />

k is the number <strong>of</strong> ways <strong>of</strong> choosing k edges, out <strong>of</strong> the possible n − 1<br />

edges, and p k (1 − p) n−k−1 is the probability that the k selected edges are present and the<br />

remaining n−k−1 are not. Since n is large, replacing n−1 by n does not cause much error.<br />

The binomial distribution falls <strong>of</strong>f exponentially fast as one moves away from the mean.<br />

However, the degree distributions <strong>of</strong> graphs that appear in many applications do not exhibit<br />

such sharp drops. Rather, the degree distributions are much broader. This is <strong>of</strong>ten<br />

referred to as having a “heavy tail”. The term tail refers to values <strong>of</strong> a random variable<br />

far away from its mean, usually measured in number <strong>of</strong> standard deviations. Thus, although<br />

the G (n, p) model is important mathematically, more complex models are needed<br />

to represent real world graphs.<br />

Consider an airline route graph. The graph has a wide range <strong>of</strong> degrees, from degree<br />

one or two for a small city, to degree 100, or more, for a major hub. The degree distribution<br />

is not binomial. Many large graphs that arise in various applications appear to have power<br />

law degree distributions. A power law degree distribution is one in which the number <strong>of</strong><br />

vertices having a given degree decreases as a power <strong>of</strong> the degree, as in<br />

Number(degree k vertices) = c n k r ,<br />

for some small positive real r, <strong>of</strong>ten just slightly less than three. Later, we will consider<br />

a random graph model giving rise to such degree distributions.<br />

The following theorem claims that the degree distribution <strong>of</strong> the random graph G (n, p)<br />

is tightly concentrated about its expected value. That is, the probability that the degree <strong>of</strong><br />

a vertex differs from its expected degree, np, by more than λ √ np, drops <strong>of</strong>f exponentially<br />

fast with λ.<br />

Theorem 4.1 Let v be a vertex <strong>of</strong> the random graph G(n, p). Let α be a real number in<br />

(0, √ np).<br />

Prob(|np − deg(v)| ≥ α √ np) ≤ 3e −α2 /8 .<br />

Pro<strong>of</strong>: The degree deg(v) is the sum <strong>of</strong> n − 1 independent Bernoulli random variables,<br />

y 1 , y 2 , . . . , y n−1 , where, y i is the indicator variable that the i th edge from v is present. So<br />

the theorem follows from Theorem 12.6.<br />

Although the probability that the degree <strong>of</strong> a single vertex differs significantly from<br />

its expected value drops exponentially, the statement that the degree <strong>of</strong> every vertex is<br />

close to its expected value requires that p is Ω( ln n ). That is, the expected degree grows<br />

n<br />

as ln n.<br />

Corollary 4.2 Suppose ε is a positive constant. If p is Ω( ln n<br />

nε 2 ), then, almost surely, every<br />

vertex has degree in the range (1 − ε)np to (1 + ε)np.<br />

75

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!