08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

in time polynomial in n.<br />

A quantity called the mixing time, loosely defined as the time needed to get close to<br />

the stationary distribution, is <strong>of</strong>ten much smaller than the number <strong>of</strong> states. In Section<br />

5.4, we relate the mixing time to a combinatorial notion called normalized conductance<br />

and derive upper bounds on the mixing time in several cases.<br />

5.1 Stationary Distribution<br />

Let p t be the probability distribution after t steps <strong>of</strong> a random walk.<br />

long-term average probability distribution a t by<br />

Define the<br />

a t = 1 t (p 0 + p 1 + · · · + p t−1 ) .<br />

The fundamental theorem <strong>of</strong> Markov chains asserts that for a connected Markov chain, a t<br />

converges to a limit probability vector x, which satisfies the equations xP = x; ∑ i x i = 1<br />

which we can rewrite as<br />

x[P − I, 1] = [0, 1].<br />

We will prove now that the matrix [P −I, 1] has rank n provided the Markov Chain is connected.<br />

This implies that there is a unique solution to the equations x[P − I, 1] = [0, 1].<br />

We denote this solution by π. It has non-negative components and so is a probability<br />

vector. Since πP = π, we have that running one step <strong>of</strong> the Markov Chain starting with<br />

distribution π leaves us in the same distribution. Thus also running any number <strong>of</strong> steps<br />

<strong>of</strong> the Markov Chain starting the first step with π leaves the distribution still the same.<br />

For this reason, π is called the stationary distribution.<br />

Lemma 5.1 Let P be the transition probability matrix for a connected Markov chain.<br />

The n × (n + 1) matrix A = [P − I , 1] obtained by augmenting the matrix P − I with an<br />

additional column <strong>of</strong> ones has rank n.<br />

Pro<strong>of</strong>: If the rank <strong>of</strong> A = [P − I, 1] was less than n there would be two linearly independent<br />

solutions to Ax = 0. Each row in P sums to one so each row in P − I sums to<br />

zero. Thus (1, 0), where all but the last coordinate is 1, is one solution. Assume there<br />

was a second solution (x, α) perpendicular to (1, 0). Then (P − I)x + α1 = 0. Thus for<br />

each i, ∑ p ij x j − x i + α = 0 or x i = ∑ j p ijx j + α. Each x i is a convex combination<br />

j<br />

<strong>of</strong> some <strong>of</strong> the x j plus α. Since x is perpendicular to 1, not all x i can be equal. Let<br />

S = {i : x i = Max n j=1x j } be the set <strong>of</strong> i for which x i attains its maximum value. ¯S is<br />

not empty. Connectedness implies that there is some edge (k, l), k ∈ S, l ∈ ¯S. Thus,<br />

x k > ∑ j p kjx j . Therefore α must be greater than 0 in x k = ∑ j p kjx j + α. A symmetric<br />

argument with the set <strong>of</strong> i with x i taking its minimum value implies α < 0 producing a<br />

contradiction proving the lemma.<br />

143

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!