08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.3 Areas and Volumes<br />

Computing areas and volumes is a classical problem. For many regular figures in<br />

two and three dimensions there are closed form formulae. In Chapter 2, we saw how to<br />

compute volume <strong>of</strong> a high dimensional sphere by integration. For general convex sets in<br />

d-space, there are no closed form formulae. Can we estimate volumes <strong>of</strong> d-dimensional<br />

convex sets in time that grows as a polynomial function <strong>of</strong> d? The MCMC method answes<br />

this question in the affirmative.<br />

One way to estimate the area <strong>of</strong> the region is to enclose it in a rectangle and estimate<br />

the ratio <strong>of</strong> the area <strong>of</strong> the region to the area <strong>of</strong> the rectangle by picking random points<br />

in the rectangle and seeing what proportion land in the region. Such methods fail in high<br />

dimensions. Even for a sphere in high dimension, a cube enclosing the sphere has exponentially<br />

larger area, so exponentially many samples are required to estimate the volume<br />

<strong>of</strong> the sphere.<br />

It turns out that the problem <strong>of</strong> estimating volumes <strong>of</strong> sets is reducible to the problem<br />

<strong>of</strong> drawing uniform random samples from sets. Suppose one wants to estimate the volume<br />

<strong>of</strong> a convex set R. Create a concentric series <strong>of</strong> larger and larger spheres S 1 , S 2 , . . . , S k<br />

such that S 1 is contained in R and S k contains R. Then<br />

Vol(R) = Vol(S k ∩ R) = Vol(S k ∩ R) Vol(S k−1 ∩ R)<br />

Vol(S k−1 ∩ R) Vol(S k−2 ∩ R) · · · Vol(S 2 ∩ R)<br />

Vol(S 1 ∩ R) Vol(S 1)<br />

If the radius <strong>of</strong> the sphere S i is 1 + 1 d times the radius <strong>of</strong> the sphere S i−1, then the value<br />

<strong>of</strong><br />

Vol(S k−1 ∩ R)<br />

Vol(S k−2 ∩ R)<br />

can be estimated by rejection sampling provided one can select points at random from a<br />

d-dimensional region. Since the radii <strong>of</strong> the spheres grows as 1+ 1 , the volume <strong>of</strong> a sphere<br />

d<br />

is ( 1 + d) 1 d<br />

< e times the volume <strong>of</strong> the preceding sphere and the number <strong>of</strong> spheres is at<br />

most<br />

O(log 1+(1/d) r) = O(rd)<br />

where r is the ratio <strong>of</strong> the radius <strong>of</strong> S k to the radius <strong>of</strong> S 1 .<br />

It remains to show how to draw a uniform random sample from a d-dimensional set.<br />

It is at this point that we require the set to be convex so that the Markov chain technique<br />

will converge quickly to its stationary probability. To select a random sample from a<br />

d-dimensional convex set, impose a grid on the region and do a random walk on the grid<br />

points. At each time, pick one <strong>of</strong> the 2d coordinate neighbors <strong>of</strong> the current grid point,<br />

each with probability 1/(2d) and go to the neighbor if it is still in the set; otherwise, stay<br />

put and repeat. If the grid length in each <strong>of</strong> the d coordinate directions is at most some<br />

a, the total number <strong>of</strong> grid points in the set is at most a d . Although this is exponential in<br />

d, the Markov chain turns out to be rapidly mixing (the pro<strong>of</strong> is beyond our scope here)<br />

150

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!