08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A<br />

B<br />

D<br />

C<br />

(a)<br />

(b)<br />

Figure 6.5: (a) shows a set <strong>of</strong> four points that can be shattered by rectangles along with<br />

some <strong>of</strong> the rectangles that shatter the set. Not every set <strong>of</strong> four points can be shattered<br />

as seen in (b). Any rectangle containing points A, B, and C must contain D. No set <strong>of</strong> five<br />

points can be shattered by rectangles with axis-parallel edges. No set <strong>of</strong> three collinear<br />

points can be shattered, since any rectangle that contains the two end points must also<br />

contain the middle point. More generally, since rectangles are convex, a set with one point<br />

inside the convex hull <strong>of</strong> the others cannot be shattered.<br />

Theorem 6.13 (Growth function sample bound) For any class H and distribution<br />

D, if a training sample S is drawn from D <strong>of</strong> size<br />

n ≥ 2 ɛ [log 2(2H[2n]) + log 2 (1/δ)]<br />

then with probability ≥ 1−δ, every h ∈ H with err D (h) ≥ ɛ has err S (h) > 0 (equivalently,<br />

every h ∈ H with err S (h) = 0 has err D (h) < ɛ).<br />

Theorem 6.14 (Growth function uniform convergence) For any class H and distribution<br />

D, if a training sample S is drawn from D <strong>of</strong> size<br />

n ≥ 8 [ln(2H[2n]) + ln(1/δ)]<br />

ɛ2 then with probability ≥ 1 − δ, every h ∈ H will have |err S (h) − err D (h)| ≤ ɛ.<br />

Theorem 6.15 (Sauer’s lemma) If VCdim(H) = d then H[n] ≤ ∑ d<br />

( n<br />

)<br />

i=0 i ≤ (<br />

en<br />

d )d .<br />

Notice that Sauer’s lemma was fairly tight in the case <strong>of</strong> axis-parallel rectangles,<br />

though in some cases it can be a bit loose. E.g., we will see that for linear separators<br />

in the plane, their VC-dimension is 3 but H[n] = O(n 2 ). An interesting feature about<br />

Sauer’s lemma is that it implies the growth function switches from taking the form 2 n to<br />

taking the form n VCdim(H) when n reaches the VC-dimension <strong>of</strong> the class H.<br />

Putting Theorems 6.13 and 6.15 together, with a little algebra we get the following<br />

corollary (a similar corollary results by combining Theorems 6.14 and 6.15):<br />

208

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!