08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Corollary 6.16 (VC-dimension sample bound) For any class H and distribution D,<br />

a training sample S <strong>of</strong> size<br />

( )<br />

1<br />

O<br />

ɛ [VCdim(H) log(1/ɛ) + log(1/δ)]<br />

is sufficient to ensure that with probability ≥ 1 − δ, every h ∈ H with err D (h) ≥ ɛ has<br />

err S (h) > 0 (equivalently, every h ∈ H with err S (h) = 0 has err D (h) < ɛ).<br />

For any class H, VCdim(H) ≤ log 2 (|H|) since H must have at least 2 k concepts in<br />

order to shatter k points. Thus Corollary 6.16 is never too much worse than Theorem 6.1<br />

and can be much better.<br />

6.9.2 Examples: VC-Dimension and Growth Function<br />

Rectangles with axis-parallel edges<br />

As we saw above, the class <strong>of</strong> axis-parallel rectangles in the plane has VC-dimension<br />

4 and growth function C[n] = O(n 4 ).<br />

Intervals <strong>of</strong> the reals<br />

Intervals on the real line can shatter any set <strong>of</strong> two points but no set <strong>of</strong> three points<br />

since the subset <strong>of</strong> the first and last points cannot be isolated. Thus, the VC-dimension<br />

<strong>of</strong> intervals is two. Also, C[n] = O(n 2 ) since we have O(n 2 ) choices for the left and right<br />

endpoints.<br />

Pairs <strong>of</strong> intervals <strong>of</strong> the reals<br />

Consider the family <strong>of</strong> pairs <strong>of</strong> intervals, where a pair <strong>of</strong> intervals is viewed as the set<br />

<strong>of</strong> points that are in at least one <strong>of</strong> the intervals, in other words, their set union. There<br />

exists a set <strong>of</strong> size four that can be shattered but no set <strong>of</strong> size five since the subset <strong>of</strong> first,<br />

third, and last point cannot be isolated. Thus, the VC-dimension <strong>of</strong> pairs <strong>of</strong> intervals is<br />

four. Also we have C[n] = O(n 4 ).<br />

Convex polygons<br />

Consider the set system <strong>of</strong> all convex polygons in the plane. For any positive integer<br />

n, place n points on the unit circle. Any subset <strong>of</strong> the points are the vertices <strong>of</strong> a convex<br />

polygon. Clearly that polygon will not contain any <strong>of</strong> the points not in the subset. This<br />

shows that convex polygons can shatter arbitrarily large sets, so the VC-dimension is<br />

infinite. Notice that this also implies that C[n] = 2 n .<br />

Half spaces in d-dimensions<br />

209

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!