08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

then with probability ≥ 1 − δ, every h ∈ H will have |err S (h) − err D (h)| ≤ ɛ.<br />

Pro<strong>of</strong>: This pro<strong>of</strong> is identical to the pro<strong>of</strong> <strong>of</strong> Theorem 6.13 except B ∗ is now the event<br />

that there exists a set h ∈ H[S ′′ ] such that the error <strong>of</strong> h on S differs from the error <strong>of</strong> h on<br />

S ′ by more than ɛ/2. We again consider the experiment where we randomly put the points<br />

in S ′′ into pairs (a i , b i ) and then flip a fair coin for each index i, if heads placing a i into S<br />

and b i into S ′ , else placing a i into S ′ and b i into S. Consider the difference between the<br />

number <strong>of</strong> mistakes h makes on S and the number <strong>of</strong> mistakes h makes on S ′ and observe<br />

how this difference changes as we flip coins for i = 1, 2, . . . , n. Initially, the difference<br />

is zero. If h makes a mistake on both or neither <strong>of</strong> (a i , b i ) then the difference does not<br />

change. Else, if h makes a mistake on exactly one <strong>of</strong> a i or b i , then with probability 1/2<br />

the difference increases by one and with probability 1/2 the difference decreases by one.<br />

If there are r ≤ n such pairs, then if we take a random walk <strong>of</strong> r ≤ n steps, what is the<br />

probability that we end up more than ɛn/2 steps away from the origin? This is equivalent<br />

to asking: if we flip r ≤ n fair coins, what is the probability the number <strong>of</strong> heads differs<br />

from its expectation by more than ɛn/4. By Hoeffding bounds, this is at most 2e −ɛ2 n/8 .<br />

This quantity is at most δ/(2H[2n]) as desired for n as given in the theorem statement.<br />

Finally, we prove Sauer’s lemma, relating the growth function to the VC-dimension.<br />

Theorem 6.15 (Sauer’s lemma) If VCdim(H) = d then H[n] ≤ ∑ d<br />

( n<br />

)<br />

i=0 i ≤ (<br />

en<br />

d )d .<br />

Pro<strong>of</strong>: Let d = VCdim(H). Our goal is to prove for any set S <strong>of</strong> n points that<br />

|H[S]| ≤ ( (<br />

n<br />

≤d)<br />

, where we are defining<br />

n<br />

∑<br />

≤d)<br />

=<br />

d<br />

( n<br />

i=0 i)<br />

; this is the number <strong>of</strong> distinct<br />

ways <strong>of</strong> choosing d or fewer elements out <strong>of</strong> n. We will do so by induction on n. As a<br />

base case, our theorem is trivially true if n ≤ d.<br />

As a first step in the pro<strong>of</strong>, notice that:<br />

( ) ( )<br />

n n − 1<br />

=<br />

≤ d ≤ d<br />

+<br />

( ) n − 1<br />

≤ d − 1<br />

(6.2)<br />

because we can partition the ways <strong>of</strong> choosing d or fewer items into those that do not<br />

include the first item (leaving ≤ d to be chosen from the remainder) and those that do<br />

include the first item (leaving ≤ d − 1 to be chosen from the remainder).<br />

Now, consider any set S <strong>of</strong> n points and pick some arbitrary point x ∈ S. By induction,<br />

we may assume that |H[S \ {x}]| ≤ ( )<br />

n−1<br />

≤d . So, by equation (6.2) all we need to show<br />

is that |H[S]| − |H[S \ {x}]| ≤ ( n−1<br />

≤d−1)<br />

. Thus, our problem has reduced to analyzing how<br />

many more partitions there are <strong>of</strong> S than there are <strong>of</strong> S \ {x} using sets in H.<br />

If H[S] is larger than H[S \ {x}], it is because <strong>of</strong> pairs <strong>of</strong> sets in H[S] that differ only<br />

on point x and therefore collapse to the same set when x is removed. For set h ∈ H[S]<br />

213

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!