08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

containing point x, define twin(h) = h \ {x}; this may or may not belong to H[S]. Let<br />

T = {h ∈ H[S] : x ∈ h and twin(h) ∈ H[S]}. Notice |H[S]| − |H[S \ {x}]| = |T |.<br />

Now, what is the VC-dimension <strong>of</strong> T ? If d ′ = VCdim(T ), this means there is some set<br />

R <strong>of</strong> d ′ points in S \ {x} that are shattered by T . By definition <strong>of</strong> T , all 2 d′ subsets <strong>of</strong> R<br />

can be extended to either include x, or not include x and still be a set in H[S]. In other<br />

words, R ∪ {x} is shattered by H. This means, d ′ + 1 ≤ d. Since VCdim(T ) ≤ d − 1, by<br />

induction we have |T | ≤ ( n−1<br />

≤d−1)<br />

as desired.<br />

6.9.4 VC-dimension <strong>of</strong> combinations <strong>of</strong> concepts<br />

Often one wants to create concepts out <strong>of</strong> other concepts. For example, given several<br />

linear separators, one could take their intersection to create a convex polytope. Or given<br />

several disjunctions, one might want to take their majority vote. We can use Sauer’s<br />

lemma to show that such combinations do not increase the VC-dimension <strong>of</strong> the class by<br />

too much.<br />

Specifically, given k concepts h 1 , h 2 , . . . , h k and a Booelan function f define the set<br />

comb f (h 1 , . . . , h k ) = {x ∈ X : f(h 1 (x), . . . , h k (x)) = 1}, where here we are using h i (x) to<br />

denote the indicator for whether or not x ∈ h i . For example, f might be the AND function<br />

to take the intersection <strong>of</strong> the sets h i , or f might be the majority-vote function. This can<br />

be viewed as a depth-two neural network. Given a concept class H, a Boolean function f,<br />

and an integer k, define the new concept class COMB f,k (H) = {comb f (h 1 , . . . , h k ) : h i ∈<br />

H}. We can now use Sauer’s lemma to produce the following corollary.<br />

Corollary 6.19 If the concept class H has VC-dimension d, then for any combination<br />

function f, the class COMB f,k (H) has VC-dimension O ( kd log(kd) ) .<br />

Pro<strong>of</strong>: Let n be the VC-dimension <strong>of</strong> COMB f,k (H), so by definition, there must exist<br />

a set S <strong>of</strong> n points shattered by COMB f,k (H). We know by Sauer’s lemma that there<br />

are at most n d ways <strong>of</strong> partitioning the points in S using sets in H. Since each set in<br />

COMB f,k (H) is determined by k sets in H, and there are at most (n d ) k = n kd different<br />

k-tuples <strong>of</strong> such sets, this means there are at most n kd ways <strong>of</strong> partitioning the points<br />

using sets in COMB f,k (H). Since S is shattered, we must have 2 n ≤ n kd , or equivalently<br />

n ≤ kd log 2 (n). We solve this as follows. First, assuming n ≥ 16 we have log 2 (n) ≤ √ n so<br />

kd log 2 (n) ≤ kd √ n which implies that n ≤ (kd) 2 . To get the better bound, plug back into<br />

the original inequality. Since n ≤ (kd) 2 , it must be that log 2 (n) ≤ 2 log 2 (kd). substituting<br />

log n ≤ 2 log 2 (kd) into n ≤ kd log 2 n gives n ≤ 2kd log 2 (kd).<br />

This result will be useful for our discussion <strong>of</strong> Boosting in Section 6.10.<br />

6.9.5 Other measures <strong>of</strong> complexity<br />

VC-dimension and number <strong>of</strong> bits needed to describe a set are not the only measures<br />

<strong>of</strong> complexity one can use to derive generalization guarantees. There has been significant<br />

214

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!