23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

We define the optimal mistake bound for a concept class C below.<br />

Definition: Let C be an arbitrary nonempty concept class. The optimal mistake<br />

bound for C, denoted Opt (C), is the minimum over all possible learning algorithms<br />

A of MA(C).<br />

Opt (C) = min MA (a<br />

Adearning algorithms<br />

Speaking informally, this definition states that Opt(C) is the number of<br />

mistakes made for the hardest target concept in C, using the hardest training<br />

sequence, by the best algorithm. Littlestone (1987) shows that for any concept<br />

class C, there is an interesting relationship among the optimal mistake bound for<br />

C, the bound of the HALVING algorithm, and the VC dimension of C, namely<br />

Furthermore, there exist concept classes for which the four quantities above<br />

are exactly equal. One such concept class is the powerset Cp of any finite set<br />

of instances X. In this case, VC(Cp) = 1x1 = log2(1CpJ), so all four quantities<br />

must be equal. Littlestone (1987) provides examples of other concept classes for<br />

which VC(C) is strictly less than Opt (C) and for which Opt (C) is strictly less<br />

than M~aIvin~(C)<br />

7.5.4 WEIGHTED-MAJORITY Algorithm<br />

In this section we consider a generalization of the HALVING algorithm called<br />

the WEIGHTED-MAJORITY algorithm. The WEIGHTED-MAJORITY algorithm makes<br />

predictions by taking a weighted vote among a pool of prediction algorithms and<br />

learns by altering the weight associated with each prediction algorithm. These<br />

prediction algorithms can be taken to be the alternative hypotheses in H, or they<br />

can be taken to be alternative learning algorithms that themselves vary over time.<br />

All that we require of a prediction algorithm is that it predict the value of the target<br />

concept, given an instance. One interesting property of the WEIGHTED-MAJORITY<br />

algorithm is that it is able to accommodate inconsistent training data. This is<br />

because it does not eliminate a hypothesis that is found to be inconsistent with<br />

some training example, but rather reduces its weight. A second interesting property<br />

is that we can bound the number of mistakes made by WEIGHTED-MAJORITY in<br />

terms of the number of mistakes committed by the best of the pool of prediction<br />

algorithms.<br />

The WEIGHTED-MAJORITY algorithm begins by assigning a weight of 1 to<br />

each prediction algorithm, then considers the training examples. Whenever a prediction<br />

algorithm misclassifies a new training example its weight is decreased by<br />

multiplying it by some number B, where 0 5 B < 1. The exact definition of the<br />

WEIGHTED-MAJORITY algorithm is given in Table 7.1.<br />

Notice if f? = 0 then WEIGHTED-MAJORITY is identical to the HALVING algorithm.<br />

On the other hand, if we choose some other value for p, no prediction

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!