19.11.2012 Views

Learning binary relations using weighted majority voting

Learning binary relations using weighted majority voting

Learning binary relations using weighted majority voting

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

270 GOLDMAN AND WARMUTH<br />

Furthermore, for y >_ 2<br />

@2 _ 7y + 2 4 1<br />

> - > -<br />

6y 2-10y+3 - 7 2<br />

and thus it suffices to have x _< ym/2 which is the case.<br />

Finally, it can be verified that fxxfyy - (fxy)2 is<br />

for y_> 2.<br />

y2 4y(y - 1) + 1 +bm In 2 + 2bray In 2(y - 2) + 2 in ~(-N-sTy_~ n(r~--l) ) ,_y (Q. 2 - 4y + 1) )<br />

16b 2 (In 2) 2 (y - 1)2re(f (x, y))4<br />

This completes the proof that f(x, y) is concave over the desired interval.<br />

Notes<br />

1. The adversary, who tries to maximize the learner's mistakes, knows the learner's algorithm and has unlimited<br />

computing power.<br />

2. The halving algorithm is described in more detail in Section 2.<br />

3, Throughout this paper we let lg denote the base 2 logarithm and In the natural logarithm. Euler's constant<br />

is denoted by e.<br />

4. The same result holds if one performs the updates after each trial.<br />

5. A version of this second algorithm is described in detail in Figure 3. For the sake of simplicity it is<br />

parameterized by an update the factor 3' = 2~/(1 + ~) instead of/3. See Section 5 for a discussion of<br />

this algorithm called Learn-Relation(7).<br />

6. We can obtain a bound of half of (1) by either letting the algorithm predict probabilistically in {0, 1}<br />

or deterministically in the interval [0, 1] (Cesa-Bianchi, Freund, Helmbold, Haussler & Schapire 1993;<br />

Kivinen & Warmuth, 1994). In the case of probabilistic predictions this quantity is an upper bound for<br />

the expected number of mistakes. And in the case of deterministic predictions in [0, 1], this quantity is<br />

an upper bound on the loss of the algorithm (where the loss is just the sum over all trials of the absolute<br />

difference between the prediction and the correct value).<br />

7. Recall that 9(z) = 1<br />

1+2z+2~ 2 and 9(ee) = 0.<br />

8. Again, we can obtain improvements of a factor of 2 by either letting the algorithm predict probabilistically<br />

in {0, 1} or deterministically in the interval [0, 1] (Cesa-Bianchi, Freund, Helmbold, Haussler, Schapire &<br />

Warmuth, 1993; Kivinen & Warmuth, 1994).<br />

References<br />

Angluin, D. (1988). Queries and concept learning. Machine <strong>Learning</strong>, 2(4): pp. 319-342.<br />

Barzdin, J. & Freivald, R. (1972). On the prediction of general recursive functions. Soviet Mathematics<br />

Doklady, 13: pp. 1224-1228.<br />

Cesa-Bianchi, N., Freund, Y., Helmbold, D., Haussler, D., Schapire, R. & Warmuth, M. (1993). How to<br />

use expert advice. Proceedings of the Twenty Fifth Annual ACM Symposium on Theory of Computing, pp.<br />

382-391.<br />

Chert, W. (1992). Personal communication.<br />

Goldman, S., Rivest, R. & Schapire, R. (1993). <strong>Learning</strong> <strong>binary</strong> <strong>relations</strong> and total orders. SIAM Journal of<br />

Computing, 22(5): pp. 1006-1034.<br />

>_0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!