Learning binary relations using weighted majority voting

More documents

Recommendations

Info

258 GOLDMAN AND WARMUTH : min {kprn + 3C~p + 2V/Ct((n - k)lnk + k)+ (n- k)lgk + klge} for our first algorithm, where the minimum is taken over all partitions p of size at most with noise at most c~, and kp denotes the size and c~p the noise of partition p. For the above tuning we needed an upper bound for both the size and the noise of the partition. If an upper bound for only one of the two is known, then the standard doubling trick can be used to guess the other. This causes only a slight increase in the mistake bound (see Cesa-Bianchi, Freund, Helmbold, Haussler, Schapire, and Warmuth (1993)). Note that in the above mistake bound there is a subtle tradeoff between the noise Ctp and size kp of a partition p. Recall that when this first algorithm is applied in the noise-free case that it essentially matches the information-theoretic lower bound. An interesting question is whether or not it can be shown to be essentially optimal in the case of learning non-pure relations. As we show in the next section (Theorem 3), when using the second algorithm for [earning non-pure relations, our algorithm makes at most min ~ {hpm + 13mn21gk+4c~pmn (1- -~n) +mni24cm (1- -~n) lnk mistakes in the worst case, where the minimum is taken over all partitions p, and kp denotes the size and C~p the noise of partition p where kp < k and C~p _< c~. 5. Algorithm Two: A Polynomial-time Algorithm In this section we analyze our second algorithm, Learn-Relation, when applied to learning non-pure relations. (Recall that Learn-Relation is shown in Figure 3.) The mistake bound of Theorem 2 obtained for this algorithm is larger than the mistake bound of the first algorithm. However this algorithm uses only (~) weights as opposed to exponentially many. We begin by giving an update that is equivalent to the one used in WMG (Littlestone & Warmuth, 1989). Recall that in WMG if x is the prediction of an input with weight w, then ifthe feedback is the bit p then w is multiplied by 1 - (1 -/3)]x - pl for/3 E [0, 1). If/3 = 3,/(2 - ~,), then for our application the update of WMG can be summarized as follows • If a node predicts correctly (so, ]x - pl = 0) its weight is not changed. • If a node makes a prediction of 1/2 then its weight is multiplied by 1/(2 - 3,). • If a node predicts incorrectly (so, ]x - Pt = 1) then its weight is multiplied by ~/(2 - ~).
LEARNING BINARY RELATIONS 259 In the new update all factors in the above algorithm are simply multiplied by (2 - "y). This update is used in our Algorithm Learn-Relation('y) since it leads to simpler proofs. Because voting is performed by a weighted majority vote, the predictions made by the two schemes are identical. In order to use the analysis technique of Littlestone and Warmuth we must obtain a lower bound for the final weight in the system. However, using WMG the weight in the system is decreased by nodes that do not predict, and thus we would have to compute an upper bound on the total number of times that this occurs. Thus to simplify the analysis, we have modified the update scheme (i.e. at each step we multiplied all weights by (2 - 3/)) so that the weights of nodes that do not predict remain unchanged. 5.1. The Analysis In this section we compute an upper bound on the number of mistakes made by Learn- Relation. os ô~£ We begin with some preliminaries. Ler fzx = -~~ , fyy = õ-Uv' °~I and fzy = . A function f : N --+ ~ is concave (respectively convex) over an interval D of N if for all x E D, fxx(x) 0). In our analysis we repeatedly use the following variants of Jensen's inequality. Ler f be a function from ~ to N that is concave over some interval D of N. Let q E JV', and let xl,x2,.. ,Xq E D. Then q q xi = U ~ E f(xi)
Page 1 and 2: Machine Leaming, 20, 245-271 (1995)
Page 3 and 4: LEARNING BINARY RELATIONS 247 When
Page 5 and 6: LEARNING BINARY RELATIONS 249 ~,~,.
Page 7 and 8: LEARNING BINARY RELATIONS 251 l J i
Page 9 and 10: LEARNING BINARY RELATIONS 253 Learn
Page 11 and 12: LEARNING BINARY RELATIONS 255 We no
Page 13: LEARNING BINARY RELATIONS 257 Solvi
Page 17 and 18: LEARNING BINARY RELATIONS 261 depen
Page 19 and 20: LEARNING BINARY RELATIONS 263 m V S
Page 21 and 22: LEARNING BINARY RELATIONS 265 Final
Page 23 and 24: LEARNING BINARY RELATIONS 267 32500
Page 25 and 26: LEARNING BINARY RELATIONS 269 mance
Page 27: LEARNING BINARY RELATIONS 271 Kivin

Learning binary relations using weighted majority voting

Create successful ePaper yourself

Delete template?

Save as template?