08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

sleeping experts problem.<br />

Combining Sleeping Experts Algorithm:<br />

Initialize each expert h i with a weight w i = 1. Let ɛ ∈ (0, 1). For each example x, do the<br />

following:<br />

1. [Make prediction] Let H x denote the set <strong>of</strong> experts h i that make a prediction on x, and<br />

let w x =<br />

∑ w j . Choose h i ∈ H x with probability p ix = w i /w x and predict h i (x).<br />

h j ∈H x<br />

2. [Receive feedback] Given the correct label, for each h i ∈ H x let m ix = 1 if h i (x) was<br />

incorrect, else let m ix = 0.<br />

3. [Update weights] For each h i ∈ H x , update its weight as follows:<br />

( ∑ )<br />

• Let r ix =<br />

h j ∈H x<br />

p jx m jx /(1 + ɛ) − m ix .<br />

• Update w i ← w i (1 + ɛ) r ix<br />

.<br />

Note that ∑ h j ∈H x<br />

p jx m jx represents the algorithm’s probability <strong>of</strong> making a mistake<br />

on example x. So, h i is rewarded for predicting correctly (m ix = 0) especially<br />

when the algorithm had a high probability <strong>of</strong> making a mistake, and h i is penalized<br />

for predicting incorrectly (m ix = 1) especially when the algorithm had a low<br />

probability <strong>of</strong> making a mistake.<br />

For each h i ∉ H x , leave w i alone.<br />

Theorem 6.21 For any set <strong>of</strong> n sleeping experts h 1 , . . . , h n , and for any sequence <strong>of</strong><br />

examples S, the Combining Sleeping Experts Algorithm A satisfies for all i:<br />

E ( mistakes(A, S i ) ) ≤ (1 + ɛ) · mistakes(h i , S i ) + O ( )<br />

log n<br />

ɛ<br />

where S i = {x ∈ S : h i ∈ H x }.<br />

Pro<strong>of</strong>: Consider sleeping expert h i . The weight <strong>of</strong> h i after the sequence <strong>of</strong> examples S<br />

is exactly:<br />

[( ∑ ]<br />

∑x∈S<br />

w i = (1 + ɛ) i h j ∈Hx p jxm jx<br />

)/(1+ɛ)−m ix<br />

= (1 + ɛ) E[mistakes(A,S i)]/(1+ɛ)−mistakes(h i ,S i ) .<br />

Let w = ∑ j w j. Clearly w i ≤ w. Therefore, taking logs, we have:<br />

E ( mistakes(A, S i ) ) /(1 + ɛ) − mistakes(h i , S i ) ≤ log 1+ɛ w.<br />

So, using the fact that log 1+ɛ w = O( log W ),<br />

ɛ<br />

E ( mistakes(A, S i ) ) ≤ (1 + ɛ) · mistakes(h i , S i ) + O ( )<br />

log w<br />

ɛ .<br />

221

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!