20.03.2015 Views

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Bandits</strong><br />

• K arms each arm i<br />

– Wins (reward 1) with probability p i<br />

– Looses (reward 0) with probability 1- p i<br />

• Exploration vs. Exploitation<br />

– Exploration is unbiased<br />

– Exploitation is biased by exploration only<br />

• Regret<br />

– Max return – Actual return

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!