Online Learning, Regret Minimization, and Game Theory - MLSS 08

More documents

Recommendations

Info

History and development (abridged) [Hannan’57, Blackwell’56]: Alg. with regret O((N/T) 1/2 ). Re-phrasing, need only T = O(N/ε 2 ) steps to get time-average regret down to ε. (will call this quantity T ε ) Optimal dependence on T (or ε). . Game-theoristsviewed #rows N as constant, not so important as T, sopretty much done.Why optimal in T?dest•Say world flips fair coin each day.•Any alg, in T days, has expected cost T/2.•But E[min(# heads,#tails)] = T/2 – O(T 1/2 ).•So, per-day gap is O(1/T 1/2 ).AlgorithmWorld – life - fate1 00 1
History and development (abridged) [Hannan’57, Blackwell’56]: Alg. with regret O((N/T) 1/2 ). Re-phrasing, need only T = O(N/ε 2 ) steps to get time-average regret down to ε. (will call this quantity T ε ) Optimal dependence on T (or ε). . Game-theoristsviewed #rows N as constant, not so important as T, sopretty much done. Learning-theory 80s-90s:“combining expert advice”.Imagine large class C of N prediction rules. Perform (nearly) as well as best f2C. f [LittlestoneittlestoneWarmuth’89]: Weighted-majority algorithmE[cost] · OPT(1+ε) ) + (log N)/ε.Regret O((log N)/T) 1/2 . T ε= O((log N)/ε 2 ). Optimal as fn of N too, plus lots of work on exactconstants, 2 nd order terms, etc. [CFHHSW93]… Extensions to bandit model (adds extra factor of N).
Page 1 and 2: Online Learning and GameTheory+On L
Page 3 and 4: Some books/references: Algorithmic
Page 5 and 6: Consider the following setting… E
Page 7 and 8: “No-regret” algorithms for repe
Page 9: Some intuition & properties of no-r
Page 13 and 14: Using “expert” adviceSay we wan
Page 15 and 16: Using “expert” adviceBut what i
Page 17 and 18: Analysis: do nearly as well as best
Page 19 and 20: Analysis• Say at time t we have f
Page 21 and 22: What if we have N options,not N pre
Page 23 and 24: Efficient implicit implementation f
Page 25 and 26: [Kalai-Vempala’03] and [Zinkevich
Page 27 and 28: Kalai-Vempala algorithm Recall setu
Page 29 and 30: Bandit setting What if alg is only
Page 31 and 32: A natural generalization This gener
Page 33 and 34: Can combine with [KV],[Z] too: Back
Page 35 and 36: Consider the following scenario…
Page 37 and 38: Game Theory terminolgy• Rows and
Page 39 and 40: Minimax-optimal strategies• Can s
Page 41 and 42: Minimax-optimal strategies• How a
Page 43 and 44: Summary of gameValue to guessergues
Page 45 and 46: Summary of gameValue to guessergues
Page 47 and 48: Interesting. The hider has a(random
Page 49 and 50: Nice proof of minimax thm• Suppos
Page 51 and 52: Can use notion of minimaxoptimality
Page 53 and 54: • Two players A and B. 3 cards: 1
Page 55 and 56: And the minimax optimalstrategies a
Page 57 and 58: Now, to General-Sum games!
Page 59 and 60: General-sum games• In general-sum
Page 61 and 62:
Nash Equilibrium• A Nash Equilibr
Page 63 and 64:
Uses• Economists use games and eq
Page 65 and 66:
NE can do strange things• Braess
Page 67 and 68:
Existence of NE• Proof will be no
Page 69 and 70:
Proof (cont)• S = {(p,q): p,q are
Page 71 and 72:
Try #1• What about f(p,q) = (p’
Page 73 and 74:
Instead we will use...• f(p,q) =
Page 75 and 76:
One more interesting game“Ultimat
Page 77 and 78:
Boosting & game theory• Suppose I
Page 79 and 80:
Stop 3: What happens ifeveryone is
Page 81 and 82:
What if everyone started using no-r
Page 83 and 84:
What can we say? If algorithms mini
Page 85 and 86:
Internal/swap-regretregret• E.g.,
Page 87 and 88:
Internal/swap-regretregret• If al
Page 89 and 90:
Internal/swap-regret,regret, contdC
Page 91 and 92:
Consider Wardrop/Roughgarden-Tardos
Page 93 and 94:
Global behavior of NR algs Interest
Page 95 and 96:
Global behavior of NR algs [B-EvenD
Page 97 and 98:
Global behavior of NR algs [B-EvenD
Page 99 and 100:
Last stop: On a Theory ofSimilarity
Page 101 and 102:
2-minute version• Suppose we are
Page 103 and 104:
Part 1: On similarityfunctions for
Page 105 and 106:
What’s s a kernel?• A kernel K
Page 107 and 108:
Moreover, generalize well if good m
Page 109 and 110:
I want to use ML to classify protei
Page 111 and 112:
Moreover, generalize well if good m
Page 113 and 114:
Goal: notion of “good similarity
Page 115 and 116:
Defn satisfying (1) and (2):• Say
Page 117 and 118:
How to use itAt least a 1-ε prob m
Page 119 and 120:
But not broad enough+ +_• Idea: w
Page 121 and 122:
Broader defn…• Ask that exists
Page 123 and 124:
And furthermoreNow, defn is broad e
Page 125 and 126:
Summary: part 1• Can develop suff
Page 127 and 128:
Part 2: Can we use thisangle to hel
Page 129 and 130:
What conditions on a similarity fun
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Let’s s weaken our goals a bit…
Page 137 and 138:
Then you can start getting somewher
Page 139 and 140:
Example analysis for simpler versio
Page 141 and 142:
PropertyStrict SeparationProperties
Page 143 and 144:
More general conditionsWhat if only
Page 145 and 146:
Other properties• Can also relate
Page 147 and 148:
Conclusions- Natural properties (re
show all

Online Learning, Regret Minimization, and Game Theory - MLSS 08

Create successful ePaper yourself

Delete template?

Save as template?