30.01.2015 Views

Lecture 11: Learning in Games - Fictitious Play 1 Introduction 2 ...

Lecture 11: Learning in Games - Fictitious Play 1 Introduction 2 ...

Lecture 11: Learning in Games - Fictitious Play 1 Introduction 2 ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.254: Game Theory with Eng<strong>in</strong>eer<strong>in</strong>g Applications March 13-18, 2008<br />

<strong>Lecture</strong> <strong>11</strong>: <strong>Learn<strong>in</strong>g</strong> <strong>in</strong> <strong>Games</strong> - <strong>Fictitious</strong> <strong>Play</strong><br />

<strong>Lecture</strong>r: Asu Ozdaglar<br />

1 <strong>Introduction</strong><br />

Most economic theory relies on equilibrium analysis based on Nash equilibrium or its ref<strong>in</strong>ements.<br />

The traditional explanation for when and why equilibrium arises is that it results<br />

from analysis and <strong>in</strong>trospection by the players <strong>in</strong> a situation where the rules of the game,<br />

the rationality of the players, and the payoff functions of players are all common knowledge.<br />

In this lecture, we develop an alternative explanation why equilibrium arises as the longrun<br />

outcome of a process <strong>in</strong> which less than fully rational players grope for optimality over<br />

time. We <strong>in</strong>vestigate theoretical models of learn<strong>in</strong>g <strong>in</strong> games. Our focus will be on the process<br />

of fictitious play (and its variants), which is a widely used model of learn<strong>in</strong>g. In this process,<br />

agents behave as if they th<strong>in</strong>k they are fac<strong>in</strong>g a stationary, but unknown, distribution of<br />

opponents’ strategies. We exam<strong>in</strong>e whether fictitious play is a sensible model of learn<strong>in</strong>g<br />

and the asymptotic behavior of play when all players use fictitious play learn<strong>in</strong>g rules.<br />

2 <strong>Fictitious</strong> <strong>Play</strong><br />

One of the earliest learn<strong>in</strong>g rules studied is fictitious play. <strong>Fictitious</strong> play was <strong>in</strong>troduced by<br />

Brown [1]. It is a “belief-based” learn<strong>in</strong>g rule, i.e., players form beliefs about opponent play<br />

and behave rationally with respect to these beliefs.<br />

The model for fictitious play is given below. Here we focus on the case where we have<br />

two players only, see [2] for the analysis of fictitious play for multiple players.<br />

• Two players, i = 1, 2, play the strategic form game G at times t = 1, 2, . . ..<br />

• Denote the stage payoff of player i by g i (s i , s −i ).<br />

<strong>11</strong>-1


• For t = 1, 2, . . . and i = 1, 2, def<strong>in</strong>e the function η t i : S −i → N, where η t i(s −i ) is<br />

the number of times player i has observed the action s −i before time t. Let η 0 i (s −i )<br />

represent a start<strong>in</strong>g po<strong>in</strong>t (or fictitious past).<br />

Example 1: Assume that S 2 = {U, D}. If η 0 1(U) = 3 and η 0 1(D) = 5, and player 2 plays<br />

U, U, D <strong>in</strong> the first three periods, then η 3 1(U) = 5 and η 3 1(D) = 6.<br />

Each player assumes that his opponent is us<strong>in</strong>g a stationary mixed strategy.<br />

<strong>Play</strong>ers<br />

choose actions <strong>in</strong> each period (or stage) to maximize that period’s expected payoff given<br />

their prediction of the distribution of opponent’s actions, which they form accord<strong>in</strong>g to:<br />

µ t i(s −i ) =<br />

η t i(s −i )<br />

∑¯s −i ∈S −i<br />

η t i (¯s −i) ,<br />

i.e., player i forecasts player −i’s strategy at time t to be the empirical frequency distribution<br />

of past play.<br />

Given player i’s belief/forecast about his opponents play, he chooses his action at time t<br />

to maximize his payoff, i.e.,<br />

s t i ∈ arg max<br />

s i ∈S i<br />

g i (s i , µ t i).<br />

This choice is myopic, i.e., players are try<strong>in</strong>g to maximize current payoff without consider<strong>in</strong>g<br />

their future payoffs. Note that myopia is consistent with the assumption that players are<br />

us<strong>in</strong>g stationary mixed strategies.<br />

Example 2: Consider the fictitious play of the follow<strong>in</strong>g game:<br />

L<br />

R<br />

U 3, 3 0, 0<br />

D 4, 0 1, 1<br />

Note that this game is dom<strong>in</strong>ant solvable (D is a strictly dom<strong>in</strong>ant strategy for the row<br />

player), and the unique Nash equilibrium is (D, R). Assume that η 0 1 = (3, 0) and η 0 2 = (1, 2.5).<br />

Period 1: Then, µ 0 1 = (1, 0) and µ 0 2 = (1/3.5, 2.5/3.5), so play follows s 0 1 = D and s 0 2 = L.<br />

Period 2: We have η 1 1 = (4, 0) and η 1 2 = (1, 3.5), so play follows s 1 1 = D and s 1 2 = R.<br />

Period 3: We have η 1 1 = (4, 1) and η 1 2 = (1, 4.5), so play follows s 2 1 = D and s 2 2 = R.<br />

<strong>11</strong>-2


S<strong>in</strong>ce D is a dom<strong>in</strong>ant strategy for the row player, he always plays D, and µ t 2 converges<br />

to (0, 1) with probability 1. Therefore, player 2 will end up play<strong>in</strong>g R.<br />

Remark: The strik<strong>in</strong>g feature of the fictitious play is that players don’t have to know<br />

anyth<strong>in</strong>g about their opponent’s payoff. They only form beliefs about how their opponents<br />

will play.<br />

3 Convergence of <strong>Fictitious</strong> <strong>Play</strong><br />

Let {s t } be a sequence of strategy profiles generated by fictitious play. In this section, we<br />

study the asymptotic behavior of the sequence {s t }, i.e., the convergence properties of the<br />

sequence {s t } as t goes to <strong>in</strong>f<strong>in</strong>ity.<br />

We first def<strong>in</strong>e the notion of convergence to pure strategies.<br />

Def<strong>in</strong>ition 1 The sequence {s t } converges to s if there exists T such that s t = s for all<br />

t ≥ T .<br />

The next proposition formalizes the property that if the fictitious play sequence converges,<br />

then it must converge to a Nash equilibrium of the game.<br />

Proposition 1 Let {s t } be a sequence of strategy profiles generated by fictitious play.<br />

(a) If {s t } converges to ¯s, then ¯s is a pure strategy Nash equilibrium.<br />

(b) Suppose that for some t, s t = s ∗ , where s ∗ is a strict Nash equilibrium of G. Then<br />

s τ = s ∗ for all τ > t.<br />

Part (a) is straightforward. We provide a proof for part (b):<br />

Proof of part (b): Let s t = s ∗ . We will show that s t+1 = s ∗ . Note that<br />

µ t+1<br />

i<br />

= (1 − α)µ t i + αs t −i = (1 − α)µ t i + αs ∗ −i,<br />

where, abus<strong>in</strong>g the notation, we used s t −i to denote the degenerate probability distribution<br />

and<br />

α =<br />

1<br />

∑s −i<br />

η t i (s −i) + 1 .<br />

<strong>11</strong>-3


Therefore, by the l<strong>in</strong>earity of the expectation, we have for all s i ∈ S i ,<br />

S<strong>in</strong>ce s ∗ i<br />

g i (s i , µ t+1<br />

i ) = (1 − α)g i (s i , µ t i) + αg i (s i , s ∗ −i).<br />

maximizes both terms (<strong>in</strong> view of the fact that s ∗ is a strict Nash equilibrium), it<br />

follows that s ∗ i will be played at t + 1. Q.E.D.<br />

Note that the preced<strong>in</strong>g notion of convergence only applies to pure strategies. We next<br />

provide an alternative notion of convergence, i.e., convergence of empirical distributions or<br />

beliefs.<br />

Def<strong>in</strong>ition 2 The sequence {s t } converges to σ ∈ Σ <strong>in</strong> the time-average sense if for all i<br />

and for all s i ∈ S i , we have<br />

[number of times s t i = s i for t ≤ T ]<br />

lim<br />

T →∞<br />

T + 1<br />

i.e., µ T −i(s i ) converges to σ i (s i ) as T → ∞.<br />

= σ(s i ),<br />

The next example illustrates convergence of the fictitious play sequence <strong>in</strong> the timeaverage<br />

sense.<br />

Example 1: (Match<strong>in</strong>g Pennies) Consider the fictitious play of the follow<strong>in</strong>g match<strong>in</strong>g<br />

pennies game:<br />

Consider the follow<strong>in</strong>g sequence of play:<br />

H<br />

T<br />

H 1, −1 −1, 1<br />

T −1, 1 1, −1<br />

η1 t η2 t <strong>Play</strong><br />

0 (0,0) (0,2) (H,H)<br />

1 (1,0) (1,2) (H,H)<br />

2 (2,0) (2,2) (H,T)<br />

3 (2,1) (3,2) (H,T)<br />

4 (2,2) (4,2) (T,T)<br />

5 (2,3) (4,3) (T,T)<br />

6 · · · · · · (T,H)<br />

<strong>11</strong>-4


<strong>Play</strong> cont<strong>in</strong>ues as (T,H), (H,H), (H,H) - a determ<strong>in</strong>istic cycle. The time average converges<br />

(<br />

)<br />

to (1/2, 1/2), (1/2, 1/2) , which is the unique Nash equilibrium.<br />

The next proposition extends the basic convergence property of fictitious play, stated <strong>in</strong><br />

Proposition 1, to mixed strategies.<br />

Proposition 2 Suppose a fictitious play sequence {s t } converges to σ <strong>in</strong> the time-average<br />

sense. Then σ is a Nash equilibrium of G.<br />

Proof: Suppose s t converges to σ <strong>in</strong> the time-average sense. Assume that σ is not a Nash<br />

equilibrium. Then there exist some i, s i , s ′ i ∈ S i with σ i (s i ) > 0 such that<br />

g i (s ′ i, σ −i ) > g i (s i , σ −i ).<br />

Choose ɛ > 0 such that<br />

ɛ < 1 2<br />

[<br />

]<br />

g i (s ′ i, σ −i ) − g i (s i , σ −i ) ,<br />

and T sufficiently large that for all t ≥ T , we have<br />

∣<br />

∣µ T ɛ<br />

i (s −i ) − σ −i (s −i ) ∣ <<br />

max s∈S g i (s) ,<br />

which is possible s<strong>in</strong>ce µ t i → σ −i by assumption. Then, for any t ≥ T , we have<br />

g i (s i , µ t i) = ∑ s −i<br />

g i (s i , s −i )µ t i(s −i )<br />

≤ ∑ s −i<br />

g i (s i , s −i )σ −i (s −i ) + ɛ<br />

< ∑ s −i<br />

g i (s ′ i, s −i )σ −i (s −i ) − ɛ<br />

≤ ∑ s −i<br />

g i (s ′ i, s −i )µ t i(s −i ) = g i (s ′ i, µ t i).<br />

This shows that after T , s i is never played, imply<strong>in</strong>g that as T → ∞, µ t −i(s i ) → 0. But this<br />

contradicts the fact that σ i (s i ) > 0, complet<strong>in</strong>g the proof. □<br />

It is important to realize that convergence <strong>in</strong> the time-average sense is not necessarily a<br />

natural convergence notion, as illustrated <strong>in</strong> the follow<strong>in</strong>g example.<br />

Example 2: (Mis-coord<strong>in</strong>ation) Consider the fictitious play of the follow<strong>in</strong>g game:<br />

<strong>11</strong>-5


A<br />

B<br />

A 1, 1 0, 0<br />

B 0, 0 1, 1<br />

Note that this game had a unique mixed Nash equilibrium<br />

(<br />

)<br />

(1/2, 1/2), (1/2, 1/2) . Consider<br />

the follow<strong>in</strong>g sequence of play:<br />

η1 t η2 t <strong>Play</strong><br />

0 (1/2,0) (0,1/2) (A,B)<br />

1 (1/2,1) (1,1/2) (B,A)<br />

2 (3/2,1) (1,3/2) (A,B)<br />

3 · · · · · · (B,A)<br />

<strong>Play</strong> cont<strong>in</strong>ues as (A,B), (B,A), . . . - aga<strong>in</strong> a determ<strong>in</strong>istic cycle. The time average<br />

(<br />

)<br />

converges to (1/2, 1/2), (1/2, 1/2) , which is a mixed strategy equilibrium of the game.<br />

But players never successfully coord<strong>in</strong>ate!<br />

The follow<strong>in</strong>g proposition provides some classes of games for which fictitious play converges<br />

<strong>in</strong> the time-average sense (see Fudenberg and Lev<strong>in</strong>e [2]; see also [9], [6], [7]).<br />

Proposition 3 <strong>Fictitious</strong> play converges <strong>in</strong> the time-average sense for the game G under<br />

any of the follow<strong>in</strong>g conditions:<br />

(a) G is a two player zero-sum game.<br />

(b) G is a two player nonzero-sum game where each player has at most two strategies.<br />

(c) G is solvable by iterated strict dom<strong>in</strong>ance.<br />

(d) G is an identical <strong>in</strong>terest game, i.e., all players have the same payoff function.<br />

(e) G is an (exact) potential game.<br />

In Section 5, we consider a cont<strong>in</strong>uous time variant of fictitious play and provide the<br />

proof of convergence for two player zero-sum games.<br />

<strong>11</strong>-6


4 Nonconvergence of <strong>Fictitious</strong> <strong>Play</strong><br />

The next proposition shows that there exists a game for which fictitious play does not<br />

converge.<br />

Proposition 4 (Shapley [12]) In a modified version of Rock-Scissors-Paper game, fictitious<br />

play does not converge.<br />

Example 3: Shapley’s Rock-Scissors-Paper game has payoffs:<br />

R S P<br />

R 0, 0 1, 0 0, 1<br />

S 0, 1 0, 0 1, 0<br />

P 1, 0 0, 1 0, 0<br />

Suppose that η 0 1 = (1, 0, 0) and that η 0 2 = (0, 1, 0). Then <strong>in</strong> period 0, play is (P,R). In period<br />

1, player 1 expects R, and 2 expects S, so play is (P,R). <strong>Play</strong> then cont<strong>in</strong>ues to follow (P,R)<br />

until player 2 switches to S. Suppose this takes k periods. Then play follows (P,S), until<br />

player 1 switches to to R. This will take βk periods (verify this), with β > 1. <strong>Play</strong> then<br />

follows (R,S), until player 2 switches to P. This takes β 2 k periods. And so on, with the key<br />

be<strong>in</strong>g that each switch takes longer than last.<br />

Shapley’s proof of the nonconvergence of fictitious play <strong>in</strong> the above game explicitly<br />

computes time spent <strong>in</strong> each stage of the sequence. Here, we provide an alternative <strong>in</strong>sightful<br />

proof due to Monderer et al. [8]. For this we will make use of the follow<strong>in</strong>g lemma. Def<strong>in</strong>e<br />

the time-average payoffs through time t as:<br />

U t i = 1<br />

t + 1<br />

Def<strong>in</strong>e the expected payoffs at time t as:<br />

t∑<br />

g i (s τ i , s τ −i).<br />

τ=0<br />

Ũ t i = g i (s)i t , µ t i) = max<br />

s i ∈S i<br />

g i (s i , µ t i).<br />

<strong>11</strong>-7


Lemma 1 Given any ɛ > 0 and any i ∈ I, there exists some T such that for all t ≥ T , we<br />

have<br />

Ũ t i ≥ U t i − ɛ.<br />

Proof: Note that<br />

Ũi t = g i (s t i, µ t i) ≥ g i (s t−1<br />

i , µ t i)<br />

1<br />

=<br />

t + 1 g i(s t−1<br />

i<br />

1<br />

=<br />

t + 1 g i(s t−1<br />

i<br />

, s t−1<br />

−i ) +<br />

, s t−1<br />

−i ) +<br />

t<br />

t + 1 g i(s t−1<br />

i , µ t−1<br />

t<br />

t + 1 Ũ t−1<br />

i .<br />

i )<br />

Expand<strong>in</strong>g Ũ t−1<br />

i<br />

and iterat<strong>in</strong>g, we obta<strong>in</strong><br />

Ũ t i ≥ 1<br />

t + 1<br />

t∑<br />

τ=0<br />

g i (s τ i , s τ −i) − 1<br />

t + 1 g i(s t i, s t −i).<br />

Choos<strong>in</strong>g T such that<br />

completes the proof.<br />

□<br />

ɛ > 1<br />

T + 1 max<br />

s∈S g i(s),<br />

We can now use the preced<strong>in</strong>g lemma to prove Shapley’s result.<br />

Proof of Shapley Nonconvergence: In the Rock-Scissors-Paper game, there is a unique<br />

Nash equilibrium with expected payoffs of 1/3 for both players. Therefore, if the fictitious<br />

play converged, then Ũ t i → 1/3 for i = 1, 2, imply<strong>in</strong>g that Ũ t 1 + Ũ t 2 → 2/3. But under<br />

fictitious play, realized payoffs always sum to 1, i.e.,<br />

U t 1 + U t 2 = 1, for all t,<br />

thus contradict<strong>in</strong>g the lemma above, and show<strong>in</strong>g that fictitious play does not converge.<br />

Q.E.D<br />

For extensions of fictitious play (e.g., stochastic fictitious play, derivative action fictitious<br />

play) and their convergence behavior, see Fudenberg and Lev<strong>in</strong>e [2] and also Shamma and<br />

Arslan [<strong>11</strong>].<br />

<strong>11</strong>-8


5 Convergence Proofs<br />

We now consider a cont<strong>in</strong>uous time version of fictitious play and provide a proof of convergence<br />

for zero-sum games.<br />

5.1 Cont<strong>in</strong>uous Time <strong>Fictitious</strong> <strong>Play</strong><br />

Let us <strong>in</strong>troduce some notation to facilitate the subsequent analysis. We denote the empirical<br />

distribution of player i’s play up to (but not <strong>in</strong>clud<strong>in</strong>g) time t by<br />

p t i(s i ) =<br />

∑ t−1<br />

τ=0 I{sτ i = s i }<br />

.<br />

t<br />

We use p t ∈ Σ to denote the product distribution formed by the p t i.<br />

In cont<strong>in</strong>uous time fictitious play (CTFP), the empirical distributions of the players are<br />

updated <strong>in</strong> the direction of a best response to their opponents’ past action:<br />

where<br />

where<br />

dp t i<br />

dt ∈ BR i(p t −i) − p t i,<br />

BR i (p t −i) = arg max<br />

σ i ∈Σ i<br />

g i (σ i , p t −i).<br />

Another variant of the CTFP is the perturbed CTFP def<strong>in</strong>ed by<br />

dp t i<br />

dt = C i(p t −i) − p t i,<br />

[<br />

]<br />

C i (p t −i) = arg max g i (σ i , p t −i) − V i (σ i ) , (1)<br />

σ i ∈Σ i<br />

and V i : Σ i → R is a strictly convex function and satisfies a boundary condition, which<br />

we <strong>in</strong>formally state as follows: “as σ i approaches the boundary of the probability simplex,<br />

‖∇V i (σ i )‖ goes to <strong>in</strong>f<strong>in</strong>ity.” Note that because the perturbed best-response C i is uniquely<br />

def<strong>in</strong>ed, the perturbed CTFP is described by a differential equation rather than a differential<br />

<strong>in</strong>clusion.<br />

<strong>11</strong>-9


5.2 Convergence of (perturbed) CTFP for Zero-sum <strong>Games</strong><br />

In this section, we provide a proof of convergence of perturbed CTFP for two player zero-sum<br />

games. The proof presented here is due to Shamma and Arslan (see also Johari [5]).<br />

We consider a two player zero-sum game with payoff matrix M, where the payoffs are<br />

perturbed by the functions V i (see the previous section), i.e., the payoffs are given by<br />

Π 1 (σ 1 , σ 2 ) = σ ′ 1Mσ 2 − V 1 (σ 1 ), (2)<br />

Let {p t } be generated by the perturbed CTFP,<br />

Π 2 (σ 1 , σ 2 ) = −σ ′ 1Mσ 2 − V 2 (σ 2 ). (3)<br />

dp t i<br />

dt = C i(p t −i) − p t i,<br />

[cf. Eq. (1)]. We use a Lyapunov function approach to prove convergence. In particular, we<br />

consider the function<br />

W (t) = U 1 (p t ) + U 2 (p t ),<br />

where the functions U i : Σ → R are def<strong>in</strong>ed as<br />

U i (σ i , σ −i ) = max Π i (σ i, ′ σ −i ) − Π i (σ i , σ −i ),<br />

σ i ′ ∈Σ i<br />

(i.e., the function U i gives the maximum possible payoff improvement player i can achieve<br />

by a unilateral deviation <strong>in</strong> his own mixed strategy). Note that U i (σ) ≥ 0 for all σ ∈ Σ, and<br />

U i (σ) = 0 for all i implies that σ is a mixed Nash equilibrium.<br />

For the zero sum game with payoffs (2)-(3), the function W (t) takes the form<br />

We will show that<br />

W (t) = max Π 1 (σ 1, ′ p t 2) + max Π i (p t 1, σ 2) ′ + V 1 (p t 1) + V 2 (p t 2).<br />

σ 1 ′ ∈Σ 1<br />

σ 2 ′ ∈Σ 2<br />

dW (t)<br />

dt<br />

all <strong>in</strong>itial conditions p 0 , we have<br />

We need the follow<strong>in</strong>g lemma.<br />

≤ 0 with equality if and only if p t i = C i (p t −i), show<strong>in</strong>g that for<br />

( )<br />

lim p t i − C i (p t −i = 0 i = 1, 2.<br />

t→∞<br />

<strong>11</strong>-10


Lemma 2 (Envelope Theorem) Let F : R n × R m → R be a cont<strong>in</strong>uously differentiable<br />

function. Let U ⊂ R m be an open convex subset, and u ∗ (x) be a cont<strong>in</strong>uously differentiable<br />

function such that<br />

Let H(x) = m<strong>in</strong> u∈U F (x, u). Then,<br />

F (x, u ∗ (x)) = m<strong>in</strong> F (x, u).<br />

u∈U<br />

∇ x H(x) = ∇ x F (x, u ∗ (x)).<br />

Proof: The gradient of H(x) is given by<br />

∇ x H(x) = ∇ x F (x, u ∗ (x)) + ∇ u F (x, u ∗ (x))∇ x u ∗ (x)<br />

= ∇ x F (x, u ∗ (x)),<br />

where we use the fact that ∇ u F (x, u ∗ (x)) = 0 s<strong>in</strong>ce u ∗ (x) m<strong>in</strong>imizes F (x, u).<br />

Us<strong>in</strong>g the preced<strong>in</strong>g lemma, we have<br />

[<br />

]<br />

d<br />

max Π 1 (σ<br />

dt<br />

1, ′ p t 2)<br />

σ 1 ′ ∈Σ 1<br />

= ∇ σ2 Π 1 (C 1 (p t 2), p t 2) ′ dpt 2<br />

dt<br />

□<br />

= C 1 (p t 2) ′ M dpt 2<br />

dt<br />

= C 1 (p t 2) ′ M (C 2 (p t 1) − p t 2).<br />

Similarly,<br />

[<br />

]<br />

d<br />

max Π 2 (p t<br />

dt<br />

1, σ 2)<br />

′ = −(C 1 (p t 2) − p t 1) ′ MC 2 (p t 1).<br />

σ 2 ′ ∈Σ 2<br />

Comb<strong>in</strong><strong>in</strong>g the preced<strong>in</strong>g two relations, we obta<strong>in</strong><br />

dW (t)<br />

dt<br />

= −C 1 (p t 2) ′ Mp t 2 + (p t 1) ′ MC 2 (p t 1) + ∇V 1 (p t 1) ′ dpt 1<br />

dt + ∇V 2(p t 2) ′ dpt 2<br />

dt . (4)<br />

S<strong>in</strong>ce C i (p t −i) is a perturbed best response, we have<br />

C 1 (p t 2) ′ Mp t 2 − V 1 (C 1 (p t 2)) ≥ (p t 1) ′ Mp t 2 − V 1 (p t 1),<br />

−(p t 1) ′ MC 2 (p t 1) − V 2 (C 2 (p t 1)) ≥ −(p t 1) ′ Mp t 2 − V 2 (p t 2),<br />

<strong>11</strong>-<strong>11</strong>


with equality if and only if C i (p t −i) = p t i, i = 1, 2 (the latter claim follows by the uniqueness<br />

of the perturbed best response). Comb<strong>in</strong><strong>in</strong>g these relations, we have<br />

−C 1 (p t 2) ′ Mp t 2 + (p t 1) ′ MC 2 (p t 1) ≤ ∑ i<br />

[V i (p t i) − V i (C i (p t −i))]<br />

≤<br />

∑ i<br />

∇V i (p t i) ′ (C i (p t i) − p t i)<br />

= − ∑ i<br />

∇V i (p t i) ′ dpt i<br />

dt ,<br />

where the second <strong>in</strong>equality follows by the convexity of V i . The preced<strong>in</strong>g relation and Eq.<br />

(4) imply that dW dt<br />

≤ 0 for all t, with equality if and only if C i (p t −i) = p t i for both players,<br />

complet<strong>in</strong>g the proof.<br />

References<br />

[1] G. Brown, “Iteartive solution of games by fictitious play,” <strong>in</strong> Activity Analysis of Production<br />

and Allocation, T. Koopmans, editor, John Wiley and Sons, New York, New<br />

York, 1951.<br />

[2] Fudenberg D. and Lev<strong>in</strong>e D. K., The Theory of <strong>Learn<strong>in</strong>g</strong> <strong>in</strong> <strong>Games</strong>, MIT Press, 1999.<br />

[3] Hart S. and Mas-Colell A., “A simple adaptive procedure lead<strong>in</strong>g to correlated equilibrium,”<br />

Econometrica, vol. 68, no. 5, pp. <strong>11</strong>27-<strong>11</strong>50, 2000.<br />

[4] Hart S., “Adaptive Heuristics,” Econometrica, vol. 73, no. 5, pp. 1401-1430, 2005.<br />

[5] Johari R., <strong>Lecture</strong> Notes, 2007.<br />

[6] Miyasawa K., “On the convergence of learn<strong>in</strong>g processes <strong>in</strong> a 2x2 nonzero sum game,”<br />

Research Memo 33, Pr<strong>in</strong>ceton, 1961.<br />

[7] Monderer D. and Shapley L., “<strong>Fictitious</strong> play property for games with identical <strong>in</strong>terests,”<br />

Journal of Economic Theory, vol. 68, pp. 258-265, 1996.<br />

[8] Monderer D., Samet D., and Sela A., “Belief affirm<strong>in</strong>g <strong>in</strong> learn<strong>in</strong>g processes,” Mimeo,<br />

Technion, 1994.<br />

<strong>11</strong>-12


[9] Rob<strong>in</strong>son J., “An iterative method of solv<strong>in</strong>g a game,” Annals of Mathematical Statistics,<br />

vo. 54, pp. 296-301, 1951.<br />

[10] Shamma J. S. and Arslan G., “Unified convergence proofs of cont<strong>in</strong>uous-time fictitious<br />

play”, IEEE Transactions on Automatic Control, pp. <strong>11</strong>37-<strong>11</strong>42, July 2004.<br />

[<strong>11</strong>] Shamma J. S. and Arslan G., “Dynamic fictitious play, dynamic gradient play, and<br />

distributed convergence to Nash equilibria”, IEEE Transactions on Automatic Control,<br />

pp. 312-327, March 2005.<br />

[12] Shapley L., “Some Topics <strong>in</strong> Two-Person <strong>Games</strong>” Advance <strong>in</strong> Game Theory M.<br />

Drescher, L.S. Shapley, and A.W. Tucker (Eds.). Pr<strong>in</strong>ceton: Pr<strong>in</strong>ceton University Press,<br />

1964.<br />

<strong>11</strong>-13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!