04.11.2014 Views

Lecture 17: Repeated Games -II 1 Folk Theorems for Infinitely ...

Lecture 17: Repeated Games -II 1 Folk Theorems for Infinitely ...

Lecture 17: Repeated Games -II 1 Folk Theorems for Infinitely ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.254: Game Theory with Engineering Applications April 15, 2008<br />

<strong>Lecture</strong>r: Asu Ozdaglar<br />

<strong>Lecture</strong> <strong>17</strong>: <strong>Repeated</strong> <strong>Games</strong> -<strong>II</strong><br />

In this lecture, we will continue our discussion of repeated games. In particular, we will<br />

cover:<br />

• <strong>Repeated</strong> games with perfect monitoring - Nash and Perfect <strong>Folk</strong> <strong>Theorems</strong><br />

• Imperfect monitoring<br />

– Price trigger strategies<br />

Recall that we use v i to denote the minmax payoff of player i, i.e.,<br />

v i = min max g i (α i , α −i ).<br />

α −i α i<br />

The strategy profile m i −i denotes the minmax strategy of opponents of player i and m i i<br />

denotes the best response of player i to m i −i, i.e., g i (m i i, m i −i) = v i . The set V ∗ ⊂ R I denotes<br />

the set of feasible and strictly individually rational payoffs.<br />

1 <strong>Folk</strong> <strong>Theorems</strong> <strong>for</strong> <strong>Infinitely</strong> <strong>Repeated</strong> <strong>Games</strong><br />

The “folk theorems” <strong>for</strong> repeated games establish that if the players are patient enough, then<br />

any feasible strictly individually rational payoff can be obtained as equilibrium payoffs. We<br />

start with the Nash folk theorem which obtains feasible strictly individually rational payoffs<br />

as Nash equilibria of the repeated game.<br />

Theorem 1 (Nash folk theorem:) For all v = (v 1 , . . . , v I ) ∈ V ∗ , there exists some δ < 1<br />

such that <strong>for</strong> all δ ≥ δ, there is a Nash Equilibrium of the infinitely repeated game G ∞ (δ)<br />

with payoffs v.<br />

<strong>17</strong>-1


Proof: Assume first that there exists a pure action profile a = (a 1 , . . . , a I ) such that g i (a) = v i<br />

(otherwise, we need a public randomization argument, see Fudenberg and Tirole, Theorem<br />

5.1).<br />

Consider the following strategy <strong>for</strong> player i:<br />

I. Play a i in period 0 and continue to play a i so long as either the realized action in the<br />

previous period is a, or the realized action in the previous period differed from a in<br />

two or more components. If only one player deviates, go to <strong>II</strong>.<br />

<strong>II</strong>. If player j deviates, play m j i thereafter.<br />

1<br />

We next check if player i can gain by deviating <strong>for</strong>m this strategy profile. If i plays the<br />

strategy, his payoff is v i . If i deviates from the strategy in some period t, then denoting<br />

v i = max a g i (a), the most that player i could get is given by:<br />

[<br />

]<br />

(1 − δ) v i + δv i + · · · + δ t−1 v i + δ t v i + δ t+1 v i + δ t+2 v i + · · · .<br />

Hence, following the suggested strategy will be optimal if<br />

v i ≥ (1 − δ t )v i + δ t (1 − δ)v i + δ t+1 v i<br />

= v i − δ t [v i + (1 − δ)v i − δv i + (δv i − δv i )].<br />

It can be seen that the expression in the bracket is non negative <strong>for</strong> any δ ≥ δ, where<br />

thus completing the proof.<br />

□<br />

δ = max<br />

i<br />

v i − v i<br />

v i − v i<br />

,<br />

Nash folk theorem states that essentially any payoff can be obtained as a Nash Equilibrium<br />

when players are patient enough.<br />

However, the corresponding strategies involve<br />

unrelenting punishments, which may be very costly <strong>for</strong> the punisher to carry out (i.e., they<br />

represent non-credible threats). Hence, the strategies used are not subgame perfect. The<br />

next example illustrates this fact.<br />

1 Note that the strategy constructed here is a grim trigger strategy with punishment by minmax.<br />

<strong>17</strong>-2


Example:<br />

L (q) R (1 − q)<br />

U 6, 6 0, −100<br />

D 7, 1 0, −100<br />

The unique NE in this game is (D, L). It can also be seen that the minmax payoffs are<br />

given by<br />

v 1 = 0, v 2 = 1,<br />

and the minmax strategy profile of player 2 is to play R.<br />

Nash <strong>Folk</strong> Theorem says that (6,6) is possible as a Nash equilibrium payoff of the repeated<br />

game, but the strategies suggested in the proof require player 2 to play R in every period<br />

following a deviation.<br />

While this will hurt player 1, it will hurt player 2 a lot, it seems<br />

unreasonable to expect her to carry out the threat.<br />

Our next step is to get the payoff (6, 6) in the above example, or more generally, the set<br />

of feasible and strictly individually rational payoffs as subgame perfect equilibria payoffs of<br />

the repeated game.<br />

Perfect <strong>Folk</strong> <strong>Theorems</strong>: The first perfect folk theorem shows that any payoff above the<br />

static Nash payoffs can be sustained as a subgame perfect equilibrium of the repeated game.<br />

This weaker theorem is sometimes referred to as the Nash-threats folk theorem.<br />

Theorem 2 (Friedman) Let α ∗ be a static equilibrium of the stage game with payoffs e.<br />

Then <strong>for</strong> any feasible payoff v with v i > e i <strong>for</strong> all i ∈ I, there exists a subgame perfect<br />

equilibrium of G ∞ (δ) with payoffs v.<br />

The proof of this theorem involves constructing grim trigger strategies with punishment<br />

by the static Nash Equilibrium in phase <strong>II</strong>, thus satisfying sequential rationality in subgames<br />

where some player deviated.<br />

This theorem raises the questions:<br />

• What happens if the static NE payoff vector is strictly greater than minmax payoff<br />

vector (in some components)?<br />

<strong>17</strong>-3


• Does the requirement of subgame perfection restrict the set of equilibrium payoffs?<br />

The perfect folk theorem of Fudenberg and Maskin shows that this is not the case: For<br />

any feasible, individually rational payoff vector, there is a range of discount factors <strong>for</strong> which<br />

that payoff vector can be obtained as a subgame perfect equilibrium.<br />

Theorem 3 (Fudenberg and Maskin) Assume that the dimension of the set V of feasible<br />

payoffs is equal to the number of players I. Then, <strong>for</strong> any v ∈ V with v i > v i <strong>for</strong> all i, there<br />

exists a discount factor δ < 1 such that <strong>for</strong> all δ ≥ δ, there is a subgame perfect equilibrium<br />

of G ∞ (δ) with payoffs v.<br />

See Fudenberg and Tirole, Chapter 5, Theorem 5.4.<br />

Remarks:<br />

1. The strategies constructed to prove the above theorem involve punishments and rewards.<br />

2. The assumption on the dimension of V ensures that each player i can be singled out<br />

<strong>for</strong> punishment in the event of a deviation.This assumption could be weakened, but<br />

cannot be completely dispensed with (see Homework 4).<br />

3. <strong>Folk</strong> theorem <strong>for</strong> finitely repeated games:<br />

• If the stage game has a unique Nash equilibrium, then by backward induction, the<br />

unique perfect equilibrium is to play the static Nash equilibrium in every period.<br />

• With multiple stage Nash equilibria, one can use rewards & punishments to construct<br />

other subgame perfect equilibria (see Homework 4).<br />

4. Equilibrium concepts do very little to pin down the play by patient players!<br />

<strong>17</strong>-4


2 <strong>Repeated</strong> <strong>Games</strong> with Imperfect Monitoring<br />

In the previous section, we assumed that in the repeated play of the game, each player<br />

observes the actions of others at the end of each stage. Here, we consider the problem of<br />

repeated games where player’s actions may not be directly observable. We assume that<br />

players observe only an imperfect signal of the stage game actions.<br />

Motivational Example Cournot competition with noisy demand. (Green and Porter 84)<br />

Firms set output levels q1, t ..., qI t privately at time t. The level of demand is stochastic. Each<br />

firm’s payoff depends on his own output and on the publicly observed market price. Firms<br />

do not observe each other’s output levels. The market price depends on uncertain demand<br />

and the total output, i.e., a low market price could be due to high produced output or low<br />

demand.<br />

We will focus on games with public in<strong>for</strong>mation: At the end of each period, all players<br />

observe a public outcome, which is correlated with the vector of stage game actions. Each<br />

player’s realized payoff depends only on his own action and the public outcome.<br />

2.1 Model<br />

We state the components of the model <strong>for</strong> a stage game with finite actions and a public<br />

outcome that can take finitely many values. The model extends to the general case with<br />

straight<strong>for</strong>ward modifications.<br />

• Let A 1 , . . . , A I be finite action sets.<br />

• Let y denote the publicly observed outcome, which lie in a (finite) set Y . Each action<br />

profile a induces a probability distribution over y ∈ Y . Let π(y, a) denote the<br />

probability distribution of y under action profile a.<br />

• Player i realized payoff is given by r i (a i , y). Note that this payoff depends on the<br />

actions of the other players through their effect on the distribution of y.<br />

<strong>17</strong>-5


• Player i’s expected stage payoff is given by<br />

g i (a) = ∑ y∈Y<br />

π(y, a)r i (a i , y).<br />

• A mixed strategy is α i ∈ ∆(A i ). Payoffs are defined in the obvious way.<br />

• The public in<strong>for</strong>mation at the start of period t is h t = (y 0 , . . . , y t−1 ).<br />

• We consider public strategies <strong>for</strong> player i, which is a sequence of maps s t i : h t → A i .<br />

• Player i’s average discounted payoff when the action sequence {a t } is played is given<br />

by<br />

(1 − δ)<br />

∞∑<br />

δ t g i (a t )<br />

2.2 Example: Noisy Prisoner’s Dilemma<br />

Public signal: p<br />

Actions: (a 1 ,a 2 ), where a i ∈ (C, D)<br />

Payoffs:<br />

Probability distribution <strong>for</strong> public signal p:<br />

a 1 = a 2 = C → p = X,<br />

a 1 ≠ a 2 → p = X − 2,<br />

a 1 = a 2 = D → p = X − 4,<br />

t=0<br />

r 1 (C, p) = 1 + p, r 1 (D, p) = 4 + p,<br />

r 2 (C, p) = 1 + p, r 2 (D, p) = 4 + p.<br />

where X is a continuous random variable with cumulative distribution function F (x) and<br />

E[X] = 0. The payoff matrix <strong>for</strong> this game takes the following <strong>for</strong>m:<br />

C<br />

D<br />

C (1 + X, 1 + X) (−1 + X, 2 + X)<br />

D (2 + X, −1 + X) (X, X)<br />

<strong>17</strong>-6


2.2.1 Trigger-price Strategy<br />

Consider the following trigger type strategy <strong>for</strong> the noisy prisoner’s dilemma:<br />

• (I) - Play (C, C) until p ≤ p ∗ , then go to Phase <strong>II</strong>.<br />

• (<strong>II</strong>) - Play (D, D) <strong>for</strong> T periods, then go back to Phase I.<br />

Notice the strategy is stationary and symmetric. Also notice the punishment phase uses a<br />

static NE. We next show that we can choose p ∗ and T such that the proposed strategy profile<br />

is an SPE.<br />

Find the continuation payoffs:<br />

In Phase I, if players do not deviate, their expected payoff will be<br />

From this equation, we obtain<br />

[<br />

]<br />

v = (1 − δ)(1 + 0) + δ F (p ∗ )δ T v + (1 − F (p ∗ ))v .<br />

v =<br />

If the player deviates, his payoff will be<br />

1 − δ<br />

1 − δ T +1 F (p ∗ ) − δ(1 − F (p ∗ )) . (1)<br />

v d = (1 − δ){(2 + 0) + δ[F (p ∗ + 2)δ T v + (1 − F (p ∗ + 2))v]} (2)<br />

Note that deviating provides immediate payoff, but increases the probability of entering<br />

Phase <strong>II</strong>. In order <strong>for</strong> the strategy to be an SPE, the expected difference in payoff from the<br />

deviation must not be positive.<br />

Incentive Compatibility Constraint: v ≥ v d<br />

v ≥ (1 − δ){2 + F (2 + p ∗ )δ T +1 v + [1 − F (2 + p ∗ )]δv} (3)<br />

Substituting v, we have<br />

1<br />

1 − δ T +1 F (p ∗ ) − δ(1 − F (p ∗ )) ≥ 2<br />

1 − F (2 + p ∗ )δ T +1 − δ[1 − F (2 + p ∗ )]<br />

(4)<br />

<strong>17</strong>-7


Any T and p ∗ that satisfy this constraint would construct an SPE. The best possible triggerprice<br />

equilibrium strategy could be found if we could maximize v subject to the incentive<br />

compatibility constraint.<br />

Question: Are there other equilibria with higher payoffs ?<br />

There’s an explicit characterization of the set of equilibrium payoffs using a multi-agent<br />

dynamic programming-type operator, by Abreu, Pearce and Stachetti (90). The set of<br />

equilibrium payoffs is a fixed point of this operator and the value iteration converges to the<br />

set of equilibrium payoffs.<br />

For more on this topic, see Fudenberg and Tirole section 5.5 and Abreu, Pearce and<br />

Stachetti (90).<br />

References<br />

[1] Abreu D., Pearce D., and Stachetti E., “Toward a theory of discounted repeated games<br />

with imperfect monitoring,” Econometrica, vol. 58, pp., 1041-1064, 1990.<br />

[2] Green E. and Porter R., “Noncooperative collusion under imperfect price in<strong>for</strong>mation,”<br />

Econometrica, vol. 52, pp., 87-100, 1984.<br />

<strong>17</strong>-8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!