Lecture 17: Repeated Games -II 1 Folk Theorems for Infinitely ...
Lecture 17: Repeated Games -II 1 Folk Theorems for Infinitely ...
Lecture 17: Repeated Games -II 1 Folk Theorems for Infinitely ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6.254: Game Theory with Engineering Applications April 15, 2008<br />
<strong>Lecture</strong>r: Asu Ozdaglar<br />
<strong>Lecture</strong> <strong>17</strong>: <strong>Repeated</strong> <strong>Games</strong> -<strong>II</strong><br />
In this lecture, we will continue our discussion of repeated games. In particular, we will<br />
cover:<br />
• <strong>Repeated</strong> games with perfect monitoring - Nash and Perfect <strong>Folk</strong> <strong>Theorems</strong><br />
• Imperfect monitoring<br />
– Price trigger strategies<br />
Recall that we use v i to denote the minmax payoff of player i, i.e.,<br />
v i = min max g i (α i , α −i ).<br />
α −i α i<br />
The strategy profile m i −i denotes the minmax strategy of opponents of player i and m i i<br />
denotes the best response of player i to m i −i, i.e., g i (m i i, m i −i) = v i . The set V ∗ ⊂ R I denotes<br />
the set of feasible and strictly individually rational payoffs.<br />
1 <strong>Folk</strong> <strong>Theorems</strong> <strong>for</strong> <strong>Infinitely</strong> <strong>Repeated</strong> <strong>Games</strong><br />
The “folk theorems” <strong>for</strong> repeated games establish that if the players are patient enough, then<br />
any feasible strictly individually rational payoff can be obtained as equilibrium payoffs. We<br />
start with the Nash folk theorem which obtains feasible strictly individually rational payoffs<br />
as Nash equilibria of the repeated game.<br />
Theorem 1 (Nash folk theorem:) For all v = (v 1 , . . . , v I ) ∈ V ∗ , there exists some δ < 1<br />
such that <strong>for</strong> all δ ≥ δ, there is a Nash Equilibrium of the infinitely repeated game G ∞ (δ)<br />
with payoffs v.<br />
<strong>17</strong>-1
Proof: Assume first that there exists a pure action profile a = (a 1 , . . . , a I ) such that g i (a) = v i<br />
(otherwise, we need a public randomization argument, see Fudenberg and Tirole, Theorem<br />
5.1).<br />
Consider the following strategy <strong>for</strong> player i:<br />
I. Play a i in period 0 and continue to play a i so long as either the realized action in the<br />
previous period is a, or the realized action in the previous period differed from a in<br />
two or more components. If only one player deviates, go to <strong>II</strong>.<br />
<strong>II</strong>. If player j deviates, play m j i thereafter.<br />
1<br />
We next check if player i can gain by deviating <strong>for</strong>m this strategy profile. If i plays the<br />
strategy, his payoff is v i . If i deviates from the strategy in some period t, then denoting<br />
v i = max a g i (a), the most that player i could get is given by:<br />
[<br />
]<br />
(1 − δ) v i + δv i + · · · + δ t−1 v i + δ t v i + δ t+1 v i + δ t+2 v i + · · · .<br />
Hence, following the suggested strategy will be optimal if<br />
v i ≥ (1 − δ t )v i + δ t (1 − δ)v i + δ t+1 v i<br />
= v i − δ t [v i + (1 − δ)v i − δv i + (δv i − δv i )].<br />
It can be seen that the expression in the bracket is non negative <strong>for</strong> any δ ≥ δ, where<br />
thus completing the proof.<br />
□<br />
δ = max<br />
i<br />
v i − v i<br />
v i − v i<br />
,<br />
Nash folk theorem states that essentially any payoff can be obtained as a Nash Equilibrium<br />
when players are patient enough.<br />
However, the corresponding strategies involve<br />
unrelenting punishments, which may be very costly <strong>for</strong> the punisher to carry out (i.e., they<br />
represent non-credible threats). Hence, the strategies used are not subgame perfect. The<br />
next example illustrates this fact.<br />
1 Note that the strategy constructed here is a grim trigger strategy with punishment by minmax.<br />
<strong>17</strong>-2
Example:<br />
L (q) R (1 − q)<br />
U 6, 6 0, −100<br />
D 7, 1 0, −100<br />
The unique NE in this game is (D, L). It can also be seen that the minmax payoffs are<br />
given by<br />
v 1 = 0, v 2 = 1,<br />
and the minmax strategy profile of player 2 is to play R.<br />
Nash <strong>Folk</strong> Theorem says that (6,6) is possible as a Nash equilibrium payoff of the repeated<br />
game, but the strategies suggested in the proof require player 2 to play R in every period<br />
following a deviation.<br />
While this will hurt player 1, it will hurt player 2 a lot, it seems<br />
unreasonable to expect her to carry out the threat.<br />
Our next step is to get the payoff (6, 6) in the above example, or more generally, the set<br />
of feasible and strictly individually rational payoffs as subgame perfect equilibria payoffs of<br />
the repeated game.<br />
Perfect <strong>Folk</strong> <strong>Theorems</strong>: The first perfect folk theorem shows that any payoff above the<br />
static Nash payoffs can be sustained as a subgame perfect equilibrium of the repeated game.<br />
This weaker theorem is sometimes referred to as the Nash-threats folk theorem.<br />
Theorem 2 (Friedman) Let α ∗ be a static equilibrium of the stage game with payoffs e.<br />
Then <strong>for</strong> any feasible payoff v with v i > e i <strong>for</strong> all i ∈ I, there exists a subgame perfect<br />
equilibrium of G ∞ (δ) with payoffs v.<br />
The proof of this theorem involves constructing grim trigger strategies with punishment<br />
by the static Nash Equilibrium in phase <strong>II</strong>, thus satisfying sequential rationality in subgames<br />
where some player deviated.<br />
This theorem raises the questions:<br />
• What happens if the static NE payoff vector is strictly greater than minmax payoff<br />
vector (in some components)?<br />
<strong>17</strong>-3
• Does the requirement of subgame perfection restrict the set of equilibrium payoffs?<br />
The perfect folk theorem of Fudenberg and Maskin shows that this is not the case: For<br />
any feasible, individually rational payoff vector, there is a range of discount factors <strong>for</strong> which<br />
that payoff vector can be obtained as a subgame perfect equilibrium.<br />
Theorem 3 (Fudenberg and Maskin) Assume that the dimension of the set V of feasible<br />
payoffs is equal to the number of players I. Then, <strong>for</strong> any v ∈ V with v i > v i <strong>for</strong> all i, there<br />
exists a discount factor δ < 1 such that <strong>for</strong> all δ ≥ δ, there is a subgame perfect equilibrium<br />
of G ∞ (δ) with payoffs v.<br />
See Fudenberg and Tirole, Chapter 5, Theorem 5.4.<br />
Remarks:<br />
1. The strategies constructed to prove the above theorem involve punishments and rewards.<br />
2. The assumption on the dimension of V ensures that each player i can be singled out<br />
<strong>for</strong> punishment in the event of a deviation.This assumption could be weakened, but<br />
cannot be completely dispensed with (see Homework 4).<br />
3. <strong>Folk</strong> theorem <strong>for</strong> finitely repeated games:<br />
• If the stage game has a unique Nash equilibrium, then by backward induction, the<br />
unique perfect equilibrium is to play the static Nash equilibrium in every period.<br />
• With multiple stage Nash equilibria, one can use rewards & punishments to construct<br />
other subgame perfect equilibria (see Homework 4).<br />
4. Equilibrium concepts do very little to pin down the play by patient players!<br />
<strong>17</strong>-4
2 <strong>Repeated</strong> <strong>Games</strong> with Imperfect Monitoring<br />
In the previous section, we assumed that in the repeated play of the game, each player<br />
observes the actions of others at the end of each stage. Here, we consider the problem of<br />
repeated games where player’s actions may not be directly observable. We assume that<br />
players observe only an imperfect signal of the stage game actions.<br />
Motivational Example Cournot competition with noisy demand. (Green and Porter 84)<br />
Firms set output levels q1, t ..., qI t privately at time t. The level of demand is stochastic. Each<br />
firm’s payoff depends on his own output and on the publicly observed market price. Firms<br />
do not observe each other’s output levels. The market price depends on uncertain demand<br />
and the total output, i.e., a low market price could be due to high produced output or low<br />
demand.<br />
We will focus on games with public in<strong>for</strong>mation: At the end of each period, all players<br />
observe a public outcome, which is correlated with the vector of stage game actions. Each<br />
player’s realized payoff depends only on his own action and the public outcome.<br />
2.1 Model<br />
We state the components of the model <strong>for</strong> a stage game with finite actions and a public<br />
outcome that can take finitely many values. The model extends to the general case with<br />
straight<strong>for</strong>ward modifications.<br />
• Let A 1 , . . . , A I be finite action sets.<br />
• Let y denote the publicly observed outcome, which lie in a (finite) set Y . Each action<br />
profile a induces a probability distribution over y ∈ Y . Let π(y, a) denote the<br />
probability distribution of y under action profile a.<br />
• Player i realized payoff is given by r i (a i , y). Note that this payoff depends on the<br />
actions of the other players through their effect on the distribution of y.<br />
<strong>17</strong>-5
• Player i’s expected stage payoff is given by<br />
g i (a) = ∑ y∈Y<br />
π(y, a)r i (a i , y).<br />
• A mixed strategy is α i ∈ ∆(A i ). Payoffs are defined in the obvious way.<br />
• The public in<strong>for</strong>mation at the start of period t is h t = (y 0 , . . . , y t−1 ).<br />
• We consider public strategies <strong>for</strong> player i, which is a sequence of maps s t i : h t → A i .<br />
• Player i’s average discounted payoff when the action sequence {a t } is played is given<br />
by<br />
(1 − δ)<br />
∞∑<br />
δ t g i (a t )<br />
2.2 Example: Noisy Prisoner’s Dilemma<br />
Public signal: p<br />
Actions: (a 1 ,a 2 ), where a i ∈ (C, D)<br />
Payoffs:<br />
Probability distribution <strong>for</strong> public signal p:<br />
a 1 = a 2 = C → p = X,<br />
a 1 ≠ a 2 → p = X − 2,<br />
a 1 = a 2 = D → p = X − 4,<br />
t=0<br />
r 1 (C, p) = 1 + p, r 1 (D, p) = 4 + p,<br />
r 2 (C, p) = 1 + p, r 2 (D, p) = 4 + p.<br />
where X is a continuous random variable with cumulative distribution function F (x) and<br />
E[X] = 0. The payoff matrix <strong>for</strong> this game takes the following <strong>for</strong>m:<br />
C<br />
D<br />
C (1 + X, 1 + X) (−1 + X, 2 + X)<br />
D (2 + X, −1 + X) (X, X)<br />
<strong>17</strong>-6
2.2.1 Trigger-price Strategy<br />
Consider the following trigger type strategy <strong>for</strong> the noisy prisoner’s dilemma:<br />
• (I) - Play (C, C) until p ≤ p ∗ , then go to Phase <strong>II</strong>.<br />
• (<strong>II</strong>) - Play (D, D) <strong>for</strong> T periods, then go back to Phase I.<br />
Notice the strategy is stationary and symmetric. Also notice the punishment phase uses a<br />
static NE. We next show that we can choose p ∗ and T such that the proposed strategy profile<br />
is an SPE.<br />
Find the continuation payoffs:<br />
In Phase I, if players do not deviate, their expected payoff will be<br />
From this equation, we obtain<br />
[<br />
]<br />
v = (1 − δ)(1 + 0) + δ F (p ∗ )δ T v + (1 − F (p ∗ ))v .<br />
v =<br />
If the player deviates, his payoff will be<br />
1 − δ<br />
1 − δ T +1 F (p ∗ ) − δ(1 − F (p ∗ )) . (1)<br />
v d = (1 − δ){(2 + 0) + δ[F (p ∗ + 2)δ T v + (1 − F (p ∗ + 2))v]} (2)<br />
Note that deviating provides immediate payoff, but increases the probability of entering<br />
Phase <strong>II</strong>. In order <strong>for</strong> the strategy to be an SPE, the expected difference in payoff from the<br />
deviation must not be positive.<br />
Incentive Compatibility Constraint: v ≥ v d<br />
v ≥ (1 − δ){2 + F (2 + p ∗ )δ T +1 v + [1 − F (2 + p ∗ )]δv} (3)<br />
Substituting v, we have<br />
1<br />
1 − δ T +1 F (p ∗ ) − δ(1 − F (p ∗ )) ≥ 2<br />
1 − F (2 + p ∗ )δ T +1 − δ[1 − F (2 + p ∗ )]<br />
(4)<br />
<strong>17</strong>-7
Any T and p ∗ that satisfy this constraint would construct an SPE. The best possible triggerprice<br />
equilibrium strategy could be found if we could maximize v subject to the incentive<br />
compatibility constraint.<br />
Question: Are there other equilibria with higher payoffs ?<br />
There’s an explicit characterization of the set of equilibrium payoffs using a multi-agent<br />
dynamic programming-type operator, by Abreu, Pearce and Stachetti (90). The set of<br />
equilibrium payoffs is a fixed point of this operator and the value iteration converges to the<br />
set of equilibrium payoffs.<br />
For more on this topic, see Fudenberg and Tirole section 5.5 and Abreu, Pearce and<br />
Stachetti (90).<br />
References<br />
[1] Abreu D., Pearce D., and Stachetti E., “Toward a theory of discounted repeated games<br />
with imperfect monitoring,” Econometrica, vol. 58, pp., 1041-1064, 1990.<br />
[2] Green E. and Porter R., “Noncooperative collusion under imperfect price in<strong>for</strong>mation,”<br />
Econometrica, vol. 52, pp., 87-100, 1984.<br />
<strong>17</strong>-8