Equilibrium paths of Nash equilibria of finitely often repeated prisoner's dilemma

160 Views Asked by At

The specific payoff matrix is not so relevant, but consider a prisoner's dilemma with payoff matrix $$ \begin{matrix} & \text{C} & \text{D} \\ \text{C} & 0,0 & -2,1 \\ \text{D} & 1,-2 & -1,-1 \\ \end{matrix} $$ and the following claim:

Claim: In the finitely often repeated prisoner's dilemma (without discounting), only $(D,D)$ is played on the equilibrium path of any Nash Equilibrium (NE).

Note that this claim is concerned with any NE, not necessarily SPNE!

Attempted proof: By induction from the last period.

Consider any Nash equilibrium. Suppose there is a player, who on the equilibrium path of that equilibrium plays $C$ in the last period. Then that player would have a profitable deviation in playing $D$ instead, which is a contradiction. Therefore $(D,D)$ must be played on the equilibrium path.

Now consider a history $h$ on the equilibrium path such that for all longer histories $h'$ on the equilibrium path, both players prescribe $D$ as their action at $h'$. Suppose one player plays $C$ after $h$. Then, since only $(D,D)$ is played on the equilibrium path after $h$, that player has a profitable deviation in playing $D$ instead.

Question: Is this proof correct? I'm mostly unsure about the last sentence. How do I know that I stay on the equilibrium path while changing my action?

Thank you!

2

There are 2 best solutions below

0
On BEST ANSWER

I found an answer to my question in Proposition 155.1 of A Course in Game Theory by Osborne and Rubinstein. Their result is more general and says that

If the payoff profile in every Nash equilibrium of the strategic game $G$ is the profile $(v_i)$ of minmax payoffs in $G$ then for any value of $T$ the outcome $(a^1 ,\ldots,a^T)$ of every Nash equilibrium of the $T$-period repeated game of $G$ has the property that $a^t$ is a Nash equilibrium of $G$ for all $t = 1,\ldots,T$.

Here's the proof for my version of the question.

Suppose the game $G$ with payoff matrix given in my question is $T$ times repeated. Let $s$ be a NE of $G^T$ and let $(a^1,\ldots,a^T)$ be its outcome path.

Let $t\in\{1,\ldots,T\}$ be maximal such that $a^t \neq (D,D)$. Suppose that player $i$ plays $U$ after $(a^1,\ldots, a^{t-1})$. Consider a modification $\tilde s_i$ of $s_i$ in which $i$ plays $D$ after $(a^1,\ldots, a^{t-1})$ and after any longer history he plays something which guarantees him at least his min-max value for the period following that history (namely $-1$). Then $i$'s payoff from $\tilde s_i$ after $(a^1,\ldots, a^{t-1})$ is bounded from below by $$ 1 + (-1)(T-t), $$ which is greater than $(-1 + (-1)(T-t))$, the payoff from $s_i$ after $(a^1,\ldots, a^{t-1})$. Since the payoffs from $s_i$ and $\tilde s_i$ in all periods before $(a^1,\ldots, a^{t-1})$ are equal, this means that $\tilde s_i$ is a profitable deviation.

1
On

Start in the last period, at time $T$. In the last period, $D$ is a dominant strategy regardless of the history of the game. So the subgame starting at $T$ has a dominant strategy equilibrium: $(D,D)$.

Then move to stage $T−1$. By backward induction, we know that at $T$, no matter what, the play will be $(D,D)$. Then given this, the subgame starting at $T−1$ (again regardless of history) also has a dominant strategy equilibrium.

With this argument, we have that there exists a unique subgame perfect Nash equilibrium $(D,D)$ at each date.