Infinite Prisoners dilemma

333 Views Asked by At

Please help me understand the idea of solving this problem.

There are infinitely repeated game $G( \infty, \sigma)$.

$$\begin{array}{|c|c|c|} \hline &c&d \\ \hline c&(0,0)&(7,-3) \\ \hline n&(-3,7)&(4,4) \\ \hline \end{array}$$

Strategies with punishments in the form of a "forgiving trigger" with a period of length $T = 4$. I need to find equilibrium strategies and the corresponding value of $\sigma$ that provides a "good" trajectory - the constant repetition of $(n, n)$ as a trajectory in NE (SPNE).


Are my arguments true?

Initially, players do (n, n) until one of them changes the strategy. Then the other player will also change it and will act (c, c) in 3 steps.

By repeating (n, n) the player receives:

$$4+4δ+4δ^2+⋯=4/(1-δ).$$

In case of deviation:

$$7+4δ^4+4δ^5+⋯=7+\frac{4δ^4}{1-δ}.$$

Therefore, we have $$\frac{4}{1-δ} \geq7+\frac{4δ^4}{1-δ}.$$

1

There are 1 best solutions below

0
On BEST ANSWER

You should start by specifying strategies for any history. The strategies you are implicitly using are called grim trigger, and work like this: if neither I nor my opponent have previously played anything but $n$ or $d$ (including the first round when the history is empty), cooperate and play $n$ or $d$; if anyone has previously deviated, punish everyone by adopting $c$. This maximizes the pain of the punishment, and gives the best incentives to cooperate.

After any history in which someone previously deviated from $n$ and $d$ and triggered the punishment, the grim trigger stategies are a subgame perfect Nash eqm, because my opponent is simply threatening to play a stage Nash strategy, and I have no profitable deviation from replying with my stage Nash strategy. So there are no profitable deviations for these histories.

After any history in which everyone has previously cooperated, you get the right discounted expected payoff, $4/(1-\delta)$. But if I deviate, my discounted expected payoff is $$ 7 + \delta 0 +\delta^2 0 + ... = 7 $$ because this triggers the $(c,c)$ profile forever. Your work rewards deviators by giving them 7 and then returning to cooperation, instead of punishing them with return to Nash play. Everyone would always deviate from the cooperative mode.

Then you have an SPNE if $$ \dfrac{4}{1-\delta} \ge 7 $$ or $$ \delta \ge \dfrac{3}{7}. $$ So if the players are sufficiently patient, the grim trigger strategies are a subgame perfect Nash eqm of the infinitely repeated game.

Sorry, I now understand what you mean by "forgiving trigger" by reading your post after doing the work. The idea here is that if the punishment period $T$ is long enough, no one will deviate. So if someone triggers the punishment, we do the $(c,c)$ profile for $T$ periods, then go back to cooperating. Then the payoff is $$ 7 + \delta 0 + \delta^2 0 + ... + \delta^T0 + \delta^{T+1}4 + ... = 7 + \delta^T \dfrac{4}{1-\delta}. $$ If you take $T\rightarrow \infty$, you get the grim trigger strategies. Then these strategies are an SPNE if $$ \dfrac{4}{1-\delta} \ge 7 + \delta^{T+1}\dfrac{4}{1-\delta}, $$ so that no one finds deviation and return to the original profile profitable (this is called the ''one-shot deviation principle''). Because of the $\delta^{T+1}$ term, you can't really solve for a closed form solution.