I am considering the iterated prisoners dilemma. In particular, suppose after each game, the game is played again with probability $p$. Additionally, assume it is known that my opponent will confess on every turn once I have confessed once, but will stay mute until so.
Just to reiterate the rules, if A confesses and B does as well, both get $5$ years in jail. If one is mute but the other confesses, the one who was mute gets $10$ years in jail. If both remain mute, both get 1 year in jail.
So far, I've calculated that following: The expected payoff is,
$$5 - \frac5{(1-p)}$$
if one always confesses. Additionally, if I always stayed mute it would be $$\frac{−1}{(1 − p)}$$ Thus, the latter strategy is better if $p > 1/5$.
However, there are some pieces here I do not understand. For example, let $Y_r = 1$ if there are at least $r$ rounds of the game (and $0$ otherwise). I am expected to find the value of $E[Y_r]$ as a function of $p$ and $r$.
From intuition, it is clear that $E[Y_1] = 1$ and $E[Y_2] = p$. Since the stopping condition is geometric with $\frac{1}{1-p}$, I am unsure on how to implement this into getting the expected number of rounds played.
Additionally, I am trying to calculate the expected payoff I would receive if my strategy was to stay mute on the first game and then confess on any subsequent game. I was thinking to obtain this payoff as a linear combination of $Y_1 , Y_2 , Y_3 , \ldots ,$ and use linearity of expectation to obtain my equation.
How can I properly calculate $Y_r$ and the expected payout should I remain mute on the first game and confess on any subsequent game?
Denote by $u^t_i$ the payoff obtained by Player~$i$ in round~$t$, and assume zero payoffs once the game stops. Since it is not mentioned, I ignore the possibility of a discount factor and let $\delta=1$.
The (random) sum of payoffs to $i$ is $$\sum_{t=0}^{+\infty} u^t_i Y_t$$ where $Y_t$ has a bernoullian distribution with $P(Y_t = 1) = p^t$ because the probability of a round is $p$ and thus the probability of at least $t$ rounds is $p^t$.
Therefore, the expected payoffs to $i$ is $$\sum_{t=0}^{+\infty} u^t_i E(Y_t) = \sum_{t=0}^{+\infty} u^t_i p^t$$
Now you can apply this formula to specific cases. F.i., if you stay mute in the first period and then always confess, you get $$-1 + \sum_{t=1}^{+\infty} (-5) p^t = -1 + \frac{5p}{1-p}$$ because $$\sum_{t=1}^{+\infty} p^t = \sum_{t=0}^{+\infty} p^t - 1 = \frac{1}{1-p} - 1 = \frac{p}{1-p}$$
Likewise, you can rationalise the other payoffs presented in your question.