Various strategies exist in the iterated prisoner's dilemma. My question is what the percentage of cooperation is between two tit for two tats players (TF2T) when there is noise in the environment. Noise is defined as a player wanting to either cooperate or defect, but the result getting flipped. TF2T is a strategy where the player cooperates unless the other player defected the two previous turns. It is well-known for being more noise-tolerant than the original tit for tat (TFT). I am looking for a function that takes the noise as input and outputs the cooperation percentage. If you are interested I will share what I have tried so far:
I first tried solving a similar, but an easier problem. The probability of two TFT players cooperating. I will define P(Tc), P(Td) and N as the probability that the TFT player will cooperate and defect respectively and the noise in the environment. Also P(CCrit), P(DCrit) is defined as the probability that the cooperate criterium and the defect criterium are satisfied. The following should be true: P(Tc) + P(Td) = 1, P(Tc) = P(CCrit) * (1-N) + P(DCrit) * N and P(Td) = P(DCrit) * (1-N) + P(CCrit) * N. Also I thought that P(CCrit) = P(Tc) and P(DCrit) = P(Td). I did not know exactly how to rewrite this nicely, but I can run simulations with a certain N to obtain P(Tc). Then I can fill in the equation and check if it is close enough. In this case, it was. With 0 < N < 1, P(Tc) = 0.5.
Then I tried solving the probability of a TF2T player cooperating versus a player that always cooperates. Here and also for the actual TF2T vs TF2T the P(Tc) and P(Td) formulas still apply, only with different criteria. I define P(Cc) = 1-N and P(Cd) = N as the probability that the cooperator cooperates or defects respectively. Here I thought the P(CCrit) = P(Cc)^2 + P(Cc) * P(Cd) * 2, so basically if the cooperator did not defect twice in the last two turns. Also P(DCrit) = P(Cd)^2. Again filling in the numbers everything checked out.
Then TF2T vs TF2T. Here the formulas for P(Tc) and P(Td) are still the same. Here I thought P(CCrit) = P(Tc)^2 + P(Tc) * P(Td) * 2 and P(DCrit) = P(Td)^2. But now when I run the simulations P(Tc) is slightly higher than I would expect when running the simulations. The difference is 0.0015, but it is significant. I made a mistake somewhere in the maths, but I don't know where. I am pretty sure my calculator and simulation software are precise enough. The other times they were spot on multiple times, and I run the simulation for 30 million repeats. Can anyone spot my mistake?
I'll suggest a solution for the TFT vs TFT as it is simpler to write, and then explain how to expand it further. The general approach is Markov chains.
So let $p$ be the probability of doing what you actually want to do (so with probability $1-p$ you do the wrong action). Each player has two possibilities: either he played $D$ in the previous step ($0$) or didn't ($1$). Thus, the state space has $4$ possible states: $(0,0),(0,1),(1,0)$ and $(1,1)$.
Suppose the game is in $(0,0)$. According to TFT, now they should both cooperate. The probability that they will is $p^2$. With probability $p(1-p)$ only player 1 plays $D$ by accident and with the same probability only Player 2 plays $D$ by accident. With probability $(1-p)^2$ both play $D$ and the new state will be $(1,1)$.
Therefore, the distribution over the next state, $x_1$ can be written as $A x_0$ where $x_0$ is the distribution of the states today (in the start of the game it is $(1,0,0,0)^t$ ) and $$A=\begin{pmatrix}p^2 & p(1-p) & p(1-p) & (1-p)^2 \\ p(1-p) & (1-p)^2 & p^2 & p(1-p) \\ p(1-p) & p^2 & (1-p)^2 & p(1-p) \\ (1-p)^2 & p(1-p) & p(1-p)& p^2 \end{pmatrix}$$
For $N$ large enough, the longterm distribution of the states will be $\lim\limits_{n\to \infty} A^n x_0$ which is the (normalized to sum$=1$) eigenvector of the eigen value $1$. You can compute it numerically for specific $p$s or, probability in this example, get an expression that depends on $p$. Denote this vector by $v$.
Now you can answer any question. For example, the percentage of turns after which they both cooperate is the first element of $v$ (as this state appears only after both cooperated).
For a general case and TF2T, you will have $16$ states ($4$ for each player. Each state corresponds to the two previous turns) but the same mechanism is the same.