I am just talking with a chess friend on Lichess about the probability to have duplicate games in chess, and some questions arose for me.
Taken that a chess game always lasts $P$ (ply) half moves and that in every position there are always exact $N$ choices. There is a group of $N_c$ computers which play against each other. Each computer plays $N_g$ games, randomly chosing between the choices. After all those games have been played:
What is the formula to calculate the probability that
- any of these games is a duplicate of any other of these games?
- one specific of these games is a duplicate of any other of these games?
- any of the games played by one specific computer is a duplicate of any of all other games played by all computers?
(Notice, with identical i mean identical moves. Move transpositions do not count)
And further, for the question: "What is the probability that any of these games has a specific move order?" the chess friend suggested the following formula, related to the Birthday Problem. Is this the correct formula?
(I am using Python syntax, ** means ^, alias power. Here is the same as math formula using bigger numbers).
choices = 2 # choices per move
ply = 10 # amount of half moves (every game has the same)
P = choices ** ply # all possible move orders
print(f'Amount of possible move orders: {P}')
comps = 10 # amount of participating chess computers
played = 10 # amount of games every computer plays
S = int(comps / 2 * played) # all played games
print(f'Amount of games played: {S}')
from math import factorial as fac
print(f'Probability of two identical games:',
1 - fac(P) / fac(P - S) / P ** S
)
For an idealized game where each player has $N$ choices at each of $P$ opportunities, the number of unique sequences is $N^P$. Now let's look at the cases
If you're looking at a specific game and seeing if any other game matches it, there are $N_gN_c-1$ other games that could match it and a probability of $N^{-P}$ of a match, so \begin{align} P(\mathrm{match}) &= 1 - (1-N^{-P})^{N_gN_c-1} \\&\approx 1 - \exp\left(-\frac{N_gN_c-1}{N^P}\right). \end{align}
If you're looking at a specific computer's games, compared to other computer's games, we can look at the $N_g^2(N_c-1)$ pairs of games where the first game comes from the specific computer and the second comes from one of the other computers. These pairs are all independent from each other, and they each have a $N^{-P}$ chance of a match, so the probability is \begin{align} P(\mathrm{match}) &= 1- (1-N^{-P})^{N_g^2(N_c-1)} \\&\approx 1 - \exp\left[-\frac{N_g^2(N_c-1)}{N^P}\right]. \end{align}
If require them to not only match each other, but also match a particular move order, the results are different:
Note that in each case, the $N_g^2N_c^2 \ll N^P$ limit is has essentially the same probability for the any match vs specific match case, with $N^P$ replaced with $N^{2P}$. This is because, in that limit, the pairs are essentially independent, and the problem just becomes looking at the number of relevant pairs and how likely they are to match. For any match, it's $N^P$, while for specific match, it's $N^{2P}$.