Probability of duplicated games in chess

781 Views Asked by At

I am just talking with a chess friend on Lichess about the probability to have duplicate games in chess, and some questions arose for me.

Taken that a chess game always lasts $P$ (ply) half moves and that in every position there are always exact $N$ choices. There is a group of $N_c$ computers which play against each other. Each computer plays $N_g$ games, randomly chosing between the choices. After all those games have been played:

What is the formula to calculate the probability that

  • any of these games is a duplicate of any other of these games?
  • one specific of these games is a duplicate of any other of these games?
  • any of the games played by one specific computer is a duplicate of any of all other games played by all computers?

(Notice, with identical i mean identical moves. Move transpositions do not count)

And further, for the question: "What is the probability that any of these games has a specific move order?" the chess friend suggested the following formula, related to the Birthday Problem. Is this the correct formula?

(I am using Python syntax, ** means ^, alias power. Here is the same as math formula using bigger numbers).

choices = 2                 # choices per move
ply = 10                    # amount of half moves (every game has the same)
P = choices ** ply          # all possible move orders
print(f'Amount of possible move orders: {P}')

comps = 10                  # amount of participating chess computers
played = 10                 # amount of games every computer plays
S = int(comps / 2 * played) # all played games
print(f'Amount of games played: {S}')

from math import factorial as fac

print(f'Probability of two identical games:',
  1 - fac(P) / fac(P - S) / P ** S
)
2

There are 2 best solutions below

2
On BEST ANSWER

For an idealized game where each player has $N$ choices at each of $P$ opportunities, the number of unique sequences is $N^P$. Now let's look at the cases

  • For any two games matching, this is the birthday problem. There are $N_gN_c$ games to match and $N^P$ possible outcomes, so \begin{align}P(\mathrm{match})& = 1 - \frac{N^P!}{N^{PN_gN_c}(N^P-N_gN_c)!}\\&\approx 1 - \exp\left[-\frac{N_gN_c(N_gN_c-1)}{2N^P}\right],\end{align} where the approximation holds if $N_g^2N_c^2 \ll N^P$.
  • If you're looking at a specific game and seeing if any other game matches it, there are $N_gN_c-1$ other games that could match it and a probability of $N^{-P}$ of a match, so \begin{align} P(\mathrm{match}) &= 1 - (1-N^{-P})^{N_gN_c-1} \\&\approx 1 - \exp\left(-\frac{N_gN_c-1}{N^P}\right). \end{align}

  • If you're looking at a specific computer's games, compared to other computer's games, we can look at the $N_g^2(N_c-1)$ pairs of games where the first game comes from the specific computer and the second comes from one of the other computers. These pairs are all independent from each other, and they each have a $N^{-P}$ chance of a match, so the probability is \begin{align} P(\mathrm{match}) &= 1- (1-N^{-P})^{N_g^2(N_c-1)} \\&\approx 1 - \exp\left[-\frac{N_g^2(N_c-1)}{N^P}\right]. \end{align}

If require them to not only match each other, but also match a particular move order, the results are different:

  • Looking at all games, the probability is one minus the probability that no games have that move order or exactly one game has that move order, which is \begin{align} P(\mathrm{match}) & =1 - (1-N^{-P})^{N_gN_c} - N_gN_cN^{-P}(1-N^{-P})^{N_gN_c - 1} \\&= 1 - \left(1 + \frac{ N_gN_c}{N^P-1}\right)(1-N^{-P})^{N_gN_c} \\&\approx 1 - \exp\left[-\frac{N_gN_c(N_gN_c-1)}{2N^{2P}}\right]. \end{align}
  • Looking at a particular game, the probability that it and another game match that move order is the probability that that game matches times the probability that any other game matches, which is \begin{align} P(\mathrm{match}) &= N^{-P}\left[1 - (1-N^{-P})^{N_gN_c-1}\right] \\&\approx N^{-P}\left[1-\exp\left(-\frac{N_gN_c-1}{N^P}\right)\right]. \end{align}
  • In the last case, this is the probability that any of the computer's $N_g$ games match the given move order times the probability that any of the other computers' $N_g(N_c-1)$ games match the given move order, which is \begin{align} P(\mathrm{match}) & = \left[1-\left(1-N^{-P}\right)^{N_g}\right]\left[1-\left(1-N^{-P}\right)^{N_g(N_c-1)}\right] \\&\approx 1 - \exp\left[-\frac{N_g^2(N_c-1)}{N^{2P}}\right]. \end{align}

Note that in each case, the $N_g^2N_c^2 \ll N^P$ limit is has essentially the same probability for the any match vs specific match case, with $N^P$ replaced with $N^{2P}$. This is because, in that limit, the pairs are essentially independent, and the problem just becomes looking at the number of relevant pairs and how likely they are to match. For any match, it's $N^P$, while for specific match, it's $N^{2P}$.

1
On

Under your assumption that every game lasts $P$ ply there are $N^P$ games, each one equally likely. The chance that two out of a population match is part of the generalized birthday problem. The number of games to get a $50\%$ chance of a match between any two is about $\sqrt{2 \log 2 N^P}$. The chance that one specific game matches some other game is just the same (within 1 game) of the chance that a specific game is played. For a number of games $k$ small compared with $N^P$ this is about $\frac k{N^P}$