The expected outcome of a random game of chess?

6.1k Views Asked by At

Imagine a game of chess where both players generate a list of legal moves and pick one uniformly at random.

Q: What is the expected outcome for white?

  • 1 point for black checkmated, 0.5 for a draw, 0 for white checkmated. So the expected outcome is given by $$\mathrm{Pr}[\text{black checkmated}]+0.5\ \mathrm{Pr}[\text{draw}].$$
  • Neither player resigns, nor are there any draw offers or claims.

As a chess player, I'm curious if white (who plays first) has some advantage here.


I'm not expecting an exact answer to be possible. Partial results (e.g. that the expectation is >0.5) and experimental results are welcome. (The expectation is not 0 or 1, since there are possible games where white does not win and where black does not win.)

I'm guessing this has been looked at before, so I decided to ask first (rather than implement a chess engine that makes random moves and hope to find something other than "draw, draw, draw, draw, ..."). Searching for "random game of chess" lists Chess960 and other randomized variants, which is not what I want.


Technicalities:

  • En passant capturing, castling, pawn promotion, etc. all apply as usual.

  • The FIDE Laws of Chess will be updated 1 July 2014 with the following:

    9.6 If one or both of the following occur(s) then the game is drawn:

    • a. the same position has appeared, as in 9.2b, for at least five consecutive alternate moves by each player.

    • b. any consecutive series of 75 moves have been completed by each player without the movement of any pawn and without any capture. If the last move resulted in checkmate, that shall take precedence.

    This means that games of chess must be finite, and thus there is a finite number of possible games of chess.

5

There are 5 best solutions below

3
On

Update: The code below has a small, but significant oversight. I was unaware that a stalemate would not be counted the same way as a board with insufficient pieces to play and this changes the answer. @Winther has fixed the bug and reran the simulations. That said, there is still value to the code being posted so I'll leave it up for anyone else to repeat the experiments (and find more bugs!).


Slightly rephrasing your question,

Is the expected outcome for EX[white] = 1/2 in a random game?

To test this, I simulated 10^5 games using the library python-chess. The code is posted below for those wishing to repeat the numerical experiment (this takes about 4 hours on an 8-core machine). In the 100000 games, 46123 came up as wins for white and 6867 games were ties. This puts the expected value of the game at

$$ \text{EX}[white] = 0.495565 $$

Using the 2-sided normal approximation to the binomial test of a fair game, we get a p-value of 0.00511. Therefore, we can reject the null-hypothesis that the game is fair. This was surprising to me.

In other words, $\text{EX}[white]<1/2$ looks to be statistically significant, however the advantage for black is very small.

Personally, the more interesting question is the distribution of game length, hence a plot of it is included below.

import chess, random, itertools, multiprocessing
simulations = 10**5

def random_move(board):
    return random.choice(list(board.legal_moves))

def play(game_n):
    board = chess.Bitboard()
    ply = 0
    while not board.is_game_over():
        board.push( random_move(board) )
        ply += 1

    # board.turn == 0 -> White, == 1 -> Black
    return game_n, int(board.is_stalemate()), board.turn, ply

P = multiprocessing.Pool()
results = P.imap(play,xrange(simulations))

with open("results.txt",'w') as FOUT:
    for game in results:
        s = "{} {} {} {}\n".format(*game)
        FOUT.write(s)

enter image description here

There is much to be mined out of this dataset, but I am not a chess-aficionado. I'm not sure why the distribution contains two "humps", comments are welcome.

7
On

I found a bug in the code given in Hooked's answer (which means that my original reanalysis was also flawed): one also have to check for insufficient material when assessing a draw, i.e.

int(board.is_stalemate())

should be replaced with

int(board.is_insufficient_material() or board.is_stalemate())

This changes things quite a bit. The probabillity of a draw goes up quite a bit. So far with $n = 5\cdot 10^5$ samples I find

$$E[\text{Black}] \approx 0.5002$$ $$E[\text{White}] \approx 0.4998$$ $$P[\text{Draw}] \approx 84.4\%$$

A simple hypotesis test shows that with $P(\text{white})=P(\text{black})=0.078,~P(\text{draw})=0.844$ and $N=5\cdot 10^5$ samples the probabillity to get $|E[\text{Black}] - 0.5| > 0.002$ is $25\%$ so our results are perfectly consistent with $E[\text{White}]=E[\text{Black}]=0.5$. The "hump" remains, but is now easily explained: it is due to black or white winning. Either they win early or the game goes to a draw.

enter image description here
(source: folk.uio.no)

Here is one of the shortest game I found, stupid black getting matted in four moves:

enter image description here
(source: folk.uio.no)

3
On

full code and results

from queue import Queue
import chess, random, _thread
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

def outcome(board):
    if board.is_checkmate():
        if board.turn:
            return "Black"
        else:
            return "White"
    else:
        return "Draw"

#calc total moves
def moves(board):
    if board.turn:
        return board.fullmove_number * 2 - 1
    else:
        return board.fullmove_number * 2 - 2

def play(i):
    board = chess.Board()
    while not board.is_game_over():
        board.push(random.choice(list(board.legal_moves)))
    return i, outcome(board), moves(board)

def thread_wrapper(i, func, stat, q):
    def run():
        q.put(func)
        stat[i] = True
    return run

workers = 500000
status = [False for i in range(workers)]
q = Queue()
for i in range(workers):
    _thread.start_new_thread(thread_wrapper(i, play(i), status, q), tuple())

while not all(status):
    pass

results = []
while not q.empty():
    results.append(q.get())
results_df = pd.DataFrame(results, columns=['game_n', 'outcome', 'moves'])
#TODO process the results

results_df.to_csv('my_file.csv')

black = results_df.loc[results_df['outcome'] == 'Black']
white = results_df.loc[results_df['outcome'] == 'White']
draw = results_df.loc[results_df['outcome'] == 'Draw']
win = results_df.loc[results_df['outcome'] != 'Draw']

Total = len(results_df.index)
Wins = len(win.index)

PercentBlack = "Black Wins ≈ %s" % ('{0:.2%}'.format(len(black.index)/Total))
PercentWhite = "White Wins ≈ %s" % ('{0:.2%}'.format(len(white.index)/Total))
PercentDraw = "Draw ≈ %s" % ('{0:.2%}'.format(len(draw.index)/Total))
AllTitle = 'Distribution of Moves by All Outcomes (nSample = %s)' % workers

a = draw.moves
b = black.moves
c = white.moves

kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)
kdec = scipy.stats.gaussian_kde(c)

grid = np.arange(700)

#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(a)+len(b)+len(c)))
wb = kdeb(grid)*(len(b)/float(len(a)+len(b)+len(c)))
wc = kdec(grid)*(len(c)/float(len(a)+len(b)+len(c)))

total = wa+wb+wc
wtotal = wb+wc

plt.figure(figsize=(10,5))
plt.plot(grid, total, lw=2, label="Total")
plt.plot(grid, wa, lw=1, label=PercentDraw)
plt.plot(grid, wb, lw=1, label=PercentBlack)
plt.plot(grid, wc, lw=1, label=PercentWhite)
plt.title(AllTitle)
plt.ylabel('Density')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()

ExpectedBlack = "EV Black Wins ≈ %s" % ('{0:.2%}'.format(len(black.index)/Wins))
ExpectedWhite = "EV White Wins ≈ %s" % ('{0:.2%}'.format(len(white.index)/Wins))
WinTitle = 'Distribution of Moves by Wins (nWins = %s)' % Wins

plt.figure(figsize=(10,5))
plt.plot(grid, wtotal, lw=2, label="Wins")
plt.plot(grid, wb, lw=1, label=ExpectedBlack)
plt.plot(grid, wc, lw=1, label=ExpectedWhite)
plt.title(WinTitle)
plt.ylabel('Density')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()

print("Most frequent moves of All:", grid[total.argmax()], round(max(total), 4), "for", Total, "games")
print("Most frequent moves of Draws:", grid[wa.argmax()], round(max(wa), 4), "for", len(draw.index), "games")
print("Most frequent moves of Wins:", grid[wtotal.argmax()], round(max(wtotal), 4), "for", Wins, "games")
print("Most frequent moves of Black wins:", grid[wb.argmax()], round(max(wb), 4), "for", len(black.index), "games")
print("Most frequent moves of White wins:", grid[wc.argmax()], round(max(wc), 4), "for", len(white.index), "games")

All results:

All Results

Non-draw results:

Results of Wins

Most frequent moves of All: 368 0.0036 for 500000 games

Most frequent moves of Draws: 370 0.0035 for 422856 games

Most frequent moves of Wins: 135 0.0008 for 77144 games

Most frequent moves of Black wins: 133 0.0004 for 38546 games

Most frequent moves of White wins: 137 0.0004 for 38598 games

1
On

@Winther, @Hooked, @yaoster I don't have enough reputation to comment, I noticed that your code assumed board.turn==1 to mean Black's turn, but please correct me if I'm wrong, it seems the opposite is true as I checked python-chess documentation (ie. board.turn==1 means White's turn).

If this is true, your analysis would suggest White has a slight advantage, which is what most people would expect?

From python-chess documentation:

chess.WHITE = True

http://python-chess.readthedocs.io/en/latest/core.html?highlight=turn#chess.WHITE

0
On

I know 350 games are not enough samples, but I will include this in case someone was curious about the draw-type distribution:

enter image description here

Note: in the case of multiple types (example: Checkmate + Fifty Move Rule) it only increased the first one in order of priority: Checkmate, Stalemate, 3-fold-rep, Insufficient Material, 50-move-rule.