I have coded two C# programs, which use two different approaches to evaluate the outcome of a certain casino-style game (casino-style in the sense that the user pays points to take a turn, and sometimes receives points as a reward depending on that turn’s result). Program 1 calculates the average profitability of the best game play decision for each possible game state, starting at a round's end and working to the beginning. The average profitability of the starting game state is equivalent to, thus can be used to infer, the average profitability of the game as a whole. Program 1 also outputs a proper strategy for the game.
Program 2 accepts a strategy as input (I use the same strategy generated by Program 1), and simulates actual beginning-to-end game play using that strategy, cycling through many iterations of the game to gather statistics. This program outputs the return rate of the game based on the simulated trials (100% being breakeven).
Desired behavior: To produce correct and non-contradictory results in Program 1’s gamePlay.value variable for the starting game state (representing the game’s profitability in points), and Program 2’s returnRate variable (representing the game’s return rate).
Specific problem: Program 1’s gamePlay.value variable for the starting game state (Colorless, Colorless, Colorless) outputs 51.025 when the user inputs the same starting parameters as those which are hard-coded into Program 2 (namely, cost = 51 and baseBet = 50). A secondary task of Program 1 is to calculate the average number of turns remaining in the round, for each possible game state. Again, by noting this value for the starting state, the average number of turns in the round as a whole is known. There are, on average, 4.246 turns per round. By multiplying this number by the cost per turn, 51, we see that the average cost per round is 216.546. Adding the 51.025 profit yields 267.571, and dividing this number by the cost per round reveals a 123.563% return rate for the game.
This is not what Program 2 calculates, even using an extremely large number of game play samples. Some output samples, each of which are the result of one million game play turns, include:
1.00978242220553 1.00976014965254 1.00977590536083 1.0098289475708 1.00979315220468
123.563% and 100.98% are very far from each other.
Code to reproduce problem:
PROGRAM 1
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace theBlindPainterAnalyzerSimplified
{
public partial class Form1 : Form
{
double cost;
double baseBet;
public Form1()
{
InitializeComponent();
}
private void calculateStrategyButton_Click(object sender, EventArgs e) // This is the primary code, analogous to main()
{
cost = Convert.ToDouble(textBox1.Text);
baseBet = Convert.ToDouble(textBox2.Text);
GameState[] gameStates = new GameState[10];
for (int h = 0; h < gameStates.Length; h++)
{
gameStates[h] = new GameState();
}
int i = 0;
foreach (var c in BuildCombinations(Enum.GetValues(typeof(OrderedColors)).Cast<OrderedColors>().Reverse().ToArray(), 3))
{
int j = 0;
foreach (var panel in c)
{
gameStates[i].colors[j] = panel;
j++;
}
i++;
}
foreach (var gameState in gameStates) // Stores the expected value of each decision from each game state
{
int numPreviousStrokes = 0;
for (int j = 0; j < gameState.colors.Length; j++)
{
if (gameState.colors[j] == OrderedColors.Purple)
{
numPreviousStrokes++;
}
else if (gameState.colors[j] == OrderedColors.Blue)
{
numPreviousStrokes += 2;
}
}
double zeroProfitability = 0;
double oneProfitability = checkProfitability(gameState, 1, gameStates);
double twoProfitability = -4;
double threeProfitability = -4;
if (numPreviousStrokes >= 2)
{
twoProfitability = checkProfitability(gameState, 2, gameStates);
}
if (numPreviousStrokes >= 3)
{
threeProfitability = checkProfitability(gameState, 3, gameStates);
}
if (zeroProfitability > oneProfitability && zeroProfitability > twoProfitability && zeroProfitability > threeProfitability)
{
gameState.optimalPlay = 0;
gameState.value = zeroProfitability;
}
else if (oneProfitability > zeroProfitability && oneProfitability > twoProfitability && oneProfitability > threeProfitability)
{
gameState.optimalPlay = 1;
gameState.value = oneProfitability;
}
else if (twoProfitability > zeroProfitability && twoProfitability > oneProfitability && twoProfitability > threeProfitability)
{
gameState.optimalPlay = 2;
gameState.value = twoProfitability;
}
else if (threeProfitability > zeroProfitability && threeProfitability > oneProfitability && threeProfitability > twoProfitability)
{
gameState.optimalPlay = 3;
gameState.value = threeProfitability;
}
else
{
MessageBox.Show("Draw!");
}
gameState.remainingTurnCount = checkRemainingTurnCount(gameState, gameStates);
richTextBox1.Text += gameState.colors[0] + "," + gameState.colors[1] + "," + gameState.colors[2] + "," + gameState.optimalPlay + "," + gameState.value + "," + gameState.remainingTurnCount + "\n";
}
}
private double checkProfitability(GameState state, int numStrokes, GameState[] gameStates) // Calculates the expected value of making a particular decision from a particular game state
{
double[] possiblePayoffs = new double[50000];
int pPIndex = 0;
double sumOfPossiblePayoffs = 0;
double averagePayoff = 0;
double payoff = -cost;
for (int i = 0; i < Math.Pow(3, numStrokes); i++)
{
int innerForIndex = i;
for (int j = 0; j < numStrokes; j++)
{
state.colors[innerForIndex % 3]++;
innerForIndex /= 3;
}
if ((int)state.colors[0] <= 2 && (int)state.colors[1] <= 2 && (int)state.colors[2] <= 2)
{
int numPurple = 0;
int numBlue = 0;
int numNonZeros = 0;
for (int panel = 0; panel <= 2; panel++)
{
if (state.colors[panel] == OrderedColors.Purple)
{
numPurple++;
}
if (state.colors[panel] == OrderedColors.Blue)
{
numBlue++;
}
}
if (numPurple != 0)
{
numNonZeros++;
}
if (numBlue != 0)
{
numNonZeros++;
}
switch (numPurple)
{
case 2:
payoff += (.7 * baseBet);
break;
case 3:
payoff += (2 * baseBet);
break;
}
switch (numBlue)
{
case 2:
payoff += baseBet;
break;
case 3:
payoff += (4.12 * baseBet);
break;
}
if (numPurple + numBlue == 3 && numPurple != 0 && numBlue != 0)
{
payoff += (4 * baseBet);
}
}
OrderedColors[] currentColors = (OrderedColors[])state.colors.Clone(); // Temporary clone of array used to find corrosponding game state
Array.Sort(currentColors);
Array.Reverse(currentColors);
foreach (var gameState in gameStates)
{
if (areArraysEqual(gameState.colors, currentColors))
{
payoff += gameState.value;
break;
}
}
possiblePayoffs[pPIndex] = payoff;
pPIndex++;
payoff = -cost;
innerForIndex = i;
for (int j = 0; j < numStrokes; j++)
{
state.colors[innerForIndex % 3]--;
innerForIndex /= 3;
}
}
for (int i = 0; i <= pPIndex; i++)
{
sumOfPossiblePayoffs += possiblePayoffs[i];
}
averagePayoff = sumOfPossiblePayoffs / pPIndex;
return averagePayoff;
}
private double checkRemainingTurnCount(GameState state, GameState[] gameStates)
{
double[] possibleTurnAverages = new double[50000];
int pTAIndex = 0;
double sumOfPossibleTurnAverages = 0;
double turns = -4;
if (state.optimalPlay == 0)
{
turns = 0;
}
else
{
for (int i = 0; i < Math.Pow(3, state.optimalPlay); i++)
{
int innerForIndex = i;
for (int j = 0; j < state.optimalPlay; j++)
{
state.colors[innerForIndex % 3]++;
innerForIndex /= 3;
}
if ((int)state.colors[0] <= 3 && (int)state.colors[1] <= 3 && (int)state.colors[2] <= 3)
{
OrderedColors[] currentColors = (OrderedColors[])state.colors.Clone(); // Temporary clone of array used to find corrosponding game state
Array.Sort(currentColors);
Array.Reverse(currentColors);
foreach (var gameState in gameStates)
{
if (areArraysEqual(gameState.colors, currentColors))
{
possibleTurnAverages[pTAIndex] = gameState.remainingTurnCount + 1;
pTAIndex++;
break;
}
}
}
else
{
possibleTurnAverages[i] = 1;
pTAIndex++;
}
innerForIndex = i;
for (int j = 0; j < state.optimalPlay; j++)
{
state.colors[innerForIndex % 3]--;
innerForIndex /= 3;
}
}
for (int i = 0; i <= pTAIndex; i++)
{
sumOfPossibleTurnAverages += possibleTurnAverages[i];
}
turns = sumOfPossibleTurnAverages / pTAIndex;
}
return turns;
}
private static IEnumerable<T[]> BuildCombinations<T>(T[] items, int itemsCountInCombination, int startIndex = 0) // Finds all possible unique game states; I did not write this code myself
{
if (itemsCountInCombination == 0)
{
yield return new T[0];
yield break;
}
for (int i = startIndex; i < items.Length; i++)
{
foreach (var combination in BuildCombinations(items, itemsCountInCombination - 1, i))
{
var c = new T[itemsCountInCombination];
c[0] = items[i];
Array.Copy(combination, 0, c, 1, combination.Length);
yield return c;
}
}
}
private bool areArraysEqual<T>(T[] array1, T[] array2)
{
if (Object.ReferenceEquals(array1, array2))
{
return true;
}
if (array1.Length != array2.Length)
{
return false;
}
for (int i = 0; i < array1.Length; i++)
{
if (array1[i].Equals(array2[i]) == false)
{
return false;
}
}
return true;
}
}
public class GameState
{
public OrderedColors[] colors = { OrderedColors.Colorless, OrderedColors.Colorless, OrderedColors.Colorless };
public int optimalPlay = -4; // The value 0 has a specific meaning
public double value = 0;
public double remainingTurnCount;
}
public enum OrderedColors
{
Colorless,
Purple,
Blue
}
}
__
PROGRAM 2
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
namespace theBlindPainterTesterContinuousSimplified
{
public partial class Form1 : Form
{
Random rn1_3 = new Random();
float cost = 51;
float baseBet = 50;
static double startScore = 100000;
static double score = startScore;
int cycledPoints = 0;
double returnRate = 0;
bool isRoundOver = false;
OrderedColors[] panelColors = { OrderedColors.Colorless, OrderedColors.Colorless, OrderedColors.Colorless };
public Form1()
{
InitializeComponent();
}
private void startButton_Click(object sender, EventArgs e)
{
string strategyFileText;
using (StreamReader strategyFile = new StreamReader("strategy.txt"))
{
strategyFileText = strategyFile.ReadToEnd();
}
string[] strats = strategyFileText.Split('\n');
for (int h = 0; h < 1000000; h++)
{
isRoundOver = false;
int selectedPlay = selectPlay(strats);
if (selectedPlay == 0)
{
endRound();
}
else if (selectedPlay >= 1 && selectedPlay <= 3)
{
score = score - cost;
cycledPoints += Convert.ToInt32(cost);
for (int i = 0; i < selectedPlay; i++)
{
if (isRoundOver == false)
{
paintPanel();
}
}
payOut();
}
else
{
MessageBox.Show(selectedPlay + " is not a valid play.");
}
}
if (startScore != 0 && cycledPoints != 0)
{
returnRate = Math.Pow(score / startScore, startScore / cycledPoints);
}
else
{
MessageBox.Show("Division by zero error.");
}
richTextBox1.Text = returnRate.ToString();
}
// Retrieves the appropriate game play for the current panel colors
private int selectPlay(string[] strats)
{
int play = -4; // The value 0 has a specific meaning
Array.Sort(panelColors);
Array.Reverse(panelColors);
foreach (string strat in strats)
{
string[] stratComponents = strat.Split(',');
int matches = 0;
for (int j = 0; j <= 2; j++)
{
if (stratComponents[j] == panelColors[j].ToString())
{
matches++;
}
}
if (matches == 3)
{
play = Convert.ToInt32(stratComponents[3]);
break;
}
}
return play;
}
// Paints a single randomly selected panel
private void paintPanel()
{
int primedPanel = rn1_3.Next(1, 4);
if (panelColors[primedPanel - 1] == OrderedColors.Colorless)
{
panelColors[primedPanel - 1] = OrderedColors.Purple;
}
else if (panelColors[primedPanel - 1] == OrderedColors.Purple)
{
panelColors[primedPanel - 1] = OrderedColors.Blue;
}
else if (panelColors[primedPanel - 1] == OrderedColors.Blue)
{
endRound();
}
}
// Adjusts score and takes action if the game is over
private void payOut()
{
int numPurple = 0;
int numBlue = 0;
int numNonZeros = 0;
for (int panel = 0; panel <= 2; panel++)
{
if (panelColors[panel] == OrderedColors.Purple)
{
numPurple++;
}
if (panelColors[panel] == OrderedColors.Blue)
{
numBlue++;
}
}
if (numPurple != 0)
{
numNonZeros++;
}
if (numBlue != 0)
{
numNonZeros++;
}
switch (numPurple)
{
case 2:
score += (.7 * baseBet);
break;
case 3:
score += (2 * baseBet);
break;
}
switch (numBlue)
{
case 2:
score += baseBet;
break;
case 3:
score += (4.12 * baseBet);
break;
}
if (numPurple + numBlue == 3 && numPurple != 0 && numBlue != 0)
{
score += (4 * baseBet);
}
}
private void endRound()
{
isRoundOver = true;
for (int panel = 0; panel <= 2; panel++)
{
panelColors[panel] = OrderedColors.Colorless;
}
}
}
public enum OrderedColors
{
Colorless,
Purple,
Blue
}
}
__
What I Have Considered: It is difficult to isolate the problem because Program 1 solves the game using backward induction, while Program 2 gathers information by playing the game start to finish many times, two fundamentally different approaches. I have spent a fairly substantial amount of time with Program 1 in the debugger, including working out the results for the last few game states on paper, and it seems to be functioning properly and to a very respectable precision, as far as I can tell. Program 2 is more difficult to do this with due to the inclusion of randomness in the game, but I have stepped through a small number of iterations and the calculations seem to me to be on point.
Can anyone clarify the reason that these two approaches to return rate calculation produce mathematically conflicting results? Is one of the approaches invalid, or am I looking at an error in the execution of one approach?
The game (this is detailed info about the game being studied, feel free to skip this section): This game consists of three objects called panels, which the user paints. Each panel starts out Colorless. At first, the user must select 1, which paints one random panel. A Colorless panel will turn Purple when painted. A Purple panel will turn Blue when painted again. A Blue panel, when painted again, will immediately cause all panels to return to Colorless, and a new round begins. Once the user has painted a panel, he/she can, on the next turn, select 1 or 2, which will paint one random panel, or two random panels, respectively. Once two units of paint have been applied, the user can select 1, 2, or 3, and the appropriate result is applied. When a new round begins, only option 1 is available until 2 and 3 are unlocked again.
Each turn costs 51 points, regardless of whether 1, 2, or 3 is selected. There is a payout table, as laid out in the code, which may award points back to the user based on the resulting state of the panels. The user can also decide to start a new round at any time, and revert all panels to Colorless, which does not have a cost. The goal of the game is to earn rather than lose points, on average.
Requested link: https://stackoverflow.com/questions/26616210/why-does-the-return-rate-of-a-game-appear-to-vary-when-output-by-two-different-p