Calculating the expected amount of matches played before winning an Alpha Pack in Rainbow Six: Siege

816 Views Asked by At

I am trying to solve the next problem:

The problem

A video game tries to incentive their players to play more matches online. For that, the system will spin a roulette every time the player wins a match, where he will win a prize with varying probability ($p$), based on his previous matches.

The system works as follows:

  • For a new player, he will have 3% chance of winning the roulette. Let's call this starting probability $p_0 = 0.03$.
  • If you win a match, the system will spin the roulette, where you will win the prize with said probability $p$
  • If you win the match, but didn't win the free prize (the roulette), your chances for the next match are incremented +3% ($p_{new} = p_{old} + 0.03$).
  • If you lose the match the system won't spin the roulette, but you increase +2.5% your chances for the next match ($p_{new} = p_{old} + 0.025$).
  • If you win the match AND win the roulette, your chances are reset ($p_{new} = p_0$).

Assume that the probability of winning a match is independent from the amount of matches played and previous results, calling it $v$. For now, assume that $v=0.5$.

Let's call $M$ de amount of matches played before winning the prize. Calculate the expected value of $M$ ($E[M]$).

What I've tried so far:

My first intuition was to create a tree diagram with the possibilities. It turned out as follows

I think it's pretty intuitive but I can explain it further if you need to (or correct it if it has any errors). $V_i$ and $D_i$ represents a Victory or Defeat on the match number i, while $R_i$ means spinning+winning the roulette after match number i (and $\overline{R_i}$ the opposite).

Knowing that M is a discrete random variable, the expected value will be:

$E[M] = 1 \cdot P(M=1) + 2 \cdot P(M=2) + ~...$

Looking at the tree, it turns out that:

$P(M=1) = P(V_1) \cdot P(R_1 | V_1) = 0.5 \cdot 0.03$

$P(M=2) = P(V_1) \cdot P(\overline{R_1} | V_1) \cdot P(V_2 |~ [\overline{R_1}, V_1]) \cdot P(R_2 | ~[V_2,\overline{R_1}, V_1]) + P(D_1) \cdot P(V_2 |~ D_1) \cdot P(R_2 | ~[V_2, D_1]) = 0.5 \cdot 0.97 \cdot0.5 \cdot 0.06 + 0.5 \cdot0.5 \cdot 0.055$

But I can't seem to find the generic expression of $P(M=m)$, so this approach didn't workout.

In the end, I turned out simulating 100000 players and the expected value turned out to be: $E[M] \approx 10.3$. You can see the code for that here.

The histogram turned out to be like this.

So, I know that the result is more or less 10.3 but I want to know the analytical way to reach that conclusion.

Any help is welcome!

P.S: If you are curious, this system is used by the game Rainbow Six: Siege, where the prize are cosmetics (alpha packs). You can read their FAQ here.

1

There are 1 best solutions below

0
On

I'll briefly describe how you can relate this expected value with the solution to a small ($200 \times 200$) sparse linear system. I'm not optimistic that there is a "closed-form solution" for problems of this class, but it is possible to solve them exactly and at very small computational expense. I will skip many important details and attempt only to give a sketch of the technique. More information can be found in any introductory text on finite Markov chains that covers fundamental matrices.

Consider the random trajectory of a player through the prize-distribution system. We will say that the "state" of this player after $k$ rounds is $X_k$. I will consider $200$ states, numbered from $1$ to $200$:

  • State $1$: The player has won a prize.
  • States $2 \le i \le 200$: The player has not won a prize, and the roulette is currently favorable with probability $5i/1000$. Not all states in this class are reachable; I have included all of them for simplicity.

In this notation, we have $X_0 = 6$.

Conditioned only on the value of $X_k$, we can describe the distribution of $X_{k + 1}$. (You have already done this when programming your simulation.) If $X_k = 1$, then $X_{k + 1}$ is also $1$ by our analysis. If $X_k = i > 1$, then we identify three mutually exclusive events:

  • With probability $\frac 1 2$, the player loses the next round: $X_{k + 1} = \operatorname{min}(i + 5, 200)$.
  • With probability $\frac 1 2 \left(\frac{5i}{1000}\right)$, the player wins the round and wins the prize: $X_{k + 1} = 1$.
  • With probability $\frac 1 2 \left(1 - \frac{5i}{1000}\right)$, the player wins the round but does not win the prize: $X_{k + 1} = \operatorname{min}(i + 6, 200)$.

We can record this "transition rule" in a matrix with $200$ lines and $200$ columns, putting the probability that $X_{k + 1} = j$ given $X_k = i$ at position $(i, j)$. Call this matrix $M$. By the nature of the state $1$, the first line of $M$ is $[1, 0 \ldots 0]$. We will say that $M$ has the block structure

$$M = \begin{bmatrix} 1 & 0 \\ R & Q \end{bmatrix}$$

Now, we skip over some important formalism (namely, the notion of a Markov process) and claim the following: the probability that $X_{k + n} = j$ given that $X_k = i$ equals the $(i, j)$th element of $M^n$.

With this observation, it is easy to compute the probability that, after $k$ rounds, the player has won a prize: this is just the element at $(6,1)$ of $M^k$. Here's a graph of this computation for $k = 1 \ldots 30$.

the bespoke graph

We can also use this model to answer your original question. Let $T$ be the random variable that records the number of rounds we play until we win a prize, and let $N_i$ be the number of times that we visit the state $i$. An important observation is that $T = \sum_{i = 2}^{200} N_i$. It follows by linearity of expectation that $$\mathbb{E}(T) = \mathbb{E}\left[\sum_{i = 2}^{200} N_i\right] = \sum_{i = 2}^{200} \mathbb{E}[N_i]$$

There is an interesting trick to compute $\mathbb{E}[N_i]$. Using subscripts to refer to an element of a matrix and denoting by $\{e_k\}$ the canonical basis vectors of appropriate dimension, it is possible to verify that, for all $2 \le i \le 200$, $$\mathbb{E}[N_i] = \sum_{k = 0}^\infty P(X_k = i) = \sum_{k = 0}^\infty M^k_{6, i} = \sum_{k=0}^\infty Q^k_{5, i - 1} = e_5^T \left(\sum_{k=0}^\infty Q^k\right) e_{i - 1}$$ The convergence of these series is justified because $\sum_{i = 0}^\infty Q^k$ converges to the so-called fundamental matrix of the process, $(I - Q)^{-1}$. (A similar fact is true in general for finite-state Markov processes.) Therefore, where $\mathbf{1}$ is the all-ones vector, we have $$\mathbb{E}(T) = e_5^T (I - Q)^{-1} \mathbf{1} $$ Computing this with a software package (Mathematica) gives an exact result that agrees with the value you derived from simulation. the bespoke Mathematica session