I am trying to solve the next problem:
The problem
A video game tries to incentive their players to play more matches online. For that, the system will spin a roulette every time the player wins a match, where he will win a prize with varying probability ($p$), based on his previous matches.
The system works as follows:
- For a new player, he will have 3% chance of winning the roulette. Let's call this starting probability $p_0 = 0.03$.
- If you win a match, the system will spin the roulette, where you will win the prize with said probability $p$
- If you win the match, but didn't win the free prize (the roulette), your chances for the next match are incremented +3% ($p_{new} = p_{old} + 0.03$).
- If you lose the match the system won't spin the roulette, but you increase +2.5% your chances for the next match ($p_{new} = p_{old} + 0.025$).
- If you win the match AND win the roulette, your chances are reset ($p_{new} = p_0$).
Assume that the probability of winning a match is independent from the amount of matches played and previous results, calling it $v$. For now, assume that $v=0.5$.
Let's call $M$ de amount of matches played before winning the prize. Calculate the expected value of $M$ ($E[M]$).
What I've tried so far:
My first intuition was to create a tree diagram with the possibilities. It turned out as follows
I think it's pretty intuitive but I can explain it further if you need to (or correct it if it has any errors). $V_i$ and $D_i$ represents a Victory or Defeat on the match number i, while $R_i$ means spinning+winning the roulette after match number i (and $\overline{R_i}$ the opposite).
Knowing that M is a discrete random variable, the expected value will be:
$E[M] = 1 \cdot P(M=1) + 2 \cdot P(M=2) + ~...$
Looking at the tree, it turns out that:
$P(M=1) = P(V_1) \cdot P(R_1 | V_1) = 0.5 \cdot 0.03$
$P(M=2) = P(V_1) \cdot P(\overline{R_1} | V_1) \cdot P(V_2 |~ [\overline{R_1}, V_1]) \cdot P(R_2 | ~[V_2,\overline{R_1}, V_1]) + P(D_1) \cdot P(V_2 |~ D_1) \cdot P(R_2 | ~[V_2, D_1]) = 0.5 \cdot 0.97 \cdot0.5 \cdot 0.06 + 0.5 \cdot0.5 \cdot 0.055$
But I can't seem to find the generic expression of $P(M=m)$, so this approach didn't workout.
In the end, I turned out simulating 100000 players and the expected value turned out to be: $E[M] \approx 10.3$. You can see the code for that here.
The histogram turned out to be like this.
So, I know that the result is more or less 10.3 but I want to know the analytical way to reach that conclusion.
Any help is welcome!
P.S: If you are curious, this system is used by the game Rainbow Six: Siege, where the prize are cosmetics (alpha packs). You can read their FAQ here.
I'll briefly describe how you can relate this expected value with the solution to a small ($200 \times 200$) sparse linear system. I'm not optimistic that there is a "closed-form solution" for problems of this class, but it is possible to solve them exactly and at very small computational expense. I will skip many important details and attempt only to give a sketch of the technique. More information can be found in any introductory text on finite Markov chains that covers fundamental matrices.
Consider the random trajectory of a player through the prize-distribution system. We will say that the "state" of this player after $k$ rounds is $X_k$. I will consider $200$ states, numbered from $1$ to $200$:
In this notation, we have $X_0 = 6$.
Conditioned only on the value of $X_k$, we can describe the distribution of $X_{k + 1}$. (You have already done this when programming your simulation.) If $X_k = 1$, then $X_{k + 1}$ is also $1$ by our analysis. If $X_k = i > 1$, then we identify three mutually exclusive events:
We can record this "transition rule" in a matrix with $200$ lines and $200$ columns, putting the probability that $X_{k + 1} = j$ given $X_k = i$ at position $(i, j)$. Call this matrix $M$. By the nature of the state $1$, the first line of $M$ is $[1, 0 \ldots 0]$. We will say that $M$ has the block structure
$$M = \begin{bmatrix} 1 & 0 \\ R & Q \end{bmatrix}$$
Now, we skip over some important formalism (namely, the notion of a Markov process) and claim the following: the probability that $X_{k + n} = j$ given that $X_k = i$ equals the $(i, j)$th element of $M^n$.
With this observation, it is easy to compute the probability that, after $k$ rounds, the player has won a prize: this is just the element at $(6,1)$ of $M^k$. Here's a graph of this computation for $k = 1 \ldots 30$.
We can also use this model to answer your original question. Let $T$ be the random variable that records the number of rounds we play until we win a prize, and let $N_i$ be the number of times that we visit the state $i$. An important observation is that $T = \sum_{i = 2}^{200} N_i$. It follows by linearity of expectation that $$\mathbb{E}(T) = \mathbb{E}\left[\sum_{i = 2}^{200} N_i\right] = \sum_{i = 2}^{200} \mathbb{E}[N_i]$$
There is an interesting trick to compute $\mathbb{E}[N_i]$. Using subscripts to refer to an element of a matrix and denoting by $\{e_k\}$ the canonical basis vectors of appropriate dimension, it is possible to verify that, for all $2 \le i \le 200$, $$\mathbb{E}[N_i] = \sum_{k = 0}^\infty P(X_k = i) = \sum_{k = 0}^\infty M^k_{6, i} = \sum_{k=0}^\infty Q^k_{5, i - 1} = e_5^T \left(\sum_{k=0}^\infty Q^k\right) e_{i - 1}$$ The convergence of these series is justified because $\sum_{i = 0}^\infty Q^k$ converges to the so-called fundamental matrix of the process, $(I - Q)^{-1}$. (A similar fact is true in general for finite-state Markov processes.) Therefore, where $\mathbf{1}$ is the all-ones vector, we have $$\mathbb{E}(T) = e_5^T (I - Q)^{-1} \mathbf{1} $$ Computing this with a software package (Mathematica) gives an exact result that agrees with the value you derived from simulation.