Expected stop time for biased gambler's ruin

435 Views Asked by At

Supposed you have \$10 at hand, and you play a game which costs \$2 each time. You will have $p=\frac{1}{3}$ to get \$5 back, and $q=\frac{2}{3}$ to get nothing. You would like to play the game as many as you can (which means whenever you have at least $2 at hand, you would play it). What is the expected time you could play?

Here is my attempts:

Let $E(X_t)$ be the expected money at hand after $t$. We also know $E(X_0) = 10$.

From the question, we know $E(X_{t+1}) = E(X_t) - \frac{1}{3}$, then we can have $E(X_{t+1} + \frac{t+1}{3}) = E(X_t + \frac{t}{3})$. Hence, we can have: $$E(X_t + \frac{t}{3}) = E(X_0) = 10$$

Let $\tau$ be the stop time, then we could have: $$E(X_\tau + \frac{\tau}{3}) = 10$$

So the expected stop time is: $$E(\tau) = 3*(10 - E(X_\tau))$$

However, I don't know the value of $E(X_\tau)$ since when the game is over, we could possibly end with \$1 or \$0. And I couldn't get their corresponding probability.

I also try to simulate it for 10000000 times. It gives me around 28.41

1

There are 1 best solutions below

0
On BEST ANSWER

A lot of this comes down to what you know, as there are many moving parts underlying this. Since this is tagged 'martingale' the solution I give leans heavily on martingales and basics of linear recurrences.

note:
there are essentially 2 different ways to represent the problem, one is you start at 10 dollars and terminate when bankroll is $\leq 1$. The other is you start at 0 dollars and terminate when bankroll $\leq -9$. There is a simple translation between one and the other so I'll freely use whichever is most convenient.

i.) let $S_n =X_1+...+ X_n$ Where the $X_i$ are iid taking values $\{-2,3\}$ with probabilities $\frac{2}{3}$ and $\frac{1}{3}$ respectively
(this is the start at 0 formulation where we want a stopping rule of terminate when bankroll $\leq -9$).

Let $A_n$ denote the event $S_n \gt -9$, then for $n$ large enough we have
$P\big(\text{threshold of -9 has not been crossed over n trials}\big)=P\big(A_1\cap ...\cap A_n\big)\leq P\big(A_n\big)\leq \rho^n $
for some $\rho\in(0,1)$
(Use Chernoff type bound or Azuma-Hoeffding)

This means that the tail of our stopping time random variable $T$ is bounded above by a geometric random variable. (Or for some constant $c$ we have $T+c$ is stochasticaly smaller than some geometric random variable.)
Thus $T$ is finite WP1 and has all moments. In particular $E\big[T\big]\lt \infty$

ii.) Applying the Wald Equation we have
$E\big[S_T\big] = E\big[T\big]\cdot E\big[X\big]\implies E\big[T\big] = \frac{E\big[S_T\big]}{E\big[X\big]}$
now
$E\big[X\big] = \frac{2}{3}\cdot -2 + \frac{1}{3}3=-\frac{1}{3}$ and
$E\big[S_T\big]= p\cdot -9 + (1-p)\cdot -10$

so we need to solve for $p$, the probability that we stop after total net losses of exactly 9

iii.) Write this as a countable state Markov chain with states $\big\{0,1,2,....,\big\}$
(This is the start at 10 and terminate at 0 or 1 formulation)
States 0 and 1 are absorbing states, and states $\geq 2$ are transient, forming a single communicating class. What (i) tells us is that states 0 and 1 form a persistent set, i.e. absorption happens WP1 in either state 0 or 1, no matter what the starting state is. (Technically we only verified this for starting in state 10 but the transient states form a single communicating class so they all inherit this property.)

There is a transition matrix $P$ associated with this chain, and we want to find some $\mathbf x$ that satisfies $P\mathbf x = \mathbf x$

So, for state $k\geq 2$ we have a recurrence
$x_k = \frac{2}{3}x_{k-2}+\frac{1}{3}x_{k+3}$ or
$x_{k+3}=3x_k-2 x_{k-2}$

and we also have boundary conditions of $x_1:=1$ and $x_0:=0$. (When there are only 2 absorbing states the standard technique is to give a reward of 1 for one of them and reward of 0 for the other to in effect convert them into a Bernouli.)

thus if we solve the recurrence we will have
$P\mathbf x = \mathbf x$ where $x_0=0$ and $x_1 = 1$ and for probabilistic reasons we want to impose the additional constraint that each
$x_k\in [0,1]$

as a result we'll, given our start in state 10:
$\mathbf e_{10}^TP^n \mathbf x=\mathbf e_{10}^T\big(P^n \mathbf x\big) = \mathbf e_{10}^T \mathbf x = x_{10}$
but since $\lim_{n\to \infty}\mathbf e_{10}^TP^n = \mathbf z^T$ where $z_0 =1-p$ and $z_1=p$ and $z_i =0$ for $i\geq 2$, this means
$p =(1-p)\cdot 0 + p \cdot 1= \mathbf z^T\mathbf x=\lim_{n\to \infty}\big(\mathbf e_{10}^T P^n\big) \mathbf x = \lim_{n\to \infty}\mathbf e_{10}^T P^n \mathbf x = x_{10}$
in short $P\mathbf x = \mathbf x$ gives us a very useful martingale.

(if one is concerned about analytic subtleties relating to the above limits, there are many ways to add further justification. A very simple exercise is to show that if state $j$ is absorbing, and $x_j:=1$, and in general $x_k\in[0,1]$ for some $\mathbf x = P\mathbf x$, then $p_{k,j}^{(n)}\leq x_k$ for all n. After taking a limit of n with $j:=1$ and then repeating with $j:=0$, using this time $\mathbf x':=\mathbf 1-\mathbf x$ we see that these inequalities must in fact be met with equality, after taking the limit, in each case.)

iv.) solve the recurrence however you like. You can observe 5 distinct roots
https://www.wolframalpha.com/input/?i=x%5E5-3x%5E2%2B2
but $\lambda_1=1$ and $\lambda_2 \approx-0.76219$ are the only two with modulus $\leq 1$. Note after factoring out the root of 1 the polynomial is $x^4+x^3+x^2-2x-2$ which in principle is solvable exactly.

since the linear recurrence has 5 distinct roots you'd normally take it to be of the form
$x_k = y_1 \cdot \lambda_1^k+y_2 \cdot \lambda_2^k+y_3 \cdot \lambda_3^k+y_4 \cdot \lambda_4^k+y_5 \cdot \lambda_5^k$

but there are 5 distinct $\lambda_i$ and only 2 boundary conditions.

heuristic: throw out all the $\big \vert\lambda_i\big\vert \gt 1$ because if we kept them then we couldn't ensure that $x_{k}\in[0,1]$ for all $k$ - the solution would get 'too big' in modulus. Given the simplicity of the problem one could manipulate the symbols to rigorize this intuition, but a much better justification is to simply note
$x_k = y_1 \cdot \lambda_1^k+y_2 \cdot \lambda_2^k$
gives a particular solution that satisfies our constraints, and that particular solution must in fact unique for analytical reasons (i.e. calling on standard martingale or countable state markov chain limit theorems).

So solve

$ \begin{bmatrix} 1 & 1 \\ 1 & \lambda_2 \\ \end{bmatrix} \begin{bmatrix} y_1\\ y_2\\ \end{bmatrix}=\begin{bmatrix} 1 & 1 \\ \lambda_1 & \lambda_2 \\ \end{bmatrix} \begin{bmatrix} y_1\\ y_2\\ \end{bmatrix} = \begin{bmatrix} x_0\\ x_1\\ \end{bmatrix}=\begin{bmatrix} 0\\ 1\\ \end{bmatrix}$
if you do this with the numeric approximation of $\lambda_2 \approx -0.76219$ then you get
$\begin{bmatrix} y_1\\ y_2\\ \end{bmatrix} = \begin{bmatrix} 0.56748\\ -0.56748\\ \end{bmatrix}$
thus
$x_k=0.56748\cdot 1^k - 0.56748\cdot (-0.76219)^k=0.56748- 0.56748\cdot (-0.76219)^k$
which for $k:=10$ gives
$x_{10}=0.56748 - 0.56748\cdot (-0.76219)^{10}\approx 0.5299$

plugging this into (ii) gives $E\big[T\big] \approx 28.410$