Biased random walk with unequal step size

214 Views Asked by At

Consider an asymmetric random walk $(X_n)$ in which the initial point is one ($X_0 = 1$). It increases by $a$ with a probability of 0.3, remains the same with a probability of 0.4, and decreases by $b$ with a probability of 0.3, where $a<b$ and $a,b \in \mathbb{R}$. Also, suppose this process stops once $X_n \geq \lambda$.

For example, $a=0.3$, $b=0.5$, and $\lambda=1.4$. How can I calculate the probability of this process stopping? That is, what would be the probability of $X_n$ ever getting bigger than $\lambda=1.4$? Since the probability of stopping at the second period is $0.3\cdot 0.3$, it would be greater than zero. Also, since it is supermartingale, by the maximal inequality for supermartingale,

\begin{equation*} P(\sup X_n \geq 1.4) \leq \frac{\mathbb{E} X_0 }{1.4} = \frac{1}{1.4}. \end{equation*} Thus, it would not be one. However, I would like to derive precise predictions about them, which is where I am stuck now.

I found other problems with asymmetric random walks with equal integer step sizes. However, I am unsure if I can still use the same ways in this environment. Also, would it be a completely different problem in an environment where the step sizes are irrational numbers?

3

There are 3 best solutions below

2
On BEST ANSWER

First, notice that we can replace the transition probabilities by "increases by $a$ or decreases by $b$ with equal probability".

Also, we can scale: $a=3$, $b=5$, we start at $X_0=-4$ and we "win" when we reach $X_n=0$ for some $n$.

For $z=1,2,3 \cdots$, let $g(z)$ be the probability of winning given that we are currently at $X=-z$. Then we can write the recursion

$$g(z) = \frac{g(z-3)+g(z+5)}{2} \tag 1 $$ with $g(z)=1$ for $z\le 0$. And we can assume, from your argument, that $g \to 0$ as $ z \to + \infty$.

We are interested in $g(4)$.

I'm not sure if $(1)$ can be solved in some simple closed form (I doubt it). The standard approach of postulating a candidate solution in the form $g(z)=a b ^z$ gives $b=0.87762789$ (solution of $x^8+ 1 - 2 x ^3=0$), but that would only work if we had the single boundary condition $g(0)=1$; here we also need $g(-1)=g(-2)=1$.

We can instead use $g(z)= b^z$ as an initial guess and iterate numerically using $(1)$. It indeed converges, fairly quickly. And for large $z$, it seems that indeed $g(z) \approx a b^z$ with $a \approx 0.8928175$. But for small $z$, it's other story.

enter image description here

Numerically, I get $g(4) \approx 0.5013472$

PS: some pointers in this MO question

PS2: A quite precise numerical fit:

$$g(n) \approx a b^n + c b^{2n} \sin\left( \frac{2 \pi n}{T} + d\right) $$

with $a=0.8928175$ $b=0.87762789$, $c=0.116$, $T=3.047$, $d=1.952$

0
On

Too long for a comment, this answer addresses just this sub-question:

would it be a completely different problem in an environment where the step sizes are irrational numbers?

Not sure I'd call it "completely" different but it would be quite different. E.g. the solution technique in @leonbloy's answer (recurrence with single discrete index) would not work. Single index recurrence works because some number of + steps can cancel some number of - steps. But say your steps are $+1, -e$ and the boundary is $\pi^3$. You might need to set up some 2-dimensional recurrence counting both + and - steps, and you'd need to actually calculate whether $N \times 1 - M \times e \ge \pi^3$ for various $(N,M)$ pairs. You would need both dimensions because you can never return to the same point at a later time, because there is no integer solution to $N \times 1 - M \times e = 0$. :)

Talking of irrationals remind me of this other interesting aspect when you consider the values as reals, and ask natural questions about continuous functions.

Depending on exact values, the answer can be very sensitive to small changes (and you don't even need irrationals). E.g. suppose your steps are $+1, -10000000$ and the boundary is $1.99$. Clearly, the only dominant term in $Prob(stop)$ is the event that your first 2 steps are +, so $Prob(stop) \approx 1/4$. If you just change the boundary to $2.01$ (or equivalently shrink the + step to $+0.99$) then suddenly the dominant term is first 3 steps being +, and $Prob(stop) \approx 1/8$. So $Prob(stop)$ is NOT a continuous function of the boundary.

0
On

It is maybe good to answer the question by working explicitly in the given example. Let $A=1+a=1.3$ and $B=1-b=0.5$.


So we have a discrete process $(X_n)$ with $X_0=1$ on the time line $T=\{0,1,2,3,4,\dots\}$ and the following passage probabilities and values from step $n$ to step $n+1$:

  • with probability $3/10$ the process increases by $30\%$,

    i.e. $\Bbb P(X_{n+1}/X_n=A)=\frac 3{10}=p_A=p_+$, the "$A$ case",

  • with probability $4/10$ the process continues with the same value,

    i.e. $\Bbb P(X_{n+1}/X_n=1)=\frac 4{10}=p=p_0$, the "no change case",

  • with probability $3/10$ the process decreases by $50\%$,

    i.e. $\Bbb P(X_{n+1}/X_n=B)=\frac 3{10}=p_B=p_-$, the "$B$ case".


This can be modeled by the space $\Omega=\Omega_0^T$, where $\Omega_0=\{+,0,-\}$, and a $+$ stays for the $A$ case, $0$ for the no change case, and $-$ for the $B$ case.

To have a better world for the pictures, we better work with $$ Y_n=\ln X_n\ , $$ and this process has correspondingly the characteristics:

  • with probability $3/10$ the process $Y_n$ increases by the increment $\ln A$,

    i.e. $\Bbb P(Y_{n+1}=Y_n+\ln A)=\frac 3{10}=p_A=p_+$,

  • with probability $4/10$ the process continues with the same value,

    i.e. $\Bbb P(X_{n+1}=Y_n)=\frac 4{10}=p=p_0$,

  • with probability $3/10$ the process decreases by adding the negative increment $\ln B$,

    i.e. $\Bbb P(Y_{n+1}=Y_n+\ln B)=\frac 3{10}=p_B=p_-$.


To have a better (combinatorial) control on $Y_n$ we introduce a new process $Z_n$ with values in $\Bbb R^2$, starting in $Z_0=(0,0)$.

  • Each time $Y_n$ gets $\ln A>0$ we use the right step $R=(1,0)\in\Bbb R^2$ for $Z_n$.
  • Each time $Y_n$ stays in place we use the zero step $(0,0)\in\Bbb R^2$ for $Z_n$.
  • Each time $Y_n$ gets $\ln B>0$ we use the up step $U(0,1)\in\Bbb R^2$ for $Z_n$.

Then $Y_n$ is the scalar product of $Z_n\in\Bbb R^2$ with the vector $(\ln A,\ln B)$. Let $\tau$ be the stopping time defined in terms of $X$ by the minimal point $n$ in time so that $X_n>1.4=:\lambda$. Then in terms of $Y_n$ the condition is $Y_n>\ln \lambda$. And in terms of $Z_n$ is the "pink line" $$ Z_n\cdot(\ln A,\ln B)>\ln\lambda\ . $$ We make now a picture, and draw some brownian path $\omega$ in the space of all paths till $\tau(\omega)$.

mse question 4866838

There are a lot of elements/path in $\Omega$ realizing the above geometric path, for instance: $$ 0-+00-00++0-000+0+++-0000+0-+++000+000-0++00+0+000+0+\ . $$ Here, ((the event corresponding to) a start of) a path in $\Omega$ is written as a world, instead of a tuple. (We eliminate white noise in the notation.)

The zeros in the path do not affect the displayed geometric path. So what really counts in the path is the list of the non-zero places, a $-$ can be written as an up-move, $U$, a $+$ as a right-move, $R$. This shows also that the points of exit for $Z$ are in a concrete list, $$ Z_\tau\in\{\ (2,0),\ (4,1),\ (7,2),\ (10,3),\ (12,4),\ (15,5),\ (18,6),\ \dots\ \}\ . $$ And there is no "good (arithmetic) pattern" for this set of exit points. (The second component takes all values $0,1,2,3,4,5,6,\dots$ - but the first point beyond the pink line has a complicated description in terms of the logarithms of $A,B,\lambda$.


So far we haven't done anything in the direction of solving the problem, it is only reshaped to see the "dynamic system" beyond the question. Let us do something explicitly. Aided by the computer we can calculate the probability for $Z$ to stop in each of the first few points in the list. Let us do this explicitly with bare hands for the first few points.

  • We reach $(2,0)$ with (stopped) paths - written as words - of the following shape: $0^*+0^*+$

    and the probability for this event is $$ (1+p_0+p_0^2+p_0^3+\dots)p_+(1+p_0+p_0^2+p_0^3+\dots)p_+ \\ = \frac 1{1-p_0}\cdot p_+ \frac 1{1-p_0}\cdot p_+ = \underbrace{\left(\frac {10}6\right)}_{\frac1{1-p_0}}{}^2 \cdot \underbrace{\left(\frac 3{10}\right)}_{p_+}{}^2 \cdot \underbrace{\left(\frac 3{10}\right)}_{p_-}{}^0 \\ =\left(\frac 36\right)^2\left(\frac 36\right)^0 =\bbox[lightyellow]{\ \frac 14=0.25\ } \ . $$

  • What about $(4,1)$? There are exactly two paths from $(0,0)$ to $(4,1)$ that correspond to a stop in $(4,1)$, these are $-++++$ and $+-+++$ with further insertions of subwords consisting of zeros. So we write the event which is the union of $0^*-0^*+0^*+0^*+0^*+$ and $0^*+0^*-0^*+0^*+0^*+$ . So we have two paths, there are in some order four pluses and one minus, in between we have so many $0^*$ subwords as steps, this gives the probability: $$ 2\cdot \underbrace{\left(\frac {10}6\right)}_{\frac1{1-p_0}}{}^{4+1} \cdot \underbrace{\left(\frac 3{10}\right)}_{p_+}{}^4 \cdot \underbrace{\left(\frac 3{10}\right)}_{p_-}{}^1 = 2\cdot \left(\frac 36\right)^4\left(\frac 36\right)^1 =\frac 2{2^5}=\bbox[lightyellow]{\ \frac 1{16}=0.0625\ } \ . $$

  • Here we see again, that the "dilution of time" introduced by the $0^*$ subwords can be removed from the model. We can just use only $+$ and $-$ with the adjusted probabilities. In our case $1/2$ and $1/2$. Fair case.

  • For a general point of possible exit $(x,y)$ (instead of the above $(2,0)$ and $(4,1)$ we conclude similarly, that the probability to exit through that point is $$ N(x,y)\cdot \underbrace{\left(\frac {10}6\right)}_{\frac1{1-p_0}}{}^{x+y} \cdot \underbrace{\left(\frac 3{10}\right)}_{p_+}{}^x \cdot \underbrace{\left(\frac 3{10}\right)}_{p_-}{}^y = N(x,y) \cdot \left(\frac 36\right)^x\left(\frac 36\right)^y =\bbox[lightyellow]{\ \frac{N(x,y)}{2^{x+y}}\ }\ . $$ So the wanted probability of a stopping is given by the formula: $$ \Bbb P(\tau<\infty) =\sum_{\substack{(x,y)\text{ is an element of the exit set}\\\\ \{\ (2, 0),\ (4, 1),\ (7, 2),\ (10, 3),\ (12, 4),\ (15, 5),\\ (18, 6),\ (20, 7),\ (23, 8),\ (26, 9),\ (28, 10),\ (31, 11),\\ (33, 12),\ (36, 13),\ (39, 14),\ (41, 15),\ (44, 16),\\ (47, 17),\ (49, 18),\ (52, 19),\ (55, 20),\ \dots\ \}}} \bbox[lightyellow]{\ \frac{N(x,y)}{2^{x+y}}\ }\ . $$ Here $N(x,y)$ is between $1$ (the $-^y+^x$ path) and $\binom {x+y}{x\ y}=\frac{(x+y)!}{x!y!}$ - for the last estimation count possible positions of the $+$ among the total $(x+y)$ steps. Many of them have to be rejected, but we have a valid upper bound.


The only thing we can do now is to compute for the first few values $N(x,y)$ exactly, and possibly also estimate $N(x,y)$ for the next values.

Let us illustrate what the computer does in the case of the point $(18,6)$, which is also displayed in the picture. Consider a path from $(0,0)$ to $(18,6)$ that does not leave earlier the half-plane containing the origin cut by the pink line. Such a path has exactly $18$ plus/R moves, and exactly $6$ minus/U moves. We put the $6$ minus symbols in a word, and insert before, in between, and after the $18$ plus symbols. This corresponds to a non-ordered partition of $18$ with possible zero parts in a total of seven parts, $(k_0, k_1,k_2,k_3,k_4,k_5, k_6)$. The corresponding event/word is $$+^{k_0}\ -\ +^{k_1}\ -\ +^{k_2}\ -\ +^{k_3}\ -\ +^{k_4}\ -\ +^{k_5}\ -\ +^{k_6}\ . $$ Such a word does not leave the plane at an earlier point, iff the prefix subwords obtained by a cut in front of each minus place, $+^{k_0}$, and $+^{k_0}\ -\ +^{k_1}$, and ... , do not corresponds to points beyond the pink line. So we must have $k_0<2$, $k_0+k_1<4$, $k_0+k_1+k_2<7$, $k_0+k_1+k_2+k_3<10$, $k_0+k_1+k_2+k_3+k_4<12$, and $k_0+k_1+k_2+k_3+k_4+k_5<15$. The computer may be instructed to take the partitions, which is (in its iterator) maybe harder to implement, or to use the $\binom{24}6=134596$ total cases, and check which one do not cross the pink line till the last step. In fact, the last four steps must be $+++$ (see the picture), so there are only $\binom{24-4}6=38760$ cases, but the computer will not see the difference. But we may want implement the search starting from the end from this point of view...


My code computes: $$ \begin{array}{|rl|} \hline N(2, 0) &= 1\\\hline N(4, 1) &= 2\\\hline N(7, 2) &= 7\\\hline N(10, 3) &= 37\\\hline N(12, 4) &= 231\\\hline N(15, 5) &= 1350\\\hline N(18, 6) &= 9199\\\hline N(20, 7) &= 67029\\\hline N(23, 8) &= 442313\\\hline N(26, 9) &= 3276512\\\hline \end{array} $$ So a lower bound for the probability is (if i manually worked accurately) $$ \frac 1{2^2} + \frac 2{2^5} + \frac 7{2^9} + \frac {37}{2^{13}} + \frac {231}{2^{16}} + \frac {1350}{2^{20}} + \frac {9199}{2^{24}} + \frac {67029}{2^{27}} + \frac {442313}{2^{31}} + \frac {3276512}{2^{35}} \\ = 0.3368497523479163646697998046875\ . $$ The contribution of the last two terms is $\approx 0.000205968\dots$ and $\approx 0.000095359\dots$ .

$% ( 1/2^2 + 2/2^5 + 7/2^9 + 37/2^13 + 231/2^16 + 1350/2^20+ 9199/2^24 + 67029/2^27 + 442313/2^31 + 3276512/2^35 ) $ Used sage code:

A, B, L = 13/10., 5/10., 14/10.

K = 9

xylist = []
for y in [0..K]:
    for x in [0..1000]:
        if A^x * B^y > C:
            xylist.append((x, y))
            break

for (x, y) in xylist:
    Nxy = 0    # and we increase it during the search
    P = Partitions(x + y, length=y+1)
    Q = [[entry - 1 for entry in p] for p in P]
    for q in Q:
        for sq in Permutations(q):
            ok = True    # so far
            for k in [0..y-1]:
                if A^sum(sq[:k+1]) * B^k > L:
                    ok = False
                    break
            if ok:
                Nxy += 1

    print(f"N({x}, {y}) &= {Nxy}\\\\\\hline")


Simulation:

import random
random.seed('MSE4866838')

A, B, L = 13/10., 5/10., 14/10.
R, U = vector([1, 0]), vector([0, 1])    # right, up as vectors

N = 10000    # trials
zdic = {}    # will cover the exit statistics     
for trial in range(N):
    step, crossed = 0, False
    X, Z = 1, vector([0, 0])
    while step < 5000 and not crossed:
        step += 1
        rand = random.uniform(0, 1)
        if   rand < 0.3:
            X *= B; Z += U; # print("U", end='')
        elif rand > 0.7:
            X *= A; Z += R; # print("R", end='')
            if X > L:
                # record this in the zdic
                tZ = tuple(Z)
                if tZ in zdic:    zdic[tZ] += 1
                else:             zdic[tZ]  = 1
                crossed = True
                # print()
 
zkeys = list(zdic)
zkeys.sort()
for zkey in zkeys:
    print(f"Exit {zkey} reached {zdic[zkey]} times -> prob ~ {zdic[zkey]/N.n()}")

print(f"Process stopped {sum(zdic.values())} from {N} trials  -> prob ~ {sum(zdic.values())/N.n()} .")

And this delivers:

Exit (2, 0) reached 2479 times -> prob ~ 0.247900000000000
Exit (4, 1) reached 640 times -> prob ~ 0.0640000000000000
Exit (7, 2) reached 153 times -> prob ~ 0.0153000000000000
Exit (10, 3) reached 37 times -> prob ~ 0.00370000000000000
Exit (12, 4) reached 41 times -> prob ~ 0.00410000000000000
Exit (15, 5) reached 14 times -> prob ~ 0.00140000000000000
Exit (18, 6) reached 3 times -> prob ~ 0.000300000000000000
Exit (20, 7) reached 5 times -> prob ~ 0.000500000000000000
Exit (23, 8) reached 2 times -> prob ~ 0.000200000000000000
Exit (26, 9) reached 2 times -> prob ~ 0.000200000000000000
Process stopped 3376 from 10000 trials  -> prob ~ 0.337600000000000 .

This is just a simulation, and gives only a rough orientation. But the numbers are in the range we have expected them.


More effort can be invested, but then we need a reason, more exactly the kind of the reason. There is no exact answer, and for a better approximation we need better counting methods. A structural reason may also be given, but note that a slight change of $\lambda$ would introduce new paths or deny paths counted in the result before change, so jumps may occur. Similar problems appear in discrete optimization problems, but almost nobody works in that domain any longer.