Random Walk Markov Process -- Probability of Return to the Origin

776 Views Asked by At

I'm struggling with a problem I was asked by a friend a few days ago. It goes as follows:

You start at the origin. On the first iteration you walk right with probability $p$ and left with probability $q = 1 - p$. On each subsequent iteration you walk in the same direction as the previous iteration with probability $p$ and in the opposite direction with probability $q$. What is the probability that you return to the origin in $n$ steps or fewer?

The issue is that it's difficult to account for the Markov process in the calculation. Since I want something that works for arbitrary $n$, I can't use matrices.

Say $X_i$ is the the random variable such that on the $i$-th step $$ X_i = \begin{cases} 1 &\text{if}\ i = 0\ \text{and went right}\\ -1 &\text{if}\ i = 0\ \text{and went left}\\ 1 &\text{if}\ i > 0\ \text{and went in same direction}\\ -1 &\text{if}\ i > 0\ \text{and went in opposite direction.}\end{cases} $$ Then the probability of returning to the origin is \begin{align*} P(\text{return}) &= q P(\text{return}|X_0 = -1) + p P(\text{return}|X_0 = 1)\\ &= P(\text{return}|X_0 = 1)\\ &= q + p P(\text{return}|X_1 = 1). \end{align*} I don't know where to go from here.

When constructing a Monte Carlo simulation, I first simulate $\left(X_i\right)_{i = 0}^{n-1}$. Then the cumulative product tells you whether you went left or right on the $i$-th step. From there, you can take the cumulative sum to find position. If that's ever 0, then you add to the count. It's not clear to me how one would map this onto a mathematically rigorous calculation.

This is my Python code.

import numpy as np

# Define n
n = 4

# Define p
p = 0.75

# Calculate q
q = 1 - p

# X can be either 1 or -1
vals = [1, -1]

# How many Monte Carlo simulations
sims = 10**6

# Whether same or opposite
X = np.random.choice(vals, size = (sims, n), replace = True, p = [p, q])

# Whether left or right
move = np.cumprod(X, axis = 1)

# Sum change to calculate position
position = np.cumsum(move, axis = 1)

# Which rows go through the origin
success = np.sum(position == 0, axis = 1) > 0

# Calculate fraction that go through the origin
solution = np.mean(success)

solution
2

There are 2 best solutions below

7
On BEST ANSWER

An approach without using Markov chain.

To avoid double counting we will classify the paths by the number of the steps until the first return to the origin and by the number of direction changes. Observe that the former number must be even (the same number of steps in each direction) whereas the latter number must be odd.

Accordingly the probability of the first return to the origin at $2m$-th step after $2k-1$ direction changes can be written as: $$\begin{align} P_{mk}&=\underbrace{p\big[N(m-1,k)\,p^{2m-2-(2k-1)}q^{2k-1} \big]p}_\text{first step to the right}+\underbrace{q\big[N(m-1,k)\,p^{2m-2-(2k-1)}q^{2k-1} \big]p}_\text{first step to the left}\\ \\ &=N(m-1,k)\,p^{2m-2k}q^{2k-1},\tag1 \end{align}$$ where $$N(m,k)=\frac1k\binom m{k-1}\binom{m-1}{k-1}\tag2$$ is the Narayana number.

Finally the probability in question is: $$ \sum_{m=1}^{\left\lfloor\frac n2\right\rfloor}\sum_{k=1}^m P_{mk}.\tag3 $$

1
On

One way to attack this is via generating functions. It doesn't seem to give an immediately nice answer, but maybe it's just that there isn't a particularly nice one.

Here's an outline. To simplify notation, let's start by considering just paths that go left at their first step. Let's call a "left excursion" a finite path that goes left at its first step, and returns to its starting point for the first time at the end of the path.

Any left excursion of length $2n$ can be decomposed into (i) a step left at the beginning (ii) a step right at the end, and (iii) in the middle a sequence of left excursions whose total length is $2n-2$.

Assuming $n\geq 2$, the probability of such a path can be calculated as the product of: (i) $q$ for the step left at the beginning; (ii) $p$ for the step right at the end (it must follow another right step); (iii) the product of the probabilities of the left excursions in the middle, except that the very first one should count $p$ not $q$ for its first step (it starts with a left step which follows another left step).

Let $a_n$ be the combined probability of all left excursions of length $2n$. Then $a_1=q^2$, and from the decomposition above we get that for $n\geq 2$, \begin{align*} a_n&=\sum_{m\geq 1} \sum_{\substack{k_1,\dots,k_m\geq 1\\ k_1+\dots+k_m=n-1}} qp\frac{p}{q} \prod_{i=1}^m a_{k_i} \\ &=\sum_{m\geq 1} \sum_{\substack{k_1,\dots,k_m\geq 1\\ k_1+\dots+k_m=n-1}} p^2 \prod_{i=1}^m a_{k_i}. \end{align*}

In this formula $m$ represents the number of sub-excursions in the decomposition of our excursion of length $2n$, and $2k_1, \dots, 2k_m$ are the lengths of those sub-excursions in order.

We can write the generating function $$ A(x)=\sum_{n=1}^\infty a_n x^n. $$

Using the ideas on pp. 27-28 of Flajolet and Sedgewick's book ("sequence construction") (and treating the term $n=1$ carefully) we get \begin{align*} A(x)&=p^2 x(1+A(x)+A(x)^2+\dots)+x(q^2-p^2)\\ &=p^2x\frac{1}{1-A(x)}+x(q^2-p^2). \end{align*}

Rearranging you get a quadratic equation for the quantity $A(x)$, which you solve e.g. with the quadratic formula to give $$ A(x)=\frac{1}{2}\left[ 1+x(q^2-p^2)-\sqrt{1-2x(q^2+p^2)+x^2(q^2-p^2)^2} \right]. $$

Now the right-hand side can be expanded as a power series in $x$ (e.g. using the expansion $\sqrt{1-h}=1-h/2-h^2/8-h^3/16-\dots$ via the binomial formula). Then you compare the coefficients of each power of $x$ with the left-hand side which is $a_1x + a_2x^2 +a_3x^3 +...$ to read off (in theory) any term $a_n$ that you want. I believe you get $a_1=q^2$, $a_2=p^2 q^2$, $a_3=p^4q^2 + p^2 q^4$, and so on. I don't see a way to get a nice general expression for $a_n$ (possibly someone else will!). But you could derive some asymptotics as $n\to\infty$, if that's the sort of thing you care about.

Since you want the combined probability not just of "left excursions" but also of "right excursions", what you want is actually $(1+p/q)a_n$ rather than just $a_n$.