Calculating the hitting probability using the strong markov property

985 Views Asked by At

** This problem is from Markov Chains by Norris, exercise 1.5.4.**

A random sequence of non-negative integers $(F)n)_{n\ge0}$ is obtained by setting $F_0=0$ and $F_1=1$ and, once $F_0,\ldots,F_n$ are known, taking $F_{n+1}$ to be either the sum or the difference of $F_{n-1}$ and $F_n$, each with the probability $1/2$. Is $(F_n)_{n\ge0}$ a Markov chain?

(a) By considering the Markov chain $X_n=(F_{n-1},F_n)$, find the probability that $(F_n)_{n\ge0}$ reaches $3$ before first returning to $0$.

(b) Draw enough of the flow diagram for $(X_n)_{n\ge0}$ to establish a general pattern. Hence, using the strong Markov property, show that the hitting probability for $(1,1)$, starting from $(1,2)$, is $(3-\sqrt{5})/2$.

(c) Deduce that $(X_n)_{n\ge0}$ is transient. Show that, moreover, with probability $1$, $F_n \rightarrow \infty$ as $n \rightarrow \infty$.


My attempt for (b): From $(1,2)$ the chain looks like $(1,2)\rightarrow (2,3)$ or $(1,2)\rightarrow (2,1)$ each with probability 1/2. From $(2,1)$ we can reach $(1,1)$. I want to calculate the probability generating function using the strong markov property $\phi(s)=\mathbb{E}_{(1,2)}(s^{H_{(1,2)}^{(1,1)}})$ where $H_{(1,2)}^{(1,1)}=\inf \{n\geq 0\colon X_n=(1,1) \text{ starting from } (1,2)\}$. I thought that if we start in $(2,3)$ and we want to reach $(1,1)$ we at least have to go trough $(1,2)$ again and then from $(1,2)$ to $(1,1)$. So I believe that $\mathbb{E}_{(2,3)}(s^{H_{(2,3)}^{(1,1)}})=\phi(s)^2$, but I am not sure if this true

I really need help with this exercise.

Thank you.

4

There are 4 best solutions below

1
On

I'm self-studying this book recently. I'm not so sure about my answer.

Part (a)

The probability is $\frac{3}{7}$.

Define $$h_{i,j} = \mathbb P(F_0 = i, F_1 = j, (F_n)_{n\geq 0} \textrm{ a Markov Chain reaches 3 before returning to 0})$$

Then all we need to figure out is: $$ h_{0,1} = h_{1,1} = \frac{1}{2}h_{1,2}+\frac{1}{2}h_{1,0}$$

$$ = \frac{1}{4}h_{2,3}+\frac{1}{4}h_{2,1}+\frac{1}{2}h_{1,0}$$

$$ = \frac{1}{4}h_{2,3}+\frac{1}{8}h_{1,3}+\frac{1}{8}h_{1,1}+\frac{1}{2}h_{1,0}$$

$$ \Rightarrow \frac{7}{8}h_{1,1} = \frac{1}{4}h_{2,3}+\frac{1}{8}h_{1,3}+\frac{1}{2}h_{1,0}$$

$$ = \frac{1}{4}+\frac{1}{8}$$

$$ h_{0,1} = h_{1,1} = \frac{3}{7}$$

Part (b) by Dominik Scherempf is clear but I don't know why $h^{(1,2)}_{(2,3)}$ has the same distribution as $h^{(1,1)}_{(1,2)}$. Can we explain by the fact that they start from $(a,b)$ and end at $(b, a+b)$?

Part (c) to be solved.

0
On

I also stumbled upon this problem. I did not solve (a) and (c) so far, but for you specific questions I have the answer.

If drawing the flow diagram of the Markov chain up to 4 steps from $(1,2)$ (i.e., about $2^4$ states can be reached), one notices, that one step into the wrong direction (e.g., from $(1,2)$ to $(2,3)$ requires 2 steps back.

An example: $(1,2) \rightarrow (2,3)$ requires 2 steps $(2,3) \rightarrow (3,1) \rightarrow (1,2)$ to get back to the initial position. This holds true for any state. With this observation in mind, we can write down the hitting probability of $(1,1)$ starting from $(1,2)$ denoted as $h_{(1,2)}^{(1,1)} := h_{(1,2)}$:

$$h_{(1,2)} = \frac{1}{2} h_{(2,3)} + \frac{1}{2} h_{(2,1)},$$ and $$h_{(2,3)} = h_{(2,3)}^{(1,2)} h_{(1,2)} = h_{(1,2)} h_{(1,2)} = h_{(1,2)}^2,$$

because $h_{(2,3)}^{(1,2)}$ has the same distribution as $h_{(1,2)}$. A similar argument can be used to derive

$$h_{(2,1)} = \frac{1}{2-h_{(1,2)}}.$$

Putting this together, we end up at $$0 = h_{(1,2)}^3 + 4h_{(1,2)}^2 - 4h_{(1,2)} + 1.$$

In this case, the only valid solution to this equation is $h_{(1,2)}=\frac{3-\sqrt{5}}{2}$.

0
On

Let $\mathcal{S}\subset \mathbb{N}^2$ be the set of states reachable from $(0,1)$ and $\mathcal{S}^*=\mathcal{S}\setminus \{(0,1),(1,0)\}$. It is not hard to see that $\mathcal{S}^*$ is just the set of all co-prime pairs. On the one hand, if $(x,y)\neq (1,1)$ are co-prime then both $(y,x+y)$ and $(y,|x-y|)$ are co-prime. On the other hand, any co-prime $(x,y)$ can be reached from $(1,1)$ through a "reverse" Euclidean algorithm.

Define the sum operator to be $s$ and difference operator to be $d$ on $\mathcal{S}$ such that $s(x,y)=(y,x+y)$ and $d(x,y)=(y,|x-y|)$. We use some natural abbreviations on the compositions of operators such as writing $s\circ d$ as $sd$ and $s\circ s$ as $s^2$.

For example, $ssdds(0,1)=ssdd(1,1)=ssd(1,0)=ss(0,1)=s(1,1)=s(1,2)=(2,3)$.

As Dominik Schrempf mentioned, we have $dds=\mathrm{id}$, the identity function, since $dds(x,y)=dd(y,x+y)=d(x+y,x)=(x,y)$. Thus for any state $S\in\mathcal{S}^*$, we can write it in an "irreducible form" as $S=s^{a_1}(ds)^{b_1}\ldots s^{a_k}(ds)^{b_k}(1,1)$ where $a_i$s and $b_i$s are non-negative integers and for any $i\in[k]$, $a_i+b_i\geq 1$. The key observation is that, every state has a unique such representation. This is because, $s(x,y)=(y,x+y)$ and $ds(x,y)=d(y,x+y)=(x+y,x)$ and when $x,y\geq 1$, we have $x+y>y$ and $x+y>x$. Thus you can recover the "irreducible form" of any state in $\mathcal{S}^*$ step by step.

For example, given state $(7,3)$, since $7>3$, it must be a $ds$, i.e. $(7,3)=ds(3,4)$. Since $3<4$, we can only write $(3,4)=s(1,3)$, and so on. Finally, we get the unique "irreducible form" $(7,3)=(ds)s^2(ds)(1,1)$.

Now denote $s$ as $a$ and $ds$ as $b$ and we can represent each state in $\mathcal{S}^*$ by a binary string. For example, $aba=s(ds)s(1,1)=(1,4)$. The Markov chain is translated to the binary string version as follows. For any binary string $w$, $aw$ goes to $bw$ and $aaw$ with equal probability; $bw$ goes to $w$ and $abw$ with equal probability. From this view, we see the distribution of the hitting time from $aw$ to $w$ is identical for all $w$ and the same is true for $bw$.

Now for Question (b), note that $(1,1)=\emptyset$, $(1,2)=s(1,1)=a$, $(2,3)=ss(1,1)=aa$, $(2,1)=ds(1,1)=b$. Let $\phi(s)=\mathbb{E}(s^{H_{a}^{\emptyset}})$ and $\psi(s)=\mathbb{E}(s^{H_{b}^{\emptyset}})$. Thus we have $$\phi(s)=\mathbb{E}(s^{H_{a}^{\emptyset}})=\frac{s}{2}(\mathbb{E}(s^{H_{b}^{\emptyset}})+\mathbb{E}(s^{H_{aa}^{\emptyset}}))=\frac{s}{2}(\psi(s)+\phi(s)^2)$$ and $$\psi(s)=\mathbb{E}(s^{H_{b}^{\emptyset}})=\frac{s}{2}(\mathbb{E}(s^{H_{\emptyset}^{\emptyset}})+\mathbb{E}(s^{H_{ab}^{\emptyset}}))=\frac{s}{2}(1+\phi(s)\psi(s)).$$

Organizing the equations one gets $$s\phi(s)^3-4s\phi(s)^2+4\phi(s)-s^2=0.$$ Let $s$ goes to $1$ from above and one gets the probability of hitting $(1,1)$ from $(1,2)$ as a solution of $$x^3-4x^2+4x-1=0$$ which gives $x=(3-\sqrt{5})/2$ as the only valid solution.

0
On

(c) I could only more rigorously answer the first part . However I'll still present some idea on the second part later .

First Part :

Problem :

Deduce that $(X_n)_{n\ge0}$ is transient

Set up : I'll use notation similar to Alton Wang . Let $S$ be the state space of $(X_n)_{n\ge0}$ (the set of states reachable from (0,1)) . Let $s$ be sum operator and $d$ be difference operator on $S$ such that $s(x,y) = (y,x+y)$ , $d(x,y) = (y,|x-y|)$ . Shorthand compositions of operators such as $s\circ d = sd , s\circ s = s^2$ .

Lemma 1 : For any $(x,y) \in S$

$dds(x,y) = (x,y)$

Proof : Alton Wang already given a proof .

Lemma 2 : For $\alpha \in \mathbb{Z}^+ $

$$ d^{\alpha}(0,1) = \left\{ \begin{array}{cc} (0,1) & \text{if $\alpha\mod3 = 0$ } \\ (1,1) & \text{if $\alpha\mod3 = 1$ } \\ (1,0) & \text{if $\alpha\mod3 = 2$ } \\ \end{array} \right. $$

Proof : if $\alpha\mod3=0$ , $\alpha$ must be a multiple of 3 , see that for any $(x,y)\in S$ , $$ d^3(0,1) = d^2(1,1) = d(1,0) = (0,1) $$ Therefore $d^{\alpha}(0,1) = \overbrace{d^3d^3...d^3}^{\alpha/3 \text{ times}}(0,1) = (0,1) $

if $\alpha\mod3=1$ , $\alpha - 1$ must be a multiple of 3 , then $d^{\alpha}(0,1) = dd^{\alpha-1}(0,1) = d(0,1) = (1,1) $

if $\alpha\mod3=2$ , $\alpha - 2$ must be a multiple of 3 , then $d^{\alpha}(0,1) = d^2d^{\alpha-2}(0,1) = d^2(0,1) = d(1,1) = (1,0) \qquad \square $

Lemma 3 :

The state space $S$ is a communicating class (or equivalently the Markov chain $(X_n)_{n\ge0}$ is irreducible)

Proof : First note that $(X_n)_{n\ge 0}$ is time homogeneous since the transition probabilities don't change .

$S$ is a communicating class means for any $i,j \in S$ , $\exists u \; P(X_u = j | X_0 = i) > 0 $ and $\exists v \; P(X_v = i | X_0 = j) > 0 $ . By definition of $S$ and the given intial value $X_0 = (0,1)$ , this is implied by : starting from $(0,1)$ , after any finite sequence of compositions of $s,d \; \star$ , there exists a finite sequence of composition of $s,d$ such that we can reach $(0,1) \; \spadesuit$ . $\star$ implies we can reach any state $i,j\in S$ , $\spadesuit$ implies $\exists u \; P(X_u = j | X_0 = i) > 0 $ and $\exists v \; P(X_v = i | X_0 = j) > 0 $ , because $i,j$ will be between the 2 $(0,1)$'s , a cyclic path with finite sequence of $s,d$ (here we suppose the finite sum of finite numbers is a finite number . Hence the path has positive probabilities , since $.5^n > 0 , n \in [0,\infty) $) joining $\{(0,1) , i , j\}$ is formed .

So start from $(0,1)$ , after any finite sequence of composition of $s,d$ $\Theta$ . We describe a finite sequence of composition of $s,d$ $\Omega$ such that we can reach $(0,1)$ :

  1. Collect information from $\Theta$ . We observe the sequence by moving from the start to the end , we need to keep track of 2 variables along the way (update them for each 1 unit move in the sequence)
  • $\delta :$ Initially 0 . If you see s , increase $\delta$ by 2 , if you see $d$ , decrease $\delta$ by 1 . If $\delta = -1 $ , update $\alpha$ , then set it $\delta = 0 $ .
  • $\alpha :$ Initially 0 . When $\delta = -1 $ , increase $\alpha $ by 1 .
  1. Now we're at the end of $\Theta$ . Set $\Omega = NULL$ . First add $\delta$ number of $d$ to $\Omega$ . Then add 2 $d$ to $\Omega$ if $\alpha\mod3 = 1$ , add 1 $d$ if $\alpha\mod3 = 2$ . We're now back at $(0,1)$ .

Since $\Theta$ is finite , we have $\delta , \alpha < \infty$ , so $\Omega$ is finite .

To explain , consider the the whole sequence (something akin to $\Omega\Theta(0,1)$) . Note that this sequence ensures there're 2 $d$ following each $s$ (those $d$ don't need be immediately behind , e.g $dddsds(0,1)$) , we may "group" them (e.g $\color{green}{d}\color{red}{dds}\color{green}{ds}(0,1)$) , this is the work of $\delta$ . Also , all $d$ that don't get grouped are counted by $\alpha$ (e.g the $d$ in cyan is not "grouped" : $dds\color{cyan}{d}dds(0,1)$) . Hence , we first apply lemma 1 to make all groups vanish , all is left are those "ungrouped" $d$ (e.g $\color{cyan}{d}\color{green}{d}\color{red}{dds}\color{green}{ds}(0,1) = \color{cyan}{d}\color{green}{dds}(0,1) = \color{cyan}{d}(0,1)$ ) , so in general we have $\Omega\Theta(0,1) = \color{magenta}{D}d^{\alpha} (0,1) = (0,1)$ , where we pick $\color{magenta}{D}$ according to lemma 2 $\qquad \square$ .

Theorem 1.5.3 : (Norris Markov Chain p.26) for time homogeneous discrete time markov chain $(X_n)_{n\ge 0}$ with state space $I$

The following dichotomy holds:
(i) if $P(T_i= \inf\{n\ge 1 : X_n = i \} <\infty | X_0 = i \in I) = P_i(T_i <\infty)=1$ , then $i$ is recurrent and $\sum_{n=0}^\infty p_{ii}^{(n)} = \infty $
(ii) if $P(T_i= \inf\{n\ge 1 : X_n = i \} <\infty | X_0 = i \in I) = P_i(T_i <\infty) < 1$ , then $i$ is transient and $\sum_{n=0}^\infty p_{ii}^{(n)} < \infty $
In particular, every state is either transient or recurrent .

Theorem 1.5.4 : (Norris Markov Chain p.26)

Let C be a communicating class. Then either all states in C are transient or all are recurrent.

Answer : Denote $H_i = \inf\{n\ge 0 : X_n = i \} $ , by part (b) , we have $P_{(1,2)} (H_{(1,1)} < \infty) = \frac{3-\sqrt{5}}{2} $ . Hence by markov property and time homogeneity , $$ P_{(1,1)} (T_{(1,1)} < \infty) = .5 P_{(1,1)}(T_{(1,1)} < \infty | X_1 = (1,2)) + .5 P_{(1,1)}(T_{(1,1)} < \infty | X_1 = (1,0)) $$ $$ = .5 P_{(1,2)}(H_{(1,1)} < \infty ) + .5 P_{(1,0)}(H_{(1,1)} < \infty ) $$ $$ = \frac{3-\sqrt{5}}{4} + .5 P_{(1,0)}(H_{(1,1)} < \infty) $$ $$ \le \frac{3-\sqrt{5}}{4} + .5 < 1 $$ Hence by theorem 1.5.3 , $(1,1)\in S$ is transient , by lemma 3 $S$ is a communicating class and by theorem 1.5.4 , all states in $S$ are transient ($(X_n)_{n\ge0}$ is transient ) .

Second Part : I could only provide some thoughts and incomplete numerical results in this part .

Problem :

Show that, moreover, with probability 1, $F_n \to \infty$ as $n \to \infty$ .

Some thoughts : I tried to show $$ \lim_{n\to \infty} P(F_n \ge n) = 1 $$

Incomplete numerical results : Below is $P(F_n\ge n),\; n = 2,...,23$

The run time blows up . Also there doesn't seem to be a simple pattern . Proving it mathematically may be difficult , you may want to try another approach .