Expected number of a pattern in a random sequence stopped after appearance of another symbol for the first time

74 Views Asked by At

Let $X\sim p_1\delta_1+p_2\delta_2+p_3\delta_3$. So the variable $X$ takes the values 1,2,3 with different probabilities. If I have a sequence of iid random variables $(X_i)_{i\geq 1}$ with $X_i\sim X$, what is then the expected value of the number of times i can observe the pattern "13" before seeing 2 for the first time?

Let $N_2=\inf\{n:X_n=2\}$ and $T_{13}$ the variable that counts the number of times "13" appears in the sequence. Then

$$T_{13}=\sum_{i=1}^{N_2-2}1_{X_i=1,X_i+1=3}$$

I tried using the Wald-Identity, but that gave me $$E(T_{13}) =\left(\frac1{p_2}-2\right)\cdot\frac{p_1p_3}{p_1+p_3} %=\left(\frac{1-p_2}{p_2}\right)\cdot\frac{p_1p_3}{p_1+p_3}=\frac{p_1p_3}{p_2}$$

Now, for $p_i=1/3$ that implies $E(T_{13})=\frac{1/9}{2/3}=1/6$. But simulations show that in this case the probability should be around 1/3.

Also, the formula gives negative results for $p_2>1/2$, which is another contradiction.

Any suggestions on how to approach such a problem in a more rigorous way?

EDIT: Via a Markov Chain Pattern approach i got $$E(T_{13})=\frac{p_1p_3}{p_2}$$ (got the initial conditions wrong.)

Surprisingly, using $$T_{13}=\sum_{i=1}^{N_2-1}1_{X_i=1,X_i+1=3}$$ gives the correct value using the wald-identity...

1

There are 1 best solutions below

1
On

Here are a couple of approaches.

In either case, create a 4 state markov chain with states
$\big\{1: (1), 2:(1,3), 3: (\text{"something other than 1"},3), 4: (2)\big\}$
with some care, any state except 1 may be interpreted as a 'cold start' for the process here but it it can be easy to confuse this. In the below 'approach 1' the natural interpretation is to have state 3 as the 'cold start' for the process and in 'approach 2' state 4:(2) is the sensible 'cold start' state. The transitions are hopefully obvious, but here is an example, with $p_1 = 0.10$, $p_2 = 0.05$, $p_3 =0.85$.

$A:=\displaystyle \left[\begin{matrix}0.1 & 0.85 & 0 & 0.05\\0.1 & 0 & 0.85 & 0.05\\0.1 & 0 & 0.85 & 0.05\\0.1 & 0 & 0.85 & 0.05\end{matrix}\right]$

approach 1
The above chain is shown in recurrent form. The first approach is to make seeing a '2' an absorbing state. The transient chain, then is given by deleting row and column 4 and calling the resulting matrix $B$. This gives a fundamental matrix

$N^{-1} = \big(I_3-B\big)$
and
$N=\big(I_3-B\big)^{-1} = \displaystyle \left[\begin{matrix}3.0 & 2.55 & 14.45\\2.0 & 2.7 & 15.3\\2.0 & 1.7 & 16.3\end{matrix}\right]$

and we can simply read off the number of times we visit the pattern (1,3), represented by state 2, given our 'cold start' of state 3. We visit this $1.7$ times. The above was done with explicit numbers for concreteness but you can work with $N^{-1}$ symbolically and then get $N$ via the adjugate, etc.

approach 2
viewing this as a Renewal Reward Process:

Define an epoch by starting in state 4:(2) in the original markov chain, and ending in state 4:(2). I.e. each visit to state 4 marks a 'renewal' and a new epoch. Designate a reward of 1 for each time you see the pattern (1,3) during an epoch. The renewal reward limit theorem tells us
$\lim_{t\to \infty}\frac{\mathbb E[R(t)]}{t}= \frac{\mathbb E[R_1]}{\mathbb E[T_1]}$

i.) $\mathbb E[T_1]=p_2^{-1}$ as the distribution of times for a given epoch is geometric
ii.) $\lim_{t\to \infty}\frac{\mathbb E[R(t)]}{t}=p_1\cdot p_3$
This can be found by solving the standard problem of runs, or computing $A_\infty = \lim_{t\to \infty}A^t$
and looking at component (2,2) of said matrix. A slightly better interpretation: you can solve for $\pi^T A = \pi^T$ and read off the steady state probability of visiting state 2:(1,3).

putting this all together

$\mathbb E[R_1] = \lim_{t\to \infty}\frac{\mathbb E[R(t)]}{t}\cdot \mathbb E[T_1] = p_1\cdot p_3\cdot p_2^{-1}$

so the expected reward (= expected visits) in the first epoch is $p_1\cdot p_3\cdot p_2^{-1}$ which is also the expected number of visits during an arbitrary epoch since this is a proper renewal process.