Let
- $E$ be an at most countable Polish space and $\mathcal E$ be the Borel $\sigma$-algebra on $E$
- $X=(X_n)_{n\in\mathbb N_0}$ be a Markov chain with values $(E,\mathcal E)$, distributions $(\operatorname P_x)_{x\in E}$ and transition matrix $$p=\left(p(x,y)\right)_{x,y\in E}=\left(\operatorname P_x\left[X_1=y\right]\right)_{x,y\in E}$$
- $\tau_x^0:=0$ and $$\tau_x^k:=\inf\left\{n>\tau_x^{k-1}:X_n=x\right\}$$ for $x\in E$ and $k\in\mathbb N$ and $$\varrho(x,y):=\operatorname P_x\left[\tau_y^1<\infty\right]\color{blue}{=\operatorname P_x\left[\exists n\in\mathbb N:X_n=y\right]}$$
Suppose $x\in E$ is a recurrent state, i.e. $$\varrho(x,x)=1\;.$$ Let $y\in E$ with $\varrho(x,y)>0$ .
Clearly, since $\varrho(x,y)>0$, there exists a sequence $x_1,\ldots,x_n\in E$ with $x_n=y$ and $$\operatorname P\left[X_i=x_i\text{ for all }i\in\mathbb N\right]>0\;.$$ Moreover,
\begin{equation} \begin{split} 1-\varrho(x,x)&=\operatorname P_x\left[\tau_x^1=\infty\right]\\ &\ge\operatorname P_x\left[X_1=x_1,\ldots,X_n=x_n\text{ and }\tau_x^1=\infty\right]\;, \end{split}\tag 1 \end{equation}
but I don't understand why the last term is equal to $$\operatorname P_x\left[X_1=x_1,\ldots,X_n=x_n\right]\operatorname P_y\left[\tau_x^1=\infty\right]\;.\tag 2$$
Some authors state, that $(2)$ holds by "the" (elementary? weak? strong?) Markov property, but honestly, I don't see that.
As another important note: They assume $x_i\ne x$. That might be crucial for $(2)$, but that's the next problem: I don't understand why we can choose $n$ such that $x_i\ne x$. I've seen that authors define $n$ to be the infimum of $$\left\{n\in\mathbb N:\operatorname P_x\left[X_n=y\right]>0\right\}\;,$$ but that doesn't seem to guarantee, that $X$ doesn't return a number of time to $x$ before it goes to $y$.
EDIT:$\;\;\;$The last term in $(1)$ is equal to $$\operatorname P_x\left[\tau_x^1=\infty\mid X_1=x_1,\ldots,X_n=x_n\right]\operatorname P_x\left[X_1=x_1,\ldots,X_n=x_n\right]\;.$$ Intuitively, by the memorylessness of $X$, $\operatorname P_x\left[\tau_x^1=\infty\mid X_1=x_1,\ldots,X_n=x_n\right]$ should be equal to $\operatorname P_x\left[\tau_x^1=\infty\mid X_n=x_n\right]$ and since $x_n=y$, this probability should be equal to $\operatorname P_y\left[\tau_x^1=\infty\right]$.
How can we verify this formally?
I believe this is what they had in mind. For brevity $A_n:=[X_1=x_1,…,X_n = x_n]$.
\begin{align}P_x(τ_x^1 = ∞ ∩ A) &= P_x(τ_x^1 = ∞ |A) P(A) \\&\overset{\star}{=} P_x(τ_x^1 = ∞ |X_n = y)P(A) \\&= P_y(τ_x^1 = ∞)P(A) \end{align}
Where $\star$ is where the markov property is needed. I'm not sure how to fully justify this.
Regarding the choice of $n$: if $x_i=x$ for some $i$, why don't we just terminate the sequence then?