An Estimator for the Conditional Transition Probability in a Markov Chain

26 Views Asked by At

Let $(X_t,t\in \mathbb{N}_0)$ be a time-homogeneous markov chain with transition probabilities $p(y \mid x)$ for states x and y. Then I think that the estimator $$\hat{p}(y\mid x)=\frac{\sum_{t=0}^n \mathbb{1}_{X_t=x,X_{t+1}=y} }{\sum_{t=0}^n \mathbb{1}_{X_t=x}}$$ converges almost surely to the true transition probability. But I am missing some arguments for that. The idea is to use something like the law of large numbers, i.e. $$\begin{align}\frac{ \sum_{t=0}^n \mathbb{1}_{X_t=x,X_{t+1}=y} }{ \sum_{t=0}^n \mathbb{1}_{X_t=x} } &=\frac{\frac{1}{n} \sum_{t=0}^n \mathbb{1}_{X_t=x,X_{t+1}=y} }{\frac{1}{n} \sum_{t=0}^n \mathbb{1}_{X_t=x} }\\ &\to \frac{ \mathbb{E}[\mathbb{1}_{X_t=y,X_{t+1}=y}] }{ \mathbb{E}[\mathbb{1}_{X_t=y}] } = \frac{P(X_t=x,X_{t+1}=y)}{P(X_t=x)}\\ &=P(X_{t+1}=y\mid X_t =x)=p(y\mid x) \end{align}$$

But this of course doesn't quite work, since a markov chain isn't really iid and the expected values are not fix but only converge to the limit distribution if the markov chain is ergodic.

Do you have any recommendations on where to go from here. Should I look for generalizations of the law of large numbers? Or should I try to show that in another way? For some reason I think I am feeding google the wrong keywords, because I can't find something useable. So I would appreciate some pointers.

(Ideally a source which allows to generalize this to Markov Decision Processes, i.e. the Markov property for the sequence dies with the actions being history dependent but $p(y\mid x,a)$ is stationary.)