Why does the entropy of a $(K-1)$ order Markov process converge to the entropy of the process itself as $K\to\infty$

49 Views Asked by At

I am working through a book and I just can't seem to see why the following is true:

Let $\mu$ be an ergodic process with process entropy $H$. Let $\epsilon>0$. Let $H_{K-1}=H\left(X_{K} \mid X_{1}^{K-1}\right)$ be the process entropy of the Markov chain of order $K-1$ defined by the conditional probabilities $\mu(a_1^K)/\mu(a_1^{K-1})$. Then one can find a $K$ sufficiently large, such that:

$$H_{K-1}\leq H + \epsilon$$.

Here is what what I got after some calculations:

$$H_{K-1}=-\sum_{a_{1}^{K} \in A^K} \mu(a_1^K) \log \frac{\mu(a_1^K)}{\mu(a_1^{K-1})} $$ where $A$ is a finite alphabet. And by the definition from this book, the entropy of the process $\mu$ is given by: $$H=\limsup_{n\to\infty}-\frac{1}{n}\sum_{a_1^n}\mu(a_1^n)\log{\mu(a_1^n)}$$

I got the following upper bound for the first expression: $$H_{K-1}\leq-\sum_{a_{1}^{K} \in A^K} \mu(a_1^K) \log{\mu(a_1^K)} $$

But it's missing the $1/K$, so that I could say that it converges to $H$, so it does not look promising. Any ideas would be great!

1

There are 1 best solutions below

0
On BEST ANSWER

Notice that $-\sum \mu(a_1^K) \log \mu(a_1^{K-1}) = -\sum \mu(a_1^{K-1}) \log \mu(a_1^{K-1}).$ To see this, simply observe that the first term is $\mathbb{E}_{X,Y}[f(X)]$, for $X = a_1^{K-1}, Y = a_K$. Then since the integrand doesn't depend on $Y$, you can marginalise over it, i.e., $\mathbb{E}_{X,Y}[f(X)] = \mathbb{E}_X[f(X)]$.

This means that you can write $$ H_{K-1} = - \sum \mu(a_1^K) \log \mu(a_1^K) + \sum \mu(a_1^K) \log \mu(a_1^{K-1}) = H(X_1^K) - H(X_1^{K-1})$$

Now, since we only want to show that there exists a $K$ such that $H_{K-1}$ is small, it suffices to argue that $\liminf H_{K-1} \le H$. Indeed, in this case, for any $\varepsilon > 0,$ it must hold that for infinitely many $K$ that $H_{K-1} \le H + \varepsilon,$ else $\liminf H_{K-1}$ would strictly exceed $H$.

But this follows by Stolz-Cesaro, since $$ \liminf \frac{H_{K-1}}{1} = \liminf \frac{H(X_1^K) - H(X_1^{K-1})}{K - (K-1)} \le \limsup \frac{H(X_1^K)}{K} = H.$$