Why does this compatibility in $k$ imply an entropy inequality?

43 Views Asked by At

I am working through a book and there's a step that the author makes without any explanation, but I can't seem to figure out how he got this implication. I don't think it's a hard question, if you do not want to read the whole setup scroll to the bottom, and you'll see the inequality in question. Here is the setting:

Let $A$ be a finite alphabet and $x_1^n\in A^n$ a sequence of length $n$. Then one can define a measure on $A^k$, called the circular $k$-type, for a given $k\leq n$ and fixed $x_1^n$ in the following way:

$$p_k(a_1^k|x_1^n)=\frac{| \{ i\in [1,n]:y_i^{i+k-1}=a_1^k \} |}{n},a_1^k \in A^k$$

Intuitively, twist $x_1^n$ into a loop (so that $x_1$ and $x_n$ are touching, and move a window of length $k$ around the circle and count how many times you see each length $k$ block. Divide that number by $n$ and you have a probability of that block. Trivial example, if that particular $a_1^k$ is not in $x_1^n$ then its probability is $0/n=0$.

This has the following nice property: $$ p_{k-1}(a_1^{k-1}|x_1^n)=\sum_{a_k \in A} p_k(a_1^k|x_1^n) $$

For intuition on why this is true, multiply both sides by $n$. Then the left-hand side is the number of times $a_1^{k-1}$ occurs in $x_1^n$, and the right-hand side is the number of times $a_1^{k-1}a_k$ occurs in $x_1^n$ for any $a_k$. These should clearly coincide.

Here is the step that I don't get. Then the author claim that this implies: $$H^{(k-1)}\leq H^{(i)},i\leq k-1,$$ where $H^{(k-1)},x_1^n$ is the entropy of the $(k-1)$-order Markov chain defined by $P_k(\cdot|x_1^n)$.

Now if I got the right definition here, these should be equal to: $$ {H}^{(k-1)} =-\sum_{a_{1}^{k}} \widehat{p}\left(a_{k} \mid a_{1}^{k-1}\right) \widehat{p}\left(a_{1}^{k-1}\right) \log \widehat{p}\left(a_{k} \mid a_{1}^{k-1}\right) $$ where $$ \widehat{p}\left(a_{k} \mid a_{1}^{k-1}\right)=\frac{p_{k}\left(a_{1}^{k} \mid x_{1}^{n}\right)}{\sum_{b_{k}} p_{k}\left(a_{1}^{k-1} b_{k} \mid x_{1}^{n}\right)},\widehat{p}\left(a_{1}^{k-1}\right)=\sum_{b_{k}} p_{k}\left(a_{1}^{k-1} b_{k} \mid x_{1}^{n}\right) $$

Using that nice property above, this simplifies to:

$${H}^{(k-1)} =-\sum_{a_{1}^{k}} p_{k}\left(a_{1}^{k} \mid x_{1}^{n}\right) \log \frac{p_{k}\left(a_{1}^{k} \mid x_{1}^{n}\right)}{p_{k-1}\left(a_{1}^{k-1} \mid x_{1}^{n}\right)} $$ $${H}^{(i)} =-\sum_{a_{1}^{i+1}} p_{i+1}\left(a_{1}^{i+1} \mid x_{1}^{n}\right) \log \frac{p_{i+1}\left(a_{1}^{i+1} \mid x_{1}^{n}\right)}{p_{i}\left(a_{1}^{i} \mid x_{1}^{n}\right)} ,i\leq k-1 $$

So why does $H^{(k-1)}\leq H^{(i)}$ hold?

Any ideas?