Suppose we generate the following sequence by repeatedly picking a letter from the alphabet (26 letters):
LLL EEE HHH QQQ MMM QQQ OOO TTT EEE YYY XXX GGG...
So the first letter of every group of 3 letters is drawn independently with a probability of $\frac{1}{26}$ and the next two are then deterministic and are equal to the letter drawn at the first position of each group.
The entropy of every first symbol of the sequence seems to me $\log26$ (we use base 2) as we need $\log26$ bits and $0$ the two letters after it, as they don't provide any new information and I don't need any new bits.
So with adding a new letter in the second sequence (in the example, the letter "E"), I would need $2$ times $\log26$ bits to describe 6 letters in the sequence, so I could eventually compress my sequence with a factor of $3$ and thus my entropy rate would be $\frac{\log26}{3}$.
But I'm lost with the theoretical derivation of it. It says the entropy rate is defined as:
$$H(X) = \lim_{n->\infty} \frac{1}{n}H(X_1, X_2, X_3, \dots X_n)$$
Then it says that $H(X)$ is a measure of the entropy per symbol of the variable $X_t$, and the function goes like:
$$\log26 + 0 + 0 + \log26 + 0 + 0+\dots+ \log26 + 0 + 0$$
Now it says this function is "squashed" between two bounds:
$\frac{\log26}{3}n \le H(X_1, X_2, X_3, \dots X_n) \le \frac{\log26}{3}(n+2)$
Dividing both bounds with $n$ I see that the entropy rate is indeed $\frac{\log26}{3}$.
But could someone explain the intuition behind the function $H(X)$ and how these bounds are established?
$X_i^{1} \sim U(\{1,2,\ldots 26\})$, i.i.d., s.t., $H(X_i^{1})=\log_2{26}$, where the r.v. represents the first symbol of the $i^{th}$ group $X_i$. All the groups (except possibly the last one) contain (at most) 3 variables, with the $1^{st}$ being random, the others being completely deterministic.
There can be 3 cases:
$n \equiv 0 (\text{mod } 3)$: there are $\frac{n}{3}$ groups for which the first one is randomly (uniformly) chosen from $26$ symbols, the other two are fully determined, hence total entropy $H(X)=\sum\limits_{i=1}^{\frac{n}{3}}H(X_i^{1})=\frac{n}{3}\log{26}$.
$n \equiv 1 (\text{mod } 3)$: there are $\frac{n-1}{3}$ groups for which the first one is randomly (uniformly) chosen from $26$ symbols, the other two are fully determined, additionally there is one single symbol in the group $X_{\frac{n-1}{3}+1}$, hence total entropy $=\sum\limits_{i=1}^{\frac{n-1}{3}}H(X_i^{1})+H\left(X_{\frac{n-1}{3}+1}^{1}\right)=\frac{n-1}{3}\log{26}+\log{26}=\frac{n+2}{3}\log{26}$.
$n \equiv 2 (\text{mod } 3)$: there are $\frac{n-2}{3}$ groups for which the first one is randomly (uniformly) chosen from $26$ symbols, the other two are fully determined, additionally there is a last group with two symbols in the group $X_{\frac{n-2}{3}+1}$, hence total entropy $=\sum\limits_{i=1}^{\frac{n-2}{3}}H(X_i^{1})+H\left(X_{\frac{n-2}{3}+1}^{1}\right)=\frac{n-2}{3}\log{26}+\log{26}=\frac{n+1}{3}\log{26}$.
Hence, $\frac{n}{3}\log{26}\leq H(X) \leq \frac{n+2}{3}\log{26}$.