Entropy of a Binary Sequence with Restrictions

456 Views Asked by At

Consider a length-L binary sequence $e_{1:L}$ of i.i.d. symbols, where a symbol can be $1$ with probability $p_{f}$ and $0$ with probability $1-p_{f}$. The entropy associated with such a sequence can be easily found to be: \begin{equation} H(e_{1:L}) = LH_{2}(p_{f}) \end{equation} where $H_{2}(.)$ is the binary entropy function.

Consider if we now make $i$ particular length-L sequences impossible (e.g. '$000001$', '$001001$' and '$010101$' have probability $0$ if we take $i=3$ with $L = 6$), what would the new value of the entropy $H(e_{1:L})$ be?

2

There are 2 best solutions below

5
On BEST ANSWER

Edit - my previous answer wrongly assumed a fair coin.

The question is practically unanswerable, because the entropy would depend not only on $i$ (how many messages are supressed) but on which are - and even then the entropy should be computed by brute force calculation.

An approximation for large $L$:

For large $L$, the AEP property says that "most" sequences ("most" in the probability sense), are inside the typical set $A_e^L$, with size $2^{L H_r}$ (where $H_r = h(p_f)$).

If we reduce this typical set by deleting a fraction of $0<\alpha<1$ sequences (uniformly) from it, then entropy rate becomes $$ H'_r= \frac1L \log(2^{L H_r}- \alpha 2^{L H_r})= h(p_f) +\frac1L \log(1-\alpha) \approx h(p_f) - \log(e) \frac{\alpha}{L} \tag1 $$

where the last approximation assumes $\alpha \ll 1$ (logs are in base two).

Depending on the scenario, it might be reasonable to asssume that the $i$ suppresed messages are taken from the typical set, so that $i=\alpha 2^{L H_r} $ or $\alpha = i \, 2^{-L h(p_f)}$ .

Alternatively, we may prefer to assume that the $i$ suppresed message are taken uniformly from the total set, so that $i = \alpha 2^{L }$ , and then $\alpha = i \, 2^{-L} $

5
On

With such restrictions, the digits of the sequence are no longer i.i.d. For example, while looking at 6-long binary sequences and restrict the all zeros string, after observing “00000”, the sixth string is deterministic. So, the entropy of the string can no longer be written as only a function of the marginal distribution of the digits. To find this entropy, you need to treat $e_{1:L}$ as a single random variable and find the entropy by plugging in this new joint probability $p(e_1,e_2,...,e_L)$ that describes the restrictions into the definition of entropy.