What is the entropy of a freebit?

119 Views Asked by At

The entropy of a binary register which has equal probabilities of the values is 1 bit. What is the entropy of a binary register whose probability distribution is unknown?

My calculation shows it should be 1/2 nat, that is $\frac1{2\ln2}$ bit.

2

There are 2 best solutions below

0
On

If the bit is drawn from a distribution $X$, then the entropy of the bit is $H(X)$. The fact that you do not know what the probability distribution $X$ is does not change that. If you have a single bit and you know nothing about the underlying distribution that produced the bit then the most reasonable assumption to make is that the bit has been drawn uniformly from the set of all values a single bit can take $\{0,1\}$.

If you have access to $n$ bits drawn from exact same distribution $X$, then $X$ can be identified with near certainty if $n \rightarrow \infty$. The $n$ bits can then be compressed to $nH(X)$ bits. If on the other hand the $n$ bits have been drawn for $n$ different distributions $X_1, X_2, ..., X_n$ then the entropy per bit is $\frac{1}{n}H(X_1,...,X_n)$ and the $n$ bit sequence can be compressed to $H(X_1,...,X_n)$ bits.

If the encoder (i.e the mapping that is used to map observed bits to codewords) is to be defined before any of the bits are observed then the minimum average bit compression that compression can achieve is $n \times \max \{ H(X_1), ..., H(X_n) \}$ where $X_1, X_2, ..., X_n$ is the set of $n$ the possible equiprobable underlying distributions. The average compressed sequence length will be determined by the worst case distribution regardless of the true underlying distribution. This is despite the fact that the entropy of $X$ might be $0$.

I would also like to point out that the assumption that the underlying probability distribution that produced the free bit is uniformly distributed among the set of all possible probability distributions is not one you can make with no other information. If you do not know what value any variable $k$ takes, the only reasonable assumption is to assume that all possible values of $k$ are equally likely, this is not the same as saying that $k$ was drawn from an underlying distribution that was uniformly drawn from the set of all probability distributions.

In other words, if you are sampling $n$ bits (where $n \rightarrow \infty$) from an unknown distribution, there are $2^n$ different possible numbers you can get. A reasonable assumption to make would be that each of these numbers have an equal probability of occurring and not assuming that the underlying distribution is uniformly distributed over the set of all probability distributions.

The reason you generally can't do that is because if the probability of a source outputing a "1" is $a$, then if you sample it $n$ times with $n \rightarrow \infty$ then there is a $(1 - \epsilon)$ (i.e almost certain) chance of the resulting sequence having $na$ ones. A law called the law of large numbers. It is thus more appropriate to compute the probability that the underlying probability outputs a "1" is with probability $a$ as

(Probability that p = a) = (Num of $n$ bit numbers with $np$ ones) $\times$ (Probability of each $n$ bit number)

(Probability that p = a) = (Num of $n$ bit numbers with $np$ ones) $\times$ $\frac{1}{(\text{Number of $n$ bit numbers})}$

\begin{equation} \text{Pr}( p = a) = \lim_{n \rightarrow \infty} \bigg({n \choose np} \times \frac{1}{2^n}\bigg) \end{equation}

For large $n$ the expression above computes to $0$ for all values except for $a = 0.5$ where it evaluates to $1$. Approximately 100% of all natural numbers have an equal number of ones and zeros.

0
On

I don't think I agree.

We have here a random variable $X$ which follows a Bernoulli distribution with parameter $P$... which is also regarded as a random variable with its own distribution. This would correspond to a Bayesian formulation.

In this setting, we are indeed allowed to write $H(X | P=p) = h(p)$ where $h()$ is the binary entropy function.

Now, you are claiming (it seems) that the "total entropy" is obtained by averaging: $$H(X) \stackrel{?}{=} \int H(X | P=p) f_P(p) dp$$ where $f_P$ is the densitiy of $P$. The natural question is... why?

Actually, if we regard $X,P$ as two random variables, the above average gives not the "total" (marginal) entropy, but the conditional entropy : $H(X|P)$

Then, your calculation only shows that $H(X|P)= \frac1{2\ln2}$ bits when $P$ is uniform on $[0,1]$.

The correct way to compute $H(X)$ should be to compute first $p(X) = \int p(X|P=p) f_P(p) dp$

In particular, if $P$ is uniform on $[0,1]$, then $P(X=1)= \int_0^1p dp = \frac{1}{2}$ as expected. And hence $H(X) = 1$ bit.

And, of course, $H(X) \ge H(X | P)$.