Entropy: Proving "information gain" formula: $h(x) = -\log p(x)$

79 Views Asked by At

We consider a discrete random variable X, and we want to know how much information we receive every time we observe the value of this random variable. We qualify this measure of information transfer as h(x), a monotonically decreasing function of the probability distribution p(x) for random variable X:

$$h(x) = f(p(x))~~~~~~~~(1)$$

such that if a highly improbable value of X is observed this implies that more information is received, and in contrast, a nearly certain event implies that less information received.

Further, the sum of information gained by observing two independent observations of random variable X is additive. That is if:

$$p(x_1,\,x_2)=p(x_1)p(x_2)~~~~~~~~(2)$$

then:

$$h(x_1,\,x_2) = h(x_1) + h(x_2)~~~~~~~(3)$$

Show that from (1), (2), and (3) we can infer that h(x) is defined as:

$$h(x) = - \log_2(p(x))$$

some resources:

http://en.wikipedia.org/wiki/Information_content

http://stat.yale.edu/~yw562/teaching/itlectures.pdf

1

There are 1 best solutions below

0
On

The equation $h(x) = f(p(x))$ above means that the Information we obtain for a specific
value of a random variable x is correlated with its occurring probability p(x),
and its relationship is given by a mapping function f (~).


The probability of two independent observation of random variables of X is:

$$P(x,x) = P(x)P(x)$$ $$f(P(x)P(x)) = f(P(x))~f(P(x))$$ $$h(x^2) = h(x) + h(x) = 2h(x)$$ $$h(x^2) = 2h(x)$$


The probability of 3 independent observation of random variables of X is:

$$P(x,x,x) = P(x)P(x)P(x)$$ $$f(P(x)P(x)P(x)) = f(P(x))~ f(P(x))~f(P(x))$$ $$h(x^3) = h(x) + h(x) + h(x)= 3h(x)$$ $$h(x^3) = 3h(x)$$


Thus, the general property is:

$$h(x^N) = N h(x)$$


A well known property of the log function is:

$$log(A^N) = N Log(A)$$

so we could guess that f(~) is a log function.

However, the requirement is that f(~) be monotonically decreasing, that is: a small probability yields high information, and a large probability yields low information.

If we look at the graph our guess for f(~):

$$h(x) = log(p(x))$$

we see that it is a monotonically increasing function instead of monotonically decreasing.

However, not all is lost, because we can still modify the log function to turn it into a monotonically decreasing function with the following modification:

$$h(x) = \log\bigg(\frac{1}{p(x)}\bigg)$$

further the property $h(x^N)=Nh(x)$ still holds:

$$h(x^n) = \log(1/p^N(x) = log\Bigg(\bigg(\frac{1}{p(x)}\bigg)^N\Bigg)$$

$$h(x^n) = N \log\bigg(\frac{1}{p(x)}\bigg) $$

Finally, we can make our guess for f(~) more tidey as:

$$h(x) = \log\bigg(\frac{1}{p(x)}\bigg)$$ $$h(X) = -\log p(x)$$