Derivation of equation for self information

392 Views Asked by At

I am trying to understand how the formula I(x) = -log(p(x)) for self information was derived.

From what I have read, 2 constraints were imposed on the properties that we would like self-information to satisfy. These constraints are listed below:

I(x) < I(y) if p(x) > p(y)
I(x and y) = I(x) + I(y) if P(x and y) = p(x).p(y)

Following this we somehow find out that I(x) = -log(p(x)) satisfies the above requirements.

My exact questions are:

  • Why did we define the requirements for self-information as above?
  • How did we arrive at I(x) = -log(p(x))?
  • How do we know that I(x) = log(p(x)) uniquely satisfies the above requirements?

Reference: http://people.seas.harvard.edu/~jones/cscie129/nu_lectures/lecture2/info%20theory/Info_Theory_1.html#def

1

There are 1 best solutions below

0
On BEST ANSWER

Why did we define the requirements for self-information as above?

Just because they seem reasonable, they fit with the concept we are trying to formalize. The first one says that an event with low probability give us high information. The second one says that the information a pair of two independent events is the sum of the individual informations.

Let's state our "axioms"

  • Information of the event should depend only on its probability: $I(X)=g(p(X))$
  • $g()$ should be non-negative, continuous and decreasing: lower probability means higher information
  • Specifically $g(0)=+\infty$ and $g(1)=0$ : very improbable event has very high information; (almost) sure event has (almost) zero information.
  • If events $X,Y$ are independent, then $I(X,Y)=I(X)+I(Y)$

It's easy to see that $g(p)= - \log_b (p)$ (arbitrary base) satisfies those requirements. To prove uniqueness is a little more difficult. Sketch: Suppose we have $n$ events with probability $p$, then the above implies (by induction) $g(p^n)=n g(p)$. By considering that $g(p)=g([p^{1/m}]^m)=m g(p^{1/m})$ we deduce that $g(p^a)=a \, g(a)$ holds also for any rational $a$ - and, by continuity for any real $a$. This leads to the unique (up to the base choice) solution $g(p)= - \log_b (p)$