Doubt in Conditional Probability

58 Views Asked by At

I'm studying Information theory from the book Information Theory, Coding and Cryptography-Rajan Bose. I got confused at one pos where they have derived the equation (II)

Consider a discrete random variable $X$ with the possible outcomes $x_i$, i = 1, 2, ..., n. The self information of the event $X = x_i$ is defined as $$I(x_i) = \log (\frac{1}{P(x_i)}) \tag{I}$$

Now consider two discrete random variables X and Y with possible outcomes $x_i$, i = 1, 2, ..., n and $y_j$, j = 1, 2, ..., m. Suppose we observe some outcome $Y = y_j$ and we want to determine the amount of information this event provides about the event $X = x_i$, i = 1, 2, ..., n. i.e. we want to mathematically represent the mutual information.

The Mutual Information $I(x_i;y_j)$ between $x_i$ and $y_j$ is defined as

$$I(x_i; y_j) = \log(\frac{P(x_i|y_j)}{P(x_i)}) \tag{II}$$

I'm not getting how they have come up with the above equation (II). I think It should be

$$I(x_i; y_j) = \log(\frac{1}{P(y_j) \times P(x_i|y_j)})$$ $$= \log(\frac{1}{P(x_i) \times P(y_j|x_i)})$$

But either of the above two equations can not be able to equal to equation (II). Please tell me how they have come up with equation (II).

Reference : Link (Page 7)

2

There are 2 best solutions below

0
On BEST ANSWER

I don't know information theory, but your source is very clear. $$I(x_i;y_j) = \log\left[\frac{P(x_i|y_j)}{P(x_i)}\right]$$ is the definition of mutual information between $x_i$ and $y_j$.

Moreover, notice that $$\frac{1}{P(y_j) P(x_i|y_j)} = \frac{1}{P(x_i,y_j)}=\frac{1}{P(x_i) P(y_j|x_i)}$$ and $$\frac{P(x_i|y_j)}{P(x_i)} =\frac{P(x_i,y_j)}{P(x_i)P(y_j)}.$$ Thus $$\frac{1}{P(y_j) P(x_i|y_j)}\neq\frac{P(x_i|y_j)}{P(x_i)}$$ and so I believe that $$I(x_i;y_j)\neq \log\left[\frac{1}{P(y_j) P(x_i|y_j)}\right]$$ and I do not believe that $$I(x_i;y_j)\neq \log\left[\frac{1}{P(x_i) P(y_j|x_i)}\right],$$ as you claim.

0
On

Equation (II) is the definition of a different concept, (pointwise) mutual information. It's not the self-information (I) of a joint outcome.

The difference in the two notations is really subtle: the semicolon belongs to the notation in (II), and you should have used a comma to separate the outcomes of a joint variable when you wanted to use (I).

$$I(X=x;Y=y) = \log\frac{P(X=x\mid Y=y)}{P(X=x)}$$

$$I(X=x,Y=y) = \log\frac{1}{P(X=x, Y=y)}$$

They are different.