I'm studying Information theory from the book Information Theory, Coding and Cryptography-Rajan Bose. I got confused at one pos where they have derived the equation (II)
Consider a discrete random variable $X$ with the possible outcomes $x_i$, i = 1, 2, ..., n. The
self informationof the event $X = x_i$ is defined as $$I(x_i) = \log (\frac{1}{P(x_i)}) \tag{I}$$Now consider two discrete random variables X and Y with possible outcomes $x_i$, i = 1, 2, ..., n and $y_j$, j = 1, 2, ..., m. Suppose we observe some outcome $Y = y_j$ and we want to determine the amount of information this event provides about the event $X = x_i$, i = 1, 2, ..., n. i.e. we want to mathematically represent the mutual information.
The
Mutual Information$I(x_i;y_j)$ between $x_i$ and $y_j$ is defined as$$I(x_i; y_j) = \log(\frac{P(x_i|y_j)}{P(x_i)}) \tag{II}$$
I'm not getting how they have come up with the above equation (II). I think It should be
$$I(x_i; y_j) = \log(\frac{1}{P(y_j) \times P(x_i|y_j)})$$ $$= \log(\frac{1}{P(x_i) \times P(y_j|x_i)})$$
But either of the above two equations can not be able to equal to equation (II). Please tell me how they have come up with equation (II).
Reference : Link (Page 7)
I don't know information theory, but your source is very clear. $$I(x_i;y_j) = \log\left[\frac{P(x_i|y_j)}{P(x_i)}\right]$$ is the definition of mutual information between $x_i$ and $y_j$.
Moreover, notice that $$\frac{1}{P(y_j) P(x_i|y_j)} = \frac{1}{P(x_i,y_j)}=\frac{1}{P(x_i) P(y_j|x_i)}$$ and $$\frac{P(x_i|y_j)}{P(x_i)} =\frac{P(x_i,y_j)}{P(x_i)P(y_j)}.$$ Thus $$\frac{1}{P(y_j) P(x_i|y_j)}\neq\frac{P(x_i|y_j)}{P(x_i)}$$ and so I believe that $$I(x_i;y_j)\neq \log\left[\frac{1}{P(y_j) P(x_i|y_j)}\right]$$ and I do not believe that $$I(x_i;y_j)\neq \log\left[\frac{1}{P(x_i) P(y_j|x_i)}\right],$$ as you claim.