I've only barely started to learn about Entropy and Information Theory as a part of a course I'm taking in Systems Theory / Cybernetics.
The thing is, I'm terrible at math!
Say I have a joint probability matrix as follows:
y1 y2
----------
x1| 1/2 1/4
x2| 0 1/4
In other words:
- the probability of x1 and y1 occurring is 0.5
- the probability of x1 and y2 occurring is .25
- the probability of x2 and y1 occurring is 0
- the probabiltiy of x2 and y2 occurring is .25
Apparently, this corresponds to the following conditional probability matrix (Y conditioned by X):
y1 y2
-----------
x1| 2/3 1/3
x2| 0 1
In other words:
- If we know x1 will occur, then there is a 2/3 chance y1 will occur and a 1/3 chance y2 will occur.
- If we know x2 will occur, then there is a 100% chance that y2 will occur.
So, I have 2 questions.
- Are my interpretations correct?
- How was the second matrix derived from the first? I'm sure the formula is embarrassingly simple as all the material I'm reading seems to assume the conversion function is self-evident.
In other words, for each row, I need to maintain the ratio of each value to each other, but scale all the values so that they add to 1. (I think)
Your interpretations are correct. To obtain the second matrix you apply the definition of conditional entropy $P(A|B)=P(A B)/P(B)$
For example:
$$ P(Y_1 | X_1) = \frac{P(X_1,Y_1)}{P(X_1)}$$
The numerator (joint probability) is given by the first matrix. The denominator (marginal probability) is obtained by summing the joint probability over all the values of the "other" variable (marginalization, or law of total probability).
$$P(X_1) = \sum_j P(X_1,Y_j)=\frac{1}{2}+\frac{1}{4}=\frac{3}{4}$$
Hence
$$ P(Y_1 | X_1) = \frac{1/2}{3/4}=\frac{2}{3}$$
What you say in your last paragraph is also correct (more than that, insightful): the computation of conditional probability amounts to pick each row of the joint entropy and normalize it (that's precisely what the marginalization sum does).
Notice, by the way, that in the first matrix (joint), all entries must sum up to one. Instead, in the second (conditional), each row must sum up to one.