'Knowing' and 'Learning' a Random Variable

78 Views Asked by At

When using basic ideas from probability to think about information (e.g. entropy etc.), some commonly used jargon includes phrases such as:

  • Learning a discrete random variable (r.v.) $Y$
  • Knowing a discrete r.v. $X$

What does 'knowing'/'learning' really mean here?

If we 'know' $X$, do we effectively have a record of the entire probability distribution $P_{X}$? If we're 'learning' $Y$, are we trialling the random variable as many times as possible in order to try and construct some semblance of $P_{Y}$?

Example:

The entropy $H$ of a conditional distribution $P_{Y|X=x}$ is $H(Y|X=x)=-\sum_{y}p_{y|x}\log(p_{y|x})$. The conditional entropy $H(Y|X)$ is then the expected entropy of $P_{Y|X=x}$ over $P_{X}$ i.e. $H(Y|X)=\sum_{x}p_{x}H(Y|X=x)$. This is supposedly "how surprised we are to learn $Y$ given that we know $X$ already".

For example, see slide 47 of these notes.

1

There are 1 best solutions below

1
On BEST ANSWER

Based on the slides to which you linked, it appears that the words "know" and "learn" are being used pretty informally here. They have not been given any precise mathematical meaning in the notes (as far as I can tell), and are not being used to make any kind of rigorous argument. Instead, the words are being used to help build some intuition.

If you look at slide 31, where the definition of entropy is motivated, you will see that entropy is meant to quantify how much uncertainty we have about a random variable. The more uncertain we are about it (and, therefore, the less we know about it), the more entropy that variable possesses.

On the other hand, if a variable depends on a second variable (or both variables depend on a third, hidden variable), then observing the value of one variable can give us information about the second variable. That is, if we observe $X$ and determine that it has a particular value, then this might give us information about the possible values of $Y$. By "knowing" something about $X$, we have "learned" something about $Y$. For example, suppose that I deal each of us a card from a standard deck. Let $X$ be the color of my card, and $Y$ the color of your card. Before looking at either card, $$ P(X = \text{red}) = P(Y = \text{red}) = P(X = \text{black}) = P(Y = \text{black}) = \frac{1}{2}. $$ From this, entropy can be computed. Now, suppose that I look at my card and determine that it is black. Now $$ P(X = \text{red}) = 0, \qquad\text{and}\qquad P(X = \text{black}) = 1, $$ which means that I know that my card is black. What does that tell me about your card? That is, by knowing that my card is black, what have I learned about your card? Since there are 52 cards in a deck, and half of them are red, I now know that $$ P (Y = \text{red} \mid X = \text{black} ) = \frac{26}{51} \qquad\text{and}\qquad P (Y = \text{black} \mid X = \text{black} ) = \frac{25}{51}. $$ By knowing something about my card, I have learned something about your card (i.e., it is slightly more likely to be red than black). Conditional entropy is meant to convey some quantification of this idea. Again, an actual computation will bear this out.

Long story short: "Learn" and "know" are being used informally in the exposition. They are meant to help you build some intuition, and are not rigorously defined mathematical terms (at least, not in the cited context).