Does X|Y = X formally, in the sense of RVs?

74 Views Asked by At

In Cover and Thomas' "Elements of Information Theory", the joint entropy $H(X,Y)$ is defined, but they state that this definition is nothing new if we consider that it is the entropy of a single vector valued random variable $(X,Y)$. Then it goes on to define conditional entropy by itself, but the previous remark got me thinking if this too could be just the entropy of a random variable $X|Y$.

But is $X|Y$ really a random variable? It seems that much of the time we talk about random variables, we use them to state facts about the probability distributions associated with them, which is interesting because (correct me if I am wrong) formally they are just functions from $\Omega$ to $\mathbb{R}^n$ and they don't carry information about the distribution. Entropy of a random variable is actually entropy of a PMF which we associate with that random variable in our heads.

That leads me to beleive that $X|Y = X$ formally, in the sense of functions, and the only distinction is that we "change the PMF of $X$". Am I making any sense or is this interpretation wrong?

If I am correct, an additional question would be, why is it common practice to stretch the definitions in this way and talk about RVs so freely when the actual object of interest is the distribution?

2

There are 2 best solutions below

5
On BEST ANSWER

$X|Y$ is not a RV, but $E(X|Y)$ is!

And you can define $H(X|Y) = H(X) - H(E(X|Y))$. For example, if $X,Y$ are independent, then $E(X|Y)$ is just a number, hence entropy 0, and then $H(X|Y) = H(X)$.

"formally they are just functions from $\Omega$ to $R^n$ and they don't carry information about the distribution." not completely correct, because $\Omega$ has to be endowed with a measure. In fact, the exact $\Omega$ does not really matter as the same RV can be defined in many possible ways on many sample spaces.

What $E(X|Y)$ does, is reduces "granularity" of subsets of $\Omega$. For example, for RV $X$, $\omega_1$ and $\omega_2$ might lead to different values, but for $E(X|Y)$, to the same value.

2
On

I have never gotten a clear answer on this question. My intuition is the same as yours. And it appears that it is the same in "hierarchical models" in statistics, where

$$ x \sim (X|Y)$$

is shorthand for:

$$ y \sim Y \\ x \sim (X | Y = y)$$

The latter shorthand would appear to indicate that there is a tripleable adjunction induced by "sampling", so that the conditioning operator $|$ is a monadic functor on random variables.

But I think that doesn't work out in the case of real random variables, because of measure theory.