Why do we like sticking random variables into their own distributions?

113 Views Asked by At

Let $X$ be a random variable taking values in the set $S$. It has some distribution $f(s)$. Often in statistics, we are interested in the real valued random variable $f(X)$. Here are some examples:

  • The value ${\bf E}(\log \frac{1}{f(X)})$ is called entropy and is very important in information theory.
  • If we have a family of distributions $f(s,\theta)$ where $\theta \in \Theta$ is a parameter, then ${\bf E}[(d_{\theta} \log f(X,\theta))^2]$ is called the Fisher information metric which shows up in the Cramer-Rao bound and other places.

These examples are enough to convince me that $f(X)$ is extremely fundamental, but without foresight, $f(X)$ seems like a strange object to consider.

Is there any way to motivate $f(X)$, where $f$ is the distrubution of $X$ without saying things like "noisy channel coding theorem" or "Cramer-Rao bound"?

1

There are 1 best solutions below

0
On

OK, based on the links in Bey's comments, it seems that there is not an obvious answer to this question.

I have a few ideas which I want to record.

First lets consider the case of a random variable $X \in S$ where $S$ is a finite set. Let $f : S \to \mathbb{R}$ be the distribution. If we write $ S = \{ s_1,\dots,s_n \}$ then we can write $f = (p_1,\dots,p_n)$. The distribution of $f(X)$ is $$ \sum p_i \delta_{p_i} $$ Notice that this distribution does not depend on how we label the elements of $S$, but it remembers all the probabilities. You can think of $X \rightsquigarrow f(X)$ as forgetting the way we "coordinatized" $S$, but remembering which probabilities occurred.

For continuous random variables things are more subtle, but you can still say the following. Let $M^d$ be a manifold and $X \in M$ a continuous random variable. The distribution of $X$ is some $d$-form $\eta \in \Gamma(\wedge^d T^*M)$ and $\eta(X)$ is a random variable taking values in the line bundle $\wedge^d T^*M$. The density function of $\eta(X)$ is the $d$-form $\eta$ supported on the subset $\eta(M) \subseteq \wedge^d T^* M$. Said, in another way, the distribution of $\eta(X)$ is the push forward of $\eta$ along itself.