Does a random variable come from a probability distribution or is it vice-versa?

66 Views Asked by At

I am curious to know which statement is correct:

  1. a random variable come from a probability distribution OR
  2. a probability distribution is created from observing the behavior of a random variable
2

There are 2 best solutions below

0
On

Any random variable is a real-measurable function on the given space, equipped with some $\sigma$-algebra. It induces the probability mass function, in case the image is a discrete set, etc.

1
On

Interesting question! The answer is that it can go both ways.

For any probability distribution F, i.e. a cadlag (right continuous with left limits) function $F: (-\infty, \infty) \to [0,1]$ satisfying

\begin{equation} \lim_{x \to -\infty} F(x) = 0, \lim_{x \to \infty} F(x) = 1, \end{equation}

there exists a random variable $X$ with $\mathbb{P}(X \leq x) = F(x)$. Specifically, this means there exists a probability space $(\Omega, \mathcal{A}, \mathbb{P})$, and a measurable function $X: \Omega \to \mathbb{R}$ with $\mathbb{P}(\{\omega: X(\omega) \leq x\}) = F(x)$ for every $x \in \mathbb{R}$. Depending on the distribution function $F$, there may be many possible probability spaces that work, but usually $\Omega$ just sits in the background, and we don't think much about it. (It turns out that the space $\Omega = [0,1]$ with the Borel sigma-field and Lebesgue measure always works, so you can assume that's where $X$ lives if you like.)

Conversely, if $X$ is a measurable map from a probability space $\Omega$ into $\mathbb{R}$, then the function $F_X: \mathbb{R} \to [0,1]$ is called the distribution function of $X$, defined as $F_X(x) = \mathbb{P}(\{\omega: X(\omega) \leq x\})$. One can use the axioms of a probability space to show that $F_X$ satisfies all the properties mentioned above.

Philosophically, what is happening? Roughly speaking, a statistician would like the first definition. The idea is: we observe data from some process, and form an empirical distribution from it. We assume there is a random variable $X$ generating the data, and we can try to understand how the empirical distribution relates to the true distribution of $X$.

For a mathematician, the other order is more typical. We define processes or variables abstractly using random variables on probability spaces, and prove theorems about them, or about the corresponding distribution functions.