Why use random variables instead of probability spaces

Question

Why use random variables instead of probability spaces

357 Views Asked by Bumbble Comm At 27 Mar 2026 - 3:35

When talking about various ways to model something probabilistically, many authors prefer to use random variables, instead of probability distributions. Of course, this difference is more of a point of view, than of actual mathematical substance - yet I'm very much interested in why the random variables point-of-view is assumed? Let me elaborate below on this.

It seems to me that this comes from not being fully explicit and formal, when building your model - since if you would be, that you would see that using random variables is actually very artificial and using the probability distribution is actually much more natural.
Consider the following problem:
Suppose we have a vector $x\in\mathbb{R}^{p}$ that we interpret as the visible attributes of individual. For example, $x$ might represent a loan applicants age, gender, race, and credit history. We consider the problem of modeling whether we should give a person represented by $x$ a loan; let $y\in\{0,1\}$ represent the target of this prediction, i.e. whether an individual will have defaulted on a loan he received ($y=0$) or repaid it according to his contract ($y=1$).
To formalize this problem, we can define random variables $X$ and $Y$ that take on values $X=x$ and $Y=y$ for an individual drawn randomly from the population of interest (e.g., the population of ). We define the true risk \begin{equation} r(x)=Pr(Y=1|X=x)\ \ (1). \end{equation} Then the problem is how to estimate this risk from data, yadda, yadda.

The issue I mention above is related to the formulation (not the solution or theoretical framework) of this problem. Usually the above description is all that you get!

Let us investigate how we can make it even more precise:
If we begin to be more explicit, in order to even introduce random variables $X,Y$ we need a sample space. Because these random variables appear in the expression (1), which explicitly is $$ r(x)=Pr(\{\omega\in\Omega:Y(\omega)=1\}|\{\omega\in\Omega:X(\omega)=x\}), $$ the random variables furthermore need to be defined on the same sample space. We could pick $\Omega:=\mathbb{R}^{p}\times\{0,1\}$ as a suitable candidate, where a distribution $\mathcal{D}$ on it models how likely it is that a certain individual is drawn from it. We could then define $X:\Omega\rightarrow\mathbb{R}^{p}$ as the projection onto the first $p$ components and $Y:\Omega\rightarrow\{0,1\}$ as the projection onto the last component. By doing so, we have given (1) a concrete meaning.

But defining the random variables like this is rather cumbersome; since we already needed to introduce $\Omega$ and $\mathcal{D}$ to even talk about random variables, we could just use these two ingredients to define the true risk by \begin{equation} r(x)=Pr(\{\omega\in\Omega:\omega_{p+1}=1\}|\{\omega\in\Omega:\omega_{1,\ldots,p}=x\}) \ \ (2), \end{equation}

where subscripts indicate the $p$-th coordinate.

But somehow a formulation as in (2) is very rarely used. My question is: Why does the community tend to prefer a vague way of defining random variables, that, if made precise, is actually more tedious to set up(as I have just shown) than using the formulation (2) ?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Using the probability space might seem more natural, but random variables are more elegant because usually we do not care about the probability space. Yes, in real applications, the probability space is relatively straightforward to point out, but it's not actually important. There is some quantity we care about, or multiple quantities we care about, and they are somehow dependent on each other or they aren't. And it's these quantities and their interplay we really care about, so why not do the theoretical groundwork with a focus on those quantities - random variables.

Another reason is that random variables give us an elegant method to describe events. Any event can be described as the preimage of a usually simple set under a random variable, and then knowledge about the random variable translates to knowledge about the event. Especially (in)dependence of events can be elegantly treated with (in)dependent random variables.

Why use random variables instead of probability spaces

There are 1 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in SOFT-QUESTION

Related Questions in RANDOM-VARIABLES

Related Questions in CONVENTION

Trending Questions

Popular # Hahtags

Popular Questions