Postscript to the question below. In trying to learn from the answers below, all of which I am grateful for, I read a historical article on the origins and legacy of Kolmogorov's Grundbegriffe. This article helped me understand what basic things people were struggling with when this theory was developed. In particular, the long-term trend towards abstraction and foundation in terms of measure theory, and the early-days focus on the connection between the real world and the probabilistic model. I then re-read the answers and comments. I made a comment that started
We can choose $Ω=\Re$ because the domain of the distribution function is $\Re$.
This is wrong because the domain of the distribution function is not necessarily mentioned in the declaration of the probability space. I made the convention that random variables $X: \Omega \rightarrow \Re$. So the domain of the distribution function is $\Re$ by my convention, but that doesn't have anything to do with the probability space. $\Omega$ is a kind of index set. Suppose we are reasoning about the saturation of the color red in grapes. In that case we are thinking about say a color level in $S=[0,255)$. Nowhere in the definition of a probability space $(\Omega,\mathcal A,P)$ to support reasoning about $S$ do we need to specify $S$. We do need to demonstrate that there is a 1-1 mapping between $\Omega$ and $S$, i.e. that $\Omega$ can enumerate $S$. Once we have "built" $(\Omega,\mathcal A,P)$, we can put it to work and re-use it for any $S$ which $\Omega$ can enumerate. The probability space $(\Omega,\mathcal A,P)$ is a kind of indexing structure. That for me is the key realization. The key cognitive error comes from labelling $\Omega$ as the sample space, and $\mathcal A$ as the event space. The common sense meaning of those terms implies a connection with the actual samples being reasoned about, when that does not have to be the case. A far less misleading terminology would be to label $\Omega$ as the sample index space or just index space, and $\mathcal A$ as the index set space. This kind of thing is clearly understood in programming languages, where if I have an array $A$, then $(i,j)$ is an index and I don't confuse $(i,j)$ with $A[i,j]$, and I don't confuse the purpose of arrays with the purpose of array indices, but in some contexts I can identify $A[i,j]$ with $(i,j)$.
Short version of the question: How do we formally and correctly define the probability space of the reals which supports the definition of the typical/usual univariate continuous probability distributions, such as uniform and exponential?
Short restatement of the core question that I have: I am hung up on p. 3 section 1.1B of the KPS text. They start with an unspecified probability space $(\Omega,\mathcal A,P)$. Two distinct random variables $V$, $V \in Exp(\lambda)$ and $V \in U[a,b]$, are said to have distribution functions $F_V=P_V((-\infty,x))=P(\{\omega \in \Omega: V(\omega)<x\})$. These are distinct and solved separately as $F_{U[a,b]}(x) = \mathcal H(x-a) \mathcal H(b-x) \frac{x-a}{x-b} + \mathcal H(x-b)$ and $F_{Exp(\lambda)}=\mathcal H(x) (1-e^{-\lambda x})$, where $\mathcal H(x \geq 0) = 1, \mathcal H(x<0)=0$. My key question is:
- What is a solution for the $P$ shared by $X$ and $Y$?
Note: Here are some similar questions on Math Stack Exchange
- Probability space of a Gaussian, unanswered, from 2016.
- What are the sample spaces when talking about continuous random variables?, asked 9 years ago and answered as $[0,1]$. The accepted answer starts out by saying "You can take it to be a subset of $\Re$ or, more generally, $\Re^n$." But then the solver gets to $[0,1]$.
Comment: I was mistakenly assuming that the text above was taking $\Omega=\Re$ because I saw a similar statement somewhere to the effect of saying "for purposes of discussion let's say the sample space for continuous random variables is $\Re^d$". The cited answer to 2nd question above starts that way but then gets to $[0,1]$. So: I now understand that the $[0,1]$ is the "best fit" sample space, along with Lebesgue measure. So the "right" probability space that I was looking for is the Steinhaus space $([0,1],\mathscr B([0,1]), \mu)$ where $\mu$ is the Lebesgue measure restricted to $[0,1]$. 99.999% of my confusion came from
- Not recognizing that $[0,1]$ is a "big enough" space to enumerate the domain of a continuous map into $\Re$. So it's "as good as" $\Re$.
- Making the assumption that the convention, was, somehow somewhere, to identify the sample space for $d$-dimensional continuous random variables with $\Re^d$, when the "best fit" answer is $[0,1]^d$.
Longer version of the question:
Following this text,
Let $\Omega$ be a nonempty set, the sample space.
Let set $\mathcal F$ of subsets of $\Omega$ be a $\sigma$-algebra so that
- $\Omega \in \mathcal F$
- $\Omega \setminus F \in \mathcal F$ if $F \in \mathcal F$
- $\bigcup_{n=1}^{\infty} F_n \in \mathcal F$ if all $F_i \in \mathcal F$
Let $P: \mathcal F \rightarrow [0,1]$ be a probability measure so that
- $P(\Omega) = 1$
- $P(\Omega \setminus F) = 1-P(F)$
- $P(\bigcup_{n=1}^{\infty} F_n) = \sum_{n=1}^\infty P(F_n)$
We call the triple $(\Omega, \mathcal F, P)$ a probability space.
Suppose $X:\Omega\rightarrow \Re$. We say $X$ is a random variables if $\{\omega \in \Omega : X(\omega) \leq a\}$ is in $\mathcal F$ for every $a \in \Re$.
Then the probability distribution function $F_X : \Re \rightarrow \Re$ is defined for all $x \in \Re$ as
$$F_X(x) = P(\{\omega \in \Omega : X(\omega) < x\})$$
Note that $P$ appears unsubscripted in the definition of $F_X$. $P$ does not depend on the particular random variable $X$ whose distribution we are defining. So in that sense it should be possible for the same probability space $(\Omega, \mathcal F, P)$ to underly probability distribution function constructions for multiple distinct random variables $X$ and $Y$, $X \neq Y$, for the same probability space.
For example, let
$$\Omega = \{0,1\}$$ $$\mathcal F = \{\emptyset, \{0\}, \{1\}, \{0,1\}\}$$ $$P = \begin{cases} \emptyset &\mapsto& 0 \\ \{0\} &\mapsto& \frac{1}{2} \\ \{1\} &\mapsto& \frac{1}{2} \\ \{0,1\} &\mapsto& 1 \end{cases}$$
Let $X,Y: \Omega\rightarrow \Re$ and be random variables fully defined by
$$X = \begin{cases} 0 &\mapsto& 17 \\ 1 &\mapsto& 17 \end{cases}$$
$$Y = \begin{cases} 0 &\mapsto& 42 \\ 1 &\mapsto& 42 \end{cases}$$
Then the probability distributions of $X$ and $Y$ are
$$F_X(x) = P(\{\omega\in\Omega:X(\omega)<x\}) = \begin{cases} x < 17 &\mapsto& 0 \\ x \geq 17 &\mapsto& 1 \end{cases}$$
$$F_Y(x) = P(\{\omega\in\Omega:Y(\omega)<x\}) = \begin{cases} x < 42 &\mapsto& 0 \\ x \geq 42 &\mapsto& 1 \end{cases}$$
Clearly $X \neq Y$ and $F_X \neq F_Y$. In the above discrete example, if I understand the language correctly, there is a single probability space $(\Omega,\mathcal F,P)$ with a single probability measure $P$ which underlies or supports two distinct probability distributions $F_X$ and $F_Y$ for two distinct random variables $X$ and $Y$.
Now let $(\Omega, \mathcal F, P)$ be a probability space underlying random variables $X$ and $Y$ where:
- Random variable $X: \Omega \rightarrow \Re$ is such that $X$ has the uniform distribution $F_X: \Re \rightarrow [0,1]$ such that
$$F_X(x) = P(\{\omega\in\Omega:X(\omega)<x\}) = \begin{cases}0 &:& x < a \\ \frac{x-a}{b-a} &:& a \leq x \leq b \\ 1 &:& b < x \end{cases}$$
- Random variable $Y: \Omega \rightarrow \Re$ is such that $Y$ has the exponential distribution $F_Y: \Re \rightarrow [0,1]$ such that
$$F_Y(x) = P(\{\omega\in\Omega:Y(\omega)<x\}) = \begin{cases}0 &:& x < 0 \\ 1-e^{-\lambda x} &:& x \geq 0 \end{cases}$$
Also, per comment below, one distribution can be supported by multiple probability spaces. (The key understanding here for me is that probability space and probability distribution are separate constructions.)
My questions are (and some answers that I take from my reading of the solutions below):
Q1. Is $(\Omega, \mathcal F, P) = (\Re, \mathcal B(\Re), \mu)$ where $\mathcal B(\Re)$ is the Borel set of the reals and $\mu$ is the Lebesgue measure a probability space which underlies $X$ and $Y$? Answer: No, but the Steinhaus $([0,1], \mathcal B([0,1]), \mu)$ is good.
Q2. Is it correct to call $(\Re, \mathcal B(\Re), \mu)$ the standard probability space of the reals? Is there some other standard notation or language for the probability space underlying the usual continuous probability distributions? Answer: No, but the Steinhaus space is a standard space in the Wikipedia sense.
Q3. Is it correct to say that the notion of probability space is independent of and complementary to the notion of probability distribution, and that the notion of probability distribution is always associated with a particular random variable $X$ presented with a supporting probability space $(\Omega, \mathcal F, P)$? Answer: Kind of. One distribution can be accompanied by many probability spaces. One probability space can be accompanied by many distributions. I'm using "accompanied" because the worked "supported" may be overloaded in math. I'm looking for some compact synonym of "independent and complementary". The main thing is to demonstrate through examples that the relationship is many-to-many.
Some concepts/definitions that might help:
A probability measure on $\left(\mathbf{R}^d, \mathcal{B}(\mathbf{R}^d) \right)$ is called distribution. The triplet obtained can be called a distribution space to distinguish it from the general probability space.
Typical distributions are built from Lebesgue measure and $\mathcal{B}(\mathbf{R}^d)$-measurable functions $h:\mathbf{R}^d\rightarrow [0,\infty) $ with $$ \int_{\mathbf{R}^d} h(x) \mu(dx) =1$$ by $$ P_h(B) = \int_B h(x) \mu(dx) $$ for all $B\in \mathcal{B}(\mathbf{R}^d)$.
An example of distribution that cannot be built this way is Dirac's distribution concentrated at some point $x_0 \in \mathbf{R}^d$:
$$ \delta_{x_0} (B) = 1_{x_0\in B}$$ for all $B\in \mathcal{B}(\mathbf{R}^d)$.
Also, given probability space $\left(\Omega, \mathcal{F}, P\right)$ and $X:\Omega\rightarrow \mathbf{R}^d$ which is $\mathcal{F}/\mathcal{B}(\mathbf{R}^d)$-measurable, one can build a distribution $P_X$ as follows:
$$ P_X = P \circ X^{-1}, $$
usually called the distribution of $X$ (or law of $X$), which suggests that now one can focus only on the distribution space $\left(\mathbf{R}^d, \mathcal{B}(\mathbf{R}^d), P_X \right)$.
Note: If $\Omega = \mathbf{R}^d, \mathcal{F} = \mathcal{B}(\mathbf{R}^d)$ and $P$ is a distribution, then taking $X$ to be the identity function, $id$, we have:
$$ P_{X} = P.$$
Note 2: Two random variables, possibly defined on different spaces, can have the same distribution (law).
If $X$ is defined on an abstract space $\left(\Omega, \mathcal{F}, P\right)$ as above, it induces distribution $ P_X$.
Then random variable $id$ defined on $\left(\mathbf{R}^d, \mathcal{B}(\mathbf{R}^d), P_X \right)$ has the same distribution.
Many models rely on knowing the distribution of a random variable $X$ rather than its explicit form and the probability space on which it is defined.
Note 3: To answer Q3, I guess, we have the following facts:
A distribution space is just a particular case of probability space.
Yes, for a distribution, be it $P_h$ or Dirac type, there is always a random variable on a 'supporting' probability space that induces the same distribution: we take the probability space to be the starting distribution space itself and the random variable to be the identity function.
(Complementing Note 2) If $A,B\in \mathcal{F}$ are different events such that $P(A)=P(B)$, then $$1_A \not= 1_B,$$ but they are random variables with the same distribution, that is
$$ P_{1_A} = P_{1_B}.$$
$$ P_{\alpha \circ X} = P_X \circ \alpha^{-1}. $$
Note 4: I finally realized that you are focusing on the distribution function.
A function $F:\mathbf{R}\rightarrow \mathbf{R}$ which is non-decreasing, bounded, left-continuous and for which $$\lim_{x\rightarrow -\infty} F(x) = 0$$ is called a distribution function. This definition stands on its own (no mention of measures).
The following facts can be proven.
Fact: Let $F$ be a distribution function such that $$\lim_{x\rightarrow \infty} F(x) = 1.$$ Let also $m$ be a measure on $\left((0,1), \mathcal{B}((0,1))\right)$ such that $$ m((0,x))=x $$ for all $x\in (0,1]$ (its existence can be proven). Then there is a non-decreasing function $f:(0,1) \rightarrow \mathbf{R}$ such that measure $m\circ f^{-1}$ has $F$ as distribution function, that is
$$ (m\circ f^{-1})((-\infty,x)) = F(x)$$
for all $x\in \mathbf{R}$.
Fact 2: A measure $\mu$ on $(\mathbf{R}, \mathcal{B}(\mathbf{R}))$ is perfectly determined by its distribution function $F_\mu$ defined as $$ F_\mu(x) = \mu ((-\infty,x)) $$ for all $x\in \mathbf{R}$. That is, if two measures on $(\mathbf{R}, \mathcal{B}(\mathbf{R}))$ have the same distribution function, they coincide.
These suggests that specifying the triplet
$$\left(\mathbf{R}, \mathcal{B}(\mathbf{R}), m\circ f^{-1}\right)$$
for some non-decreasing $f$ or rather a distribution function $F$ (with $\lim_{x\rightarrow \infty} F(x) = 1$, for which we know such $f$ exists) is the essential step in setting up any distribution space.
For a random variable on an abstract probability space, $X:(\Omega, \mathcal{F}, P) \rightarrow (\mathbf{R}, \mathcal{B}(\mathbf{R}))$, as soon as we get $P_X$, the associated distribution, and $F_X$ its distribution function, as defined in the book, we are done (can forget about $X$, in some sense; basically replace it with $id$ introduced in Note 2, as it has the same distribution). Note that:
$$ F_X = F_{P_X} $$
with the second term defined above (in Fact 2).