Probability - Linking Continuous Distribution with Uniform Distribution

96 Views Asked by At

I have a Question Regarding a Proposition from the Textbook: Rice J.A Mathematical Statistics and Data Analysis (3rd).

Section 2.3 Proposition C:

Let $Z=F(X)$, where X is a Random Variable; then $Z$ has a uniform distribution on $[0,1]$.
Proof $P(Z ≤ z) = P(F(X)≤z) = P(X≤F−1(z)) = F(F−1(z)) = z$

I understand the proof, but I can't understand what is the intuitive meaning of this. I have looked at similar threads regarding this question but it wasn't insightful. I have also tried plotting the distribution and trying to see how the Uniform Distribution of Z comes about.

Thank you for your time!

3

There are 3 best solutions below

0
On

It says that if you have a random variable $X$ with any known continuous distribution, but what you really want is a uniform random variable, you can transform $X$ into a uniform random variable by applying a function to it, and that function is in fact the cdf of $X$.

The converse of this theorem is more useful: that if $U$ is a uniform random variable and $F$ is a continuous cdf, then $F^{-1}(U)$ has the distribution with cdf $F$. This is the idea of inverse transform sampling.

0
On

Intuitively, $F(X)$ is the quantile of a sample $X \sim F$. So, the Probability Integral Transform (the result you state) says that the quantile of a sample from any distribution is distributed uniformly on $[0,1]$.

Maybe using some specific numbers can help:

What fraction of the time is $X$ less than the $\frac{1}{4}$ quantile? $\frac{1}{4}$ of the time. So, the quantile of $X$ (i.e. $F(X)$) will be less than $\frac{1}{4}$ in $\frac{1}{4}$ of the time. (This is the same as saying $\mathbb{P}(Z \leq \frac{1}{4}) = \frac{1}{4}$.)

What fraction of the time is $X$ between the $\frac{1}{5}$ and $\frac{4}{5}$ quantiles? $\frac{3}{5}$ of the time. So, $X$'s quantile will be between $\frac{1}{5}$ and $\frac{4}{5}$ in $\frac{3}{5}$ of the time.

Similarly, for any interval $(a,b) \subset [0,1]$, what fraction of the time is $X$ between the $a$th and $b$th quantiles? In $b-a$ of the time. So, $X$'s quantile will be in $(a,b)$ in $b-a$ of the time.

This is precisely saying that $X$'s quantile is distributed uniformly on $[0,1]$.

0
On

The statement as given in the OP is not true in general, unless $X$ has a continuous distribution. For example, consider $X\sim\operatorname{Bernoulli}(0,1;p=1/2)$. If $F$ denotes the (cumulative) distribution function of $X$, then $$F(x)=\frac12\mathbb{1}_{[0,1)}(x)+\mathbb{1}_{[1,\infty)}(x)$$ A simple computation shows that $$Y:=F(X)=\frac{1}{2}\mathbb{1}_{\{X=0\}}+\mathbb{1}_{\{X=1\}}$$ Thus $Y\sim\operatorname{Bernoulli}(1/2,1;p=1/2)$ which is different form the uniform distribution.

There is a nice result in Rüschendorf, L., On the distributional transform, Sklar's theorem, and the empirical copula process, Journal of Statistical Planning and Inference 139(11):3921-3927, 2009, that generalizes the statement in the OP from continuous real-valued random variables to general real-random variables. The remaining of this posting is dedicated to introducing this result.

Recall that the cumulant function (some times known as generalized inverse) of $X$ defined as $$Q(q)= \inf\{x\in\mathbb{R}:F(x)\geq q\} \qquad 0\leq q\leq 1 $$ Some authors use $F^{-1}$ to denote $Q$. Clearly $Q(0):=-\infty$, and $Q(1)<\infty$ iff $F(x)=1$ for some $x\in\mathbb{R}$ (some authors set $Q(0)=\inf\{x\in\mathbb{R}: F(x)>0\}$, but that is inconsequential).

Notice that while $F$ is a monotone nondecreasing, right-continuous with left limits function, $Q$ (defined on $(0,1)$) is a nondecreasing, left-continuous with right-limits function. Also, for any $x\in\mathbb{R}$ and $0<q<1$ $$F(x)\geq q\quad\text{iff}\quad Q(q)\leq x$$ From this, it follows that if $U\sim\operatorname{Unif}(0,1)$, then $Q(U)\sim X$.

Define the function $$G(x;\lambda):=\mathbb{P}[X<x]+\lambda\mathbb{P}[X=x] =F(x-)+\lambda(F(x)-F(x-)), \qquad x\in\mathbb{R},\,0\leq \lambda\leq 1$$

Notice that if $X$ has continuous distribution function $F$, then $G(x;\lambda)=F(x)$ for all $\lambda$.

Theorem: Suppose $V\sim\operatorname{Unif}(0,1)$ and that $X$ and $V$ are independent. Define the random variable $$U:=G(X;V)=F(X-)+V\big(F(X)-F(X-)\big)$$ Then, $U\sim\operatorname{Unif}(0,1)$ and $X= Q(U)$ almost surely.

Continuing with the example given at the beginning of this posting, observe that $$F(X-)=\frac{1}{2}\mathbb{1}_{\{X=1\}}$$ and so, $$F(X)-F(X-)=\frac{1}{2}\mathbb{1}_{\{X=0\}}+\frac{1}{2}\mathbb{1}_{\{X=1\}}$$ Hence $$U=G(X,V)=\frac12V\mathbb{1}_{\{X=0\}}+\frac{1+V}{2}\mathbb{1}_{\{X=1\}} $$ For $0\leq u\leq 1$, \begin{align} P[U\leq u]&=P[U\leq u, X=0]+P[U\leq u, X=1]=\frac12P[V\leq 2u]+\frac12P[V\leq 2u-1]\\ &=\frac12\max(0,\min(2u,1))+\frac12\max(0,\min(2u-1,1))=u \end{align}

On the other hand, $$Q(q)=\mathbb{1}_{(\frac12,1]}(q),\qquad 0<q\leq 1$$ and $$\{Q(U)\neq X\}\subset\{V=0\}$$ and $P[V=0]=0$.