Why is this true? (uniform distributions)

388 Views Asked by At

If $Z=F(X)$, then $Z$ has a uniform distribution on $[0,1]$.

I understand the proof using functions, but visually it doesn't make sense.

If $X$ is a normal distribution, then its cdf $F(X)$ looks like:

enter image description here

If we take $Z=F(X)$, $Z$ does not look like a uniform distribution (no flat top, its curved here). What am I missing?

2

There are 2 best solutions below

2
On BEST ANSWER

If you transform a random sample by the (continuous) CDF of its population, it becomes standard uniform.

First, I find the notation to be unfortunate and it may be confusing you. Let's say $U = F_X(X) \sim \mathsf{Unif}(0,1),$ for a continuous random variable $X.$

Second, let's investigate a couple of specific examples.

Example 1: Suppose $X = \mathsf{Beta}(2,2),$ with $f_X(x) = 6x(1-x),$ for $0<x<1,$ a parabola. Also, $F_X(x) = 3x^2 - 2x^3.$

We can use R statistical software to sample $n=500$ observations in vector x from $\mathsf{Beta}(2,2).$ in R, the density is denoted dbeta and the CDF is denoted pbeta (each with appropriate parameters).

set.seed(721)
x = rbeta(500, 2, 2)
u = pbeta(x, 2, 2)

The expression for u amounts to $u =3x^2 - 2x^3.$

A histogram (blue) provides a crude estimate of the corresponding density function (red), and an empirical CDF (ECDF) plot provides an estimate of the corresponding CDF.

enter image description here

An ECDF uses actual data, while a histogram uses binned data (with some loss of information). So ECDFs are ordinarily better estimates of CDFs, than are histograms estimates of densities. The number of observations $n = 500$ is a compromise to get histograms that are not too rough, while getting an ECDF that can (barely) be distinguished from its corresponding CDF.

As you suspected, the 'action' takes place in the ECDF plot. Below we show how nine particular points in x (out 0f 500) get transformed to appropriate values in 'u'.

enter image description here

Example 2: Here is a similar example in which we transform 1000 points 'x' from the distribution \mathsf{Norm}(\mu=100,\sigma=15) to standard uniform by using the CDF of this normal distribution.

set.seed(2021)
x = rnorm(500, 100, 15)
u = pnorm(x, 100, 15)

enter image description here

Note: In case you are interested in R code for the figures, here is the code for the first two. (The third uses minor modifications of the first.)

par(mfrow = c(1,3))
hdr1 = "BETA(2,2): Histogram and Density"
hist(x, prob=T, col="skyblue2", main=hdr1)
 curve(dbeta(x,2,2), add=T, col="red", lwd=2)
hdr2 = "BETA(2,2), ECDF and CDF"
plot(ecdf(x), col="blue", lty="dashed", main=hdr2)
 curve(pbeta(x,2,2), add=T, col="red")
 hdr3 = "UNIF(0,1): Histogram and Density"
 hist(u, prob=T, col="skyblue2", main=hdr3)
  curve(dunif(x), add=T, col="red", lwd=2)
par(mfrow=c(1,1))

sx = sort(x);  X = sx[seq(50,450, by=50)]
U = pbeta(X, 2,2)
plot(ecdf(x), col="blue", main="ECDF or Beta Sample")
for(i in 1:9) {
 lines(c(-.2, X[i], X[i]), c(U[i],U[i], 0), col="green2")
 }

[The R procedure curve requires the argument x, regardless of what is being plotted.]

0
On

I assume that $F$ is the cumulative distribution function associated to the given random variable $X$.

In general, the claim that $Z = F\circ X$ is uniformly distributed over $[0,1]$ is false, unless we impose condition on $F$ (we require that $F$ is continuous).

The following is an obvious counter-example: Let $X$ be a constant random variable $X=0$. Then $F(x) = 0$ if $x<0$ and $F(x)=1$ if $x\geq 0$. In this case $F\circ X$ can only take values 0 or 1, so $F\circ X$ cannot be uniformly distributed over $[0,1]$.