Examples of how to transform a random variable with a distribution not Gaussian to a random variable with Gaussian distribution

725 Views Asked by At

My doubt is if there exists a method to transform a random variable with not Gaussian distribution in other with a Gaussian distribution.

I only found a random variable with Birhaum Saunders distribution that I can transform in Gaussian.

I would like to obtain other examples.

2

There are 2 best solutions below

1
On BEST ANSWER

If you have a continuous random variable $X$ with distribution function $F_X(X)$, that you know what it is, then the random variable

$$Y=\sigma \Phi^{-1}[F_X(X)] + \mu \sim N(\mu,\sigma^2)$$

where $\Phi()^{-1}$ is the inverse standard Normal distribution function.

Since this is the standard way of generating draws from a Normal Distribution (since $F_X(X)\sim U(0,1)$), I wonder whether this is what you are really asking here.

0
On

Add 12 Uniforms. Demonstration in R:

Let $X_1, X_2, \dots, X_{12}$ be independently $\mathsf{Unif}(0,1),$ and let $Z = \sum_{i=1}^{12} X_i - 6.$ Then $Z \stackrel{aprx}{\sim}\mathsf{Norm}(0,1).$ Each $X_i$ has $E(X_i) = 1/2, Var(X_i) = 1/12,$ so $E(Z) = 0$ and $Var(Z) = 1.$ By the Central Limit Theorem, $Z$ is very nearly normal. The main flaw is that the method never produces $|Z|>6.$

In the demonstration below $m = 1000$ standard normal observations produced by this method pass a Shapiro-Wilk normality test and their histogram seems a close match to the standard normal density function.

set.seed(821);  m = 1000
z = replicate(m,  sum(runif(12))-6)
shapiro.test(z)

    Shapiro-Wilk normality test

data:  z
W = 0.99817, p-value = 0.3615   # P-value > .05, so no evidence of non-normality

hist(z, prob=T, col="skyblue2")
  curve(dnorm(x), -5, 5, add=T)

enter image description here

Box-Muller: Let $X_1, X_2$ be independently $\mathsf{Unif}(0,1).$ Then $$Z_1 = \sqrt{-2\log(X_1)}\cos(2\pi X_2)\;\;\text{and}\;\; Z_2 = \sqrt{-2\log(X_1)}\sin(2\pi X_2)$$ are independently $\mathsf{Norm}(0,1).$

Theoretically, the $Z_i$ are exactly standard normal. This method is discussed at length in Wikipedia.

A computational flaw is that trig and log functions can underflow system capabilities and not produce $|Z|$ greater than about $7,$ depending on the software.

In the demonstration below $m = 1000$ standard normal observations produced by this method pass a Shapiro-Wilk normality test and their histogram seems a close match to the standard normal density function.

set.seed(822)
u1 = runif(500);  u2 = runif(500)
z1 = sqrt(-2*log(u1))*cos(2*pi*u2)
z2 = sqrt(-2*log(u1))*sin(2*pi*u2)
z = c(z1, z2)
shapiro.test(z)

        Shapiro-Wilk normality test

data:  z
W = 0.99818, p-value = 0.3672

hist(z, prob=T, col="skyblue2")
  curve(dnorm(x), -5, 5, add=T)

enter image description here

Wichura method: Let $\Phi$ denote the standard normal CDF. If $\Phi$ were expressible in closed form and were invertable, then $Z = \Phi^{-1}(U) \sim \mathsf{Norm}(0,1),$ for $U \sim \mathsf{Unif}(0,1).$

Although $\Phi$ is not expressible in closed form Wichura (1988) constructed a rational approximation to $\Phi$ that is accurate to the limits of double-precision arithmetic, and he also found a similarly accurate approximation to $\Phi^{-1}.$ In R statistical software the function rnorm uses Wichura's inverse to generate standard normal observations from standard uniform ones produced by the Mersenne-Twister pseudorandom generator. Some technical fine-tuning aside, rnorm(10) essentially uses qnorm(runif(10)) to produce ten independent standard normal observations.

set.seed(818);  z = rnorm(1000)
shapiro.test(z)

        Shapiro-Wilk normality test

data:  z
W = 0.99877, p-value = 0.732

hist(z, prob=T, col="skyblue2")
 curve(dnorm(x), -5, 5, add=T)

enter image description here