Why to take "uniformly distributed" random numbers in $[0,1]$ to get desired random variable in inverse transform method of synthetic data generation?

45 Views Asked by At

I am learning probability for ML and came across a concept called inverse transform method to generate artificial data that transforms the random numbers (probabilities) in range $[0,1]$ into actual values that follow the desired distributions (By inverting the Cumulative distribution function of the actual distribution at hand). My doubt is, why should we only take uniformly distributed data to start with? (All the blogs I referred go straight into explaining the working of method instead of starting with the basic as this).

1

There are 1 best solutions below

0
On BEST ANSWER

We do it because the CDF of the uniform distribution is the identity function on $[0, 1]$, so it simplifies a lot of the calculations.

If we have data $X$ that is distributed with CDF $F$, and want to transform it to data $Y$ that has CDF $G$, then we are essentially trying to find $y$ such that

$$\begin{eqnarray} P(X \leq x) & = & P(Y \leq y) \\ F(x) & = & G(y) \\ y & = & G^{-1}(F(x)) \end{eqnarray}$$

The simplest case of this is if $X \sim U(0, 1)$, because then we can just eliminate the $F$ component and go straight to $y = G^{-1}(x)$.