I have a few questions that I need some explanations/clarity for:
Two methods for generating a standard normal are:
Take the sum of 3 uniform random numbers and scale to have mean 0 and sd = 1. (use properties of the uniform distribution and mean to determine the required transformation).
Generate a standard uniform and then apply inverse cdf function to obtain a random variate. For each method generate 10000 random numbers and check the distribution using.
For both methods produce:
(a) Normal probability plot
(b) Mean and standard deviation
(c) The proportion of the data lying within the theoretical 2.5 and 97.5 percentiles and the 0.5 and 99.5 percentiles.
My understanding of the first question:
- Take the sum of 3 uniform random numbers and scale to have mean 0 and sd = 1. (use properties of the uniform distribution and mean to determine the required transformation).
Is that I must create a normal distribution by taking the sum of 3 random numbers with a mean of 0 and sd = 1?
To do this I used rnorm(3,mean=0,sd=1) in R. I am unsure of what it means by: (use properties of the uniform distribution and mean to determine the required transformation).
Could someone try to explain what this question means exactly? or link a resource that I can read that will help me understand?
- Generate a standard uniform and then apply inverse cdf function to obtain a random variate. For each method generate 10000 random numbers and check the distribution using.
To generate a standard uniform, I used R's runif(10000) function, then I believe I should use pnorm somehow to obtain the normal random variate? I am a little confused at this question, do I need to find the sd and mean of my newly generated distribution and then input it into pnorm()? What exactly is a normal random variate in this case?
Sorry for all the questions, I am extremely confused and I'm just trying to wrap my head around what a lot of this stuff actually means, I've tried googling a bulk of it but I still don't understand.
This is mainly an exercise about properties of distributions. Neither method of generating normal data makes practical sense.
(1) Sum of three independent uniform random variables is is not enough. Better to sum 12 of them and subtract 6, which gives a better fit to normal and avoids the need for scaling. [See Wikipedia on Irwin-Hall distributions.]
(2) If you're going to use the normal quantile function (inverse CDF) to convert standard uniform distributions to normal. That will work perfectly, but as a practical matter, why use
qnorm(runif(10))when you can use R to get 10 standard uniform random variables withrnorm(10)?Trying to simulate a standard normal RV by adding three uniform RVs. A little help with (1): A standard uniform random variable $U_1$ has $E(U_1) = 1/2, Var(U_1) = 1/12.$ So the total $T$ of three has $E(T) = 3/2, Var(T) = 3/12 = 1/4, SD(T) = 1/2.$ Then $Z = 2(T - 3/2)$ has $E(Z) = 0; SD(Z) = 1.$
You can't make a normal probability plot in (1) without having a sample of several. Instructions for (2) suggests $n=10,000,$ so let's go with that. Here is a sample of $n$ of the transformed $Z_i$ from above.
The normal probability plot is distinctly nonlinear. Also, you can tell something is wrong because values of $Z$ are all contained in $(-3,3),$ which is certainly not true of standard uniform random variables.
Here is a histogram of the simulated $Z$s along with the standard normal density curve. The fit is not so good.
col="skyblue2", main="Histogram of Poorly Simulated Std Normal Obs") curve(dnorm(x), add=T, col="red", lwd=2)
A formal Shapiro-Wilk test detects that the $Z$s aren't normal rejecting the null hypothesis of normality with a tiny P-value. [This procedure will accommodate only samples of size 5000 or smaller, so I used just that many.]
Mean and SD of the sample of $n = 10,000$ observations are very close to 0 and 1, respectively. With $10,000$ observations you can expect about 1 decimal place of accuracy. Our transformation from $T$ to $Z$ did what it was supposed to do.
A little more than 95% of the $Z$s fall in the interval $(-1.96, 1.96).$ Remember that the distribution of the $Z$s is short-tailed compared with standard normal. [In R,
z > -1.96 & z < 1.96is a logical vector ofTRUEs andFALSEs; itsmeanis the proportion of itsTRUEs. Also, the symbol&stands for intersection.]I will leave the rest of (1) and all of (2) for you to finish. [I don't this you're supposed to use
rnormfor this problem, but to explore ways of getting a normal sample without using this obvious procedure.]Notes: (1) There is no closed form for a normal CDF. However, for computer use Michael Wichura found a piecewise rational approximation to it, and also inverted it to get a rational approximation to the normal quantile function. Both are accurate up to double precision arithmetic. R uses Wichura's approximations for its procedures
pnormandqnorm.(2) In the early days of computation when almost all operations other than plain arithmetic were expensive, it was common practice to sample from the standard uniform distribution by adding 12 standard uniform variates (from a pseudoranom number generator) and subtracting 6. Not sufficiently precise for modern simulation, but served well into the 1960s.