I am trying to work some things in R and I am having trouble understanding some of the instructions.
I generated $1000$ samples of size $5$ from the standard normal distribution, and I calculated the mean of the sample variance of these. Now I want to know what the sample variance of my sample of sample variances is. But I am not sure I understand really what this means, nor how to implement this in R.
Further, I am asked to overlay the histogram I generated from my sample with a histogram of the theoretical density of the sampling distribution. What does this mean? Ie, what is meant by the theoretical density of the sampling distribution of the sample variance.
I know all my samples are coming from standard normal, where $\sigma^{2}=1$
and I know that if $X_{N}=X_{1}+...+X_{1000}$ would be $N(0,\frac{\sigma^{2}}{1000})$, is this at all what is being referred to?
I will appreciate any help and advice. Thank you
The distribution of the sample variance $S^2$ is given by $(n-1)S^2/\sigma^2 \sim \chi^2(n-1)$. I'm guessing that you are asked to provide an illustration of this relationship using R. Consider the following simulation.
This may not be exactly what you are being asked for, but it may point you in the right direction. I have overlaid a density curve on the histogram. I'm not sure what kind of histogram could be superimposed.
Probably, an important message here is that the relevant chi-squared distribution has df = n-1, not df = n. You can try superimposing the density of $Chisq(5)$ and you'll see it doesn't fit the histogram at all well.
$Addendum:$ I don't know if you know about density estimators, but for good measure, I also superimposed a density estimator (smoothed histogram) in green. For this particular simulation run the theoretical curve and the density estimator agree pretty well, but if you run the program several times you will get some cases in which the agreement isn't so good. (If you use m = 10,000, results will be more stable.)
Please let me know if you can make sense of this to finish your project. What is the variance of $Chisq(4)$? If you don't know, look at the Wikipedia article on 'Chi-squared distribution'.
Addendum per Comment from @Quality: Because $(n-1)S^2/\sigma^2 \sim Chisq(4),$ we have $V[4S^2] = 2(4)$ or $V(S^2) = 8/16 = 1/2$. Also,
vin the program represents $S^2,$ so it is not surprising thatvar(v)returns $0.488 \approx 0.5$ within simulation error. (Because variances are on a squared-unit scale, the margin of simulation error is numerically larger for variances than for means: Several additional runs of the program gave values between 0.47 and 0.59. Usem=10^6for a slower run with better accuracy.)