n <- 10
nsims <- 10000
true_ci_cov <- vector(length=nsims)
est_ci_cov <- vector(length=nsims)
for(i in 1:nsims){
data<-rnorm(n, mean=1, sd=2)
mean_dat<-mean(data)
sd_dat<-sd(data)
ci_true_low <- mean_dat - 1.96*2/sqrt(n)
ci_true_high <- mean_dat + 1.96*2/sqrt(n)
ci_est_low <- mean_dat - 1.96*sd_dat/sqrt(n)
ci_est_high <- mean_dat + 1.96*sd_dat/sqrt(n)
true_ci_cov[i] <- (ci_true_high >=1)*(ci_true_low <=1)
est_ci_cov[i] <- (ci_est_high >=1)*(ci_est_low <=1)
}
mean(true_ci_cov)
mean(est_ci_cov)
Here I understood in the code that it is storing the values in an array and returning the mean of that array but I want to understand what is the significance or what does that mean (true_ci_cov) and mean(est_ci_cov) value represent?
In each loop over $i$ this code draws $n$ independent normally distributed rvs $X_1,...,X_n$ with mean $1$ and standard deviation $2\,.$ Then it calculates $$ \text{mean_dat} = \frac{1}{n}\sum_{i=1}^nX_i\,,\quad\text{sd_dat}=\frac{1}{\sqrt{n}}\sqrt{\sum_{i=1}^n(X_i-\text{mean_dat})^2}\,. $$ Due to random noise, mean_dat will fluctuate around its theoretical value $1\,.$ To know by how much, observe first that the variance of each $X_i$ is $4$ so that $$ \mathbb E[X_i^2]=5\,. $$ Therefore, $$ \mathbb E\Big[\text{mean_dat}\Big]=1\,, $$ and $$ \mathbb E\Big[\,\text{mean_dat}^2\,\Big]=\frac{1}{n^2}\sum_{i=1}^n\underbrace{\mathbb E[X_i^2]}_{5}+\frac{1}{n^2}\sum_{i\not=j}\underbrace{\mathbb E[X_i]\mathbb E[X_j]}_{1}=\frac{1}{n^2}\sum_{i=1}^n5+\frac{n^2-n}{n^2}=\frac{4}{n}+1\,. $$ Therefore, in theory, the fluctuations of mean_dat have a standard deviation of \begin{align}\tag{1} &\sqrt{\mathbb E\Big[\text{mean_dat}^2\Big]-\mathbb E\Big[\text{mean_dat}\Big]^2}=\frac{2}{\sqrt{n}}\,. \end{align} Likewise, sd_dat fluctuates around its theoretical value $2\,.$ Alternatively to the theoretical standard deviation (1) the code also calculates the standard deviation using the estimated sd_dat instead of 2: $$\tag{2} \frac{\text{sd_dat}}{\sqrt{n}}\,. $$ The code now checks if the drawn mean_dat is within a confidence interval around its theoretical value $1$ where the interval is $$ \left[1-1.96\frac{2}{\sqrt{n}},1+1.96\frac{2}{\sqrt{n}}\right]\text{ resp. } \left[1-1.96\frac{\text{sd_dat}}{\sqrt{n}},1+1.96\frac{\text{sd_dat}}{\sqrt{n}}\right]\,. $$ Because mean_dat is normally distributed, the first interval corresponds to 95% probability. In theory, the last two lines
mean(true_ci_cov)mean(est_ci_cov)should produce something close to 0.95.