Confidence Intervals (A level)

44 Views Asked by At

Firstly let me apologise for asking this question on here - I see this as getting a sledgehammer to crack a nut, but I have nobody else that I can ask for advice on this topic.

I'm an A level Mathematics student so please forgive me for lack of/poor notation that you may normally come to expect/be familiar with.

I am looking at the specification for my exam that is coming up and these are three sections:

A) Construct symmetric confidence intervals for the mean of a normal distribution with known variance

B) Construct symmetric confidence intervals from large samples, for the mean of a normal distribution with unknown variance.

C) Construct symmetric confidence intervals from small samples, for the mean of a normal distribution with unknown variance using the t -distribution.

I believe I understand what is asked of me in A, as when the variance is known I look for the 'Z' value when calculating the interval. However, I find B and C somewhat similar. I had always thought that if the variance is unknown you would always use the 'T' value, however, if (in B) there is a large sample, would I just find the variance of that sample and use the 'Z' value?

If so, what is the generally accepted number for when a small sample becomes a large sample?

Thanks for any help that you can offer, I'm really grateful!

2

There are 2 best solutions below

0
On BEST ANSWER

The difference between B and C is in the choice of critical value. A typical symmetric confidence interval for a location parameter has the form $$\text{point estimate} \pm \text{critical value} \times \text{standard error},$$ where in turn $$\text{standard error} = \frac{\text{standard deviation}}{\sqrt{\text{sample size}}},$$ and $$\text{critical value} \times \text{standard error} = \text{margin of error}.$$ The choice of critical value is informed by the nature of the sampling distribution. When a normally distributed population has unknown variance, the sampling distribution of the mean is Student's $t$-distributed; however, when the sample size is large, the difference in critical values is negligible; i.e., $$\lim_{\nu \to \infty} t^*_{\nu, \alpha} = z^*_{\alpha},$$ where $t^*_{\nu, \alpha}$ is the upper $\alpha$ quantile of the Student's $t$ distribution with $\nu$ degrees of freedom, and $z^*_\alpha$ is the upper $\alpha$ quantile of the standard normal distribution. So for scenario B, you will use $z^*_{\alpha/2}$ for a two-sided CI with large sample size as an approximation; for scenario C, you will use $t^*_{\nu, \alpha/2}$ for the critical value.

The difference between A and B lies in the standard deviation. In scenario A, it is presumed to be known, thus the standard error will use the population standard deviation $\sigma$ in the numerator. In scenario B, it is unknown, thus you will use the unbiased estimator of the standard deviation $$s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar x)^2}.$$ In practice, the factor of $n-1$ rather than $n$ makes little difference when $n$ is large as in this case. But in scenario C, as $\sigma$ is also unknown and must be estimated from the sample, you must use $n-1$, otherwise your CI will not have the required coverage probability even if you correctly use the $t$-distribution quantile for the critical value.

0
On

As number of degrees of freedom grows, $t$-distribution approaches standard normal distribution. So yes, for a sufficiently large sample, you would use Z-values corresponding to your desired confidence level (e.g., 1.96 for 95% confidence interval) and multiply them by the standard error of the sample (sample standard deviation divided by the square root of the number of observations). In C, you would use $t$ values because of the small sample size. Generally, for sample sizes greater than 1000 you can always use Z-values.