Firstly let me apologise for asking this question on here - I see this as getting a sledgehammer to crack a nut, but I have nobody else that I can ask for advice on this topic.
I'm an A level Mathematics student so please forgive me for lack of/poor notation that you may normally come to expect/be familiar with.
I am looking at the specification for my exam that is coming up and these are three sections:
A) Construct symmetric confidence intervals for the mean of a normal distribution with known variance
B) Construct symmetric confidence intervals from large samples, for the mean of a normal distribution with unknown variance.
C) Construct symmetric confidence intervals from small samples, for the mean of a normal distribution with unknown variance using the t -distribution.
I believe I understand what is asked of me in A, as when the variance is known I look for the 'Z' value when calculating the interval. However, I find B and C somewhat similar. I had always thought that if the variance is unknown you would always use the 'T' value, however, if (in B) there is a large sample, would I just find the variance of that sample and use the 'Z' value?
If so, what is the generally accepted number for when a small sample becomes a large sample?
Thanks for any help that you can offer, I'm really grateful!
The difference between B and C is in the choice of critical value. A typical symmetric confidence interval for a location parameter has the form $$\text{point estimate} \pm \text{critical value} \times \text{standard error},$$ where in turn $$\text{standard error} = \frac{\text{standard deviation}}{\sqrt{\text{sample size}}},$$ and $$\text{critical value} \times \text{standard error} = \text{margin of error}.$$ The choice of critical value is informed by the nature of the sampling distribution. When a normally distributed population has unknown variance, the sampling distribution of the mean is Student's $t$-distributed; however, when the sample size is large, the difference in critical values is negligible; i.e., $$\lim_{\nu \to \infty} t^*_{\nu, \alpha} = z^*_{\alpha},$$ where $t^*_{\nu, \alpha}$ is the upper $\alpha$ quantile of the Student's $t$ distribution with $\nu$ degrees of freedom, and $z^*_\alpha$ is the upper $\alpha$ quantile of the standard normal distribution. So for scenario B, you will use $z^*_{\alpha/2}$ for a two-sided CI with large sample size as an approximation; for scenario C, you will use $t^*_{\nu, \alpha/2}$ for the critical value.
The difference between A and B lies in the standard deviation. In scenario A, it is presumed to be known, thus the standard error will use the population standard deviation $\sigma$ in the numerator. In scenario B, it is unknown, thus you will use the unbiased estimator of the standard deviation $$s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar x)^2}.$$ In practice, the factor of $n-1$ rather than $n$ makes little difference when $n$ is large as in this case. But in scenario C, as $\sigma$ is also unknown and must be estimated from the sample, you must use $n-1$, otherwise your CI will not have the required coverage probability even if you correctly use the $t$-distribution quantile for the critical value.