Assumption of normality while creating CI from chi-squared and t-statistic pivots?

282 Views Asked by At

While explaining the use of a chi-squared pivot or a t-statistic in creating confidence interval, we were told that one of the underlying assumption is the normality of the data. Chi-squared distribution, as I understand it, is the distribution of any sum of squares of Normal RVs.

So while using it for variance, does not the Central Limit Theorem dictate that the sample mean $\bar X$, irrespective of the distribution of the original RV, will be distributed normally, and since the population mean ($ \mu $) is a constant, $ \bar X-\mu $ will be normally distributed and hence their squares should follow a chi-squared distribution? Where is the assumption of normality? The same argument applies to a t-statistic.

1

There are 1 best solutions below

3
On BEST ANSWER

Let $X_1,..,X_n$ i.i.d $\mathcal{N}(\mu, \sigma^2)$ each one,hence by definition $$ \sum_{i=1}^n\left(\frac{X_i - \mu}{\sigma} \right)^2 \sim \chi^2_n \, . $$ Now, in order to estimate the variance of the Normal distribution, you are using (variation of) $$ S^2 = \frac{1}{n}\sum_{i=1}^n(X_i - \bar{X})^2, $$ thus $$ \sigma^2 S^2 = \frac{\sigma^2}{n}\sum_{i=1}^n\left(\frac{X_i - \bar{X}}{\sigma}\right)^2 \sim \frac{\sigma^2}{n} \chi^2_{n-1}. $$ For the construction of the CI you are using the fact that the estimator is distributed $\chi^2$ up to some constant term, i.e., $$ P\left(\chi^2_{n-1, \alpha/2} \le \frac{1}{\sigma^2} \sum_{}^n(X_i - \bar{X})^2 \le \chi^2_{n-1, 1-\alpha/2} \right) = 1- \alpha $$ or by simple manipulation you get the familiar form $$ P\left(\chi^2_{n-1, \alpha/2} \le \frac{n}{\sigma^2} S^2\le \chi^2_{n-1, 1-\alpha/2} \right) = 1- \alpha, $$ since all the terms are positive and thus $g_1(x)=1/x$ and $g_2(x) = x^{1/2}$ are well defined one to one functions on $\mathbb{R}^+$, you get the final expression of $$ P\left(\sigma \in \left[\sqrt{nS^2/\chi^2_{n-1, 1-\alpha/2}}, \sqrt{nS^2/\chi^2_{n-1, \alpha/2}} \right] \right) = 1-\alpha. $$ Note that all these manipulations are justified by the initial assumption of normality of the data $X_1,..., X_n$. With the $t$ distribution is the same idea, one of the possible ways to define it, is by $$ T = \frac{Z}{\sqrt{Q/n}},\,\, Q\sim \chi^2_n. $$ So $(\bar{X}_n - \mu)/\sigma \sim \mathcal{N}(0,1)$ only where $X_1,...,X_n$ is initially $\mathcal{N}(\mu, \sigma^2)$. Particularly, the familiar result of $$ \frac{\bar{X}-\mu}{S/\sqrt{n}}\sim t, $$ can be attained by simple algebra using the aforementioned definitions. Note that the CLT is not required here as the results hold precisely and for any $n \in \mathbb{N}$. When your original data is not normal you can use the CLT in order to deduce the asymptotic normality of $\bar{X}_n$. But in this case your distribution is only approximatelly normal and $n$ should be large enough, as such the use of $t$ may be redundant as the standard normal distribution will yield paractically the same results.