Why do you use Z statistics instead of T for inference on categorical variables?

134 Views Asked by At

If I'm comparing the difference between two proportions, why do I use the Z statistic instead of a T statistic? I thought T statistics were used when you don't know the true standard deviation in a population - why doesn't this apply for differences in standard deviations of the samples taken?

1

There are 1 best solutions below

3
On

Basically it comes down to hypothesis testing. If you have a sample of $n$ Bernoulli($p$) variables $X_i$ with unknown $p$, you take the null hypothesis $p=\hat{p}=\overline{X}$, and then you compute tail probabilities (depending on what the alternative hypothesis is). But because you know $p$, you can exactly compute these tail probabilities using the binomial distribution. You don't need to know any other parameters to do these computations, just $p$.

The $T$ statistic strictly speaking is only correct for normally distributed data, but it arises because in the normal distribution, the mean and variance are independent parameters. That means that assuming $\mu=\overline{X}$ alone does not allow you to compute tail probabilities. You need to either estimating the variance or additionally include the variance in the null hypothesis (and we rarely want to do the latter). But when we estimate the variance, the error that we make changes the distribution of the test statistic so that it isn't normal anymore. Using the $t$ distribution instead corrects for this error.