How to decide on the standard error for difference of standard deviations

524 Views Asked by At

enter image description here

In this problem the null hypothesis is that $\sigma_1 = \sigma_2$ then why in the standard error a common sigma value is not used? It can he found as $\sigma = ({n_{1}s_1^{2}+ n_{2}s_{2}^2})/({n_{1}+n_{2}})$ And also while testing for difference of sample means i.e $H_0: \mu_1=\mu_2$, does it always mean that $\sigma_1=\sigma_2$? Or does that happen only when the samples are drawn from the same population? Basically what’s the difference from the null hypothesis that $H_0: \mu_1=\mu_2$ and the case when samples are drawn from same population? Is the testing done for the significant difference of population mean or sample mean? also in difference of proportions, under the hypothesis $P_1=P_1$ the standard error also has one term $P_1=P_2=P$ why doesn’t that happen here for sigma? Or let me put it this way, under the null hypothesis $P_1=P_1$ the standard error also has one term $H_0: P_1=P_2=P$ Is this $P$ same as the one calculated from $P=(n_1p_1+n_2p_2)/(n_1+n_2)$ where $p_i$ is the sample proportion. enter image description here

EDIT: I referred another text and here it says, under null hypothesis $H_0: \sigma_1=\sigma_2$ and continues to say that, i.e sample standard deviations don’t differ significantly. But are not $\sigma_1$ and $\sigma_2$ population standard deviations?

2

There are 2 best solutions below

8
On

The provided solution is complete garbage. Disregard it entirely. The test statistic they use is not normally distributed under the assumption of the null hypothesis. I don't know where this text comes from, but such an elementary mistake discredits the whole text.

To test the equality of variances (and in turn, standard deviations), the usual test is the $F$-ratio test for normally distributed populations, and Levene's test when departure from normality is considered. But under no circumstances does that solution make any sense whatsoever. It looks like the author of the solution simply treated the sample standard deviations like sample means, and assumed something like Welch's $t$-test would be an appropriate test statistic. But even then, there are mistakes: first, $z$ is not appropriate because the true population standard deviation is unknown. Second, there's an extra factor of $2$ in the denominators. Third, there is a sign error in the last expression where the author uses $-$ instead of $+$. But this is all irrelevant because as I have already stated, the solution is garbage.

0
On

The answer to your first question appears to be that a "common sigma value" was in fact used in the original definition of the $\ z\ $ statistic, given as $$ z=\frac{s_1-s_2}{\sigma\sqrt{\frac{1}{2n_1}+\frac{1}{2n_2}}}\ , $$ on page $20$ of Statistics for Engineers, in which the problem described under Example $2$ of your first illustration appears as Problem $6$ on page $24$. However, when you replace $\ \sigma\ $ with its estimate $\ \sqrt{\frac{n_1s_1^2+n_2s_2^2}{n_1+n_2}}\ $ you get the following estimate $\ z_e\ $ for the value of $\ z\ $: \begin{align} z_e&=\frac{s_1-s_2}{\sqrt{\left(\frac{n_1s_1^2+n_2s_2^2}{n_1+n_2}\right)\left(\frac{1}{2n_1}+\frac{1}{2n_2}\right)}}\\ &=\frac{s_1-s_2}{\sqrt{\frac{s_1^2}{2n_2}+\frac{s_2^2}{2n_1}}} \end{align} Please note the following:

  • In your question, you've omitted the square root sign from $\ \sqrt{\frac{n_1s_1^2+n_2s_2^2}{n_1+n_2}}\ $, which is a random quantity whose value is only an estimate of $\ \sigma\ $, and not equal to it (except by a very lucky coincidence).
  • In the above expression for $\ z_e\ $ the indices on $\ s\ $ and $\ n\ $ in the fractions in the denominator do not match. In the examples given in the above-cited book, this expression for $\ z_e\ $ gets replaced without explanation by $\ \frac{s_1-s_2}{\sqrt{\frac{s_1^2}{2n_1}+\frac{s_2^2}{2n_2}}}\ $, which corresponds to taking $\ \sqrt{\big(n_2s_1^2+n_1s_2^2\big)\big/\big(n_1+n_2\big)}\ $ as an estimate for $\ \sigma\ $ instead of the much more natural $\ \sqrt{\big(n_1s_1^2+n_2s_2^2\big)\big/\big(n_1+n_2\big)}\ $.
  • As heropup alludes to in this answer (which I fully agree with), none of $\ s_1, s_2, s_1-s_2, z\ $, or either version of $\ z_e\ $, have an approximately normal distribution, even for very large samples from a normally distributed population—contrary to the presumptions of the above-cited book. Thus, any calculation of significance levels based on the presumption that $\ z_e\ $ is normally distributed is completely unjustified and will almost certainly be wrong.

Update

If the samples are independently drawn from two homogeneous, normally distributed populations with the same variance (but possibly different means), then the true distribution of the absolute value of the statistic $\ z\ $ defined in Example $2$ of the OP's first illustration can be expressed in terms of the $F$-distribution with $\ n_1-1\ $ and $\ n_2-1\ $ degrees of freedom. This makes it easy to construct examples showing how the assumption that this statistic has a standard normal distribution can give completely erroneous results.

Take the case when $\ n_1=150,$ $n_2=250,$$\ s_1=14.0\ $, and $\ s_2=11.3\ $. Then $$ z=\frac{14.0-11.3}{\sqrt{\frac{(14.0)^2}{300}+\frac{(11.3)^2}{500}}}\approx2.83\ . $$ If $\ z\ $ had a standard normal distribution, this would give a $p$-value of $$ P\big(|z|>2.83\big)=2{\cal N}(0,1;-2.83)\approx0.00465\ , $$ implying that the null hypothesis could be rejected at the $0.5\%$ level of significance.

To obtain the true distribution of $\ |z|\ $, rewrite $\ z\ $ as $$ z=\frac{\frac{s_1}{s_2}-1}{\sqrt{\frac{1}{2n_1}\left(\frac{s_1}{s_2}\right)^2+\frac{1}{2n_2}}}\ . $$ A little algebraic manipulation then shows that \begin{align} |z|\le x&\iff \left(1-\frac{x^2}{2n_1}\right)\left(\frac{s_1}{s_2}\right)^2-2\left(\frac{s_1}{s_2}\right)+1-\frac{x^2}{2n_2}\le0\\ &\iff\ell_\ell\le \frac{s_1}{s_2}\le\ell_u\ \ \ (\text{ provided }\ \ x^2<2n_1), \end{align} where \begin{align} \ell_\ell&=\frac{1-\sqrt{1-\left(1-\frac{x^2}{2n_1}\right)\left(1-\frac{x^2}{2n_2}\right)}}{1-\frac{x^2}{2n_1}}\ \ \text{ and}\\ \ell_u&=\frac{1+\sqrt{1-\left(1-\frac{x^2}{2n_1}\right)\left(1-\frac{x^2}{2n_2}\right)}}{1-\frac{x^2}{2n_1}}\ . \end{align} Under the hypotheses stated in the opening paragraph of this update, the statistic $\ \frac{n_1\big(1-n_2\big)}{n_2\big(1-n_1\big)}\left(\frac{s_1}{s_2}\right)^2\ $ has an $F$-distribution with $\ n_1-1\ $ and $\ n_2-1\ $ degrees of freedom, and so \begin{align} P\big(|z|\le x\big)=&\,P\left(\frac{n_1\big(1-n_2\big)\ell_\ell^2}{n_2\big(1-n_1\big)}\le\frac{n_1\big(1-n_2\big)}{n_2\big(1-n_1\big)}\left(\frac{s_1}{s_2}\right)^2\le\frac{n_1\big(1-n_2\big)\ell_u^2}{n_2\big(1-n_1\big)}\right)\\ =&\,F_{n_1-1,n_2-1}\left(\frac{n_1\big(1-n_2\big)\ell_u^2}{n_2\big(1-n_1\big)}\right)-F_{n_1-1,n_2-1}\left(\frac{n_1\big(1-n_2\big)\ell_\ell^2}{n_2\big(1-n_1\big)}\right)\ . \end{align} For the example immediately above $\ \frac{n_1\big(1-n_2\big)\ell_\ell^2}{n_2\big(1-n_1\big)}\ $ and $\ \frac{n_1\big(1-n_2\big)\ell_u^2}{n_2\big(1-n_1\big)}\ $ evaluate approximately to $\ 0.668\ $ and $\ 1.539\ $ respectively, giving \begin{align} P\big(|z|\le2.83\big)&\approx F_{149,249}\big(1.539\big)-F_{149,249}\big(0.668\big)\\ &\approx0.930\ , \end{align} or $\ P\big(|z|>2.83\big)\approx0.070\ $. Thus, the true $p$-value is more than an order of magnitude larger than the erroneous value obtained by assuming $\ z\ $ has a standard normal distribution, and implies that the null hypothesis cannot be rejected even at a $5\%$ level of significance.