In my statistics course notes the Welch-Satterthwaite equation, as used in the derivation of the Welch test, is formulated as follows:
Suppose $S_1^2, \ldots, S_n^2$ are sample variances of $n$ samples, where the $k$-th sample is a sample of size $m_k$ from a population with distribution $N(\mu_k, \sigma_k^2)$. For any real numbers $a_1, \ldots, a_n$, the statistic $$ L = \frac{\nu}{\sum_{k=1}^{n} a_k\sigma_k^2}\sum_{k=1}^n a_kS_k^2 $$ with $$ \nu = \frac{\left(\sum_{k=1}^m a_kS_k^2\right)^2}{\sum_{k=1}^m \frac{(a_kS_k^2)^2}{N_k - 1}} $$ approximately (sic) follows a chi-squared distribution with $\nu$ degrees of freedom.
Doing a quick Google search, most sources seem to follow a similar formulation. There are a few questions I have though:
- How can the number of degrees of freedom of a chi-squared distribution depend on the statistics $S_i$? Shouldn't the number of degrees of freedom be a constant?
- What does 'approximately follow a distribution' mean? This is closely related to my third question:
- How can one justify this approximation? All 'proofs' I have seen on the internet seem to assume that $L$ follows a chi-squared distribution and then show that $\nu$ must have the stated value, using basic properties of expected value and variance. Even those arguments are confusing to mee, since they seem to ignore the fact that the $S_i$ are itself stochastic variables and not known constants.
I found a link to the original papers by Welch and Satterthwaite, which I can't access freely. I wonder if there is a reasonably short answer to at least the first two of my questions. Us students are not expected to be able to prove the equation, so it's not that bad if there is no readily available proof, but I'd like to at least understand the statement itself.
It has been noted in another answer that the first and second moment matches between the linear combination of chi-squared variables and the Welch-Satterthwaite approximation. I think some additional perspective can be provided.
Namely:
For which I will define slightly differently $$ \nu = \frac{\left(\sum_{k=1}^m a_k \sigma_k^2\right)^2}{\sum_{k=1}^m \frac{(a_k \sigma_k^2)^2}{n_k - 1}} $$
Setup
Unfortunately, I will have to state some notation first:
Define $$ M := \sum_{k=1}^{m} a_k S^2_k $$ where the $S^2_k$ is the unbiased sample variance for the $k$-th group of data. A well-known result is $\frac{(n_{k}-1)S_{k}^2}{\sigma_k^2} \sim \chi^2_{n_k-1} $ (e.g. related post). Thus, the defined $M$ is a linear combination of $\chi^2$ random variables. We can re-express $M$ (with $X_k \sim \chi^2_{n_k-1}$)
$$ M := \sum_{k=1}^m \frac{a_k\sigma_k^2}{n_k-1} X_k $$ (this will be useful later).
The idea of Welch-Satterthwaite approximation is to approximate $M$ with a single scaled chi-squared rv. Define a random variable $L$ (contrast this with the $L$ statistic stated in the original post) s.t.
$$ \frac{\nu L}{ \alpha } \sim \chi_\nu^2, \text{ where } \alpha := \sum^m_{k=1} a_k \sigma^2_k $$ Note, $\nu $ was already defined above, but now we can also rewrite it as
$$ \nu = \frac{\alpha^2}{\sum_{k=1}^m \frac{(a_k \sigma^2_k)^2}{n_k-1} } $$
When Welch-Satterthwaite approximation is exactly correct (i.e. $M \sim L$)
Lets set a condition such that $\forall k, \frac{a_k \sigma_k^2}{n_k-1} = c$ where $c$ is a constant. Note that the term set to a constant is the scaling factor for all the chi-squared rvs.
Scaled chi-squared rvs are gamma distributed or more precisely, $ c \cdot \chi_\nu^2 \sim \Gamma(\nu/2, 2c)$ (shape-scale parameterization). Since the scale parameters are all equal under our condition,
$$ M \sim \Gamma ( \sum_{k=1}^m \nu_k / 2 , 2c) $$
(where $\nu_k = n_k -1$ are the degrees of freedom for the individual chi-squared rvs $X_k$)
Re-using the relationship b/w chi-squared and gamma distributed rvs again.
$$ \frac{M}{c} \sim \chi^2_\nu $$
Since with our condition, $$ \alpha = c \cdot \Big( \sum_{k=1}^m n_k- 1 \Big) $$
$$ \nu = \frac{1}{c} \cdot \alpha = \sum_{k=1}^m n_k - 1 $$ which is the sum over the degrees of freedom of the individual chi-squared rvs. So the degrees of freedom match to that of $\frac{\nu L}{\alpha}$.
Also note $c = \frac{\alpha}{\nu}$, so $$ \frac{M}{c} = \frac{\nu M }{\alpha} \sim \chi^2_\nu $$
which is the same as our defined $L$.
Bounds for $\nu$
There exists a bound for $\nu$ and it is:
$$ \min_k{\nu_k} \lt \nu \le \sum_{k=1}^m \nu_k $$
We have already seen in what scenario the inequality on the right is an equality. It is the scenario in which the Welch-Satterthwaite approximation is exactly correct.
First the left inequality. Recall,
$$ \nu = \frac{\alpha^2}{\sum_{k=1}^m \frac{(a_k \sigma^2_k)^2}{\nu_k} } $$
Let $$ \nu^* := \min_k{\nu_k} $$ Then $$ \nu \ge \frac{\alpha^2}{\frac{1}{\nu^*} \sum_{k=1}^m a_k \sigma^2_k } = \nu^* \alpha $$ $$ \implies \nu \gt \nu^* $$
The right inequality can be proven with Cauchy-Schwarz. Define vectors:
$$ u := ( \sqrt{\nu_1} \dots \sqrt{\nu_m}) $$
$$ v := \Big(\frac{a_1 \sigma_1^2}{\sqrt{\nu_1}} \dots \frac{a_m \sigma_m^2}{\sqrt{\nu_m}} \Big) $$
with Cauchy-Schwarz we have $$ (u \cdot v)^2 \le (u\cdot u) (v \cdot v) \\ $$ $$ \implies \alpha^2 = \Big( \sum_{k=1}^m a_k \sigma_k^2 \Big)^2 \le \Big(\sum_{k=1}^m \nu_k \Big) \Big( \sum_{k=1}^m \frac{(a_k \sigma_k^2)^2}{\nu_1} \Big) $$ $$ \implies \nu = \frac{\alpha^2}{\sum_{k=1}^m \frac{(a_k \sigma_k^2)^2}{\nu_1}} \le \sum_{k=1}^m \nu_k $$
Conclusions
There is a scenario s.t. the Welch-Satterthwaite approximation is exactly correct. There exists a (not very tight) bound for the degrees of freedom. Also, it was mentioned elsewhere that the 1st and 2nd moments match (so I do not prove this). It was also mentioned elsewhere that the $\sigma_k^2$ terms tend to be unknown and are thus replaced with the $S_k^2$ hence why the $\nu$ stated in my post is different.
It should be noted that this still does not provide us a way to tell us how "good" the Welch-Satterthwaite approximation is if our condition ($\forall k, \frac{a_k \sigma_k^2}{n_k-1} = c$) is not true.