I am trying to learn more about the inverse-variance weighting method (https://en.wikipedia.org/wiki/Inverse-variance_weighting). Here is come context to my question.
Part 1:
Suppose have Random Variables $X_1, X_2, ... X_N$ . Let's say that for any of these Random Variables we have a sample size of $n_i$ and a sample mean of $\bar{x}_1, \bar{x}_2, \dots, \bar{x}_n$. Suppose if we "magically" knew the population variance for all $X_1, X_2, ... X_N$, that is $V(X_i) = \sigma_i^2$ . That is, we assume that each $\bar{x}_i \sim N(\mu, \sigma_i^2)$. It is reasonable to assume that each $\bar{x}_i$ has a Normal Distribution by virtue of the Central Limit Theorem. We are interested in estimating $\mu$ with some estimator $\mu_w$.
Given this information, suppose we wanted to estimate $\mu_w$ in such a way, such that the resulting variance $Var(\mu_w)$ is smallest. We can consider this problem as an optimization problem where we minimize $Var(\mu_w)$ and can solve this using the Lagrange Method:
$$ \mu_w = \frac{\sum_{i=1}^n w_i \bar{x_i}}{\sum w_i}$$
where $w_i$ represents the weights assigned to each sample mean.
To minimize the variance of the weighted mean, we can formulate the problem as follows:
$$ \begin{aligned} & \text{Minimize:} & V = w_1^2 \sigma_1^2 + w_2^2 \sigma_2^2 + \dots + w_n^2 \sigma_n^2 \\ & \text{Subject to:} & w_1 + w_2 + \dots + w_n = 1 \end{aligned} $$
$$ L = V - \lambda(w_1 + w_2 + \dots + w_n - 1) $$
$$ \frac{\partial L}{\partial w_i} = 2w_i\sigma_i^2 - \lambda = 0 \quad \text{for } i = 1, 2, \dots, n $$
$$ w_i = \frac{\lambda}{2\sigma_i^2} \quad \text{for } i = 1, 2, \dots, n $$
$$ \frac{\partial L}{\partial \lambda} = w_1 + w_2 + \dots + w_n - 1 = 0 $$
$$ \frac{\lambda}{2\sigma_1^2} + \frac{\lambda}{2\sigma_2^2} + \dots + \frac{\lambda}{2\sigma_n^2} - 1 = 0 $$
$$ \lambda = 2(\frac{1}{\sigma_1^2} + \frac{1}{\sigma_2^2} + \dots + \frac{1}{\sigma_n^2})^{-1} $$
$$ w_i = (\frac{1}{\sigma_1^2} + \frac{1}{\sigma_2^2} + \dots + \frac{1}{\sigma_n^2})^{-1}\frac{1}{\sigma_i^2} \quad \text{for } i = 1, 2, \dots, n $$
The resulting values of $w_i$ will produce the minimum variance of $\mu_{w}$ and is a well known result called "Inverse-Variance Weighting" (https://en.wikipedia.org/wiki/Inverse-variance_weighting).
Part 2:
Now suppose we have a more realistic situation where we have sample means $\bar{x}_1, \bar{x}_2, \dots, \bar{x}_n$ and only sample variances $s_1^2, s_2^2 ... s_n^2$. Let's assume that each $s_i^2$ is an unbiased estimator of $\sigma_i^2$. As such, we have a similar problem where we assume that each $\bar{x}_i \sim N(\mu, \sigma_i^2)$. We are still interested in estimating $\mu$ with some estimator $\mu_w$.
Although this situation is similar to Part 1, it is still more complicated given that we are dealing with $s_i^2$ instead of $\sigma_i^2$. To find out the minimum variance estimator $\mu_w$, we now need to take into consideration the sampling distribution of $s_i^2$. The following paper (https://www.jstor.org/stable/3001633) shows how to do this:
Each $w_i$ maintains a similar formula:
$$\hat{w}_i = \left(\frac{1}{s_1^2} + \frac{1}{s_2^2} + \dots + \frac{1}{s_n^2}\right)^{-1}\frac{1}{s_i^2} \quad \text{for } i = 1, 2, \dots, n $$
But the Variance of $\mu_w$ based on these weights is now:
$$\hat{V}(\mu_{\text{weighted}}) = \left(\frac{1}{\sum \frac{1}{s_i^2}}\right) \left[ 1 + 4 \sum_{i=1}^{n} \left( w_i(1-w_i) \frac{1}{n_i}\right)\right] $$
If I understand correctly, the above estimator is the minimum variance estimator based on $s_i^2$ instead of $\sigma_i^2$. (??? clarification needed ???)
Part 3: Finally, this is where my question comes in. Now, let's suppose we have an even more realistic situation in which we have sample means $\bar{x}_1, \bar{x}_2, \dots, \bar{x}_n$, sample variances $s_1^2, s_2^2 ... s_n^2$ - but this time $\bar{x}_i \sim N(\mu_i, \sigma_i^2)$
I tried to myself to derive an estimator for Part 3 where this requirement is not present i.e. $\bar{x}_i \sim N(\mu_i, \sigma_i^2)$. For example, suppose we have two sample $n_1, n_2$ with sample means $\bar{x}_1, \bar{x}_2$ and sample variances $s_1^2, s_2^2$. We can write:
$$\mu_w = w_1(\bar{x_1}) + w_2(\bar{x_2}) \quad \text{where } w_1 + w_2 = 1 $$
$$\hat{V}(\mu_w) = \frac{\sum(x_{1i} - \mu_w)^2 + \sum(x_{2i} - \mu_w)^2}{n_1 + n_2 - 1}$$
$$\hat{V}(\mu_w) = \frac{\sum(x_{1i}^2) - 2\mu_w \sum(x_{1i}) + n_1\mu_w^2 + \sum(x_{2i}^2) - 2\mu_w \sum(x_{2i}) + n_2\mu_w^2}{n_1 + n_2 - 1}$$
After further manipulation and simplification:
$$\hat{V}(\mu) = \frac{(n_1-1)s_1^2 + n_1\bar{x}_1^2 + (n_2-1)s_2^2 + n_2\bar{x}_2^2 + (n_1+n_2)\mu_w^2 - 2\mu_w(n_1\bar{x}_1 + n_2\bar{x}_2)}{n_1+n_2-1} $$
Now, if we take the partial derivatives of $\hat{V}(\mu)$ with respect to $w_1$ and $w_2$, and then set these to 0 - we get that $w_1 = n_1/(n_1+ n_2)$ and $w_2 = n_2/(n_1 + n_2)$. Furthermore, I then took the second partial derivative of $\hat{V}(\mu)$ with respect to $w_1$ and $w_2$ and observed that both of these second partial derivatives were positive - indicating that $w_1$ and $w_2$ are indeed minimum solutions. The above derivation can also be extended for more than two sample means.
My Question:
- I was able to derive a minimum variance estimator in Part 3 which did not assume that all means were equal - as a matter of fact, in Part 3, it seems like a minimum variance estimator was derived without any assumptions whatsoever! Looking back, I also notice that Part 1 and Part 2 also did not seem to require $\bar{x}_i \sim N(\mu, \sigma_i^2)$. Thus, I don't understand why Part 1 and Part 2 require each $\bar{x}_i \sim N(\mu, \sigma_i^2)$ ?
- An interesting note, in Part 3 it seems like the variances are not present in the weights $w_1$ and $w_2$ at all! Thus, is it possible that the Inverse-Variance Weighting Scheme for unequal means (i.e. Part 3) results in weights solely based on relative sample sizes (i.e. $n_1,n_2$) ?
- And finally, just to re-iterate one more time - in the situation where $\bar{x}_i \sim N(\mu_i, \sigma_i^2)$, is it true that the mean estimator with the lowest variance is equivalent to what I have derived in Part 3?
Can someone please comment on this?
Thanks!