Estimator with smallest variance, Lagrange multiplier

995 Views Asked by At

I have a question and I think I'm supposed to use the Lagrange multiplier although I haven't been taught it, so I'm not sure if I can use it or not. The question is: Suppose that $X_i$ has mean $\mu$ and variance $\sigma_i^2$ with $i=1,....,n$. If $X_1, X_2,....X_n$ are independent, find the vector $c$ that yields the most efficient estimator (i.e. the estimator with the smallest variance.) I've been given that the unbiased estimator for $\mu$ is $\hat\mu_c=\Sigma_ic_iX_i$.

What I've done is find the variance of $\hat\mu_c$, which is $\Sigma_i(c_i^2\sigma_i^2)$. I'm not sure what I should do from there on. Any help would be great.

1

There are 1 best solutions below

0
On

Lets formulate the optimization problem you are trying to solve:

$\min\limits_{c_i} \Sigma_i(c_i^2\sigma_i^2)$

s.t.

$\sum_i c_i = 1$ (since unbiasedness requries $E[\sum_i c_iX_i] = \mu\sum_i c_i = \mu$)


Lets vectorize this formulation. Let $c^2=(c_1^2,c_2^2...c_n^2)$ and $\sigma^2=(\sigma_1^2,\sigma_2^2....\sigma_n^2)$ Then we get a formulation that looks like this:

$\min c^2\cdot \sigma^2$

$c \cdot \mathbf{1} = 1$

Therefore, all solutions must lie in the plane defined by $c \cdot \mathbf{1} = 1$

From the method of lagrange multipliers, we know that the gradient wrt $c$ of the objective function must line up with the gradient of the constraint (i.e., the objective function's tangent plane is parallel to the tangent plane of the constraint):

$\nabla(c^2\cdot \sigma^2)=\lambda \nabla(c \cdot \mathbf{1})$

Specifically, we get the system of equations:

$2c_i\sigma_i^2=\lambda\;\;\;\forall i$ and

$\sum_i c_i = 1$

Solving for the $c_i$ we get:

$c_i= \frac{\lambda}{2\sigma_i^2}$ plugging this into $\sum_i c_i = 1$ we get:

$\frac{\lambda}{2} \sum_i \frac{1}{\sigma_i^2} = 1 \implies \lambda = 2\left( \sum_i \frac{1}{\sigma_i^2}\right)^{-1}$

Thus:

$c_i =\left( \sigma_i^2 \sum_i \frac{1}{\sigma_i^2}\right)^{-1}$ which is an inverse variance weighted average: the larger the variability of a variable, the less we weight it's inferential value in estimating $\mu$.