Estimator for linear regression where data points have different variances

121 Views Asked by At

So in the case where data points have the same variance $\sigma^2$, the estimator (in normal equation form) can be written as

$$\theta=(X^TX)^{-1}X^TY$$

I'm not sure how to derive a similar formula when the data points have different variances, and thus the covariance matrix would be

$$\Sigma = diag(\sigma_1^2, \sigma_2^2, ...,\sigma_n^2)$$

2

There are 2 best solutions below

0
On

If you have any system of linear equations $$ X \theta = Y $$ one can show that $$ X^T X \theta^* = X^T Y $$ has a solution $\theta^*$ $$ \theta^* = (X^T X)^{-1} X^T Y $$ which minimizes $$ e = X\theta - Y \ $$ in the the Euclidean norm ("least squares"): $$ \lVert X\theta^* - Y \rVert_2 \le \lVert X\theta - Y \rVert_2 $$ This is independent from the variance.

0
On

what @mvw wrote is true, however, not using the information about the variances of the noise will, for sure, decrease the performance of your estimator.

The problem given by @Jenny, is equivalent to

\begin{equation} Y = X \theta + n, n \sim {\mathcal{N}}(0,\Sigma), \end{equation}

where $\Sigma \overset{def}{=} diag(\sigma_1^2, \sigma_2^2, \dots,\sigma_n^2)$ is the known and deterministic noise covariance matrix and $\theta$ is an unknown deterministic parameters vector.

In that case the optimal (Minimum Variance Unbiased or MVU) solution/estimator is given by the weighted least squares (WLS) estimator:

\begin{equation} \widehat{\theta}_{WLS} = (X^T \Sigma^{-1} X)^{-1} X^T \Sigma^{-1} Y. \end{equation}

In the linear Gaussian case this estimator is a MVU, and an efficient estimator which attains the Cramer-Rao bound.

In the general case, where the noise is not necessarily Gaussian, it can be shown that the covariance matrix of the WLS estimator will be smaller (in a matrix sense) than any other choice of a weighting matrix including the identity matrix which is in the solution @mvw suggested.

If someone wants the proof I can sent it, my email is [email protected].