I am in trouble to find where I am making a mistake...
I have to estimate the parameters a and b of the curve modeled by:
$y = a x^2 + bx$
I have to do that from K measures of the curve, each measure is modeled by:
$y_i = a x^2 + bx + \epsilon_i$
Where $\epsilon_i$ is a Gaussian random variable that follows $N(0,\sigma_i^2)$
After collecting a group of K measures, I start the estimation. In order to do that, I am using weighted least squares for estimating my parameters:
$\Theta = \left[\begin{array}{c} a\\ b \end{array}\right]$
So, the closed solution formula tells me that:
$\hat\Theta = (\Phi^TR^{-1}\Phi)^{-1}\Phi^TR^{-1}y$
Where:
y is a $[k \times 1]$ vector with k measures of the curve.
R is a $[k \times k]$ matrix constructed from $R = E[\epsilon \epsilon^T] = diag(\sigma_{\epsilon 1}^{2} ... \sigma_{\epsilon k}^{2})$
$\Phi$ is a $[k \times 2]$ matrix that is equals to $\left[\begin{array}{cc} x^{2} & x\\ \vdots & \vdots\\ x^{2} & x \end{array}\right]$
The problem is that I always find
$(\Phi^TR^{-1}\Phi)$
as a singular matrix, therefore, I am unable to invert it and get to the final estimatiion.
What am I doing wrong? Have I made a mistake in the construction of the problem?
Thank you!
The regression model written in matrix notation is given by
$$y=\Phi\Theta+\epsilon.$$
We want to find the least-squares solution $\Theta=\hat{\Theta}$ that maximizes the likelihood of the multivariate Gaussian distributed errors, i.e.
$$L\propto\exp\left(-\frac{1}{2}\epsilon^TR^{-1}\epsilon\right).$$
If we view $R^{-1}$ as the metric of an inner product (since it's positive-definite), we need to minimize $\Vert\epsilon\Vert_{R^{-1}}^2\equiv\epsilon^TR^{-1}\epsilon$ by choosing our $\Theta=\hat{\Theta}$ such that the vector $y-\Phi\hat{\Theta}$ is orthogonal to $\Phi$ under the same metric $R^{-1}$. Therefore,
$$\Phi^TR^{-1}(y-\Phi\hat{\Theta})=0,$$
which leads to the least-squares formula
$$\hat{\Theta}=(\Phi^TR^{-1}\Phi)^{-1}\Phi^TR^{-1}y.$$
So your formula is correct. This formula can even handle correlated errors if you put off-diagonal entries in $R$. The matrix $R$ should be positive-definite. If you find numerically that $\Phi^TR^{-1}\Phi$ is close to singular, use ridge regression
$$\hat{\Theta}_\lambda=(\Phi^TR^{-1}\Phi+\lambda I)^{-1}\Phi^TR^{-1}y,$$
where $\lambda>0$ is the ridge parameter.