I've studied the least-squares method from the Calculus approach, using polynomials of degree $n$: for a set of data points $(x_i,y_i)$, $i=1,...,n$, define a function $E=(y_i-f(x_i))^2$ with $f(x;\alpha_i)$ being the polynomial whose coefficients $\alpha_i$ we're trying to find. Solve for $\nabla f=0$, and you get a system $Ma=u$ where $a$ is the column matrix containing the coefficients of your polynomial. If it's a linear fit ($y=a_1 x + a_2$) then the equation looks like this:
$$ \begin{bmatrix} \sum(x_i^2) & \sum(x_i) \\ \sum(x_i) & n \end{bmatrix} \begin{bmatrix} a_1 \\ a_2 \end{bmatrix} = \begin{bmatrix} \sum(x_iy_i)\\ \sum(y_i) \end{bmatrix} $$ (With each sum going from $i=1$ to $n$, for clarity's sake).
My professor's notes say that, for such a system, the error associated with each coefficient can be found with the expression: $$\sigma^2_{a_k}=(M^{-1})_{kk}$$
Which I don't understand really well. My questions are:
- Is the last equation referring to the variance of the k-th coefficient?
- If so, does that mean that I have to find the inverse of $M$, take the k-th element in the main diagonal and find its square root to find the error associated to said coefficient?
I'm asking because I'm not entirely sure I'm getting the notation right. This is what I've done so far: I've tried to perform a simple, linear regression with the least-squares method using the data:
$\begin{array} {} \hline \textbf{x} & \textbf{y} \\ \hline 1.02 & 1.03 \\ 1.99 & 2.06 \\ 3.03 & 2.99 \end{array}$
Which is an approximation to $y=x$ that I just made up, which should result in a system $y=ax+b$ with $a\approx 1$ and $b\approx 0$. After solving the system:
$$ \begin{bmatrix} 14.18 & 6.04 \\ 6.04 & 3 \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} 14.21\\ 6.08 \end{bmatrix} $$
I get $a\approx 0.974153$ and $b \approx 0.065371$. Then, with my understanding of how to calculate the error associated with each coefficient and after calculating $M^{-1}$, I get $\sigma_a = \Delta a = \sqrt{0.4948...} = 0.7034...$ and $\sigma_b = \Delta a = \sqrt{0.2339...} = 0.48...$, which doesn't seem right. Why is the error for $b$ so big? Am I making a wrong assumption about what the standard deviation means? (Maybe it's not the same as $\Delta b$?).
Thanks in advance.
No, the variance-covariance matrix of the estimators is given by $$ \hat{\sigma}^2M^{-1}, $$ where $$ \hat{\sigma}^2 = \frac{1}{3 - 2} \sum_{i=1}^3(\hat{y}_i-y_i)^2. $$ In your case $ \hat{\sigma} = 0.00471 $, hence the standard deviation of $\hat{b}$ is $0.105$ and of $\hat{a}$ is $0.048$.