improving least squares solution after adding more observations points

344 Views Asked by At

You fit a simple linear model $y=\sum{c_i \times x_i}+q$ to your data $X=\{x_{ij}\}$ and $Y=\{y_j\}$ by linear least squares, and you obtain a solution $(q,c_i)$ (plus the residual variance) as a function of the N observations that you provided to the calulation.

How does that solution transform into the updated one $(q',c'_i)$, when we have longer data vectors $X'$ and $Y'$?

Is there any closed form expression that uses only the added points and some sort of "condensed" information saved from the previous fit?

3

There are 3 best solutions below

0
On BEST ANSWER

The least-squares fitting is performed from the normal equations, which consist essentially in the variance-covariance matrix, obtained by accumulating the moments up to second order.

This is the condensed form you are looking for.

0
On

I do not know if this is what you are looking for.

Suppose that you solve the problem using the normal equations and use matrices for solving the linear system. This means (for a model with two variables) you solved $$\sum_{i=1}^ny_i=nc_0+c_1\sum_{i=1}^nx_{1i}+c_2\sum_{i=1}^nx_{2i}$$ $$\sum_{i=1}^nx_{1i}y_i=c_0\sum_{i=1}^nx_{1i}+c_1\sum_{i=1}^nx_{1i}^2+c_2\sum_{i=1}^nx_{1i}x_{2i}$$ $$\sum_{i=1}^nx_{2i}y_i=c_0\sum_{i=1}^nx_{21i}+c_1\sum_{i=1}^nx_{1i}x_{2i}+c_2\sum_{i=1}^nx_{2i}^2$$ Now, you add $p$ new data points; so the equations become $$\left(\sum_{i=1}^ny_i+\sum_{i=n+1}^{n+p}y_i\right)=(n+p)c_0+c_1\left(\sum_{i=1}^nx_{1i}+\sum_{i=n+1}^{n+p}x_{1i}\right)+c_2\left(\sum_{i=1}^nx_{2i}+\sum_{i=n+1}^{n+p}x_{2i}\right)$$ $$\left(\sum_{i=1}^nx_{1i}y_i+\sum_{i=n+1}^{n+p}x_{1i}y_i\right)=c_0\left(\sum_{i=1}^nx_{1i}+\sum_{i=n+1}^{n+p}x_{1i}\right)+c_1\left(\sum_{i=1}^nx_{1i}^2+\sum_{i=n+1}^{n+p}x_{1i}^2\right)+c_2\left(\sum_{i=1}^nx_{1i}x_{2i}+\sum_{i=n+1}^{n+p}x_{1i}x_{2i}\right)$$ $$\left(\sum_{i=1}^nx_{2i}y_i+\sum_{i=n+1}^{n+p}x_{2i}y_i\right)=c_0\left(\sum_{i=1}^nx_{2i}+\sum_{i=n+1}^{n+p}x_{2i}\right)+c_1\left(\sum_{i=1}^nx_{1i}x_{2i}+\sum_{i=n+1}^{n+p}x_{1i}x_{2i}\right)+c_2\left(\sum_{i=1}^nx_{2i}^2+\sum_{i=n+1}^{n+p}x_{2i}^2\right)$$ Since you saved the $\sum_{i=1}^n(.)$, you just need to compute the $\sum_{i=n+1}^{n+p}(.)$ which update the coefficients of the linear equations.

0
On

Here is the answer, provided that you add just one point and you always store not just the previous model $\beta_n$ but also the corresponding (squared) matrix $R_n$: https://stats.stackexchange.com/questions/66950/intuition-for-recursive-least-squares