I am working on a project where I use multilinear regression on many large data sets. Each data point is formatted as (time, data). Lets say I calculate a regression polynomial based on the past month of data, then a week passes and I want to come up with an updated regression polynomial. Can I use the past week's data and the old regression polynomial to come up with an updated polynomial? Would this be algorithmically faster than doing multilinear regression on the full month+week's worth of data?
Considering the amount of data I'm processing, any processor time I can save is essential.
For more background I am planning on just using typical least square regression. Not sure if this makes a difference.
Say you're using the OLS estimator $\hat\beta=(X^T X)^{-1}X^T y$, where $X\in\mathbb R^{n\times p}$. There are two ways that updating for a new point is faster than fitting the entire thing again:
Keep the sufficient statistic $\hat\Sigma=X^T X$ and $\hat\gamma = X^T y$. Updating $\hat\gamma$ can be clearly done in $O(p)$ time. Updating $\hat\Sigma$ and $\hat\Sigma^{-1}$ is trickier: one might need to use the Sherman-Morrison formula. In general the updating can be done in $O(p^2)$ time, compared to the raw $O(np^2+p^3)$ fitting time.
Suppose you're using gradient descent or its variants to fit the OLS model. Then under the assumption that the model does not drift too much over time, you may want to do warm start (using the last fitted model as initializer in the new model). This way saves time.