Combine linear models of different sets of data.

267 Views Asked by Bumbble Comm At 03 Apr 2026 - 11:31

I'm working on a large data set D that can be partitioned into some disjoint subsets D1, D2, ..., Dn. For each subset Di, I have a linear model Mi that minimizes the residual error for data in Di.

Is there any way to create a combined model M using parameters of Mi's such that M minimizes the residual error for data in D = D1 + D2 + ... + Dn? I don't want to recompute the model parameters for scratch because it is time-consuming.
Assume I have the linear model M for some data set D, and another model M' for a sub data set D' of D. Is there any way to create a reduced model M'' using parameters of M and M' such that M'' minimizes the residual error for data in D'' = D - D'?

Any help would be appreciated!! Thanks!

Original Q&A

There are 1 best solutions below

Bumbble Comm On 06 Feb 2014 - 4:58

Almost the same question was asked on yesterday. I shall basically repeat myself; it is a suggestion based on similar previous experience.

Say that for three independent sets A, B and C , you built linear regressions $Y=a_1 + b_1 X$, $Y=a_2 + b_2 X$ and $Y=a_3 + b_3 X$; now you want to consider the data coming from the union of A, B and C.

In a first step, I should perform a multilinear regression using as a model
$$Y = a + b Z + c X + d X Z$$ introducing a variable Z which would be given a value of $1$ if the data point belongs to set A, $2$ if the data point belongs to set B or $3$ if the data point belongs to set C. Now, the standard analysis could apply to check if, yes or no, parameters $b$ and $d$ are statiscally significant. If they are, then using the union of the three sets could not be done.

I am afraid you could not avoid reworking the matrices if you need more than the coefficients $a$ and $b$ of the regression.

If you just need coefficients $a$ and $b$ for the overall regression, using normal equations allow to reuse the terms used in each regression to compute them at almost no extra cost.

Combine linear models of different sets of data.

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in REGRESSION

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions