Given a dataset containing two populations, each of which can be described by a linear relationship between two variables in each sample with high R$^2$, how does one separate the two populations (and incidentally compute the line-fit)?
This is fairly easy to do graphically - just create a scatterplot and the two lines are pretty apparent. But how does one do this algorithmically?
More generally, given a dataset containing an unknown number n of populations, each of which can be fit to a line with some lower bound on R$^2$ (e.g., .95), how does one separate the data into the minimum number of populations satisfying the R$^2$ criterion?
If I properly understand, you have two relations Y = a1 + b1 X for one population and Y = a2 + b2 X for a second population and you would like to merge them. If my hypothesis is correct, build a model Y = a + b X + c Z in which Z will be 1 if belonging to the first population, 2 to the second population and so on.