We are working on a wrapper that clusters observations, wrapping around a multivariate linear regression model within each cluster of observations. The idea is to use some goodness of fit statistic like $R^2$ to move observations to other clusters where the linear models gives better $R^2$ values.
We can't find much on using $R^2$ or other statistics of linear models, to cluster data.
My question is if we missed something in fundamental statistics that makes this idea terrible?