Correlation, Linear Regression, and Minimizing Cost

47 Views Asked by Bumbble Comm At 26 Mar 2026 - 5:28

If $X$ and $Y$ are random variables with correlation coefficient $\rho$, then linear regression tells us that, is we wish to minimize the mean square error, then the best linear approximation $\hat{Y}$ (resp $\hat{X}$) of $Y$ ($X$) as a function of $X$ ($Y$) satisfies $$\frac{\hat{Y}-\mu_Y}{\sigma_Y} = \rho\frac{X-\mu_X}{\sigma_X},$$ $$\frac{\hat{X}-\mu_X}{\sigma_X} = \rho\frac{Y-\mu_Y}{\sigma_Y}.$$

I want to highlight here that in BOTH cases, the slope (as a function of the normalized variables) is $\rho$, which reflects the fact that $\rho$ is a symmetric function of $X$ and $Y$. Now, if someone were to ask me for the equation of the line of best fit between $X$ and $Y$, WITHOUT mentioning that we want to minimize mean squared error, my naive guess would be that whatever slope the line of best fit has for $Y$ vs $X$, the line of best fit for $X$ vs $Y$ should have the inverse slope, that is (assuming $X$ and $Y$ are normalized for simplicity): $$\hat{Y} = aX \quad \iff \quad \hat{X} = \frac{1}{a}Y,$$ with the case $a=0$ being left undetermined. This was my first naive attempt, but after some reflection I realized why this doesn't need to be the case, both on a conceptual level and on an algebraic level. Basically, my confusion was resting on the following logic: $$(\hat{Y} = \rho X \quad \& \quad \hat{Y} \approx Y) \implies Y \approx \rho X \iff X \approx \frac{1}{\rho}Y \implies \hat{X} = \frac{1}{\rho}Y,$$ and written this way it is clear that the mistake is at the $\hat{Y} \approx Y$ step.

Now, I'm saying all of this to finally get to the following question: is there a (natural, useful) way of defining $\hat{X}$ and $\hat{Y}$ so that they do satisfy the naive logic I presented? Perhaps by changing the cost function from mean squared error to something else, and/or by replacing the correlation coefficient by some other quantity $\rho'$ satisfying $\rho'(X,Y) = \frac{1}{\rho'(Y,X)}$?

$\textbf{tldr:}$ Can we reformulate linear regression to make $\hat{Y} = \rho{X} \iff \hat{X} = \frac{1}{\rho}Y$?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 02 Oct 2022 - 6:09 BEST ANSWER

Changing the cost function from the sum of the squared vertical distances to the sum of the square perpendicular distances makes the problem symmetric in the way you describe:

This is generally known as Total least squares. See the discussion on Cross Validated here.

Correlation, Linear Regression, and Minimizing Cost

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in CORRELATION

Related Questions in LINEAR-REGRESSION

Related Questions in MEAN-SQUARE-ERROR

Trending Questions

Popular # Hahtags

Popular Questions