Let's consider two variables, $x$ and $y$. Both are binary and $x$ is always 0 when $y$ is 1 and vice versa. Now, I set out to do a linear regression between them. I know the formula is -
$$\beta = (x^Tx)^{-1} (x^Ty)$$
However, $x^Ty = 0$ which means $\beta$ must be 0. However, the coefficient of linear correlation is -1 (which is the same as the coefficient of linear regression). What am I missing here?
First off, $x$ needs to have a column of all ones as well as the column of ones and zeros. This is because your intuition that the line should be $y=1-x$ applies to a regression with an intercept, and an intercept means there's a column of all ones in the $x$ matrix. In this case it is not true that $x^Ty=0.$ Its second component is zero, but not its first.
However still, wouldn't your case still apply to a regression with no intercept? Surely it doesn't make sense that the answer for this would be zero? Actually, it does. Observe that for this case the error for all the points $(0,1)$ is fixed to be one since the regression line is constrained to go through the origin. Thus, to minimize the total error we should set the slope of the line to minimize the error for all the $(1,0)$ points, and the line with slope zero does just that! (This is one illustration of how linear regressions with no intercept can be weird.)