Multivariable calc problem implies that you can't draw a least squares line when variance is zero?

22 Views Asked by At

There is a discussion here about least squares fit and multivariable calculus:

Given a set of points $\left(x_i,y_i\right)$, and a vertical distance $d_i=y_i-\left(ax_i+b\right)$ from each point to some line $y=ax+b$, we want to find the $a, b$ so as to minimise:

$$D=\sum_{i}{d_i^2}=\sum_{i}\left(y_i-\left(ax_i+b\right)\right)^2=\sum_{i}\left(y_i^2-2ax_iy_i-2by_i+a^2x_i^2+2abx_i+b^2\right)$$

To minimise $a$ and $b$, we want

$$\frac{\partial D}{\partial a}=\sum_{i}\left(-2x_iy_i+2ax_i^2+2bx_i\right)=0$$

and

$$\frac{\partial D}{\partial b}=\sum_{i}\left(-2y_i+2ax_i+2b\right)=0$$

Cancelling the $2$s and bringing the negative terms to the other side, you get:

$$ \begin{pmatrix} \sum{x_i^2} & \sum{x_i} \\ \sum{x_i} & n \\ \end{pmatrix} \cdot \begin{pmatrix} a \\ b \\ \end{pmatrix}=\begin{pmatrix} \sum{x_iy_i} \\ \sum{y_i} \\ \end{pmatrix}$$

My question is about the matrix on the left. From what I understand, the matrix is not invertible (and hence the equation has no solution) just when $\det\left(M\right)=0$, right? But that happens when $ad-bc=0$, i.e. when

$$n\sum{x^2}=\left(\sum{x}\right)^2\implies n\sum{x^2}=n^2\mu^2 \implies \sum{x^2}=n\mu^2 \implies \frac{\sum{x^2}}{n}-\mu^2=0$$

But the LHS of the last equation is just $\sigma^2$, right? So is the conclusion that you can't draw a least squares line when Variance = 0??

Have I interpreted this correctly? I would have thought that, with zero variance, that would be somehow the easiest case to model (maybe infinite variance might have caused a problem, I might naively have thought)? Have I misunderstood something here?

Thanks