My notes define the error of least squares approximation as:
$$ E=\sum_{i=1}^n(y_i-f(x_i))^2\tag1 $$
Which, for a straight line gives:
$$ f(x_i)=a+bx\tag2$$
$$ E=\sum_{i=1}^n(y_i-(a+bx_i))^2\tag3 $$
Which makes sense to me. The notes then claim that to minimise the error, make the derivatives of $E$ w.r.t $a$ and $b$ equal to zero. Why is this?
I think of the derivative of a function as the rate of change that function. But why would a constant error necessarily be the smallest error?
Thanks
The error is not constant. You take the function $$E(a,b) = \sum_{i=1}^n(y_i-(a+bx_i))^2$$ and find the minimum of $E(a,b)$. In the minimum, you know that the derivative of $E$ with regard to both $a$ and $b$ will be equal to $0$. This is because, in each local minimum of a function, the derivative is equal to 0.
The intuition behind it is this: if the derivative is not 0, then the function is changing in that point. This means it is decreasing in one direction and cannot have a minimum. In the minimum, the function goes from falling (when the derivative is negative) to rising (when it is positive), so in the minimum, the derivative goes from positive to negative, meaning it has the value of 0. All values where the derivative is 0 are candidates to be either minimum or a maximum. The thing in this case is that a and b are the only candidates possible, and since you know a minimum exists (while a maximum does not), the candidate must necesarily be a minimum.
This does not mean that the value of the error is constant, since the derivative of $E$ is not $0$ everywhere, it is only equal to $E$ for the particular values of $a$ and $b$.