Let's assume that we have the risk function: $J(a)=\|Ya-b\|^2$. If we take the gradient of this function and claim that is zero, we have: $Y^T(Ya-b)=0$.
Now the algorithm is
$a(1)$ arbitrary
$a(k+1)=a(k)-η(k)Y^T(Ya(k)-b)$, with learning rate $η(k)=1/k$
So if we assume that $\lim\limits_{k\to\infty}a(k)=a$ (which is not always right according to Duda and Hart's book), we can claim that:
$$\lim_{k\to\infty}( η(k)Y^T(Ya(k)-b) )=0$$
or
$$\lim_{k\to\infty}( Y^T(Ya(k)-b)/k )=0$$
So this doesn't mean that $Y^T(Ya-b)=0$. Basically $Y^T(Ya-b)$ can be any constant vector. Isn't it right? So we can have a mean square error that is non zero, if the learning rate isn't constant.
Is it possible to prove that $Y^T(Ya-b)=0$ for $\eta_{k}=1/k$? This is the question.