A question about Percetron Gradient Descent MSE Algorithm.

70 Views Asked by At

Let's assume that we have the risk function: $J(a)=\|Ya-b\|^2$. If we take the gradient of this function and claim that is zero, we have: $Y^T(Ya-b)=0$.

Now the algorithm is

$a(1)$ arbitrary

$a(k+1)=a(k)-η(k)Y^T(Ya(k)-b)$, with learning rate $η(k)=1/k$

So if we assume that $\lim\limits_{k\to\infty}a(k)=a$ (which is not always right according to Duda and Hart's book), we can claim that:

$$\lim_{k\to\infty}( η(k)Y^T(Ya(k)-b) )=0$$

or

$$\lim_{k\to\infty}( Y^T(Ya(k)-b)/k )=0$$

So this doesn't mean that $Y^T(Ya-b)=0$. Basically $Y^T(Ya-b)$ can be any constant vector. Isn't it right? So we can have a mean square error that is non zero, if the learning rate isn't constant.

Is it possible to prove that $Y^T(Ya-b)=0$ for $\eta_{k}=1/k$? This is the question.