choosing optimal scalar in gradient descent algorithm

65 Views Asked by At

One iteration of the gradiant descent (finding the minimum of a function) moves $x^{(k)}$ in the direction of the biggest descent and by an amount $\alpha_k$ such that $\phi(x^{(k+1)})$ is minimal, where $\phi$ is the function we're minimizing.

$\phi(x)={1\over2}x^TAx-x^Tb$, $x^{(k+1)}=x^{(k)}+\alpha_kp^{(k)}$ where $p^{(k)}=-\nabla\phi(x^{(k)})$ is the direction of the greatest descent.

My question is about the $\alpha$.

$\alpha_k$ is such that (and since we know we're going down because we chose the right direction, so this extremum is in fact a minimum):

${d\over d\alpha}\phi(x^{(k)}+\alpha p^{(k)})=0\iff{d\over d\alpha}[{1\over2}\alpha^2p^{(k)^T}Ap^{(k)}-\alpha p^{(k)^T}b]=0$

$\iff\alpha\langle p^{k},Ap^{(k)}\rangle -\langle p^{(k)},b\rangle=0$ where $\langle\_,\_\rangle$ is the scalar product

How is that equivalent to what I have in my notes:

$\alpha={\langle p^{(k)},p^{(k)}\rangle\over\langle p^{(k)},Ap^{(k)}\rangle }\iff\alpha\langle p^{(k)},Ap^{(k)}\rangle - \langle p^{(k)},p^{(k)}\rangle=0$ ?

I wrote $\alpha$ but it is of course the optimal $\alpha_k$ at each iteration...