Optimal step size for gradient descent on quadratic function

2.2k Views Asked by At

First i have searched this forum but could not find a question that matched mine, though some are somewhat similar.

my issue is whether or not the signage matters when i try to calculate the optimum step size for the gradient descent method.

for gradient descent, the equation is given as, eqn 1, $$x_{k+1} = x_{k} - \alpha\nabla f(x_{k}) $$ for gradient ascent, the equation is given as, eqn 2, $$x_{k+1} = x_{k} + \alpha\nabla f(x_{k})$$

where $\alpha$ is the step size.

Using exact line search for a quadratic function, I get 2 different signages for the optimum step size depending on whether I use eqn 1 or eqn 2.

for a quadratic given by, $$f(X)=\frac{1}{2} X^TQX+B^TX + C$$ and Q is positive definite.

Then I will end up with, $$\alpha^*=-\frac{(\nabla f(x_{k}))^T\nabla f(x_{k})}{(\nabla f(x_{k}))^TQ\nabla f(x_{k})}$$ when using eqn1 and $$\alpha^*=\frac{(\nabla f(x_{k}))^T\nabla f(x_{k})}{(\nabla f(x_{k}))^TQ\nabla f(x_{k})}$$ when deriving using eqn 2.

MY QUESTION: Does the sign matter? Why do I end up with 2 different signs. Should I just take the absolute value for the step size. Then, if for example I am using gradient descent, I can susbsitute the absolute value for the optimum step size in to eqn 1, $x_{k+1} = x_{k} - \alpha\nabla f(x_{k}) $

pls help!!

1

There are 1 best solutions below

1
On

Say you are at iteration $k$ with $x_{k}$. Now, you define your descent step $x_{k+1} = x_{k} - \alpha\nabla f(x_{k}),$ with $\alpha > 0$ (otherwise, it would be an ascent step). Sub this in $f(X)=\frac{1}{2} X^TQX+B^TX + C$, you get a second order polynomial in $\alpha$, say $g(\alpha)$. As $Q$ is positive definite, the minimum for $g$ is reached at $g'(\alpha) = 0, $ which is, from your calculation, $$\alpha^*=\frac{(\nabla f(x_{k}))^T\nabla f(x_{k})}{(\nabla f(x_{k}))^TQ\nabla f(x_{k})}.$$ As expected $\alpha^*>0.$