Is it possible to "derive" gradient descent using Lagrangian multipliers?

131 Views Asked by At

I'm thinking about this in 1-dimension but would have thought this will generalise fairly trivially to higher dimensions.

If we're currently at $x=x_{1}$, and we have relatively easy access to $f(x_{1})$ and $f'(x_{1})$, we might argue that

$f(x_{1} + \epsilon)\simeq f(x_{1}) + f'(x_{1})\epsilon $

provided $\epsilon ^{2} < \delta$

Thus I would hoping to derive that the $\epsilon$ which max/minimises $f(x_{1}+\epsilon)$ wrt $\epsilon$, subject to the constraint $\epsilon ^{2} <\delta$, is given by the well-known gradient-descent/ascent answer which would be $\epsilon = \pm\delta \frac{f'(x_{1})}{|f'(x_{1})|}$

I was hoping this would drop out by differentiating

$\frac{\partial}{\partial \epsilon} \left(f(x_{1}) + f'(x_{1})\epsilon -\lambda (\epsilon ^{2} - \delta)\right)$

But I just get that $f'(x_{1})-2\lambda \epsilon=0$ and I'm not sure how eliminate $\lambda$.

I assume that two solutions should drop out, the maximum increase and decrease.

The above is trying to derive an intuitive and well-known result, but what about if I had relatively easy access to higher derivatives, then the result is not so obvious. I would have thought that if I could solve the above, then I could take this one step further and minimise

$f(x_{1}) + f'(x_{1})\epsilon + \frac{1}{2!}f''(x_{1})\epsilon^{2}$

subject to $|\epsilon ^{3}| < \delta$

but I'll run into the same problem of a solution for $\epsilon$ in terms of a $\lambda$ that I don't know how to eliminate.

Am I missing something in the calculation or is this method flawed?

1

There are 1 best solutions below

2
On

Hint.

Considering $\epsilon^2<\delta$ as $\epsilon^2-\delta+s^2=0$ we have

$$ L(\epsilon,\lambda,s) = f'\epsilon -\lambda(\epsilon^2-\delta+s^2) $$

then

$$ \nabla L = 0 = \cases{f'-2\lambda\epsilon\\ \epsilon^2-\delta+s^2\\ \lambda s\\ } $$

but $\epsilon = \frac 12\frac{f'}{\lambda}$ and after substitution onto the restriction we have

$$ \left(\frac 12\frac{f'}{\lambda}\right)^2-\delta+s^2=0 $$

so we have

$$ \lambda = \pm\frac 12\frac{|f'|}{\sqrt{\delta-s^2}} $$

hence we have

$$ f'\mp \frac{|f'|}{\sqrt{\delta-s^2}}\epsilon = 0 $$

or

$$ \epsilon = \pm \sqrt{\delta-s^2}\frac{f'}{|f'|} $$

now if the restriction is actuating ($s=0$) we have

$$ \epsilon = \pm \sqrt{\delta}\frac{f'}{|f'|} $$