Deriving the gradient of the Augmented Lagrangian dual

257 Views Asked by At

Can anyone derive the update method (2nd equationn of (10) in [1]) in details without using the proximal method? Especially, how does taking the gradient of the dual function wrt "y" yield $\rho(Ax^{k+1}-b)$. [1]: https://i.stack.imgur.com/O610e.png

1

There are 1 best solutions below

1
On BEST ANSWER

For a fixed vector $u$, consider the function $r(y) := y^Tu = \sum_i y_i u_i$. Now, for a perturbation $\Delta y$ on $y$, one computes the difference $$\Delta r(y) := r(y + \Delta y) - r(y) = (y + \Delta y)^Tu - y^Tu = \Delta y^Tu = \Delta y^Tu + o(\|\Delta y\|^2). $$

Thus $r$ is smooth at any $y$, with constant gradient $\nabla r(y) = u$. Such a result can be easily read off any standard text on "matrix calculus", but i preferred to do it here from "first principles" so that you may see what's really going on...

Now invoke this fact on your problem with $u = Ax^{k+1} - b$. To conclude, note that a dual gradient ascent step reads: $$\text{new point} = \text{old point} + \text{step size} \times \text{gradient at old point}, $$ i.e $y^{k+1} = y^k + \rho (Ax^{k+1} -b )$.