Augmented Lagrangian

4.3k Views Asked by At

Consider the following equality constraint minimization problem:

minimize $\text{ }f(x)$

subject to $Ax=b$

Its Lagrangian is then:

$L(x,y) = f(x) + y^T(Ax-b)$

We can use then gradient ascent to solve this problem though it is said that it works with lots of strong assumptions.

Its Augmented Lagrangian is defined as:

$L(x,y)_{\rho} = f(x) + y^T(Ax-b) + \frac{\rho}{2} \|Ax-b \|^2_2$

Then we can use this in the method of multipliers (Hestenes, Powell 1969). It is said that this method is designed to robustify dual ascent.

Could anyone explain to me the intuition behind Augmented Lagrangian? As far as I can see we are combining penalty function method with Lagrange multiplier. But how does this bring robustness?

P.S. Boyd says that The benefit of including the penalty term is that $g_p$ (dual function associated to the Augmented Lagrangian) can be shown to be differentiable under rather mild conditions on the original problem.

How does adding extra term help to make the construction "more differentiable"? Or he refers to the fact that, say, when we deal with the L1 or nuclear norms, this extra term allows us to introduce proximal operator?

3

There are 3 best solutions below

7
On BEST ANSWER

I'll assume $f$ is convex. (I should probably also assume $f$ is closed and proper.)

As you said, the Lagrangian is $L(x,y) = f(x) + \langle y, Ax - b \rangle$. The dual function is \begin{align*} g(y) &= \inf_x \, L(x,y) \\ &= \inf_x \, f(x) - \langle -A^Ty, x \rangle - \langle y, b \rangle \\ &= - \sup_x \, \langle -A^Ty, x \rangle - f(x) + \langle y, b \rangle \\ &= - f^*(-A^T y) - \langle y, b \rangle. \end{align*} Here $f^*$ is the convex conjugate of $f$.

The dual problem, expressed as a minimization problem, is \begin{equation*} \text{minimize} \quad f^*(-A^T y) + \langle y, b \rangle. \end{equation*}

Note that the dual function might not be differentiable! That is a key point. To guarantee that $f^*$ is differentiable, we need to assume that $f$ is strictly convex, which is often not the case. This prevents us (often) from simply solving the dual problem with gradient ascent.

So what can we do if the dual function is not differentiable? How can we minimize a nondifferentiable function? One option is to use the proximal point method. The proximal point method does not require the objective function to be differentiable.

If you work out the details, you will find that the augmented Lagrangian method is what you get when you solve the dual problem with the proximal point method.

Vandenberghe's 236c notes are a good source for this material.

0
On

As littleO explained above, the rationale behind introducing Lagrangian can be explained via its connection with proximal point methods.

Despite the fact that many researchers recommend to refer to D. Bertsekas "Constrained Optimization and Lagrange Multiplier Methods", I personally find it quite unreadable.

This paper provides a fairly good explanation of how Augmented Lagrangian is connected with proximal point methods http://www-2.dc.uba.ar/alio/io/pdf/claio98/paper-2.pdf

0
On

Concerning the differentiability of $g_\rho(y)=\min_x L(x,y)_\rho$, I think that the result is a direct consequence of the Danskin's Theorem (see Proposition B.25 in [Bertsekas, "Nonlinear Programming", 2nd Edition]). More specifically, if $L(\cdot,y)_\rho$ is strictly convex and $L(x,\cdot)_\rho$ is differentiable, then $g_\rho(y)$ is differentiable with $\nabla g_\rho(y)= \nabla_y L(x^\star,y)_\rho$, where $x^\star=\arg\min_x L(x,y)$.