Augmented Lagrangian

Question

Augmented Lagrangian

4.3k Views Asked by Bumbble Comm At 28 Mar 2026 - 1:04

Consider the following equality constraint minimization problem:

minimize $\text{ }f(x)$

subject to $Ax=b$

Its Lagrangian is then:

$L(x,y) = f(x) + y^T(Ax-b)$

We can use then gradient ascent to solve this problem though it is said that it works with lots of strong assumptions.

Its Augmented Lagrangian is defined as:

$L(x,y)_{\rho} = f(x) + y^T(Ax-b) + \frac{\rho}{2} \|Ax-b \|^2_2$

Then we can use this in the method of multipliers (Hestenes, Powell 1969). It is said that this method is designed to robustify dual ascent.

Could anyone explain to me the intuition behind Augmented Lagrangian? As far as I can see we are combining penalty function method with Lagrange multiplier. But how does this bring robustness?

P.S. Boyd says that The benefit of including the penalty term is that $g_p$ (dual function associated to the Augmented Lagrangian) can be shown to be differentiable under rather mild conditions on the original problem.

How does adding extra term help to make the construction "more differentiable"? Or he refers to the fact that, say, when we deal with the L1 or nuclear norms, this extra term allows us to introduce proximal operator?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 25 Aug 2014 - 12:54

As littleO explained above, the rationale behind introducing Lagrangian can be explained via its connection with proximal point methods.

Despite the fact that many researchers recommend to refer to D. Bertsekas "Constrained Optimization and Lagrange Multiplier Methods", I personally find it quite unreadable.

This paper provides a fairly good explanation of how Augmented Lagrangian is connected with proximal point methods http://www-2.dc.uba.ar/alio/io/pdf/claio98/paper-2.pdf

Bumbble Comm On 07 Jan 2015 - 2:51

Concerning the differentiability of $g_\rho(y)=\min_x L(x,y)_\rho$, I think that the result is a direct consequence of the Danskin's Theorem (see Proposition B.25 in [Bertsekas, "Nonlinear Programming", 2nd Edition]). More specifically, if $L(\cdot,y)_\rho$ is strictly convex and $L(x,\cdot)_\rho$ is differentiable, then $g_\rho(y)$ is differentiable with $\nabla g_\rho(y)= \nabla_y L(x^\star,y)_\rho$, where $x^\star=\arg\min_x L(x,y)$.

**Bumbble Comm** · Accepted Answer

I'll assume $f$ is convex. (I should probably also assume $f$ is closed and proper.)

As you said, the Lagrangian is $L(x,y) = f(x) + \langle y, Ax - b \rangle$. The dual function is \begin{align*} g(y) &= \inf_x \, L(x,y) \\ &= \inf_x \, f(x) - \langle -A^Ty, x \rangle - \langle y, b \rangle \\ &= - \sup_x \, \langle -A^Ty, x \rangle - f(x) + \langle y, b \rangle \\ &= - f^*(-A^T y) - \langle y, b \rangle. \end{align*} Here $f^*$ is the convex conjugate of $f$.

The dual problem, expressed as a minimization problem, is \begin{equation*} \text{minimize} \quad f^*(-A^T y) + \langle y, b \rangle. \end{equation*}

Note that the dual function might not be differentiable! That is a key point. To guarantee that $f^*$ is differentiable, we need to assume that $f$ is strictly convex, which is often not the case. This prevents us (often) from simply solving the dual problem with gradient ascent.

So what can we do if the dual function is not differentiable? How can we minimize a nondifferentiable function? One option is to use the proximal point method. The proximal point method does not require the objective function to be differentiable.

If you work out the details, you will find that the augmented Lagrangian method is what you get when you solve the dual problem with the proximal point method.

Vandenberghe's 236c notes are a good source for this material.

Augmented Lagrangian

There are 3 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in LAGRANGE-MULTIPLIER

Trending Questions

Popular # Hahtags

Popular Questions