Different Lagrangians defined for the same optimization problem

Question

Different Lagrangians defined for the same optimization problem

193 Views Asked by Bumbble Comm At 30 Mar 2026 - 5:30

In the book "Sparse and Redundant Representations", the following optimization problem is stated:

$$\min_x J(x) \quad \text{subject to} \quad b=Ax$$

then using Lagrange multipliers, the following Lagrangian has been defined:

$$\mathcal{L}(x)= \parallel x\parallel_2^2+\lambda^T (Ax-b)$$

Where $\lambda$ is the $Lagrange\;multipliers$ for the constraint set.

And the following requirement has been obtained by taking a derivative of $\mathcal{L}(x)$ with respect to $x$:

$$\frac{\partial\mathcal{L}(x)}{\partial(x)}=2x+A^T\lambda$$

My first question is, why in the derivative, we have $A^T\lambda$? Shouldn't it be $\lambda^TA$ instead?

The above optimization problem has been solved differently in the book "Convex Optimization by Stephen Boyd":

For the following optimization problem: $$\min_x f_0(x)\\\text{subject to}\quad h_i(x)=0, \quad i = 1, . . . , p$$

The Lagrangian associated with the above problem is:

$$L(x,\lambda,v)=f_0(x)+\sum_{i=1}^{p}v_ih_i(x)$$

Where $\lambda_i$ is referred to as $Lagrange\;multiplier$ associated with the $i$th inequality constraint $f_i(x) ≤ 0$

Why $Lagrangian$ has been defined differently in the two books?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Q1 (already answered by Michael Grant)

The derivative of $\langle \lambda ,A x\rangle $ is $A^T \lambda $. Maybe it becomes clearer if you look at the following:

$$ \langle \lambda , Ax \rangle = \sum_{i=1}^n \lambda_i \sum_{j=1}^m A_{ij} x_j$$

so the $j$th element of the gradient, is the derivative of the above with respect to $x_j$ which is

$$ \sum_{i=1} \lambda_i A_{ij} = (A^T\lambda)_j$$

Q2: For your second question, the two Lagrangians are essentially the same (except that in the first case they only consider an equality constraint while in the second I suspect they consider both equality and inequality (*)). Just write:

$$ h_i(x) = (Ax -b)_i$$

and $\nu_i=\lambda_i$ then, in the first case, you have $\lambda^T(Ax-b) = \sum_{i=1}^p \lambda_i(Ax-b)_i$ and in the second you have $\nu^T(Ax-b) = \sum_{i=1}^p \nu_i(Ax-b)_i = \sum_{i=1}^p \nu_i h_i(x)$, indeed, exactly the same thing.

(*) by the way, your edits are a bit confusing here, (a) either add the sum corresponding to the inequality constraint and leave the sentence below it or (b) remove the sum (as you have done) and the sentence below, and the $\lambda$ in the Lagrangian.

Different Lagrangians defined for the same optimization problem

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in LAGRANGE-MULTIPLIER

Trending Questions

Popular # Hahtags

Popular Questions