Derivative of the Lagrangian dual: Can I treat the optimal parameter $x^*$ in the dual $g(\lambda) = L(x^*, \lambda)$ as a constant w.r.t $\lambda$?

601 Views Asked by At

Suppose I have a standard constrained optimization problem $$ \min_x f(x) \quad \text{s.t.} h_i(x) \ge 0, $$ where $f$ and $h_1,...,h_m$ are convex functions $\mathbb{R}^n \to \mathbb{R}$. The dual is given as $$ g(\lambda) = \min_xL(x, \lambda) = L(x^*, \lambda) $$ with $L$ the Lagrangian. To obtain the optimal Lagrangian multipliers, I need to solve $$ \lambda^* = \text{argmax}_\lambda \;g(\lambda). $$ I would go on and take the derivative w.r.t $\lambda$. My question is now:

Can I treat the optimal parameter $x^*$ in the dual $g(\lambda) = L(x^*, \lambda)$ as a constant w.r.t $\lambda$?

On my first thought: No, since it depends on $\lambda$. (If it didn't we would already have solved the primal problem). But I have found several examples, where we can take the derivative without the substitution of $x^*$ and treat it as a constant. So I was wondering if that is a general rule. If that is somehow true, this would simplify the derivative a lot.

I'll give a concrete example: Let's solve $$ \min x^2 \quad \text{s.t.} -x -2 \le 0 $$ The resulting Lagrangian is $ L(x, \lambda) = x^2 + \lambda(-x -2)$. Taking the derivative w.r.t $x$, and setting it to 0 yields $ 2x - \lambda = 0 \implies x^* = \lambda / 2$.

The dual is therefore $$ g(\lambda) = x^{*2} + \lambda(-x^* -2) = \frac 1 4 \lambda^2 + \lambda(-\frac \lambda 2 -2). $$ Now, when i take the "correct" derivative by taking the derivative of the right side w.r.t. to $\lambda$, I'll get $$ g'(\lambda) = - \frac \lambda 2 - 2. $$ This is the same as if I take $x^{*}$ as a constant on the left side and take the derivative, resulting in $$-x^* - 2 = - \frac \lambda 2 - 2.$$

Why does this work? Any help is much appreciated!

2

There are 2 best solutions below

3
On BEST ANSWER

In general, the Lagrangian is $$L(x,\lambda) = f(x) - \sum_i \lambda_i h_i(x).$$ and $x^*$ satisfies $f'(x^*) - \sum_i \lambda_i h_i'(x^*) = 0$. Assume $x(\lambda)$ is a function that satisfies that equation for all $\lambda$, then $$g(\lambda) = f(x(\lambda)) - \sum_i \lambda_i h_i(x(\lambda)).$$

By application of the chain rule and the product rule: \begin{align} g'(\lambda) &= f'(x(\lambda))x'(\lambda) -\sum_i h_i(x(\lambda))- \sum_i \lambda_i h_i'(x(\lambda))x'(\lambda) \\ &= \left( f'(x(\lambda)) -\sum_i \lambda_i h_i'(x(\lambda)) \right)x'(\lambda) - \sum_i h_i(x(\lambda)) \\ &= - \sum_i h_i(x(\lambda)), \end{align} which is what you get if take the derivative of $g$ without applying the chain rule and plug in $x(\lambda)$.

5
On

In general, no, $x^*$ is not a constant and you cannot differentiate the Lagrangian as if it were.

Consider the Lagrangian $L(x, \lambda) = x^2 + 1 + \lambda(x-2)(x-4)$. If you take the infimum over $x$ by taking $\partial L /\partial \lambda$ and setting it to zero, you'll find $x^* = 3\lambda/(\lambda + 1)$. Clearly $x^*$ depends on $\lambda$ so you must use the chain rule to differentiate $L$ w.r.t. $\lambda$.