How does the constraint change when the Lagrange multiplier changes?

187 Views Asked by At

Now I have a convex function $f(x)$, $x\in \mathbb{R}^n$, consider the minimization problem: $\min_x f(x)+\lambda x^Ts$, where $s$ is a positive real vector and $\lambda$ is a parameter, I am wondering for different values of $\lambda$, when the above min problem reaches its minimization, how does $x^Ts$ changes? Is there any rules?

1

There are 1 best solutions below

1
On BEST ANSWER

Given $\lambda$ and $s$, you want to find $x \in \mathbb{R}^n$ to minimize $$ f(x) + \lambda x^Ts $$ Assume at least one minimizer exists for each value of $\lambda$ that is considered.

If $\lambda_1<\lambda_2$ and $x_1$ minimizes for $\lambda_1$, $x_2$ minimizes for $\lambda_2$, then indeed: $$x_1^Ts \geq x_2^Ts$$ Intuitively, since we penalize the constraint function $x^Ts$ more with the larger $\lambda_2$, we get a smaller constraint value $x_2^Ts$. This is easy to prove in 2-3 lines by just using the definition of "minimum," and convexity assumptions are not required. See related and/or more general statements here (Theorem III.2 page 14, Exercises IX-B.5, IX-B.6 page 41): http://ee.usc.edu/stochastic-nets/docs/network-optimization-notes.pdf


More generally if $\mathcal{X}\subseteq \mathbb{R}^n$ is any set (possibly nonconvex and/or disconnected), $f:\mathcal{X}\rightarrow\mathbb{R}$, $g:\mathcal{X}\rightarrow\mathbb{R}$ are any functions (not necessarily convex or continuous) and we consider: \begin{align} &\mbox{Minimize:} \quad f(x) + \lambda g(x) \\ &\mbox{Subject to:} \quad x \in \mathcal{X} \end{align} Then the same holds: If $\lambda_1<\lambda_2$ and $x_1\in\mathcal{X}$ and $x_2 \in \mathcal{X}$ are respective minimizers (assuming minimizers exist), then $g(x_1)\geq g(x_2)$. (The proof is the same 2-3 line argument.)

As described in the notes of the above link, this motivates a simple bisection procedure for optimization subject to one constraint, where we zero-in on a good solution (with an optimized $\lambda$) exponentially fast (whether or not this bisection procedure always finds a solution, i.e. if there exists a $\lambda$ that leads to an optimized $x$ that satisfies the desired constraint, depends on existence of "Hidden/Unhidden" Pareto optimal points, and convexity is then of use for this extended question.)