Optimizing a problem using Lagrange multipliers

125 Views Asked by At

$\newcommand{\norm}[1]{\|#1\|}$ I have the following problem: $$ \min_{w,\theta}\frac{1}{2}\norm{w-w_t}^2+\frac{1}{2}(\theta-\theta_t)^2 \text{ s.t. } w^\top(z(n-\theta)-\hat z(\hat n - \theta)) \ge 1 $$ This problem is at the core of a learning algorithm that attempts to recover a weight vector $w$ and a parameter $\theta$, given some constraints. Here there is just one constraint.


A much simpler version of this problem does not involve $\theta$: $$ \min_w\frac{1}{2}\norm{w-w_t}^2\text{ s.t. }w^\top(z-\hat z)\ge 1. $$ This simpler problem has a closed form that can be retrieved with Lagrange multipliers: basically, let $$ L(w,\lambda)=\frac{1}{2}\norm{w-w_t}^2+\lambda (w^\top(z-\hat z)-1) $$ Then, cancel $\nabla_wL=w-w_t+\lambda(z-\hat z)=0$, then express $\lambda$ as a function of $w$, by cancelling $\nabla_\lambda L$: $$ \lambda=\frac{w_t-(z-\hat z)-1}{\norm{z-\hat z}^2} $$ This allows you to obtain $w$: $$ w=w_t-\frac{w_t^\top(z-\hat z)-1}{\norm{z-\hat z}^2}(z-\hat z) $$


The problem is when I try to do the same thing with the first problem, I have a quite complicated expression for $\lambda$. Something like $a\lambda^3+b\lambda^2+c\lambda+d=0$. Then I am stuck.

Anyone has a suggestion ?

2

There are 2 best solutions below

0
On

Solve the problem for fixed $\theta$ as you described; then optimise with respect to $\theta$ without constraints.

2
On

The objective function can be written in the form

$$\frac{1}{2} \begin{bmatrix} \mathrm{w}\\ \theta\end{bmatrix}^T \begin{bmatrix} \mathrm{I}_n & 0_n\\ 0_n^T & 0\end{bmatrix} \begin{bmatrix} \mathrm{w}\\ \theta\end{bmatrix} - \begin{bmatrix} \mathrm{w}_t\\ 0\end{bmatrix}^T \begin{bmatrix} \mathrm{w}\\ \theta\end{bmatrix} + \frac{1}{2} \|\mathrm{w}_t\|_2^2$$

The inequality constraint can be written in the form

$$\frac{1}{2} \begin{bmatrix} \mathrm{w}\\ \theta\end{bmatrix}^T \begin{bmatrix} \mathrm{O}_n & \mathrm{z} - \hat{\mathrm{z}}\\ (\mathrm{z} - \hat{\mathrm{z}})^T & 0\end{bmatrix} \begin{bmatrix} \mathrm{w}\\ \theta\end{bmatrix} - \begin{bmatrix} n \mathrm{z} - \hat{n} \hat{\mathrm{z}}\\ 0\end{bmatrix}^T \begin{bmatrix} \mathrm{w}\\ \theta\end{bmatrix} + 1 \leq 0$$

Hence, we have a quadratically constrained quadratic program (QCQP)

$$\begin{array}{ll} \text{minimize} & \frac{1}{2} \mathrm{x}^T \mathrm{P}_0 \mathrm{x} + \mathrm{q}_0^T \mathrm{x} + r_0\\ \text{subject to} & \frac{1}{2} \mathrm{x}^T \mathrm{P}_1 \mathrm{x} + \mathrm{q}_1^T \mathrm{x} + r_1 \leq 0\end{array}$$

Note that $\mathrm{P}_0$ is positive semidefinite. What about $\mathrm{P}_1$?

$$\begin{array}{rl} \det \begin{bmatrix} \lambda \mathrm{I}_n & -(\mathrm{z} - \hat{\mathrm{z}})\\ -(\mathrm{z} - \hat{\mathrm{z}})^T & \lambda\end{bmatrix} &= \lambda^n \cdot (\lambda - (\mathrm{z} - \hat{\mathrm{z}})^T \lambda^{-1} (\mathrm{z} - \hat{\mathrm{z}}))\\ &= \lambda^{n-1} \cdot (\lambda^2 - \|\mathrm{z} - \hat{\mathrm{z}}\|_2^2)\end{array}$$

Since the nonzero eigenvalues of $\mathrm{P}_1$ are $\pm \|\mathrm{z} - \hat{\mathrm{z}}\|_2$, we conclude that $\mathrm{P}_1$ is indefinite and, thus, we have a non-convex QCQP.