Clarification on computation complex optimization

35 Views Asked by At

This might be a question with a very simple solution but I really don't see where this comes from. In the book on Convex optimization by Boyd and Vandenberghe in the section of examples on duality, the author compute a norm approximation problem. Particularly, they write the norm minimization problem $$ \text{minimize } \|Ax - b\|$$ as \begin{equation} \begin{aligned} & \text{minimize} & & \|y\|\\ & \text{subject to} & & Ax - b = y \end{aligned} \end{equation}

From here, I know the dual would have the form $L(x,y,\nu) = \|y\| + \nu^T(Ax - b -y) = \|y\| + \nu^T(Ax - b) - \nu^T y$. Now, the authors claim that the dual problem ends up being

\begin{equation} \begin{aligned} & \text{minimize} & & b^T\nu\\ & \text{subject to} & & \|\nu\|_* \leq 1 \\ & & A^T\nu = 0 \end{aligned} \end{equation}

I thought the objective function in the dual would be $-\nu^T b$ and the second constraint would be $\nu^TA$. Can anyone give me a mathematical reason of why they take the transpose of these expressions? Also, why is the sign for the objective function positive and not negative. Thanks for your help!

1

There are 1 best solutions below

2
On BEST ANSWER

With the constraints $Ax-b=y$, you have a choice to write the Lagrangian with (subtracting the right side from the left)

$\nu^{T} ((Ax - b) - y)$

or (subtracting the left side from the right side of the equation)

$\nu^{T} (y- (b-Ax))$

where switching from one version to the other simply changes the sign of $\nu$. If you use the second version, you'll end up with

$\min b^{T}\nu$

in your objective function. The dual constraints are all sign-invariant, so this doesn't change anything else in the statement of the dual problem.

Keep in mind that this pattern can show up any time you have linear equality constraints in your primal problem. Different authors often make different choices, so you should be comfortable switching back and forth.

The constraints $A^{T}\nu=0$ and $\nu^{T}A=0$ are exactly the same linear constraints because $(A^{T}\nu)^{T}=\nu^{T}A$. In the first case, they're written in the form of a matrix times a column vector equal to a column vector of 0's. In the second case, they're written as a row vector times a matrix equal to row vector of 0's. It's more conventional to write this at $A^{T}\nu=0$.

I've gotten similar questions from my students many times over the years- you're not alone in struggling with these kinds of issues. However, they really do boil down to simple linear algebra and if you work at it, you should become comfortable with these kinds of minor algebraic differences.