General convex optimization with L1 penalty

57 Views Asked by At

Consider the optimization problem

$$\operatorname{minimize}_{\beta^{+}, \beta^{-} \in \mathbb{R}^{p}} f\left(\beta^{+}-\beta^{-}\right)+\lambda 1_{p}^{T}\left(\beta^{+}+\beta^{-}\right) \text {s.t. } \beta_{j}^{+} \geq 0, \beta_{j}^{-} \geq 0 \text { for all } j$$

where $f: \mathbb{R}^{p} \rightarrow \mathbb{R}$ is a convex function (not necessarily differentiable) and $\lambda>0$.

Prove that if $\left(\hat{\beta}^{+}, \hat{\beta}^{-}\right)$ is a solution to the above problem, then $\hat{\beta}_{j}^{+} \hat{\beta}_{j}^{-}=0$, that is at least one coordinate must be 0 for each $j$.

I know how to do the problem when $f$ is differentiable - it would be similar to soft thresholding. But I am not sure how to approach this when $f$ is just convex. Any help would be appreciated.

2

There are 2 best solutions below

0
On

It follows directly from optimiality.

Suppose there is some $j$ such that $\hat{\beta}_{j}^{+} >0, \hat{\beta}_{j}^{-}>0$.

Let $\beta_j^+(t) = \hat{\beta}^+_j-t e_j, \beta_j^-(t) = \hat{\beta}^-_j-t e_j$ and all other indices unchanged.

Note that for small $t>0$ both $\beta_j^+(t), \beta_j^-(t) $ are feasible.

Note that for such $t$ we have $f(\beta^+(t)-\beta^-(t) ) = f(\hat{\beta}^+-\hat{\beta}^- )$ and $e^T (\beta^+(t)+\beta^-(t)) = e^T ( \hat{\beta}^+-\hat{\beta}^- ) -2t$, hence a strictly lower cost.

Hence at an optimum we have $\hat{\beta}_{j}^{+} \hat{\beta}_{j}^{-}=0$ for all $j$.

0
On

It suffices to show that for any $\beta^+\geq 0$ and $\beta^-\geq 0$ such that $\beta^+\beta^-\neq 0$, there exists another $\alpha^+$ and $\alpha^-$ such that $\alpha^+\alpha^-=0$, $f(\alpha^+-\alpha^-)=f(\beta^+-\beta^-)$ and $1^\top(\alpha^++\alpha^-) < 1^\top(\beta^{+}+\beta^-)$. Since the objective function has a smaller value at $(\alpha^+,\alpha^-)$, the result will follow. As a hint, I'll show a sketch for the 1-D case, and the generalization to $\mathbb{R}^p$ should follow by applying the argument componentwise.

Case 1: Suppose $\beta^+\geq\beta^-$. Then set $\alpha^+=\beta^+-\beta^-$ and $\alpha^-=0$. Then $f(\alpha^+-\alpha^-)=f(\beta^+-\beta^-)$ and also $1^\top(\alpha^++\alpha^-)=\alpha^++\alpha^-=\beta^+-\beta^-<\beta^++\beta^-$ (this inequality is strict: if $\beta^+>\beta^-$ it clearly follows, and if $\beta^+=\beta^-$ then it is also strict since the LHS is $0$ and the RHS is nonzero.)

Case 2: suppose $\beta^+<\beta^-$: then set $\alpha^+=0$ and $\alpha^-=\beta^--\beta^+$, similar argument yields the result.