Consider the optimization problem
$$\operatorname{minimize}_{\beta^{+}, \beta^{-} \in \mathbb{R}^{p}} f\left(\beta^{+}-\beta^{-}\right)+\lambda 1_{p}^{T}\left(\beta^{+}+\beta^{-}\right) \text {s.t. } \beta_{j}^{+} \geq 0, \beta_{j}^{-} \geq 0 \text { for all } j$$
where $f: \mathbb{R}^{p} \rightarrow \mathbb{R}$ is a convex function (not necessarily differentiable) and $\lambda>0$.
Prove that if $\left(\hat{\beta}^{+}, \hat{\beta}^{-}\right)$ is a solution to the above problem, then $\hat{\beta}_{j}^{+} \hat{\beta}_{j}^{-}=0$, that is at least one coordinate must be 0 for each $j$.
I know how to do the problem when $f$ is differentiable - it would be similar to soft thresholding. But I am not sure how to approach this when $f$ is just convex. Any help would be appreciated.
It follows directly from optimiality.
Suppose there is some $j$ such that $\hat{\beta}_{j}^{+} >0, \hat{\beta}_{j}^{-}>0$.
Let $\beta_j^+(t) = \hat{\beta}^+_j-t e_j, \beta_j^-(t) = \hat{\beta}^-_j-t e_j$ and all other indices unchanged.
Note that for small $t>0$ both $\beta_j^+(t), \beta_j^-(t) $ are feasible.
Note that for such $t$ we have $f(\beta^+(t)-\beta^-(t) ) = f(\hat{\beta}^+-\hat{\beta}^- )$ and $e^T (\beta^+(t)+\beta^-(t)) = e^T ( \hat{\beta}^+-\hat{\beta}^- ) -2t$, hence a strictly lower cost.
Hence at an optimum we have $\hat{\beta}_{j}^{+} \hat{\beta}_{j}^{-}=0$ for all $j$.