Show that the Huber-loss based optimization is equivalent to $\ell_1$ norm based.

Question

Show that the Huber-loss based optimization is equivalent to $\ell_1$ norm based.

2k Views Asked by Bumbble Comm At 11 May 2026 - 5:03

Dear optimization experts,

My apologies for asking probably the well-known relation between the Huber-loss based optimization and $\ell_1$ based optimization. However, I am stuck with a 'first-principles' based proof (without using Moreau-envelope, e.g., here) to show that they are equivalent.

Problem formulation

The observation vector is \begin{align*} \mathbf{y} &= \mathbf{A}\mathbf{x} + \mathbf{z} + \mathbf{\epsilon} \\ \begin{bmatrix} y_1 \\ \vdots \\ y_N \end{bmatrix} &= \begin{bmatrix} \mathbf{a}_1^T\mathbf{x} + z_1 + \epsilon_1 \\ \vdots \\ \mathbf{a}_N^T\mathbf{x} + z_N + \epsilon_N \end{bmatrix} \end{align*} where

$\mathbf{A} = \begin{bmatrix} \mathbf{a}_1^T \\ \vdots \\ \mathbf{a}_N^T \end{bmatrix} \in \mathbb{R}^{N \times M}$ is a known matrix
$\mathbf{x} \in \mathbb{R}^{M \times 1}$ is an unknown vector
$\mathbf{z} = \begin{bmatrix} z_1 \\ \vdots \\ z_N \end{bmatrix} \in \mathbb{R}^{N \times 1}$ is also unknown but sparse in nature, e.g., it can be seen as an outlier
$\mathbf{\epsilon} \in \mathbb{R}^{N \times 1}$ is a measurement noise say with standard Gaussian distribution having zero mean and unit variance normal, i.e. $\mathcal{N}(0,1)$.

We need to prove that the following two optimization problems P$1$ and P$2$ are equivalent.

P$1$: \begin{align*} \text{minimize}_{\mathbf{x},\mathbf{z}} \quad & \lVert \mathbf{y} - \mathbf{A}\mathbf{x} - \mathbf{z} \rVert_2^2 + \lambda\lVert \mathbf{z} \rVert_1 \end{align*}

and

P$2$: \begin{align*} \text{minimize}_{\mathbf{x}} \quad & \sum_{i=1}^{N} \mathcal{H} \left( y_i - \mathbf{a}_i^T\mathbf{x} \right), \end{align*} where the Huber-function $\mathcal{H}(u)$ is given as $$\mathcal{H}(u) = \begin{cases} |u|^2 & |u| \leq \frac{\lambda}{2} \\ \lambda |u| - \frac{\lambda^2}{4} & |u| > \frac{\lambda}{2} \end{cases} . $$

My partial attempt following the suggestion in the answer below

We attempt to convert the problem P$1$ into an equivalent form by plugging the optimal solution of $\mathbf{z}$, i.e.,

\begin{align*} \text{minimize}_{\mathbf{x},\mathbf{z}} \quad & \lVert \mathbf{y} - \mathbf{A}\mathbf{x} - \mathbf{z} \rVert_2^2 + \lambda\lVert \mathbf{z} \rVert_1 \\ \equiv \\ \text{minimize}_{\mathbf{x}} \left\{ \text{minimize}_{\mathbf{z}} \right. \quad & \left. \lVert \mathbf{y} - \mathbf{A}\mathbf{x} - \mathbf{z} \rVert_2^2 + \lambda\lVert \mathbf{z} \rVert_1 \right\} \end{align*}

Taking derivative with respect to $\mathbf{z}$, \begin{align} 0 & \in \frac{\partial}{\partial \mathbf{z}} \left( \lVert \mathbf{y} - \mathbf{A}\mathbf{x} - \mathbf{z} \rVert_2^2 + \lambda\lVert \mathbf{z} \rVert_1 \right) \\ \Leftrightarrow & -2 \left( \mathbf{y} - \mathbf{A}\mathbf{x} - \mathbf{z} \right) + \lambda \partial \lVert \mathbf{z} \rVert_1 = 0 \\ \Leftrightarrow & \quad \left( \mathbf{y} - \mathbf{A}\mathbf{x} - \mathbf{z} \right) = \lambda \mathbf{v} \ . \end{align} for some $ \mathbf{v} \in \partial \lVert \mathbf{z} \rVert_1 $ following Ryan Tibshirani's lecture notes (slide#18-20), i.e., \begin{align} v_i \in \begin{cases} 1 & \text{if } z_i > 0 \\ -1 & \text{if } z_i < 0 \\ [-1,1] & \text{if } z_i = 0 \\ \end{cases}. \end{align} Then, the subgradient optimality reads: \begin{align} \begin{cases} \left( y_i - \mathbf{a}_i^T\mathbf{x} - z_i \right) = \lambda \ {\rm sign}\left(z_i\right) & \text{if } z_i \neq 0 \\ \left| y_i - \mathbf{a}_i^T\mathbf{x} - z_i\right| \leq \lambda & \text{if } z_i = 0 \end{cases} \end{align} Also, following, Ryan Tibsharani's notes the solution should be 'soft thresholding' $$\mathbf{z} = S_{\lambda}\left( \mathbf{y} - \mathbf{A}\mathbf{x} \right),$$ where \begin{align} S_{\lambda}\left( y_i - \mathbf{a}_i^T\mathbf{x} \right) = \begin{cases} \left( y_i - \mathbf{a}_i^T\mathbf{x} - \lambda \right) & \text{if } \left(y_i - \mathbf{a}_i^T\mathbf{x}\right) > \lambda \\ 0 & \text{if } -\lambda \leq \left(y_i - \mathbf{a}_i^T\mathbf{x}\right) \leq \lambda \\ \left( y_i - \mathbf{a}_i^T\mathbf{x} + \lambda \right) & \text{if } \left( y_i - \mathbf{a}_i^T\mathbf{x}\right) < -\lambda \\ \end{cases} . \end{align}

Now, we turn to the optimization problem P$1$ such that \begin{align*} \text{minimize}_{\mathbf{x}} \left\{ \text{minimize}_{\mathbf{z}} \right. \quad & \left. \lVert \mathbf{y} - \mathbf{A}\mathbf{x} - \mathbf{z} \rVert_2^2 + \lambda\lVert \mathbf{z} \rVert_1 \right\} \\ \equiv \end{align*}

\begin{align*} \text{minimize}_{\mathbf{x}} \quad & \lVert \mathbf{y} - \mathbf{A}\mathbf{x} - S_{\lambda}\left( \mathbf{y} - \mathbf{A}\mathbf{x} \right) \rVert_2^2 + \lambda\lVert S_{\lambda}\left( \mathbf{y} - \mathbf{A}\mathbf{x} \right) \rVert_1 \end{align*}

if $\lvert\left(y_i - \mathbf{a}_i^T\mathbf{x}\right)\rvert \leq \lambda$, then So, $\left[S_{\lambda}\left( y_i - \mathbf{a}_i^T\mathbf{x} \right)\right] = 0$.

the objective would read as $$\text{minimize}_{\mathbf{x}} \sum_i \lvert y_i - \mathbf{a}_i^T\mathbf{x} \rvert^2, $$ which is easy to see that this matches with the Huber penalty function for this condition. Agree?

if $\lvert\left(y_i - \mathbf{a}_i^T\mathbf{x}\right)\rvert \geq \lambda$, then $\left( y_i - \mathbf{a}_i^T\mathbf{x} \mp \lambda \right)$.

the objective would read as $$\text{minimize}_{\mathbf{x}} \sum_i \lambda^2 + \lambda \lvert \left( y_i - \mathbf{a}_i^T\mathbf{x} \mp \lambda \right) \rvert, $$ which almost matches with the Huber function, but I am not sure how to interpret the last part, i.e., $\lvert \left( y_i - \mathbf{a}_i^T\mathbf{x} \mp \lambda \right) \rvert$. Please suggest...

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2018-06-20 09:57:08

The idea is much simpler. Use the fact that $$\min_{\mathbf{x}, \mathbf{z}} f(\mathbf{x}, \mathbf{z}) = \min_{\mathbf{x}} \left\{ \min_{\mathbf{z}} f(\mathbf{x}, \mathbf{z}) \right\}.$$ In your case, the solution of the inner minimization problem is exactly the Huber function.

**Bumbble Comm** · Answer 2 · 2021-11-27 09:00:18

Consider the proximal operator of the $\ell_1$ norm $$ z^*(\mathbf{u}) = \mathrm{argmin}_\mathbf{z} \ \left[ \frac{1}{2} \| \mathbf{u}-\mathbf{z} \|^2_2 + \lambda \| \mathbf{z} \|_1 \right] = \mathrm{soft}(\mathbf{u};\lambda) $$

In your case, (P1) is thus equivalent to minimize \begin{eqnarray*} \phi(\mathbf{x}) &=& \lVert \mathbf{r} - \mathbf{r}^* \rVert_2^2 + \lambda\lVert \mathbf{r}^* \rVert_1 \\ &=& \sum_n |r_n-r^*_n|^2+\lambda |r^*_n| \end{eqnarray*} with the residual vector $\mathbf{r}=\mathbf{A-yx}$ and its soft-thresholded version $\mathbf{r}^*= \mathrm{soft}(\mathbf{r};\lambda/2) $.

Note further that $$ r^*_n = \left\lbrace \begin{array}{ccc} r_n-\frac{\lambda}{2} & \text{if} & r_n>\lambda/2 \\ 0 & \text{if} & |r_n|<\lambda/2 \\ r_n+\frac{\lambda}{2} & \text{if} & r_n<-\lambda/2 \\ \end{array} \right. $$

\noindent In the case $r_n>\lambda/2>0$, the summand writes $\lambda^2/4+\lambda(r_n-\frac{\lambda}{2}) = \lambda r_n - \lambda^2/4 $ \ In the case $|r_n|<\lambda/2$, the summand writes $|r_n|^2 $ \ In the case $r_n<-\lambda/2<0$, the summand writes $\lambda^2/4 - \lambda(r_n+\frac{\lambda}{2}) = -\lambda r_n - \lambda^2/4 $

Finally, we obtain the equivalent minimization problem $$ \phi(\mathbf{x}) =\sum_n \mathcal{H}(r_n) $$

Show that the Huber-loss based optimization is equivalent to $\ell_1$ norm based.

There are 2 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-ANALYSIS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions