Sparse group lasso derivation of soft thresholding operator via subgradient equations

Question

Sparse group lasso derivation of soft thresholding operator via subgradient equations

907 Views Asked by Bumbble Comm At 10 May 2026 - 4:11

In the sparse group lasso paper SGLpaper, they derive a condition for $\beta=0$ from the equation: $$ \frac{1}n {X^{(k)}}^T(y-\sum_{l=1}^m X^{(l)} \beta^{(l)}) = (1-\alpha)\lambda u+ \alpha \lambda v$$ where m is the number of groups. Similar to linear regression model, $X$ is the n by p design matrix,$X^{(l)}$ is the submatrix of design matrix with each $X^{(k)}$ an n by $p_l$ matrix, where $p_l$ is the number of covariates in group $l$, $Y$ is the n response vector, $\beta$ is the coefficients to be estimated, $\alpha$ and $\lambda$ nonnegative parameters that are smaller than or equal to 1. u and v are subgradients of $||\beta^{(k)} ||_2$ and $||\beta^{(k)} ||_1$ respectively evaluated at $\beta^{(k)}$. $$ u \in \{u:||u ||_2\ \leq1\}\ if B^{(k)}=0$$ $$ v_j \in \{v_j:|v_j|\ \leq1\}\ if B_j^{(k)}=0$$ They say 'With a little bit of algebra, we see that the subgradient equations are satisfied with $B^{(k)}=0$ if: $$ ||S({X^{(k)}}^T r_{(-k)}/n, \alpha \lambda)||_2 \leq (1-\alpha)\lambda$$ where $r_{(-k)}$ the partial residual of y, subtracting all group fits other than group k: $$r_{(-k)}=y-\sum_{l \neq k} X^{(l)} \beta^{(l)}$$ and with $S(·)$ the coordinate-wise soft thresholding operator: $$(S(z, \alpha \lambda))_j = sign(z_j)(|z_j| − \alpha \lambda)_+ .$$

I tried to derive same inequality but I ended up with these: $$||\frac{1}n {X^{(k)}}^T r_{(-k)} - \alpha \lambda [-1,1]||_2 \leq (1-\alpha)\lambda $$

$$||\ |\frac{1}n {X^{(k)}}^T r_{(-k)}| - \alpha \lambda ||_2 \leq (1-\alpha)\lambda $$ But I couldn't figure out how to apply soft thresholding operator.

I would really appreciate help on solving my question. Thanks very much.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

You already found $$||\frac{1}{n} {X^{(k)}}^T r_{(-k)} - \alpha \lambda [-1,1]||_2 \leq (1-\alpha)\lambda, \qquad (1) $$ where you can choose any value in $[-1,1]$ to satisfy the inequality, for each element of the vector separately. Let us see how you would select this value for the $j^{th}$ component. The goal is to satisfy inequality (1), so you want the component to be close to $0$. The value you select depends on $$\frac{1}{n}\left({X^{(k)}}^T r_{(-k)}\right)_j \quad (2)$$ You select $1$ (from the interval $[-1,1]$ when (2) is larger than $\alpha\lambda$, $-1$ when (2) is smaller than $-\alpha\lambda$, and (2) divided by $\alpha\lambda$ when (2) is in $[-\alpha\lambda,\alpha\lambda]$. The last displayed formula in your question does not take the last case into account, while the $(\cdot)_+$ operator does.

When (2) is larger than $\alpha\lambda$, the $j^{th}$ component equals: $$\frac{1}{n}\left({X^{(k)}}^T r_{(-k)}\right)_j-\alpha\lambda.$$ When (2) is smaller than $\alpha\lambda$, the $j^{th}$ component equals: $$\frac{1}{n}\left({X^{(k)}}^T r_{(-k)}\right)_j+\alpha\lambda.$$ When (2) is in $[-\alpha\lambda,\alpha\lambda]$, the $j^{th}$ component equals $0$. For each of these cases, it is easy to see that they equal $$\text{sign}\left(\frac{1}{n}\left({X^{(k)}}^T r_{(-k)}\right)_j\right)\left(\left|\frac{1}{n}\left({X^{(k)}}^T r_{(-k)}\right)_j\right|- \alpha \lambda\right)_+.$$ Note that the sign operator can be omitted, since it does not affect the value of the 2-norm in (1).

Sparse group lasso derivation of soft thresholding operator via subgradient equations

There are 1 best solutions below

Related Questions in CONVEX-OPTIMIZATION

Related Questions in LINEAR-REGRESSION

Related Questions in SUBGRADIENT

Trending Questions

Popular # Hahtags

Popular Questions