Vector subderivatives and "simple algebra" which turn out not to be so simple

Question

Vector subderivatives and "simple algebra" which turn out not to be so simple

85 Views Asked by Bumbble Comm At 27 Mar 2026 - 2:10

In Friedman, Hastie and Simon (2013) an algorithm is proposed for a group-LASSO penalized regression possibly involving many variables. The problem is as follows:

$\underset{\beta}{min}\{ \frac{1}{2}|| Y - X \beta ||_2^2 + \lambda \sum_{k \leq p} ||\beta_{k.}||_2\}$

with $Y$, an $n \times M$ matrix, $X$, an $n \times p$ matrix and $||a||_2^2 := \sqrt{a^Ta} $. Let use also note $X_{.k}$ as the $k^{th}$ column of a matrix $X$ and, $X_{k.}$ as its $k^{th}$ row.

They propose to take as given all $\beta_{j.}$ for $j \neq k$, except $\beta_{k.}$, hence we get the objective:

$\underset{\beta}{min}\{ \frac{1}{2}|| R_{-k} - X_{.k} \beta_{k.} ||_2^2 + \lambda ||\beta_{k.}||_2\}$

where $R_{-k} := Y - \sum_{j \neq k} X_{.j}\beta_{j.}$. From here, we can take a subderivative with respect to $\beta_{j.}$ so that our solution $\hat{\beta}_{k.}$ satisfies:

$- X_{.k}^TR_{-k} + ||X_{.k}||_2^2 \hat{\beta}_{k.} + \lambda S(\hat{\beta}_{k.}) = 0$

where $S(a) \begin{cases} = \frac{a}{||a||_2} \text{ } if a \neq 0 \\ \in \{u \; \; \text{s.t.} \; \; ||u||_2 \leq 1 \} \end{cases}$

Now, I have a questions.
It concerns how you go from the FOC-like equation above to this equation:

$\hat{\beta}_{k.} = \frac{1}{||X_{.k}||_2^2} max\{0, 1 - \frac{\lambda}{||X_{.k}^T R_{-k}||_2} \} X_{.k}^T R_{-k}$

i.e., how do you isolate the solution vector in the FOC-like equation involving the subderivative? On page 4, they call it "simple algebra."

It's important that I understand it because I need to apply the same algorithm in a slightly different context. Instead of having a penalty of the form

$\lambda ||\beta_{k.}||_2$

I have one of the form

$\lambda ||W_{k.}||_2$

where $W := Q_n \beta$. If it is of any interest, I am an economist. I know some mathematics, but I am not a mathematician by any stretch of the imagination.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Here is a sketch of the missing algebra.

Given the FOC in terms of the row vector $a^T$
$$\eqalign{ (x^Tx)a^T + \Big(\frac{\lambda}{\sqrt{a^Ta}}\Big) a^T = x^TR \\ \Bigg(x^Tx + \frac{\lambda}{\sqrt{a^Ta}}\Bigg) a^T = r^T \\ \Bigg(\chi^2 + \frac{\lambda}{\alpha}\Bigg) a^T = r^T \\ }$$ So $a$ is seen to be a scalar multiple of $r$.
Square both sides and solve for the unknown scalar $\alpha$. $$\eqalign{ \Bigg(\chi^2 + \frac{\lambda}{\alpha}\Bigg)^2 \alpha^2 = \rho^2 \\ \Big(\alpha\chi^2 + \lambda\Big)^2 = \rho^2 \\ \alpha = \frac{\rho-\lambda}{\chi^2} \\ }$$ Substituting $\alpha$ recovers the $a$ vector in terms of the known quantities.

To relate this all to the current problem let $$\eqalign{ a^T &= {\hat\beta},\quad &x = X_k,\quad &r^T = X_k^TR_k \\ \alpha^2 &= \|a\|^2,\quad &\chi^2 = \|x\|^2 ,\quad &\rho^2 = \|r\|^2 \\ }$$

Vector subderivatives and "simple algebra" which turn out not to be so simple

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in DERIVATIVES

Related Questions in NUMERICAL-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions