In Friedman, Hastie and Simon (2013) an algorithm is proposed for a group-LASSO penalized regression possibly involving many variables. The problem is as follows:
$\underset{\beta}{min}\{ \frac{1}{2}|| Y - X \beta ||_2^2 + \lambda \sum_{k \leq p} ||\beta_{k.}||_2\}$
with $Y$, an $n \times M$ matrix, $X$, an $n \times p$ matrix and $||a||_2^2 := \sqrt{a^Ta} $. Let use also note $X_{.k}$ as the $k^{th}$ column of a matrix $X$ and, $X_{k.}$ as its $k^{th}$ row.
They propose to take as given all $\beta_{j.}$ for $j \neq k$, except $\beta_{k.}$, hence we get the objective:
$\underset{\beta}{min}\{ \frac{1}{2}|| R_{-k} - X_{.k} \beta_{k.} ||_2^2 + \lambda ||\beta_{k.}||_2\}$
where $R_{-k} := Y - \sum_{j \neq k} X_{.j}\beta_{j.}$. From here, we can take a subderivative with respect to $\beta_{j.}$ so that our solution $\hat{\beta}_{k.}$ satisfies:
$- X_{.k}^TR_{-k} + ||X_{.k}||_2^2 \hat{\beta}_{k.} + \lambda S(\hat{\beta}_{k.}) = 0$
where $S(a) \begin{cases} = \frac{a}{||a||_2} \text{ } if a \neq 0 \\ \in \{u \; \; \text{s.t.} \; \; ||u||_2 \leq 1 \} \end{cases}$
Now, I have a questions.
It concerns how you go from the FOC-like equation above to this equation:
$\hat{\beta}_{k.} = \frac{1}{||X_{.k}||_2^2} max\{0, 1 - \frac{\lambda}{||X_{.k}^T R_{-k}||_2} \} X_{.k}^T R_{-k}$
i.e., how do you isolate the solution vector in the FOC-like equation involving the subderivative? On page 4, they call it "simple algebra."
It's important that I understand it because I need to apply the same algorithm in a slightly different context. Instead of having a penalty of the form
$\lambda ||\beta_{k.}||_2$
I have one of the form
$\lambda ||W_{k.}||_2$
where $W := Q_n \beta$. If it is of any interest, I am an economist. I know some mathematics, but I am not a mathematician by any stretch of the imagination.
Here is a sketch of the missing algebra.
Given the FOC in terms of the row vector $a^T$
$$\eqalign{ (x^Tx)a^T + \Big(\frac{\lambda}{\sqrt{a^Ta}}\Big) a^T = x^TR \\ \Bigg(x^Tx + \frac{\lambda}{\sqrt{a^Ta}}\Bigg) a^T = r^T \\ \Bigg(\chi^2 + \frac{\lambda}{\alpha}\Bigg) a^T = r^T \\ }$$ So $a$ is seen to be a scalar multiple of $r$.
Square both sides and solve for the unknown scalar $\alpha$. $$\eqalign{ \Bigg(\chi^2 + \frac{\lambda}{\alpha}\Bigg)^2 \alpha^2 = \rho^2 \\ \Big(\alpha\chi^2 + \lambda\Big)^2 = \rho^2 \\ \alpha = \frac{\rho-\lambda}{\chi^2} \\ }$$ Substituting $\alpha$ recovers the $a$ vector in terms of the known quantities.
To relate this all to the current problem let $$\eqalign{ a^T &= {\hat\beta},\quad &x = X_k,\quad &r^T = X_k^TR_k \\ \alpha^2 &= \|a\|^2,\quad &\chi^2 = \|x\|^2 ,\quad &\rho^2 = \|r\|^2 \\ }$$