Given group lasso penalty that is defined as $$g(W)= \lambda \sum_j \|W_j\|_2$$ is it possible to drive an smooth approximation this group penalty. I looked at following paper by Nestrove, but it was not easy to understand it.
Smooth minimization of non-smooth functions
I would like to solve this optimization problem very efficently. \begin{equation} \hat{W}={arg\,min}_W \frac{1}{2}||Y-XW||^2 + \lambda ||W||_1 + \gamma \sum_{i=1}^{k} ||W_i||_2 \end{equation} Whre $W \in R{M \times T}$ , $W_i$ $ith$ row of $W$ and $Y \in R^{N \times T}$. I would like to write original problem as : \begin{equation} \hat{W}={arg\,min}_W g(W) + \lambda ||W||_1 + h(W) \end{equation} Where $g(W) =\frac{1}{2}||Y-XW||^2$, $h(W)$: Smooth approximation of $\gamma \sum_{i=1}^{k} ||W_i||_2$, then $F(W)=g(W)+h(W)$ will be differential function and I can easily compute it's gradient.This can help me to solve original problem : $$F(W) + \lambda ||W||_1$$ using proximal gradient decent, since proximal of $||W||_1$ is known and can be computed very fast.
For a compact convex subset $K$ of a finite dimensional Hilbert space $X$ and a bounded linear operator $A: Z \rightarrow X^*$ operator, consider the composite function $f =\sigma_K \circ A$, where $\sigma_K$ is the support of $K$, defined by $$ \sigma_K(z) := \sup_{y \in K}\langle z, y\rangle,\; \forall z \in X. $$
Now, Nestorov has introduced a class of smoothing approximations defined by
$$ f_\mu := \sup_{y \in K}\langle Ax, y\rangle - \frac{1}{2}\mu\|y\|_2^2, $$ for a smoothing parameter $\mu > 0$.
It's not too hard to see that
Thus if $K$ is "simple" (in the sense that $\text{Proj}_{K}$ can be easily applied) then you can imagine a very simple and intuitive algorithm for minimizing $f$: minimize $f_\mu$, reduce $\mu$, minimize $f_\mu$, reduce $\mu$, ... (until some desired tolerance is hit)
This technique is called Nesterov's smoothing method for functions of "max-type", and can be used in areas as diverse as game theory (solving for Nash equilibria for incomplete information sequential zero-sum games) and image processing (total-variation minimization, etc.)
Returning to the original question: Group Lasso
In the case of the group Lasso, you can take $X = \mathbb R^{p \times g}$ and $K := \{z \in X | \|z\|_{2,1} \le 1\}$ (unit ball for the $\ell_{2,1}$ mixed norm) and $A = \text{ identity matrix}$.
I hope this helps!