Strict convexity of penalized regression function

145 Views Asked by At

I have a quick question about a comment that R. Tibshirani makes in this paper right at the beginning. He says that the LASSO solution is unique when $\text{rank}(X)=p$ by strict convexity of the criterion. I just want to confirm if my understanding as to why the criterion is strictly convex is correct.

This amounts to showing that $f(\vec{\beta})=\frac{1}{2}||\vec{y}-X\vec{\beta}||_2^2+\lambda||\vec{\beta}||_1$ is a strictly convex function when $\text{rank}(X)=p$. First, we know that $|| \cdot ||_2^2$ is a strictly convex function and that $|| \cdot ||_1$ is a convex function. Thus for $\theta \in (0,1)$ and $x \neq y$ we know that $||\theta x + (1-\theta)y||_2^2<\theta ||x||_2^2 +(1-\theta)||y||_2^2$ and $||\theta x + (1-\theta)y||_1 \leq \theta ||x||_1 +(1-\theta)||y||_1$.

So, let $\theta \in (0,1)$ with $\vec{\beta}_1 \neq \vec{\beta}_2$. Then we have

\begin{align} f\left(\theta\vec{\beta}_1+(1-\theta)\vec{\beta_2}\right) & = \frac{1}{2}||\vec{y}-X(\theta\vec{\beta}_1+(1-\theta)\vec{\beta}_2)||_2^2+\lambda||\theta\vec{\beta}_1+(1-\theta)\vec{\beta}_2||_1\\ & = \frac{1}{2}||\theta(\vec{y}-X\vec{\beta}_1)+(1-\theta)(\vec{y}-X\vec{\beta}_2)||_2^2+\lambda||\theta\vec{\beta}_1+(1-\theta)\vec{\beta}_2||_1. \end{align}

Now since $\text{rank}(X)=p$ ($X$ is full column rank) we know that that if $\vec{\beta}_1 \neq \vec{\beta}_2$ then it must be the case that $X\vec{\beta}_1 \neq X\vec{\beta}_2$. Thus $\vec{y}-X\vec{\beta}_1 \neq \vec{y}-X\vec{\beta}_2$. By the strict convexity of the squared euclidean norm we can thus conclude that

$$ ||\theta(\vec{y}-X\vec{\beta}_1)+(1-\theta)(\vec{y}-X\vec{\beta}_2)||_2^2<\theta||\vec{y}-X\vec{\beta}_1||_2^2+(1-\theta)||\vec{y}-X\vec{\beta}_2||_2^2. $$

So continuing from above, and using convexity of $|| \cdot ||_1$ we have that

\begin{align} \frac{1}{2}||\theta(\vec{y}-X\vec{\beta}_1)+(1-\theta)(\vec{y}-X\vec{\beta}_2)||_2^2+\lambda||\theta\vec{\beta}_1+(1-\theta)\vec{\beta}_2||_1 & <\frac{1}{2}(\theta||\vec{y}-X\vec{\beta}_1||_2^2+(1-\theta)||\vec{y}-X\vec{\beta}_2||_2^2)+\lambda(\theta||\vec{\beta}_1||_1+(1-\theta)||\vec{\beta}_2||_1)\\ & = \theta(\frac{1}{2}||\vec{y}-X\vec{\beta}_1||_2^2+\lambda||\vec{\beta}_1||_1)+(1-\theta)(\frac{1}{2}||\vec{y}-X\vec{\beta}_2||_2^2+\lambda||\vec{\beta}_2||_1)\\ & = \theta f(\vec{\beta}_1)+(1-\theta)f(\vec{\beta}_2) \end{align}

and we have strict convexity.

1

There are 1 best solutions below

1
On BEST ANSWER

In Lasso, we have $f(w)=g(w)+h(w)$ where $g(w)$ is the squared error and is strictly convex when columns of $X$ are independent ($rank(X)=p$). $h(w)$ is L1 norm and hence, convex. You can show that sum of a strictly convex function and a convex function is strictly convex.
Proof: \begin{equation}\label{g} g(\alpha w_1 + (1-\alpha)w_2) < \alpha g(w_1) + (1-\alpha)g(w_2) \end{equation} \begin{equation}\label{h} h(\alpha w_1 + (1-\alpha)w_2) \leq \alpha h(w_1) + (1-\alpha)h(w_2) \end{equation} Add the above two eqns to get, \begin{equation}\label{f} f(\alpha w_1 + (1-\alpha)w_2) < \alpha f(w_1) + (1-\alpha)f(w_2) \end{equation} If you want to show strict convexity of $g(w)$, use Hessian of $g(w)=X^TX$ is positive definite when $rank(X)=p$. The proof you have is also correct.