The compatibility constant

377 Views Asked by At

I am reading Lecture notes on sparsity by Sara van de Geer and trying to understand the idea behind the compatibility constant.

Suppose that $\beta\in\mathbb R^p$. Let $S\subset\{1,\ldots,p\}$ be an index set and let us denote by $\beta_{S}$ a $p$-vector with entries equal to zero at the indexes $j\not\in S$, i.e. $\beta_{j,S}=\beta_j1\{j\in S\}$ for $j=1,\ldots,p$. $\beta_{-S}$ denotes the vector $\beta_{S^c}$. We have $\beta=\beta_S+\beta_{-S}$ for any vector $\beta\in\mathbb R^p$. The following definition is given in the notes (Subsection 1.6).

For a constant $L>1$ and an index set $S\subset\{1,\ldots,p\}$, the compatibility constant is defined by $$ \hat\phi^2(L,S)=\min\biggl\{\frac{|S|}n\|X\beta_S-X\beta_{-S}\|_2^2:\|\beta_S\|_1=1,\ \|\beta_{-S}\|_1\le L\biggr\}, $$ where $X\in\mathbb R^{n\times p}$ is the design matrix, $\|\cdot\|_2$ is the Euclidean norm and $\|\cdot\|_1$ is the $\ell_1$-norm.

It is written that this is a notion of compatibility between the $\ell_1$-norm and the Euclidean norm. However, I do not understand what this precisely means. What is idea behing the compability constant? How does this constant relate the $\ell_1$-norm to the Euclidean norm?

They claim that the inequality (inequality (1.11) in the notes) $$ \|\hat\beta_S-\beta\|_1\le\frac{\sqrt{|S|}\|X(\hat\beta-\beta)\|_2}{n\hat\phi(L,S)} $$ follows from the definition of the compatiblity constant, where $X\in\mathbb R^{n\times p}$ is the design matrix and $\hat\beta\in\mathbb R^p$ is the Lasso estimator. Could anyone provide some details how this inequality follows from the definition?

Any help is much appreciated!

1

There are 1 best solutions below

5
On BEST ANSWER

Let me rewrite the definition of compatibility constant as it follows $$ \hat\phi^2(L,S)=\min\biggl\{\frac{|S|}n\|X\delta_S-X\delta_{-S}\|_2^2:\|\delta_S\|_1=1,\ \|\delta_{-S}\|_1\le L\biggr\}. $$ Now, notice that we have $$ \|\hat{\beta}_{-S} - \beta_{-S}\|_1 \leq L \|\hat{\beta}_{S} - \beta_{S}\|_1. $$ Denote $\hat{\delta}_S = \beta_{S} - \hat{\beta}_{S}$ and $\hat{\delta}_{-S} = \hat{\beta}_{-S} - \beta_{-S}$, therefore $$ \|\hat{\delta}_{-S}\| \leq L \|\hat{\delta}_S\|_1. $$ Introduce one more vector $\delta_S = \hat{\delta}_S / \|\hat{\delta}_S\|_1$ and $\delta_{-S} = \hat{\delta}_{-S} / \|\hat{\delta}_S\|_1$, we know

$$ \|\delta_{-S}\|_1 \leq L, \\ \|\delta_{S}\|_1 = 1, $$ hence by definition of compatibility constant we have $$ \hat\phi^2(L,S) \leq \frac{|S|}n\|X\delta_S-X\delta_{-S}\|_2^2 = \frac{|S|\|X\hat \delta_S-X\hat \delta_{-S}\|_2^2}{n\|\hat{\delta}_S\|_1^2}, $$ it is sufficient to notice that $\beta_{-S} = 0$ and $X\hat \delta_S-X\hat \delta_{-S} = X\beta_{S} - X\hat{\beta}_{S} - X\hat\beta_{-S} = X(\beta - \hat\beta)$