I am reading Lecture notes on sparsity by Sara van de Geer and trying to understand the idea behind the compatibility constant.
Suppose that $\beta\in\mathbb R^p$. Let $S\subset\{1,\ldots,p\}$ be an index set and let us denote by $\beta_{S}$ a $p$-vector with entries equal to zero at the indexes $j\not\in S$, i.e. $\beta_{j,S}=\beta_j1\{j\in S\}$ for $j=1,\ldots,p$. $\beta_{-S}$ denotes the vector $\beta_{S^c}$. We have $\beta=\beta_S+\beta_{-S}$ for any vector $\beta\in\mathbb R^p$. The following definition is given in the notes (Subsection 1.6).
For a constant $L>1$ and an index set $S\subset\{1,\ldots,p\}$, the compatibility constant is defined by $$ \hat\phi^2(L,S)=\min\biggl\{\frac{|S|}n\|X\beta_S-X\beta_{-S}\|_2^2:\|\beta_S\|_1=1,\ \|\beta_{-S}\|_1\le L\biggr\}, $$ where $X\in\mathbb R^{n\times p}$ is the design matrix, $\|\cdot\|_2$ is the Euclidean norm and $\|\cdot\|_1$ is the $\ell_1$-norm.
It is written that this is a notion of compatibility between the $\ell_1$-norm and the Euclidean norm. However, I do not understand what this precisely means. What is idea behing the compability constant? How does this constant relate the $\ell_1$-norm to the Euclidean norm?
They claim that the inequality (inequality (1.11) in the notes) $$ \|\hat\beta_S-\beta\|_1\le\frac{\sqrt{|S|}\|X(\hat\beta-\beta)\|_2}{n\hat\phi(L,S)} $$ follows from the definition of the compatiblity constant, where $X\in\mathbb R^{n\times p}$ is the design matrix and $\hat\beta\in\mathbb R^p$ is the Lasso estimator. Could anyone provide some details how this inequality follows from the definition?
Any help is much appreciated!
Let me rewrite the definition of compatibility constant as it follows $$ \hat\phi^2(L,S)=\min\biggl\{\frac{|S|}n\|X\delta_S-X\delta_{-S}\|_2^2:\|\delta_S\|_1=1,\ \|\delta_{-S}\|_1\le L\biggr\}. $$ Now, notice that we have $$ \|\hat{\beta}_{-S} - \beta_{-S}\|_1 \leq L \|\hat{\beta}_{S} - \beta_{S}\|_1. $$ Denote $\hat{\delta}_S = \beta_{S} - \hat{\beta}_{S}$ and $\hat{\delta}_{-S} = \hat{\beta}_{-S} - \beta_{-S}$, therefore $$ \|\hat{\delta}_{-S}\| \leq L \|\hat{\delta}_S\|_1. $$ Introduce one more vector $\delta_S = \hat{\delta}_S / \|\hat{\delta}_S\|_1$ and $\delta_{-S} = \hat{\delta}_{-S} / \|\hat{\delta}_S\|_1$, we know
$$ \|\delta_{-S}\|_1 \leq L, \\ \|\delta_{S}\|_1 = 1, $$ hence by definition of compatibility constant we have $$ \hat\phi^2(L,S) \leq \frac{|S|}n\|X\delta_S-X\delta_{-S}\|_2^2 = \frac{|S|\|X\hat \delta_S-X\hat \delta_{-S}\|_2^2}{n\|\hat{\delta}_S\|_1^2}, $$ it is sufficient to notice that $\beta_{-S} = 0$ and $X\hat \delta_S-X\hat \delta_{-S} = X\beta_{S} - X\hat{\beta}_{S} - X\hat\beta_{-S} = X(\beta - \hat\beta)$