I am going through a survey from Emmanuel Candes on compressed sensing [1]. The setup behind compressed sensing can be described as follows:
Consider the optimization problem in $\mathbb{R}^d$:
$$\min \|x\|_1$$ $$\text{subject to: } Ax = y$$
Let a solution to this optimization problem be a sparse vector $x^*$ with nonzero entries indexed by a set $T$. Given a feasible point to the above optimization problem, we would like to show that this point is optimal. In order to do so, [1] introduces a "ansatz" optimization problem at the beginning of section 4:
$$\min \|v\|_2 $$ $$\text{subject to: } v \perp \text{null}(A) \text{ and } P_Tv = \text{sign}(x^*)$$
where $T$ is the subspace of vectors with the same support as $x^*$ and $A$ is a linear map such that the $\text{null}(A) \cap T = \{0\}$.
$P_Lu$ is the projection of $u$ onto the subspace $L$. $\text{sign}(x)_i = x_i/|x_i|$ if $x_i \neq 0$ and $0$ otherwise.
[1] mentions if there exists a feasible $v$ for the ansatz problem which also satisfies $\|P_{T^\perp} v\|_\infty < 1$, then $x^*$ is optimal for the original optimization problem, but I think we dont need a strict inequality here ($T^\perp$ is the orthogonal complement of $T$, the vectors supported at different indices than $x^*$.) We can show this by applying the KKT conditions to check the optimality of $x^*$:
$$\mathcal{L}(x, \lambda) = \|x\|_1 + \lambda^T(Ax - y)$$
$$\partial \|x\|_1 + A^T\lambda = 0$$ So $x^*$ is optimal if there exists a $u \in \text{range}(A)$ that is also an element of the subdifferential of $\|\cdot\|_1$ at $x^*$. This is exactly the conditions of the ansatz problem in addition to $\|P_{T^\perp} v\|_\infty \leq 1$ (note the inequality rather than strict inequality).
This paper mentions that they introduce this ansatz problem since it can be studied analytically and "by minimizing its Euclidean norm we hope to make its dual norm small as well". But the dual of the Euclidean norm is also the Euclidian norm (not the $1$ or $\infty$ norm) so I'm unsure what they are really after.
Why is this ansatz problem even introduced? In particular, why minimize the $\ell_2$ norm at all? It seems we just want a feasible point.
EDIT: Reading on, this problem is introduced in order to compute some feasible point in closed form. Such a point is amenable to the analysis that follows, showing that such a certificate exists with high probability.
[1] Candès, Emmanuel J. "Mathematics of sparsity (and a few other things)." Proceedings of the International Congress of Mathematicians, Seoul, South Korea. Vol. 123. Citesee, 2014.