Consider $$ \min_{\mathbf{w} \in \mathbb{R}^d} \|\mathbf{X}^T\mathbf{w}\|_1 \qquad\text{subject to } \quad \|\mathbf{w}\|_2^2=1, $$ where $\mathbf{X}\in\mathbb{R}^{d\times m}$ is a set of $d$-dimensional points and $m > d$. How can I solve this problem?
Finding the unit vector minimizing the sum of the absolute values of the projections of a set of points
388 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
This isn't a completely rigorous proof, but starting from Ramanujan's comment that this is equivalent to finding
$$\min \sum_{k = 1}^{m} \left| \sum_{j = 1}^{d} x_{j k} w_j \right|\qquad \text{subject to }\sum_{j=1}^dw_j^2=1$$
it helps to think about it somewhat geometrically. The level curves of the objective function are polytopes in $d$-dimensional space. If the objective curve was equal to the minimum, it would only intersect the $(d-1)$-sphere where as many of $\sum_{j = 1}^{d} x_{j k} w_j$ are zero as possible (at corners). Specifically, there would be $d-1$ of these equal to $0$. Then from all these possible options, one arrangement would yield the minimum.
So in order to find the minimum, choose $d-1$ columns of $\mathbf{X}$ to have the dot product with $\mathbf w$ be $0$ (i.e. $\sum_{j = 1}^{d} x_{j k} w_j=0$ for $d-1$ values of $k$). Then you have $d$ equations with $\sum_{j=1}^dw_j^2=1$ and $d$ variables, so you can solve** for $\mathbf{w}$. In fact, it would be a set of linear equations along with $\sum_{j=1}^dw_j^2=1$, so you could use whatever linear algebra technique you like to solve for $\mathbf{w}$.
Since there are $\binom{m}{d-1}$ columns you can choose, this gives you just as many options for $\mathbf{w}$, so the answer would be the value of $\mathbf{w}$ that mimimizes the objective function.
**: It might not be that it can be solved for if the columns of $\mathbf{X}$ are dependent, but if they are, you could delete the column that can depend on others and adjust the values in those other columns. So no matter what, the problem can be made into the case for when columns of $\mathbf{X}$ are linearly independent.
We can use a standard trick to get rid of the nondifferentiability. The objective is $\|X^Tw\|_1=\sum_j|w^Tx_j|$. Each term can be written as $\min t_j$ s.t. $-t_j\le w^Tx_j\le t_j$. So the problem is equivalent to $$\min_{t,w} 1^Tt\\\text{s.t. }\|w\|_2=1,\\-t\preceq X^Tw\preceq t.$$ The problem can now be passed to a standard nonlinear programming routine.