How to minimize $\| x {\bf a} - {\bf b} \|_1$ without using linear programming?

325 Views Asked by At

Note: The following question is a generalization of a question asked earlier today.


Given vectors ${\bf a}, {\bf b} \in \mathbb R^n$, can one solve the following minimization problem without using linear programming? If so, how?

$$\begin{array}{ll} \underset{x \in {\Bbb R}}{\text{minimize}} & \| x {\bf a} - {\bf b} \|_1 \end{array}$$

If ${\bf a} = {\bf 1}_n$, one can use the median. If ${\bf a} = \begin{bmatrix} 1 & 2 & \cdots & n\end{bmatrix}^\top$, Siong showed that one can also use the median. What can one do in the general case?

2

There are 2 best solutions below

1
On

I'm not sure if this counts as "without using linear programming", but it's at least relatively fast (it has runtime $O(n \log n)$).

Let $f$ be the objective function. Notice that $f(x)=\sum_{i=1}^n |a_i x - b_i|$ is piecewise linear, and also (non-strictly) convex, and so the slope of $f$ is a (non-strictly) increasing function. The minimum of $f$ will occur either on an interval where the slope is zero, or at a point where it switches from positive to negative. We can proceed as follows.

  1. Compute all the points of nonlinearity $b_i/a_i$ ($O(n)$) and sort them ($O(n \log n)$). Call the sorted values $x_1,x_2\dots,x_n$.

  2. Let $k=\left\lfloor\frac{n}{2}\right\rfloor$ and compute the slope of $f$ on the interval of linearity $[x_k,x_{k+1}]$ ($O(n)$). If this slope is positive, we're to the right of the minimum; if it's negative, we're to the left of the minimum.

  3. Perform a binary search, doing step 2) $\log n$ more times with different values of $k$ ($O(n\log n)$). Eventually you will find some $x_\ell$ such that either $f$ has slope zero on $[x_\ell,x_{\ell+1}]$, or the slope is negative on $[x_{\ell-1},x_\ell]$ but positive on $[x_\ell,x_{\ell+1}]$. Then $f(x_\ell)$ is your minimum value.

If you walked through adjacent values of $x_k$ instead of doing a binary search, you would essentially be minimizing $f$ via the simplex method, which is why I'm not totally sure this isn't linear programming. But it does seem like the binary search essentially exploits the one-dimensionality of the problem.

7
On

EDIT: Here are two methods.

Your function is always convex, proper, and continuous (hence lower-semicontinuous). Under mild conditions on your problem -- e.g. when $a\neq\mathbf{0}$ -- your function is coercive which guarantees existence of a minimizer (also if $a=\mathbf{0}$ then the problem is trivial and every number is a minimizer). Since we are now qualified to find a minimizer, let us proceed.

Method 1: The function $g\colon x\mapsto \|ax-b\|_1$ can be viewed as a linear operator applied to $x$ followed by application of a translated $1$-norm, i.e., $g= f\circ L$, where $L\colon\mathbb{R}\to\mathbb{R}^N\colon x\mapsto ax$ and $f=\|\cdot-b\|_1 $.

The goal here is to use a product-space reformulation e.g., described in Section 3.1 here and a splitting algorithm (e.g., Douglas-Rachford / ADMM). The reformulation would replace minimizing $f\circ L$ over $\mathbb{R}^n$ with minimizing $f(x) + \iota_G(y) + \iota_D(x,y)$, over $\mathbb{R}^n\times\mathbb{R}^n$, where $\iota_C$ is an $0$-$\infty$ indicator function, $G$ is the graph of $L$, and $D$ is the diagonal subspace (mentioned in the product space reformulation literature). Now, to use these algorithms, you need $\textrm{prox}_f$ and the projection onto $G$.

Regarding the prox of $f$, note that for any $\lambda\in\left]0,+\infty\right[$, the proximity operator of $\lambda \|\cdot\|_1$, is the component-wise soft thresholder with parameter $\lambda$ which I'll call $\textrm{soft}_\lambda$. The following propositions can be found in Bauschke & Combettes' book, 2nd edition. It follows from Proposition 24.8(ii) that $\textrm{prox}_{\lambda f}(x)=b+\textrm{soft}_{\lambda}(x-b)$. Several formulae for the projection onto $G$ are available in the same book, page 540.

Method 2: The subgradient descent algorithm (using subgradients of the objective function) is also guaranteed to work if we manage the stepsizes properly. The relevant literature vocabulary here is "subgradient projectors", and a good reference is Section 29.6 of Bauschke & Combettes' book (2nd edition).