Analytic solution of $\underset{x \in [0, 1]^m,\;\|x-a\|_1 \le \varepsilon}{\text{arg}\max}\;x^Tb$ where $a \in [0, 1]^m$ and $b \in \mathbb R^m$

126 Views Asked by At

Let $\varepsilon > 0$, $a \in [0, 1]^m$, and $b \in \mathbb R^m$.

Question. Is there an analytic formula for the solution of the problem $$ \underset{x \in [0, 1]^m,\;\|x-a\|_1 \le \varepsilon}{\text{arg}\max}\;x^Tb\; ? \tag{*} $$

Observations

  • If $a \in (0, 1)^m$ and $\varepsilon$ is sufficiently small, say $\varepsilon \le \min_i\min(a_i,1-a_i)$, then $B_{\ell_1}(a,\varepsilon) \subseteq [0,1]^m$ and so the above problem reduces to $\max_{x \in B_{\ell_1}(a,\varepsilon)}x^Tb = \varepsilon\|b\|_\infty$, attained at $x^*=a+\varepsilon \delta_{b_{i^*}}$ where $i^* \in \text{arg}\max_i|b_i|$.
  • If $\varepsilon$ is sufficiently large, then $B_{\ell_1}(a,\varepsilon) \supseteq [0,1]^m$ and so the above problem reduces to $\max_{x \in [0,1]^m}x^Tb = \sum_{1 \le i \le m}(b_i)_+$, attained at $x^* \in [0,1]^m$ given by $x^*_i = \begin{cases}1, &\mbox{ if }b_i > 0,\\0,&\mbox{ else.}\end{cases}$
1

There are 1 best solutions below

0
On

Diclaimer: OK, I'm going to answer my own question, under the additional condition that $b \in \{-1,1\}^m$, which is the case I really care about. The answer turns out to be really tricky. In any case, the good news is that there are analytic formulae for both the optimal objective value and the point at which it is attained.


Theorem. Consider the special case of the problem (*) in which $b \in \{-1,1\}^m$. Let $c_j=(b_j+1)/2$ be a flag which tells whether $b_j=1$ or $b_j=-1$, and $n_+=\sum_{j=1}^m c_j$ be the number of indices such that $b_j=1$. Finally, define $$ r := \begin{cases}\dfrac{\varepsilon}{n_+ - a^Tb} \in [0, 1),&\mbox{ if }\varepsilon < n_+ - a^Tb,\\1,&\mbox{ else.}\end{cases} $$

Then, the optimal objective value equals $\min(n_+,a^Tb+\varepsilon)$, and is attained at $x^* = (1-r)a + rc$.

We split the proof into two parts.

Part I: Computing the optimal objective value

Using the method of Lagrange multiplies gives

$$ \sup_{x \in [0, 1]^m,\; \|x-a\|_1 \le \varepsilon}x^Tb = \inf_{\lambda > 0}\lambda\varepsilon + h(\lambda), \text{ where }h(\lambda) := \sup_{x \in [0, 1]^m} x^Tb - \lambda\|x-a\|_1. \tag{1} $$

Now, by duality of the $\ell_1$ and $\ell_\infty$ norms, one has $-\lambda\|x-a\|_1 = \inf_{\|z\|_\infty \le \lambda}z^T(x-a)$. Thus, one computes

$$ \begin{split} h(\lambda) &= \sup_{x \in [0, 1]^m}x^Tb + \inf_{\|z\|_\infty \le \lambda}z^T(x-a) = \inf_{\|z\|_\infty \le \lambda}-a^Tz + \sup_{x \in [0, 1]^m}x^T(z + b) \\ &=\inf_{\|z\|_\infty \le \lambda}\sum_{j=1}^m (z_j+b_j)_+-a_jz_j. \end{split} \tag{2} $$

This problem is separable in the coordinates of the sought-for variable $z$; the $j$ problem is

$$ h_j(\lambda) = \inf_{-\lambda \le z_j \le \lambda}(z_j+b_j)_+-a_jz_j = \min(h_j^+(\lambda),h_j^-(\lambda)), $$

where $h_j^+(\lambda) := \inf_{-\lambda \le z_j \lambda,\;z_j \ge -b_j} (z_j+b_j)_+-a_jz_j$ and $h_j^-(\lambda) := \inf_{-\lambda \le z_j \lambda,\;z_j < -b_j} (z_j+b_j)_+-a_jz_j$

Case 1: $|b_j| \le \lambda$

Here one computes $$ \begin{split} h_j^+(\lambda) &:= \inf_{-\lambda \le z_j \lambda,\;z_j \ge -b_j} (z_j+b_j)_+-a_jz_j = \inf_{-b_j \le z_j \le \lambda} b_j + (1-a_j)z_j\\ &= b_j - (1-a_j)b_j = a_jb_j, \end{split} $$ and $ h_j^+(\lambda) := \inf_{-\lambda \le z_j \lambda,\;z_j < -b_j} (z_j+b_j)_+-a_jz_j = \inf_{-\lambda \le z_j \le -b_j} -a_jz_j = a_jb_j$.

Thus $h_j(\lambda) = a_jb_j$, attained at $z_j=-b_j=-\text{sign}(b_j)\min(|b_j|,\lambda)$.

Case 2: $b_j > \lambda$

Here, we have $z_j \ge -b_j$ for all $z_j \in [-\lambda,\lambda]$, and so $h_j^+(\lambda) = \inf_{-\lambda \le z_j \le \lambda} b_j + (1-a_j)z_j = b_j - (1-a_j)\lambda$ and $h_j^-(\lambda) = \infty$. Thus $h_j(\lambda)=\min(h_j^+(\lambda),h_j^-(\lambda)) = b_j - (1-a_j)\lambda$, attained at $z_j = -\lambda=-\text{sign}(b_j)\min(|b_j|,\lambda)$.

Case 3: $b_j < -\lambda$

Here we have $z_j < b_j$ for all $z_j \in [-\lambda,\lambda]$, and so $h_j^+(\lambda)=\infty$ and $h_j^-(\lambda) = \inf_{-\lambda \le z_j \le \lambda} -a_jb_j=-a_j\lambda$. Thus, $h_j(\lambda) = \min(h_j^+(\lambda),h_j^-(\lambda)) = -a_j\lambda$, attained at $z_j = \lambda = -\text{sign}(b_j)\min(|b_j|,\lambda)$

Putting things together, we then have $$ \begin{split} h(\lambda) &= -\lambda\sum_{b_j < -\lambda}a_j + \sum_{|b_j| \le \lambda} a_jb_j + \sum_{b_j > \lambda}b_j - (1-a_j)\lambda\\ &= \sum_{j=1}^m (b_j-\lambda)_+ + a_j\text{sign}(b_j)\min(|b_j|,\lambda), \end{split} $$ and so the original problem (1) reduces to the following 1D piecewise-linear optimizaiton problem

$$ \max_{x \in [0, 1]^m,\;\|x-a\|_1\le \varepsilon}x^Tb= \inf_{\lambda > 0}\lambda\varepsilon + h(\lambda) = \inf_{\lambda > 0} \lambda\varepsilon + \sum_{j=1}^m (b_j-\lambda)_+ + a_j\text{sign}(b_j)\min(|b_j|,\lambda). $$

Further if $b \in \{\pm 1\}$, then

$$ h(\lambda) = \begin{cases}(a^Tb - n_+)\lambda + n_+,&\mbox{ if }0 \le \lambda \le 1,\\a^Tb,&\mbox{ else,}\end{cases} $$ where $n_+$ is the number of indices for which $b_j = 1$. A simple computation then gives

$$ \begin{split} \max_{x \in [0, 1]^m,\;\|x-a\|_1\le \varepsilon}x^Tb &= \inf_{\lambda \ge 0}\varepsilon\lambda + h(\lambda) = \inf_{\lambda \ge 0}\begin{cases}(\varepsilon+a^Tb - n_+)\lambda + n_+,&\mbox{ if }0 \le \lambda \le 1,\\a^Tb+\varepsilon\lambda,&\mbox{ else.}\end{cases}\\ &= \begin{cases}n_+,&\mbox{ if }\varepsilon \ge n_+-a^Tb,\\a^Tb+\varepsilon,&\mbox{ else}\end{cases}\\ &= \min(n_+,a^Tb+\varepsilon), \end{split} $$ an analytic formula.

Part II: Verifying the proposed solution $x^*$

Case 1: $\varepsilon \ge n_+ - a^Tb$

Feasibility. We first check that $x^*=c = ((b_j+1)/2)_{1 \le j \le n}$ is feasible. Indeed, $x^* \in \{0, 1\}^m \subseteq [0, 1]^m$ and

$$ \|x^*-a\|_1 = \sum_{j=1}^m |c_j-a_j| = \sum_{j=1}^m(c_j-a_jb_j) = n_+ - a^Tb \le \varepsilon. $$

Optimality. Indeed, one computes $b^Tx^*=\sum_{j=1}b_jc_j = n_+$.

Case 2: $\varepsilon < n_+ - a^Tb$

This is a bit trickier than the previous case.

Feasibility. We have $$ x^*_j = \begin{cases}a_j + r(1-a_j)=(1-r)a_j + r \in [0, 1],&\mbox{ if }b_j = 1,\\a_j - ra_j = (1-r)a_j \in [0, 1],&\mbox{ if }b_j=-1.\end{cases} $$ Moverover, $$ \begin{split} \|x^*-a\|_1 &= \sum_{j=1}^m|x^*_j-a_j| = r\sum_{b_j=1}(1-a_j) + r\sum_{b_j=-1}a_j=r(\sum_{b_j=1}1-\sum_{j}a_jb_j)\\ &= r(n_+ - a^Tb) = \varepsilon. \end{split} $$

Optimality. A direct computation gives $$ \begin{split} b^Tx^* &= b^Ta + \sum_j b_jx_j^* = a^Tb + r\sum_{b_j=1}(1-a_j)-r\sum_{b_j=-1}a_j = a^Tb + r(n_+ - a^Tb)\\ &= a^Tb + \varepsilon. \end{split} $$