Are these two optimization problems equivalent to each other?

613 Views Asked by At

Let $\mathbf{x}=[x_1,\ldots,x_K]^T$. For a fixed vector $\mathbf{a}$, I have the following optimization problem : \begin{array}{rl} \min \limits_{\mathbf{x}} & | \mathbf{a}^T \mathbf{x} | \\ \mbox{s.t.} & x_k\ge 0, \forall k \\ & \sum_k x_k=1 \end{array} The second optimization problem is: \begin{array}{rl} \min \limits_{\mathbf{x}} & ( \mathbf{a}^T \mathbf{x})^2\\ \mbox{s.t.} & x_k\ge 0, \forall k \\ & \sum_k x_k=1 \end{array}

My question: are these two problems equivalent to each other? if yes, how to solve them ? and which one is easier to solve ?

3

There are 3 best solutions below

4
On BEST ANSWER

Which is easier? Neither. Both of these models can be solved analytically, and in the exact same way. First, let's knock out the easy cases:

  1. If any $a_i=0$ for some $i$, then the optimal value of either model is clearly $0$, as demonstrated by selecting setting $x$ to be the $i$th unit vector (the vector with $1$ at position $i$ and zeros everywhere else).

  2. If $a_i>0$ and $a_j<0$ for some pair $(i,j)$, then again the optimal value is zero. Just set $$x_i=-a_j/(a_i-a_j), \quad x_j = a_i/(a_i-a_j)$$ and the other elements of $x$ to zero.

  3. The cases that remain are where $a$ is entirely positive or entirely negative. But if $a$ is negative, then substituting $a$ for $-a$ will not change the optimal value or the values of $x$ that achieve this value, thanks to the presence of the absolute value or square.

  4. So we now have one case remaining: when $a$ is a positive vector. But in this case, $a^Tx$ is positive over the simplex, allowing us to drop the absolute value! \begin{array}{ll} \text{minimize} & a^T x \\ \text{subject to} & \mathbf{1}^T x = 1 \\ & x \geq 0 \end{array} We can solve this by inspection: the optimal value is $\min_i a_i$ for the first model, and $\min_i a_i^2$ for the second. If the minimizing $a_i$ is unique, then the unique solution is the $i$th unit vector. But if there are multiple elements of $a$ with the same magnitude, any convex combination of those corresponding unit vectors is a solution.

Putting it all together, the solution is $\min_i |a_i|$ or $\min_i a_i^2$ if $a\succeq 0$ or $a\preceq 0$, and $0$ otherwise. There really was no need for case 1, but it was easy to see.

2
On

Let's go for an explicit analytic construction of the set of all the solutions to your problem.

Basic notation: $e_K := \text{ column vector of }K\text{ }1'$s. $\langle x, y \rangle$ denotes the inner product between two vectors $x$ and $y$. For example $\langle e_K, x\rangle$ simply amounts to summing the components of $x$.

Recall the following definitions, to be used without further explanation.

Preimage: Given abstract sets $X$, $Y$, $Z \subseteq Y$., the preimage of $Z$ under $f$ is defined by $f^{-1}Z := \{x \in X | f(x) \in Z\}$. This has nothing to do with function inversion.

Convex hull: Given a subset $C$ of $\mathbb{R}^K$, its convex hull is defined by $\textit{conv }C :=$ smallest convex subset of $\mathbb{R}^K$ containing $C$.

Indicator function: Given a subset $C$ of $\mathbb{R}^K$ its indicator function $i_C:\mathbb{R}^K \rightarrow (-\infty, +\infty]$ is defined by $i_C(x) = 0$ if $x \in C$; otherwise $i_C(x) = +\infty$.

Simplex: The $K$-dimensional simplex is defined by $\Delta_K := \{x \in \mathbb{R}^K|x\ge 0, \langle e_K, x\rangle = 1\}$. Equivalently, $\Delta_K = \{\text{rows of the }K \times K\text{ identity matrix }I_K\}$. Each row of $I_K$ is a vertex of $\Delta_K$.

Faces of a simplex: Given a subset of indices $I \subseteq \{1,2,...,K\}$, the face of $\Delta_K$ spanned by $I$, denoted $F_K(I)$, is defined to be the convex hull of those rows of $I_k$ indexed by $I$. Trivially, $\Delta_K$ is a face of itself. Geometrically, $F_K(I)$ is a $\#I$-dimensional simplex whose vertices are those vertices of $\Delta_K$ which labelled by $I$.

Let $f:\mathbb{R}^K \rightarrow (-\infty,+\infty]$ be an extended real-value function.

Subdifferential: $\partial f(x) := \{g \in \mathbb{R}^K| f(z) \ge f(s) + \langle g, z - x\rangle, \forall z \in \mathbb{R}^K\}$.

Fenchel-Legengre transform: $f^*(x) := \underset{z \in \mathbb{R}^K}{\text{max }}\langle x, z\rangle - f(z)$.

The optimal value of your objective under the given constraints can be conveniently written as \begin{eqnarray} v = \underset{x \in \mathbb{R}^K}{\text{min }}g(x) + f(Dx), \end{eqnarray} where $D := a^T \in \mathbb{R}^{1 \times K}$, $g := i_{\Delta_K}$ and $f:s \mapsto \frac{1}{2}s^2$. One recognizes the above problem as the primal form of the saddle-point problem \begin{eqnarray*} \underset{x \in \mathbb{R}^K}{\text{min }}\underset{s \in \mathbb{R}}{\text{max }}\langle s, Dx\rangle + g(x) - f^*(s) \end{eqnarray*} Given a dual solution $\hat{s} \in \mathbb{R}$, the entire set of primal solutions $\hat{X}$ is given by (one can check that sufficient conditions that warrant the strong Fenchel duality Theorem) \begin{eqnarray*} \hat{X} = \partial g^*(-D^T\hat{s}) \cap D^{-1}\partial f^*(\hat{s}). \end{eqnarray*} For your problem, one easily computes, $g^*(y) = \underset{x \in \Delta_K}{\text{max }}\langle x, y\rangle$, and so by the Bertsekas-Danskin theorem (see Proposition A.22 of Bertsekas' PhD thesis) for subdifferentials, we have \begin{eqnarray*} \begin{split} \partial g^*(-D^T\hat{s}) &= \partial g^*(-\hat{s}a) = ... \text{( some computations masked )} \\ &= F_K\left(\{1 \le i \le K | \hat{s}a_i\text{ is a minimal component of }\hat{s}a\}\right). \end{split} \end{eqnarray*} Also $\partial f^*(\hat{s}) = \{\hat{s}\}$, and so $D^{-1}\partial f^*(\hat{s}) = \{x \in \mathbb{R}^K|\langle a, x\rangle = \hat{s}\}$. Thus the set of all primal solutions is \begin{eqnarray*} \hat{X} = F_K\left(\{1 \le i \le K | \hat{s}a_i\text{ is a minimal component of }\hat{s}a\}\right) \cap \{x \in \mathbb{R}^K|\langle a, x\rangle = \hat{s}\}. \end{eqnarray*} It remains now to find a dual solution. Define $\alpha := \underset{1 \le i \le K}{\text{min }}a_i \le \beta := \underset{1 \le i \le K}{\text{max }}a_i$. One computes \begin{eqnarray*} \begin{split} v &= \underset{x \in \Delta_K}{\text{min }}\underset{s \in \mathbb{R}}{\text{max }}s\langle a, x\rangle -\frac{1}{2}s^2 = \underset{s, \lambda \in \mathbb{R}}{\text{max }}\underset{x \ge 0}{\text{min }}\langle sa + \lambda e_K, x\rangle - \frac{1}{2}s^2 - \lambda\\ &=\underset{s, \lambda \in \mathbb{R}}{\text{max }}-\frac{1}{2}s^2 - \lambda\text{ subject to }sa + \lambda e_K \ge 0\\ &=-\underset{s, \lambda \in \mathbb{R}}{\text{min }}\frac{1}{2}s^2 + \lambda\text{ subject to }\lambda \ge -\underset{1 \le i \le K}{\text{min }}sa_i\\ &= -\underset{s \in \mathbb{R}}{\text{min }}\frac{1}{2}s^2 -\underset{1 \le i \le K}{\text{min }}sa_i = -\frac{1}{2}\underset{s \in \mathbb{R}}{\text{min }}\begin{cases}(s - \alpha)^2 - \alpha^2, &\mbox{ if }s \ge 0,\\(s - \beta)^2 - \beta^2, &\mbox{ otherwise}\end{cases} \end{split} \end{eqnarray*} Thus \begin{eqnarray*} \hat{s} = \begin{cases}\beta, &\mbox{ if }\beta < 0,\\0, &\mbox{ if }\alpha \le 0 \le \beta,\\\alpha, &\mbox{ if }\alpha > 0.\end{cases} \end{eqnarray*} Therefore the set of all solutions to the original / primal problem is \begin{eqnarray*} \hat{X} = \begin{cases}F_K\left(\{1 \le i \le K | a_i = \beta\}\right), &\mbox{ if }\beta < 0,\\ \Delta_K \cap \{x \in \mathbb{R}^K| \langle a, x\rangle = 0\}, &\mbox{ if } \alpha \le 0 \le \beta,\\ F_K\left(\{1 \le i \le K | a_i = \alpha\}\right), &\mbox{ if }\alpha > 0,\end{cases} \end{eqnarray*}

Now that you have the hammer, find the nails...

  • Case (1): $\beta < 0$. Choose any $k \in \{1,2,...,K\}$ such that $a_k = \beta$, and take $\hat{x} = k$th row of $I_K$.

  • Case (2a): $a_k = 0$ for some $k$. Take $\hat{x} = k$th row of $I_K$.

  • Case (2b): $\alpha < 0 < \beta$, and $a_k \ne 0 \forall k$. Choose $k_1, k_2 \in \{1,2,...,k\}$ such that $a_{k_1} < 0 < a_{k_2}$, and take $\hat{x}_k = \frac{1}{{a_{k_2} - a_{k_1}}}\begin{cases}a_{k_2}, &\mbox{ if }k = k_1,\\-a_{k_1},&\mbox{ if }k = k_2,\\0, &\mbox{ otherwise.}\end{cases}$

  • Case (3): $\alpha > 0$. Choose any $k \in \{1,2,...,K\}$ such that $a_k = \alpha$, and take $\hat{x} = k$th row of $I_K$.

0
On

Your problem can be equivalently modeled to LP with an extra variable, say $\epsilon$ and two more simple linear constraints, $| \mathbf{a}^T \mathbf{x} |\leq \epsilon $. Now your problem can be rewritten as $$\begin{array}{rl} \min \limits_{\mathbf{x}} & \epsilon \\ \mbox{s.t.} & \mathbf{a}^T \mathbf{x} \leq \epsilon \\&- \mathbf{a}^T \mathbf{x} \leq \epsilon \\ & x_k\ge 0, \forall k \\ & \sum_k x_k=1 \end{array}$$

Now you can solve this problem using simplex method or any other method you like!