Sparse $\ell_0$ solution to constrained quadratic program

464 Views Asked by At

Given $a_i > 0$,

$$ \begin{array}{ll} \underset{x}{\text{minimize}} & \sum\limits_i a_i x_i^2 \\ \text{subject to} & \sum\limits_i x_i = 1 \\ & x_i \ge 0 \\ & \|x\|_0 < T \end{array} $$

Obviously, if i remove the constraints, the minimum to the above would be at $x=0$. Also, it it possible to solve the above using constrained quadratic programming. By $\|x\|_0 < T$, I mean that the cardinality $\operatorname{card} (x)$ to be lower than $T$ and I like to choose $x_i$ so that it leads to smallest $f(x)$ which satisfies the above constraints.

Experimental Answer:

A: I noticed if i sort $x$ according to the values $a_i$ in the ascending order, then the first $T$ choice of $x$ would be the optimal choice when we want to have $\operatorname{card} (x)=T$. For example, imagine $$f(x) = 2x_1^2+x_2^2+3x_3^2+0.5x_4^2$$ and we want to choose only 2 elements of $x$, then the optimal set is $\{x_4,x_2\}$ and we can solve the constrained optimization for $g(x)=x_2^2+0.5x_4^2$ and having rest of $x$ equal to zero.

B: Also, I noticed the optimal answer is in reverse relation to the ratio of $a_i$ elements, meaning that $x_i/x_j=a_j/a_i$ which will lead to a closed form solution.

Question: Do you have any mathematical way to support A and B in the above?

2

There are 2 best solutions below

6
On BEST ANSWER

As for A, I think the statement is trivial.

Given the answer to $A$, the answer to $B$ is found with the KKT conditions for the problem. Consider $$\min_x \{ \sum a_i x_i^2 : \sum x_i \geq 1, x_i \geq 0 \}$$ where the index $i$ is restricted to the $T$ elements from A. I replaced the equality with an inequality since that simplifies the solution ($\lambda \geq 0$) but clearly does not change the optimal solution. The Lagrangian is: $$L(x) = \sum a_i x_i^2 - \lambda(\sum x_i - 1) - \sum \mu_i x_i$$ The KKT conditions are therefore: $$2a_i x_i - \lambda - \mu_i = 0 \; \forall i, \mu_i x_i = 0 \; \forall i, x \geq 0, \lambda \geq 0, \mu \geq 0, \sum x_i = 1$$ Suppose that for some $i$, $x_i=0$, then $\lambda+\mu_i = 0$, so $\lambda=0$. Then, for all other $i$, $2a_i x_i = \mu_i$, and since $x_i \mu_i = 0$, $x_i = 0$ for all $I$, contradicting $\sum x_i - 1$. So, $x_i > 0$ for all $i$, and $\mu_i = 0$.

The first condition now states that $2a_i x_i$ is constant for all $i$, so $2a_i x_i = 2a_j x_j$, leading to $x_i / x_j = a_j / a_i$.

2
On

I'm going to offer my comment as an answer because, frankly, I'm proud to have thought of it this early in the morning ;-)

It turns out that the nonnegativity constraints on $x$ are unnecessary. Suppose that we have an $x$ with one or more negative values, but $\sum_j x_j=1$. Then, since $|x_i|>x_i$ when $x_i<0$, we have $$\sum_j |x_j| > \sum_j x_j = 1.$$ Now consider the vector $y_i = |x_i| / \sum_j |x_j|$. By construction, $\sum_i y_i = 1$, so it too is a feasible solution to the problem. And $$\sum_i a_i y_i^2 = \sum_i a_i \frac{x_i^2}{(\sum_j |x_j|)^2} = \frac{\sum_i a_i x_i^2}{(\sum_j |x_j|)^2} < \sum_i a_i x_i^2.$$ So $y$ has a smaller objective value that $x$, which means $x$ cannot be optimal. That doesn't mean $y$ is optimal, mind you; it just means that $x$ cannot be.

So now we have established that the problem is equivalent to \begin{array}{ll} \text{minimize} & \sum_i a_i x_i^2 \\ \text{subject to} & \sum_i x_i = 1 \end{array} and you can proceed with LinAlg's answer by setting all $\mu_i$ to zero.