Minimum of a quadratic restricted to a hyperplane

165 Views Asked by At

I am trying to solve the following optimization problem $$ \begin{cases} \min f(x_1, \dots, x_n) = \sum_{i = 1}^na_ix_i^2, \quad a_i \ge 0 \\[0.4em] \!\text{s.t.}: \\ \begin{cases} x_i \ge 0, & i = 1\dots n \\[0.45em] \sum_{i = 1}^n x_i = 1, & i = 1\dots n \end{cases} \end{cases} $$

In order to do so I have tried to use the Karush-Kuhn-Tucker conditions of first order, by defining first the functions $$ g_k(x_1, \dots, x_n) = -x_k, \quad k = 1\dots n $$ and $$ h(x_1, \dots, x_n) = \sum_{i = 1}^n x_i - 1, $$ and their corresponding Lagrangian $$ \mathcal{L}(x, \mu, \lambda) = f + \mu^Tg + \lambda h. $$

After that, I have imposed the conditions

  1. Stationary: $$ \nabla\mathcal{L} = \nabla f + \mu^T Dg + \lambda Dh = 0 $$
  2. Dual feasibility: $$ \mu_i \ge 0, \quad i = 1\dots n $$
  3. Complementary slackness: $$ \sum_{i = 1}^n \mu_i g(x_i) = 0 $$

But without much success. I get the equations $$ \begin{cases} 2 a_i x_i - \mu_i + \lambda = 0, & i = 1 \dots n \\[0.4em] \mu_i \ge 0, & i = 1\dots n \\[0.4em] \sum_{i = 1}^n \mu_i x_i = 0 \end{cases} $$

The only idea so far that has yield something interesting is multiplying the first equation by $x_i$ and summing it for all $i = 1 \dots n$, which yields $$ 2 \sum_{i = 1}^n a_i x_i^2 - \sum_{i = 1}^n{\mu_i x_i} + \sum_{i = 1}^n \lambda x_i = 0, $$ and can be futher simplified to $$ 2 \sum_{i = 1}^n a_i x_i^2 + \lambda = 0, $$ that is $$ \lambda = -2f. $$

My intuition is that this problem should indeed have a minimum and that I should be able to find it using the KKT conditions. I think about the 2D case and I can imagine a paraboloid (since $a_i \ge 0$) being intersected by a plane ($\sum_{i = 1}^nx_i = 1$), which gives a parabola as a result and has a minimum.

However, I have been stuck with these equations without getting anywhere. Am I missing something, or am I just being bad with the algebra?

2

There are 2 best solutions below

4
On BEST ANSWER

When an optimization problem satisfies a constraint qualification, then the KKT conditions become necessary conditions that an optimum must satisfy. If the problem is, in addition, convex, then the KKT conditions become sufficient, too. The problem \begin{align} \min_{x\in\mathbb{R}^n}\quad&\frac{1}{2}x^\top D x\\ &-x\leq0\\ &\mathbf{1}^\top x=1 \end{align} is convex. If we can guess the form of a solution, we can use the KKT conditions to verify it.

Intuitively, we see that if $D$ is diagonal with $D_{ii}\geq0$ and that if there is some $i$ such that $D_{ii}=0$, then a solution should be of the form $x^*:=(0,\ldots,0,1,0,\ldots,0)$ where $x^*_{i^*}=1$ for $i^*={\mathrm{arg}\min_i} \; D_{ii}$ and 0 otherwise.

In this case, we can confirm that $x^*$ satisfies the KKT conditions if we can find dual variables $\mu\geq0\in\mathbb{R}^n$ and $\lambda\in\mathbb{R}$ such that \begin{align} \lambda\cdot(\mathbf{1}^\top x-1)=0\\ \forall i: \mu_i\cdot x_i=0. \end{align} This is readily done for the structure of $x^*$.

On the other hand, if $D_{ii}>0$ for every $i$, then we can go through the full KKT process systematically as you've done. This will involve checking the possible cases of $\lambda=0,\lambda\neq0,\mu\equiv0,\mu\not\equiv0$.

Consider $D\succ0$. Then for $\mu\geq0\in\mathbb{R}^n$ and $\lambda\in\mathbb{R}$, the Lagrangian is $$ L(x;\mu,\lambda) := \tfrac12 x^\top Dx + \mu^\top (-x) + \lambda\cdot(1-\mathbf{1}^\top x) $$ and the KKT conditions are \begin{align} \mu&\geq0\\ \mathbf{1}^\top x&=1, \; x\geq0\\ \nabla_x L(x;\mu,\lambda)&=0=Dx - \mu - \lambda\mathbf{1} \iff x = D^{-1}(\mu + \lambda\mathbf{1}) \\ \forall i : \mu_i\cdot x_i&=0\\ \lambda\cdot(1-\mathbf{1}^\top x)&=0. \end{align}

Consider first two cases for the multiplier on the equiality constraint: (1) $\lambda=0$ and (2) $\lambda\neq0$. In (1), by complementarity we have $\mu^\top x=0=\mu^\top D^{-1}\mu=0$ which implies that $\mu\equiv0$. But by stationarity of the Lagrangian, this means that $x\equiv0$, which does not satisfy primal feasibility. Hence we must have $\lambda\neq0$ i.e., (2).

In (2), primal feasibility gives $\mathbf{1}^\top x = \mathbf{1}^\top D^{-1}(\mu+\lambda\mathbf{1})=0$ which implies that $$ \lambda = \frac{1-\mathbf{1}^\top D^{-1}\mu}{\mathbf{1}^\top D^{-1}\mathbf{1}}. $$

Now consider the remaining two cases: (2i) $\mu\not\equiv0$ and (2ii) $\mu\equiv0$. In (2i), we have that there is some $i^*$ such that $\mu_{i^*}>0$. This implies that its complementary variable $x_{i^*}=0$. But by stationarity of the Lagrangian, we must have that \begin{align} x_{i^*} = 0 = D_{i^*i^*}^{-1}(\mu_{i^*} + \lambda) = 0. \end{align} But $D_{i^*i^*}>0$ and $\lambda>0$, so this is impossible. Thus we must have (2ii).

In (2ii), we have $\mu\equiv0$ so that \begin{align} \lambda = \frac{1}{\mathbf{1}^\top D^{-1}\mathbf{1}}. \end{align} and we then can recover $$ x^* = \frac{1}{\mathbf{1}^\top D^{-1}\mathbf{1}} D^{-1}\mathbf{1}. $$

3
On

I was able to find a nicer proof which does not rely on the Karush-Kuhn-Tucker conditions, but rather on Cauchy-Schwarz inequality. For the case in which $a_i > 0$, notice that \begin{equation} 1 = \left(\sum_{i = 1}^nx_i\right)^2 = \left(\sum_{i = 1}^n\frac{1}{\sqrt{a_i}}\sqrt{a_i}x_i\right)^2 \le \left(\sum_{i = 1}^n\frac{1}{a_i}\right)\left(\sum_{i = 1}^na_ix_i^2\right) = \left(\sum_{i = 1}^n\frac{1}{a_i}\right)f(x_1, \dots x_n) \end{equation}

With the equality holding if and only if \begin{equation} \left(\sqrt{a_1}x_1, \dots, \sqrt{a_n}x_n\right) = \alpha\left(\frac{1}{\sqrt{a_1}}, \dots, \frac{1}{\sqrt{a_n}}\right) \end{equation} for a given $\alpha \in \mathbb{R}$, that is, \begin{equation} \mathbf{x} = \alpha\left(\frac{1}{a_1}, \dots, \frac{1}{a_n}\right). \end{equation} Since $\sum_{i = 1}^nx_i = 1$, this $\alpha$ is \begin{equation} \alpha = \frac{1}{\sum_{i = 1}^n\frac{1}{a_i}}. \end{equation}

Thus, there exists only one minimum and it is given by \begin{equation} x^* = \frac{1}{\sum_{i = 1}^n\frac{1}{a_i}}\left(\frac{1}{a_1}, \dots, \frac{1}{a_n}\right), \quad\quad f(x_*) = \frac{1}{\sum_{i = 1}^n\frac{1}{a_i}}. \end{equation}

Let us now consider the case for which $\exists a_i = 0$. For convenience, we define the set \begin{equation} I_0 = \left\{i \mid a_i = 0\right\}. \end{equation} In this case, the set of solutions is given by \begin{equation} \mathcal{S}^* = \left\{x^* \,\,\middle\vert\,\, \sum_{i \in I_0}x^*_i = 1,\,\,\, x^*_i = 0,\, i \not\in I_0,\,\,\, x^*_i \ge 0,\, i \in I_0\right\}. \end{equation} It is clear that $0$ is a lower bound for $f$, and it is also clear that $f(x^*)$ is $0$ for all $x^* \in \mathcal{S}^*$. Thus, all the points in $\mathcal{S}^*$ minimize $f$. On the other hand, it is easy to see that the rest of the points in the domain of the problem which are not in $\mathcal{S}^*$, that is those $x$ for which $\exists i \not\in I_0$ such that $x_i > 0$, do not minimize $f$, as that means that $0 < a_ix_i^2 \le f(x)$.