How to minimize a quadratic form with constraint that sum of elements of vector is one?

1.4k Views Asked by At

Problem Statement

I've been trying to solve the following optimization problem

$$ \begin{array}{rl} \min \limits_{\boldsymbol{w}} & \boldsymbol{w}^\top\boldsymbol{C}\boldsymbol{w} \\ \mbox{s.t.} & \sum_{j=1}^K w_j = 1\\ \end{array} $$ Where $\boldsymbol{w}\in \mathbb{R}^K$ and $\boldsymbol{C}\in\mathbb{R}^{K\times K}$ and $\boldsymbol{C}$ is symmetric, that is $\boldsymbol{C} = \boldsymbol{C}^\top$.

My Tentative Solution

Introduce lagrange multiplier $\lambda\in\mathbb{R}$ and construct the Lagrangian using the vector $\boldsymbol{1}=(1, \ldots, 1)^\top \in\mathbb{R}^K$ $$ \mathcal{L}(\boldsymbol{w}, \lambda) = \boldsymbol{w}^\top\boldsymbol{C}\boldsymbol{w} - \lambda(\boldsymbol{1}^\top \boldsymbol{w} - 1) $$ Then take the derivative with respect to $\boldsymbol{w}$

$$ \nabla_{\boldsymbol{w}} \mathcal{L}(\boldsymbol{w}, \lambda) = 2\boldsymbol{C}\boldsymbol{w} - \lambda\boldsymbol{1} $$ However I am not sure how to solve this now..

3

There are 3 best solutions below

0
On

Let $v_1,\dots,v_n$ be a basis for ${\bf R}^n$ consisting of eigenvectors for $C$ (note – I've changed the dimension from $K$ to $n$). Let $\lambda_1,\dots,\lambda_n$ be the corresponding eigenvalues. We may assume $\lambda_1\le\cdots\le\lambda_n$.

Any $w$ in ${\bf R}^n$ can be written as $w=a_1v_1+\cdots+a_nv_n$ for some reals $a_1,\dots,a_n$. Then $w^tCw=\lambda_1a_1^2+\cdots+\lambda_na_n^2\ge\lambda_1(a_1^2+\cdots+a_n^2)$ so for a fixed value of $\|w\|$, $w^tCw$ is minimized by taking $w$ to be the appropriate multiple of $v_1$.

I now see this isn't quite what was asked for, since the problem is to fix $\sum w_i$ and not $\sum w_i^2$, but it does in any event give the general form of $w^tCw$ which should be important in any approach to the problem.

0
On

Construct the $w$ vector from an unconstrained vector $x$ as $$w = \frac{x}{{\tt1}^Tx}$$ then the objective function becomes $$\eqalign{ \lambda &= w^TCw = \frac{x^TCx}{x^T{\tt11}^Tx} = \frac{x^TCx}{x^TBx} \\ }$$ where $B$ is the all-ones matrix and the function is seen to be a generalized Rayleigh quotient.

For convenience define the scalars $$\eqalign{ \gamma &= x^TCx\quad\implies \frac{\partial\gamma}{\partial x} = 2Cx \\ \beta &= x^TBx\quad\implies \frac{\partial\beta}{\partial x} = 2Bx \\ }$$ Then calculate the gradient of the objective function $$\eqalign{ \lambda &= \beta^{-1}\gamma \\ \frac{\partial\lambda}{\partial x} &= \beta^{-1}\frac{\partial\gamma}{\partial x} - \gamma\beta^{-2}\frac{\partial\beta}{\partial x} \\ &= \beta^{-1}\left(\frac{\partial\gamma}{\partial x} - \lambda\frac{\partial\beta}{\partial x}\right) \\ &= 2\beta^{-1}\big(Cx - \lambda Bx\big) \\ }$$ Setting the gradient to zero yields the generalized eigenvalue equation $$Cx = \lambda Bx$$ If any such eigenvalues exist, then the minimum eigenvalue is the minimum of the objective, and the associated eigenvector is the unconstrained solution vector $x$, from which the $w$ solution is recovered as $w = \left(\frac{x}{{\tt1}^Tx}\right)$

NB: The all-ones matrix is singular so $B^{-1}$ does not exist, but if $C^{-1}$ exists then this can be used to convert the equation into a standard eigenvalue equation $$\eqalign{ A = C^{-1}B,\; \alpha=\lambda^{-1} \quad\implies Ax = \alpha x \\ }$$ which is easier to solve. Indeed, unlike the generalized equation, solutions of the standard equation are guaranteed to exist. In this case, the maximal $\alpha$ eigenvalue corresponds to the minimal $\lambda$ eigenvalue.

If the constraint were changed to $\,b^Tw=1\,$ for a given vector $b$, then the above analysis holds with the following modifications $$B=bb^T,\quad w=\frac{x}{b^Tx}$$ Finally, even if the matrix $C$ were not symmetric, it can be decomposed as the sum of a symmetric and anti-symmetric matrix $\big(C=C_s+C_a\big),\,$ where the anti-symmetric matrix contributes nothing to the objective, since $\,x^TC_ax\,$ is zero. So the above analysis holds after substituting $C_s$ for $C$.

0
On

The right answer here is to point out there is no particular reason for a minimum to exist

The argument splits into 3 parts. The first addresses the case where $C$ has at least one negative eigenvalue. The second addresses the nice case where $C$ is positive definite. The final part addresses the case where $C$ is merely positive semi-definite.

1.)

Consider the centering projector
$S := I - \frac{1}{\mathbf 1^T \mathbf 1}\mathbf 1\mathbf 1^T$
lemma:
$\big(SCS\big) \not\succeq \mathbf 0\longrightarrow \text{ there is no minimum}$

proof:
via Gramm Schmidt, decompose any $\mathbf w$, into $\mathbf w = \alpha\mathbf 1 + \beta \mathbf v$ where $\alpha = \frac{1}{K}$ and $\mathbf v\perp \mathbf 1$ and $\mathbf v$ has unit length (2 norm).

To repeat the main idea: if $\big(SCS\big)$ has negative eigenvalues then this creates a problem.

As an aside this can also be interpreted in terms of Cauchy Eigenvalue Interlacing. That is the criteria implies that $C$ is problematic if it has more than one negative eigenvalue (i.e. $C$'s signature must indicate that it has at least K-1 eigenvalues that are non-negative). The following analysis also implies $C$ is problematic if the eigenvector associated with the sole negative eigenvalues is orthogonal to $\mathbf 1$. This result further implies that for non-problematic $C$ with just one negative eigenvalue, we get the minimal value with $\mathbf w:= \text{eigenvector associated with negative eigenvalue}$ and rescale as such that $\mathbf 1^T \mathbf w = 1$.

returning to the main argument:

$f\big(\mathbf w\big) = \mathbf w^T C \mathbf w = \beta^2 \cdot \mathbf v^T C \mathbf v + \beta \cdot \alpha\cdot\text{trace}\Big(C\big(\mathbf {1v}^T + \mathbf {v1}^T \big)\Big)+ \alpha^2 \cdot \mathbf 1^T C \mathbf 1 $

The main point is, for any fixed choice of $\mathbf v$ the above reads
$f\big(\mathbf w\big) = \beta^2 \cdot \mathbf v^T C \mathbf v + \beta \cdot \text{constant}+ \text{constant}$

If $\big(SCS\big) \not\succeq \mathbf 0$ this should be interpreted as having made a wise choice of $\mathbf v$ and
$\mathbf v^T \big(SCS\big) \mathbf v =\mathbf v^T C \mathbf v \lt 0$
Thus for any $r\gt 0$ we have $f\big(\mathbf w\big) \lt -r$, for large enough $\beta$, which proves no minimum exists when $\big(SCS\big) \not\succeq \mathbf 0$.

This argument is analogous to why the image of a single variable real polynomial of degree 2 looks like 'an upside down cup' when the leading coefficient is negative. Such a polynomial doesn't have a global minimum and neither does OP's problem.

2.) At the other extreme: if $C \succ \mathbf 0$ then there is a particularly simple solution.
$1 = 1^2 = \Big(\mathbf 1^T \mathbf w\Big)^2 = \Big(\big(\mathbf 1^T C^\frac{-1}{2}\big)\big(C^\frac{1}{2}\mathbf w\big)\Big)^2 = \Big(\big( C^\frac{-1}{2}\mathbf 1\big)^T \big(C^\frac{1}{2}\mathbf w\big)\Big)^2\leq \big(\mathbf 1^T C^{-1}\mathbf 1\big)\big(\mathbf w^T C \mathbf w\big)$
or
$\frac{1}{\mathbf 1^T C^{-1}\mathbf 1} \leq \big(\mathbf w^T C \mathbf w\big)$
by Cauchy-Schwarz, with equality iff
$C^\frac{1}{2}\mathbf w = \gamma \cdot C^\frac{-1}{2}\mathbf 1$
or
$\mathbf w = \gamma \cdot C^{-1}\mathbf 1$
and of course we set $\mathbf w$ equal to this and select $\gamma$ by the boundary condition that $\mathbf 1^T \mathbf w = 1$, i.e. this implies
$1 = \mathbf 1^T \mathbf w = \gamma \cdot \mathbf 1^T C^{-1}\mathbf 1 \longrightarrow \gamma := \frac{1}{\mathbf 1^T C^{-1}\mathbf 1}$

3.) In the case of singular $C \succeq \mathbf 0$
while the solution will not in general be unique, it is straightforward to construct a solution that achieves the global minimum.

first orthogonally diagonalize
$C = Q \Lambda Q^T = \sum_{i=1}^K \lambda_i \mathbf q_i\mathbf q_i^T$
and using the fact that C's eigenvectors form a basis write $\mathbf 1 = \sum_{i=1}^K x_i \mathbf q_i$

This splits into two cases:
(i)
$x_i \neq 0$ for some eigenpair $(\lambda_i =0, \mathbf q_i)$
in other words $\mathbf 1$ is a linear combination of vectors and at least one of these has nonzero weight and is in the kernel of $C$.
solution
set $\mathbf w := \gamma \mathbf q_i$ where $\gamma$ is chosen such that $\mathbf 1^T \mathbf w = 1$

This gives the minimum possible quadratic form payoff of $\mathbf w^T C \mathbf w = 0$ and satisfies $\mathbf 1^T \mathbf w = 1$

(ii)
$\mathbf 1$ is written as a linear combination of $C$'s eigenvectors all of which are (only) in the image of $C$ (equivalently: none of which are in the kernel of $C$).

solution:
observing that any component of $\mathbf w$ that is in the kernel of $\mathbf C$ neither effects the quadratic form nor the weight constraint, we can restrict ourself to only those vectors in the image of C

abusing notation
$C^{-1}:= Q \Lambda^{+} Q^T$
where $\Lambda^{+}$ is a diagonal matrix with diagonal entries $\frac{1}{\lambda_i}$ if $\lambda_i \neq 0$ and $=0$ otherwise.

Now re-run the Cauchy-Schwarz argument in 2.) $1 = 1^2 = \Big(\mathbf 1^T \mathbf w\Big)^2 = \Big(\big(\mathbf 1^T C^\frac{-1}{2}\big)\big(C^\frac{1}{2}\mathbf w\big)\Big)^2 = \Big(\big( C^\frac{-1}{2}\mathbf 1\big)^T \big(C^\frac{1}{2}\mathbf w\big)\Big)^2\leq \big(\mathbf 1^T C^{-1}\mathbf 1\big)\big(\mathbf w^T C \mathbf w\big)$

to be sure, notice
$P:=C^\frac{-1}{2}C^\frac{1}{2} \neq I$ where $P^2 = P$
but $\mathbf 1^T P = \mathbf 1^T$ so the above result holds

and again the minimum is achieved when
$\mathbf w = \big(\frac{1}{\mathbf 1^T C^{-1}\mathbf 1}\big)\cdot C^{-1}\mathbf 1$