Is there any references to this optimization problem?

54 Views Asked by At

Suppose that $\mathbf{a}$ is a $N$-dimentional column vector and $\mathbf{a}=[a_1,a_2,...,a_N]^{T}$, here $a_i \ge 0 $ for $i\in [1,N]$ and $\mathbf{W}$ is a symmetry matrix. I want to ask if there is any reference on how to solve the following optimization problems? \begin{equation} \begin{aligned} &\min_{\mathbf{a}}\quad \mathbf{a}^{T}\mathbf{W}\mathbf{a}, \\ &\text{subject to} \quad \mathbf{1}^{T}_{N} \mathbf{a}=1, a_{i} \ge 0. \end{aligned} \end{equation} Here $\mathbf{1}^{T}_{N}$ is a $N$-dimentional column vector with all 1's element, e.g. $\mathbf{1}_{N}=[1,1,...,1]^{T}$. I found a reference [1] where he said that the closed solution to this problem is

\begin{equation} \begin{aligned} \mathbf{a} = \frac{\mathbf{W}^{-1} \mathbf{1}_N}{\mathbf{1}_N^{\mathrm{T}} \mathbf{W}^{-1} \mathbf{1}_N} \end{aligned} \end{equation} However, I'm confused about this, because $\mathbf{W}$ is just a symmetric matrix, $\mathbf{W}^{-1}$ may be not existed, so we don't necessarily have closed form solutions. Is there a better way to solve this problem?

[1] Zhao, Xiaochuan, and Ali H. Sayed. "Clustering via diffusion adaptation over networks." 2012 3rd International Workshop on Cognitive Information Processing (CIP). IEEE, 2012.

===========================Update================================ Thanks the reply for @cdalitz, and you are right. I want to talk about how I think this problem is solved.

From my opinion, I think the solution for this problem is complicated whether $\mathbf{W}$ is invertible or not.

  1. When $\mathbf{W}$ is not invertible, as @cdalitz said, the solution is not unique, I'm not sure if we can solve it in the form of a pseudo-inverse matrix. But, at this point, I think a good method is to use the gradient descent method to solve.
  2. When $\mathbf{W}$ is invertible, it's also very complicated to find the inverse of $\mathbf{W}$. I noticed that the paper[1] also provided an approximate solution by removing the non-diagonal elements of $\mathbf{W}$. I think the reason why he did like that is to make sure that $\mathbf{W}$ is invertible and to avoid solving for the inverse of $\mathbf{W}$.
2

There are 2 best solutions below

1
On BEST ANSWER

If the matrix $W$ has a rank less than $N-1$, the solution will not be unique. Consider e.g. the case $$W=\left(\begin{array}{ccc}0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1\end{array}\right)$$ The solution is any vector $\vec{a}=(a_1,a_2,0)^T$ with $a_1+a_2=1$, which are infinitely many. For rank $N-1$, I cannot immediately construct a similar example, but I guess the solution is ambigous in this case, too.

1
On

Linear algebra tells us there is a basis $\mathscr E = (\mathbf e_1, \mathbf e_2, \ldots, \mathbf e_n)$ in which the matrix of $W,$ in block matrix form, is

$$\operatorname{Mat}_{\mathscr E}(W) = \left(\begin{array}{c|c|c}\mathbf 1_r & \mathbf 0_{r,s} & \mathbf 0_{r, n-r-s}\\ \hline \mathbf 0_{s,r} & -\mathbf 1_s & \mathbf 0_{s,n-r-s}\\ \hline \mathbf 0_{n-r-s,r} & \mathbf 0_{n-r-s,s} & \mathbf 0_{n-r-s} \end{array}\right)$$

This basis can be found through any number of numerical diagonalization algorithms. In this basis, where $\mathbf a = a_1 \mathbf e_1 + a_2 \mathbf e_2 + \cdots + a_n \mathbf e_n,$ the objective function is

$$f(\mathbf a) = \mathbf a^\prime \mathbf W \mathbf a = \sum_{i=1}^r a_i^2 - \sum_{j=1}^s a_{r+j}^2$$

and the constraint $\mathbf 1_n^\prime \mathbf a = 1$ will be expressed in the form

$$\mathbf \lambda^\prime \mathbf a = \lambda_1 a_1 + \lambda_2 a_2 + \cdots + \lambda_n a_n = 1$$

for some coefficients $\lambda_i$ not all of which are zero.

The rest is straightforward, because it is now clear how to minimize $f.$

You don't even need to use a Lagrange multiplier. Instead, consider any arbitrary real numbers $u$ and $v.$ Minimize the sum of the squares of the first $r$ variables subject to the constraint $\lambda_1 a_1 + \cdots + \lambda_r a_r = u.$ (There's a simple formula for this.) Separately, maximize the sum of the squares of the next $s$ variables subject to $\lambda_{r+1} a_{r+1} + \cdots + \lambda_{r+s} a_{r+s} = v.$ (Similar formula.) Provided you can find $a_{r+s+1}, \ldots, a_{n}$ for which $\lambda_{r+s+1}a_{r+s+1} + \cdots + \lambda_n a_n = 1 - u - v,$ you have a potential solution. This reduces your problem to the two variables $(u,v).$ The details depend on the pattern of zero coefficients in $\mathbf \lambda,$ so I won't belabor this point.