Can I show a matrix equation has a unique minima by showing it is convex?

62 Views Asked by At

Consider the following equation $$ \mathbf{P}(\mathbf{A}\circ\mathbf{A})= \hat{\mathbf{P}} $$

where $\mathbf{P}$, $\mathbf{A}$ and $\hat{\mathbf{P}}$ are all square matrices with positive elements. I want to solve for $\mathbf{A}$. Due to the Schur product in there I don't think I can do so analytically (though if I can that would be great and would resolve all of my problems). The Schur product is there because the matrix $\mathbf{A}\circ\mathbf{A} = \tilde{\mathbf{A}}$ needs to have all positive elements. Given these constraints, I've decided to try and find $\mathbf{A}$ by gradient descent, thereby minimising the Frobenius norm of the difference between the LHS and RHS of the equation. $$ \frac{d}{d\mathbf{A}}\|\mathbf{P}(\mathbf{A}\circ\mathbf{A}) - \hat{\mathbf{P}}\|_{F} = 0 $$

I would like to know if its possible to show that this optimisation has a unique minima. It seems to me that would be possible by showing the function is convex everywhere, but then in the context of the matrix equation I'm not convinced I know how to do that. I would have thought that showing the function is convex in each individual element of $\mathbf{A}$ would do the trick?

Otherwise, I believe the typical argument for a function being convex if its 2nd derivative is +ve, but in this case would I have to show that the 2nd derivative is positive in all its elements? Or something like the 2nd derivative is positive definite?

2

There are 2 best solutions below

2
On

Let's define $F(A)=P(A\odot A)-\hat{P}$. Using http://www.matrixcalculus.org, we can see that:

$$\frac{\partial}{\partial A}\|F(A)\|_2 = 2\frac{P^TF(A)\odot A}{\|F(A)\|_2}$$

This is what you need to solve. If you can ensure that no element of $A$ is zero then it simplifies to $P^TF(A)=0$ or $F(A)=0$ if $P$ is inversible, which is not really helpful. However, it provides you with the gradient and a gradient descent might be doable.

5
On

Define the matrix variables $$\eqalign{ X &= A\odot A \\ Y &= \hat P \\ M &= I\otimes P \\ }$$ where $I,P,X,Y\in {\mathbb R}^{n\times n}\;$ and $\;(\odot,\otimes)$ denote the Hadamard and Kronecker products respectively.

Now vectorize the equation $$\eqalign{ \def\o{{\tt1}} \def\vc{\operatorname{vec}} \vc(Y) &= (I\otimes P)\,\vc(X) \\ y &= M\,x \\ }$$ This can be formulated as a Non-Negative Least-Squares (NNLS) problem $$\eqalign{ &\min_x &\|Mx-y\|^2_2 \\ &{\rm s.t.}\;\;&x_k\ge 0 \\ }$$ which can be solved by a variety of methods, e.g. interior point method or projected subgradients.

One extremely simple method is the sequential coordinate-wise algorithm.

After initialization $$\eqalign{ x&= 0, \quad w &= -M^Ty \\ }$$ an inner iteration runs through all components $(k=\o:n^2)$ of the $x$ vector $$\eqalign{ \def\a{\beta} \def\e{\varepsilon} q &= M^TM\e_k \\ \a &= \min\!\left(x_k,\; \frac{w_k}{q_k}\right) \\ x &= x - \a\e_k \\ w &= w - \a q \\ }$$ where $\e_k$ is the $k^{th}$ basis vector. This iteration is repeated until convergence is obtained.

Afterwards the $A$ matrix can be recovered using an elementwise square root $$\eqalign{ A &= \operatorname{reshape}\big({\sqrt x},\, n,n\big) \\ \\ }$$

Update

The algorithm can also be applied to the matrix form of the problem $$\eqalign{ &\min_X &\|PX-Y\|^2_F \\ &{\rm s.t.}\;\;&X_{ij}\ge 0 \\ }$$ Initialize $$\eqalign{ X = 0,\quad W = -P^TY \\ }$$ and iterate over all components $\,(i=\o:n,\;\;j=\o:n)\,$ of the $X$ matrix $$\eqalign{ Q &= P^TP\,\e_i\e_j^T \\ \a &= \min\!\left(X_{ij},\; \frac{W_{ij}}{Q_{ij}}\right) \\ X &= X - \a\,\e_i\e_j^T \\ W &= W - \a Q \\ }$$ until convergence. Then extract the elementwise square root $$A=\sqrt X$$