Best rank-$1$ approximation of matrix with condition.

240 Views Asked by At

Let $A\in\mathbb R^{m\times n}$ be a real matrix. For any $x,y\in\mathbb R^m$, we write $x\leq y$ if $x_i\leq y_i$ for $i=1,\dots,m$. For any matrix $B$, $\| B \|_F$ is the Frobenius norm and is defined by $\| B \|_F^2 := \operatorname{Tr}\left(B^T B\right)$, recall that the trace has the cyclic property $\operatorname{Tr}(AB) = \operatorname{Tr}(BA)$ and that the trace of a $1 \times 1$ matrix is the only value in that matrix.

I am trying to find a numerical method to solve the following optimization problem

\begin{align*} \min_{\substack{u\in\mathbb R^m,v\in\mathbb R^n\\ 0\leq Av}} \left\| A - u v^T \right\|_F^2 \end{align*}

For practical purpose, we have $m\ll n$, for instance $m$ would be of the order of hundreds and $n$ would typically be more than millions. I am expecting that, similarly to the PCA iterative computation procedure, we could get a runtime of order $O(m^2n)$. To a lesser extent I am also interested in anything related to the case $n<m$.

This is trying to find the best rank-$1$ approximation of $A$ with some constraint on the rank-$1$ approximation. This problem is not convex but is biconvex on a convex subset of $\mathbb R^m\times \mathbb R^n$, therefore my first idea was to alternate between optimization of $u$ and $v$. Observe that we can write

\begin{align*} \| A - uv^T \|_F^2 &= \|A\|_F^2 + \mathrm{Tr} (vu^Tuv^T) - 2 \mathrm{Tr}(v u^T A)\\ &= \|A\|_F^2 + v^Tvu^Tu - 2 u^T A v\\ &= \| A \|_F^2+\|u\|^2\cdot \|v\|^2 - 2 u^T A v \end{align*}

Differentiating w.r.t. $u$ gives $2\|v\|^2u-2Av$ and w.r.t. $v$ gives $2\|u\|^2 v - 2 A^T u$. For $v$ we need to compute the Lagragian and the KKT conditions (which are sufficient for a fixed $u$ but not in general), they are (with $\mu\in\mathbb R^m$ being the vector of Lagrange multipliers : \begin{cases} 2\|u\|^2 v - 2 A^T u-A^T\mu=0\\ 0\leq Av\\ 0\leq \mu\\ \mu^TAv = 0 \end{cases}

So it is necessary that we have $\| v \|^2u=Av$ and the previous conditions. I am not able to solve the system of KKT conditions for $v$, but solving it's Wolfe dual problem might be easier, it is given by \begin{align*} \max_{v,\mu} \| u \|^2\cdot \| v\|^2-2 u^TAv-\mu^TAv \end{align*} under the constraint $2\| u\|^2\cdot v-2A^Tu-A^T\mu=0$ and $\mu\geq 0$, solving for $v$ and then replacing in the optimization expression gives after some simplifications \begin{align*} \max_{\mu\geq 0} -(2u+\mu)^TM(2u+\mu) \end{align*}

Therefore solving for $\mu$ is equivalent to solving $\min_{\mu\geq 0} (2u+\mu)M(2u+\mu)$. I made a second post about this here because it is much simpler and this post is begining to be dirty, I will clean up everything when a solution is found.

Let $A^\dagger\in\mathbb R^{n\times m}$ be the Moore-Penrose pseudoinverse of $A$, in particular $A^\dagger A$ and $A A^\dagger$ are the respective projections onto the span of $A^T$ and $A$.

Here are some facts I could obtain :


Fact 1: In the solution $u,v$ to the original problem, $uv^T = 0$ if and only if $$\operatorname{span}(A) \cap [0,\infty[^m = \{ 0\}$$

Proof : If $uv^T\neq0$, then by the KKT conditions $0\leq u=\frac{Av}{\|\ v\|^2}\in\mathrm{span}(A)\cap [0,\infty[^m$, but $u\neq 0$.

If $0\neq a\in \mathrm{span}(A)\cap[0,\infty[^m$, then $b=A^\dagger a \neq 0$ and \begin{align*} \left\| A-\frac{ab^T}{\|b\|^2} \right\|_F^2 &= \| A \|_F^2+\frac{\|a\|^2\cdot \|b\|^2}{\| b\|^4} - 2 \frac{a^T A A^\dagger a}{\| b\|^2} \\ &=\| A \|_F^2+\frac{\|a\|^2}{\| b\|^2} - 2 \frac{\| a\|^2}{\| b\|^2} \\ &=\| A \|_F^2 - \frac{\| a\|^2}{\| b\|^2} \\ &<\| A \|_F^2 \end{align*} Therefore $uv^T\neq 0$.


Fact 2: If $uv^T\neq 0$ then $u\in \mathrm{span}(A)$ and $v\in\mathrm{span}(A^T)$, furthermore $u=A\frac{v}{\| v\|^2}$ and $v = A^T\frac{2u+\mu}{2\|u\|^2}$.

Proof : Trivial from KKT conditions.


Fact 3: If $a$, $\sigma$, $b$ is a singular triple of $A$ with $0\leq a$, then $u=a$ and $v=\sigma b$ satisfies all KKT conditions.

Proof : $2\| v\|^2u-2Av = 2\sigma^2 a- 2 \sigma^2a=0$, select $\mu=0$ to get $2\|u\|^2 v - 2A^Tu=2\sigma b-2\sigma b=0$, $0\leq \sigma^2 u = Av$.

2

There are 2 best solutions below

0
On

You could try using https://www.geno-project.org with the query

parameters 
    matrix A
variables
    vector u, v
min
    norm2(A - u * v')^2
st
    A*v >= vector(0)

Their library https://github.com/slaue/genosolver/ automatically generates a GPU capable solver based on L-BFGS-B.


2
On

First of all, $n$ is almost totally irrelevant. You have $m$ vectors $a_i$ (rows of $A$) and you are searching for a vector $v$ that has non-negative scalar products with all $a_j$ and maximizes the sum of squares of the projections of $a_j$ to the direction of $v$. Clearly, such $v$ is in the linear span of $a_i$, so only the Gram matrix of $a_i$ matters for finding the coefficients of the appropriate linear combination. Finding it takes time $m^2n$, indeed, but from there on, you work in $\mathbb R^m$, so the life becomes much easier.

Second, consider the simplest non-trivial case when you have 2 unit vectors in the upper half-plane in $\mathbb R^2$ making the angles $\alpha$ and $\pi-\alpha$ with the positive $x$-semiaxis for small $\alpha$. Then the admissible directions $v$ will be those making the angle $\theta$, $|\theta|<\alpha$ with the vertical semiaxis and the largest sum of the squares of scalar products will be at the endpoints of the corresponding arc on the unit circle ($\sin^2 2\alpha+0>2\sin^2\alpha$), so, if you change the length of one of the vector slightly, you'll have two local maxima, one of which will be better than the other. If you use some Lagrange multiplier or descent related technique, you may easily end up in the wrong corner and get stuck there. In higher dimension, your problem is finding the furthest (with respect to the metric given by the quadratic form $\sum_i(a_i,v)^2$) from the origin point in the intersection of the unit ball with the cone given by the inequalities $(a_i,v)\ge 0$. So, you are searching for the maximum of a quadratic form over a fairly irregular convex set with the possibility of having a lot of local maxima that have little to do with the global one. This is an ugly problem in general and the only approach I can offer is to use something like the differential evolution and pray that it would end up at a decently high value.