How to optimize a non-linear least squares energy with respect to the non-zeros of a sparse matrix?

67 Views Asked by At

I have an energy I'd like to minimize of the form:

$E(G) = \|\underbrace{X - Y G^T L G B}_{f(G)}\|_F^2$

where $X,Y,B$ are dense matrices and $G,L$ are sparse matrices ($G^TLG$ is also sparse), $\|M\|_F^2$ computes the squared Frobenius norm of the matrix $M$.

I'd like to compute quantities such as $\partial f/\partial G$ (or $\partial^2 E/\partial G^2$) to use Gauss-Newton's (or Newton's) method, but I only care about changes to $G$ that maintain its sparsity pattern. That is, suppose $G$ is a function of the non-zero values collected in a vector $g$ via, say, $G = \mathop{sparse}(i,j,g)$ where $i,j$ are lists of subscript indices corresponding to values in $g$. Then I'd really like to compute $\partial f/\partial g$.

Eventually I'm programming this and would like to avoid computing $\partial f/\partial G$ as a dense matrix/tensor.

Is there's a nice (reduced) expression for $\partial f/\partial g$?

If not, what's the best way to use automatic differentiation to conduct this optimization? I'm having trouble formulating this in the right way to use the libraries I found.

1

There are 1 best solutions below

5
On

Consider the normal vectorization operation applied to the sparse matrix $G$. Now define a projection matrix $P$ which omits the zero elements to recover what you've denoted as the $g$-vector. So we have $$\eqalign{ g_{vec} &= {\rm vec}(G) = P^Tg \cr g &= P\,g_{vec} = PP^Tg \cr I &= PP^T \cr }$$ As a concrete example of the projection matrix consider $$\eqalign{ G &= \pmatrix{1&4\cr 0&0},\quad g_{vec}=\pmatrix{1\cr 0\cr 4\cr 0},\quad g=\pmatrix{1\cr 4} \cr P &= \pmatrix{1&0&0&0\cr0&0&1&0},\quad P^T=\pmatrix{1&0\cr 0&0\cr 0&1\cr 0&0} \cr }$$Define the matrix $$F = (YG^TLGB - X) = -f$$ Write the energy in terms of $F$ and find its differential in terms of the differential $dg$. $$\eqalign{ {\mathcal E} &= \|F\|^2_F = {\rm Tr}(F^TF) = F:F {\quad\rm \Big(Frobenius\,product\Big)} \cr d{\mathcal E} &= 2F:dF \cr &= 2F:\big(Y\,dG^T\,LGB+YG^TL\,dG\,B\big) \cr &= \big(2LGBF^TY + 2L^TGY^TFB^T\big):dG \cr &= 2{\,\rm vec}\big(2LGBF^TY + 2L^TGY^TFB^T\big):{\rm vec}(dG) \cr &= 2\Big((Y^TFB^T\otimes L)\,g_{vec} + (BF^TY\otimes L^T)\,g_{vec}\Big):dg_{vec} \cr &= 2\Big(Y^TFB^T\otimes L + BF^TY\otimes L^T\Big)P^Tg:P^T\,dg \cr &= 2P\Big(Y^TFB^T\otimes L + BF^TY\otimes L^T\Big)P^Tg:dg \cr }$$ So the gradient is $$\eqalign{ \frac{\partial {\mathcal E}}{\partial g} &= 2P\Big(Y^TFB^T\otimes L+BF^TY\otimes L^T\Big)P^Tg \cr }$$ Finding the Hessian is not worth the additional effort. Just use a gradient-based method,
e.g. Polak-Ribiere, Barzilai-Borwein, etc.