Minimizing Frobenius norm involving inverse

84 Views Asked by At

I am looking for methods to solve the following minimization. Let $A\in \mathbb{R}^{n\times k}$, $B\in \mathbb{R}^{n\times m}$, $C\in \mathbb{R}^{m\times m}$ and $E\in \mathbb{R}^{m\times k}$, where $n>m>k$. Additionally, $C$ is positive definite. $$G(d)=||A-B(C+\text{diag}(d))^{-1}E||^2_F$$ Is there any hopes of finding the positive minimizer (is there just 1?) $d\geq0$ of $G$ .

1

There are 1 best solutions below

7
On BEST ANSWER

$ \def\R#1{{\mathbb R}^{#1}} \def\o{{\tt1}} \def\g{\gamma} \def\t{\times} \def\bR#1{\big(#1\big)} \def\BR#1{\Big(#1\Big)} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\mt{\mapsto} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} $Let's use a variable naming convention wherein an upper/lower letter denotes a matrix/vector and a Greek letter denotes a scalar. Under this scheme, the only variable that needs to be renamed is $\,G\to\g$.

Also, to avoid confusion with differential operations, let's rename $d\to x.$

To enforce positivity, construct $x$ from an unconstrained vector $w$ $$\eqalign{ x &= \exp(w) \qiq x \ge 0 \\ dx &= x\odot dw \\ }$$ where $\exp(w)$ is applied elementwise and $\odot$ denotes the elementwise product.

For typing convenience, define the auxiliary variables $$\eqalign{ X &= \Diag x &\qiq dX = \Diag{dx} \\ & &\qiq dx = X\,dw \\ F &= \LR{C+X}^{-\o} &\qiq dF = -F\,dX\,F \\ M &= \LR{BFE-A} &\qiq dM = B\:dF\,E \\ }$$ and the Frobenius $(:)$ product $$\eqalign{ Z,Y&\in\R{m\t n},\quad P\in\R{m\t p},\quad Q\in\R{p\t n} \\ Z:Y &= \trace{Z^TY} \\ Y:Y &= \frob{Y}^2 \\ Z:Y &= Y:Z \;=\; Y^T:Z^T \\ \LR{PQ}:Y &= P:\LR{YQ^T} \;=\; Q:\LR{P^TY} \\ \\ }$$


Write the objective function using the above notation and calculate the gradient $$\eqalign{ \g &= \tfrac12\LR{M:M} \\ d\g &= M:dM \\ &= M:\LR{B\:dF\,E} \\ &= \LR{B^TME^T}:dF \\ &= \LR{B^TME^T}:\LR{-F\,dX\,F} \\ &= -\LR{F^TB^TME^TF^T}:\Diag{X\,dw} \\ &= -X\diag{F^TB^TME^TF^T}:dw \\ \grad{\g}w &= -X\diag{F^TB^TME^TF^T} \;\doteq\; g(w) \\ }$$ Theoretically, you can set $g(w)=0$ and solve for the optimal $w$ vector, but the expression is sufficiently complicated that it's unlikely that an analytic solution can be found.

However, you can use a gradient descent algorithm $$\eqalign{ g_k &= g(w_k) \\ w_{k+\o} &= w_k - \alpha_k\,g_k \\ k &= k + \o \\ }$$ to find a numerical solution for $w$, then calculate $\;x=\exp(w)$

Update

If gradient descent converges too slowly, you can calculate the Hessian and use Newton's method.

Define the variables (using the fact that $F^T=F$) $$\eqalign{ \def\h{\odot} \def\Y{Y^\delta} Y &= \LR{FB^TME^TF} \\ y &= \diag Y \\ \Y &= \Diag y \;\equiv\; \LR{I_m\odot Y} \,\ne\, Y \\ }$$ Use the identity $$ \diag{B\,\Diag{x}\,R} \;\equiv\; \LR{R^T\h B}x $$ to obtain $$\small\eqalign{ dY &= \c{dF}B^TME^TF^T + FB^T\c{dM}E^TF + FB^TME^T\c{dF} \\ &= -\CLR{F\,dX\,F}B^TME^TF - FB^T\CLR{BF\,dX\,FE}E^TF - FB^TME^T\CLR{F\,dX\,F} \\ dy &= \diag{dY} \\ &= -\LR{FEM^TBF\h F}dx - \LR{FEE^TF\h FB^TBF}dx - \LR{F\h FB^TME^TF}dx \\ }$$ and use this to calculate the Hessian $$\small\eqalign{ g &= -Xy \;\equiv\; -\Y x \\ dg &= -X\,dy \;-\; \Y\,dx \\ &= +X\LR{F\h FEM^TBF + FEE^TF\h FB^TBF +F\h FB^TME^TF}dx -\Y\,dx \\ &= X\LR{F\h FEM^TBF+FEE^TF\h FB^TBF +F\h FB^TME^TF}X\,dw-\Y X\,dw \\ \grad gw &= X\BR{ F\h\LR{Y^T+Y} \:+\: \LR{FEE^TF}\h\LR{FB^TBF} }X - \Y X \;\doteq\; H(w) \\ }$$ The Newton iteration converges rapidly (quadratically) $$\eqalign{ g_k &= g(w_k) \\ H_k &= H(w_k) \\ w_{k+\o} &= w_k - H_k^{-\o} g_k \\ k &= k + \o \\ }$$