How to represent Frechet derivative as a matrix equation without expanding F(X+H)?

79 Views Asked by At

To preface the question that I actually want an answer to, I've read the paper by Nicholas J. Higham that computes the square root of a matrix via the Newton's method. He utilises the function $F(X) = X^2 - A$, where X is the unknown matrix and A is the matrix whose root is being calculated. He then computes the Frechet derivative of this matrix (assuming the matrices belong to $C^{n * n}$). He does this simply by computing $F(X+H) = (X+H)^2 - A$, expanding the terms and then equating them to the Taylor series expansion upto second order accuracy of F(X+H), which is $$F(X+H) = X^2 - A + F^\prime(X).H + H^2$$ where H is a matrix of very small perturbations in order to compute the infinitesimal decrement in the value of $F(X)$. From the expansion of $F(X+H)$ he obtains that $F^\prime(X).H = X.H + H.X$. Now to my question.

I have a matrix equation which involves a mix of matrix products and Hadamard products in some of the terms. Say an equation of the form $$F(X) = X + (X.A)\odot X$$ I am allowed to do a Frechet derivative on this. If I perform an expansion as Higham did in his paper, I will have some terms in the Frechet derivative that will have the Hadamard product associated with them as properties will carry over. However, if I actually calculate the Frechet derivative as the Jacobian, I realise that the terms have become linearised and will not have this nonlinear mixed operator term. However, computing the Jacobian leads to a 4 dimensional tensor. I have no idea how this could be represented as any sort of matrix equation. Any help here would be appreciated. (For making things a bit simpler, in my case matrices contain only real numbers)

1

There are 1 best solutions below

5
On BEST ANSWER

The differential can always be written in terms of the gradient $$\eqalign{ dF &= F':dX \\ }$$ It can also be calculated via first-order perturbation of the functional form $$\eqalign{ F &= X + X\odot(XA) \\ dF &= dX + dX\odot(XA) + X\odot(dX\;A) \\ }$$ Equating the expressions and changing $\,dX\to H\,$ yields $$\eqalign{ \def\LR#1{\left(#1\right)} \def\vc{\operatorname{vec}} \def\Diag{\operatorname{Diag}} \def\D{\operatorname{Diag}} \def\A{A^T\otimes I} F':H &= H + (XA)\odot H + X\odot(HA) \\ }$$

Update

First, vectorize individual terms $$\eqalign{ h &= \vc(H),\qquad f=\vc(F),\qquad x=\vc(X) \\ B &= (\A),\quad \vc(HA) = Bh,\quad \vc(F':H) = Jh \\ }$$ then the whole equation $$\eqalign{ Jh &= h + (Bx)\odot h + x\odot(Bh) \\ }$$ Replace Hadamard products with diagonal matrices $$\eqalign{ Jh &= I\cdot h + \D(Bx)\cdot h + \D(x)\cdot Bh \\ &= \LR{I + \D(Bx) + \D(x)\cdot B} h \\ }$$ and isolate the Jacobian $$\eqalign{ \def\E{{\cal E}} \def\M{{\mathbb M}} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} J &= I + \D(Bx) + \D(x)\cdot B \\\\ }$$


If you are comfortable with tensors, then you can use the $6^{th}$ order $\M$ tensor from this post and the $4^{th}$ order identity tensor $\E$ from this post to calculate the tensor-valued gradient like so $$\eqalign{ dF &= dX + dX\odot(XA) + X\odot(dX\;A) \\ &= \E:dX + \LR{XA}:\M:dX + X:\M:\LR{\E\cdot A^T:dX} \\ &= \LR{\E + \LR{XA}:\M + X:\M\cdot A^T} : dX \\ \grad{F}{X} &= \E + \LR{XA}:\M + X:\M\cdot A^T \\ }$$ The elements of this tensor are identical to those of the $J$ matrix, but reshaped from a $(n^2\times n^2)$ matrix into a $(n\times n\times n\times n)$ tensor.

So the gradient requires ${\cal O}(n^4)$ of storage whether it is represented as a matrix or a tensor. When $n$ is large, this will be a problem.