Derivative tensor of $\frac{Axx^\top A}{x^\top AA x}$ with $A$ symmetric positive semi-definite

83 Views Asked by At

Let $A$ be an $n\times n$ symmetric, positive semi-definite matrix. I also have the following elements

  • Function $f(x) = -\frac{1}{2}x^\top A x$.
  • Gradient $\nabla f(x) = -Ax$ and Jacobian $J_f(x) = -x^\top A$.
  • Hessian $H_f(x) = -A$.
  • Define the function $N:\mathbb{R}^n\to\mathbb{R}^{n\times n}$ $$ N(x) = \frac{\nabla f(x) \nabla f(x)^\top}{\nabla f(x)^\top \nabla f(x)} = \frac{Axx^\top A}{x^\top AA x} $$

How can I write the derivative of $N$ with respect to $x$ without using index notation?

The derivative will be a third-order tensor, but fundamentally I would like to be able to write down the result so that I can code it in Python.

3

There are 3 best solutions below

9
On BEST ANSWER

$ \newcommand\diff[1]{\left\{#1\right\}} \newcommand\Diff[1]{\Bigl\{#1\Bigr\}} \newcommand\trans[1]{#1^{\mathrm T}} \newcommand\R{\mathbb R} $

I hate the usual $D$ notation for total derivatives; it always creates a mess that's hard to read. Instead I'm going to use the following notation: $$ \diff{f}_x = \diff{f(x)}_x = Df(x), $$$$ \diff{f}^h_x = \diff{f(x)}^h_x = Df(x)(h). $$ We will use the following facts:

  1. The derivative of an expression is the sum of the derivatives of its parts: $$ \diff{F(x,x)}^h_x = \diff{F(\dot x, x)}^h_{\dot x} + \diff{F(x, \dot x)}^h_{\dot x}, $$ where the notation indicates that the undotted $x$ is held constant; or rather, that only the dotted $x$ is varied. (This is actually a consequence of the chain rule.)
  2. The chain rule says the derivative of a composition is the composition of the derivatives: $$ \diff{F\circ G}_x = \diff{F}_{G(x)}\circ\diff{G}_x. $$ In particular, if $f : \R \to \R$ and $g : \R^n \to \R$ then $$ \diff{f\circ g}^h_x = f'\bigl(g(x)\bigr)\diff{g}^h_x. $$
  3. The derivative of a linear function $L$ is itself: $$ \diff{L(x)}^h_x = L(h). $$
  4. (1) and (3) together give us the product rule: if $\bullet$ is any bilinear operation, then $$ \Diff{F(x)\bullet G(x)}^h_{\dot x} = \Diff{F(\dot x)\bullet G(x)}^h_{\dot x} + \Diff{F(x)\bullet G(\dot x)}^h_{\dot x} = \diff{F}^h_x\bullet G(x) + F(x)\bullet\diff{G}^h_x. $$ The last equality follows since e.g. the map $Y \mapsto Y\bullet G(x)$ is linear in $Y$. This extends just as you would expect to multilinear operations.

These facts let us easily evaluate the desired derivative: $$\begin{aligned} \diff{\frac{Ax\trans xA}{\trans xA^2x}}^h_x &= \frac{A\diff{x\trans x}^h_xA}{\trans xA^2x} + Ax\trans xA\diff{\frac1{\trans xA^2x}}^h_x \\ &= \frac{A(h\trans x + x\trans h)A}{\trans xA^2x} - \frac{Ax\trans xA}{(\trans xA^2x)^2}\diff{\trans xA^2x}^h_x \\ &= \frac{A(h\trans x + x\trans h)A}{\trans xA^2x} - \frac{Ax\trans xA}{(\trans xA^2x)^2}(\trans hA^2x + \trans xA^2h) \\ &= \frac1{\trans xA^2x}A(h\trans x + x\trans h)A - 2\frac{\trans hA^2x}{(\trans xA^2x)^2}Ax\trans xA. \end{aligned}$$ The last line follows since $\trans hA^2x = \trans xA^2 h$ since $A^2$ is symmetric.

7
On

Noting that $A$ is symmetric, one has $$ x^\top A^2y=y^\top A^2x\in\mathbb{R}, \forall x,y\in\mathbb{R}^n. $$ Also $Axy^\top A=(Ax)(Ay)^\top$. So, for $h\in\mathbb{N}$, \begin{eqnarray} DN(h)(x)&=&\lim_{t\to0}\frac{N(x+th)-N(x)}{t}\\ &=&\lim_{t\to0}\frac1t\bigg[\frac{A(x+th)(x+th)^\top A}{(x+th)^\top A^2 (x+th)}-\frac{Axx^\top A}{x^\top A^2 x}\bigg]\\ &=&\lim_{t\to0}\frac1t\frac{A(x+th)(x+th)^\top Ax^\top A^2x-Axx^\top A(x+th)^\top A^2 (x+th)}{(x+th)^\top A^2 (x+th)x^\top A^2 x}\\ &=&\lim_{t\to0}\frac1t\frac{A(xx^\top+txh^\top+thx^\top+t^2hh^\top)Ax^\top A^2x-Axx^\top A(x^\top A^2x+2tx^\top A^2h+t^2h^\top A^2h))}{(x+th)^\top A^2 (x+th)x^\top A^2 x}\\ &=&\lim_{t\to0}\frac1t\frac{A(txh^\top+thx^\top+t^2hh^\top)Ax^\top A^2x-Axx^\top A(2tx^\top A^2h+t^2h^\top A^2h))}{(x+th)^\top A^2 (x+th)x^\top A^2 x}\\ &=&\lim_{t\to0}\frac{A(xh^\top+hx^\top+thh^\top)Ax^\top A^2x-Axx^\top A(2x^\top A^2h+th^\top A^2h))}{(x+th)^\top A^2 (x+th)x^\top A^2 x}\\ &=&\frac{A(xh^\top+hx^\top)Ax^\top A^2x-2Axx^\top A(x^\top A^2h)}{(x^\top A^2 x)^2}\\ &=&\frac{x^\top A^2x(Axh^\top A+Ahx^\top A)-2Axx^\top A(x^\top A^2h)}{(x^\top A^2 x)^2}\\ &=&\frac{x^\top A^2x(Ax(Ah)^\top+Ah(Ax)^\top)-2Axx^\top A(x^\top A^2h)}{(x^\top A^2 x)^2}. \end{eqnarray}

0
On

$ \def\b{\beta}\def\d{\delta} \def\o{{\tt1}}\def\p{\partial} \def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $The independent fourth-order isotropic tensors are simply permutations of the dyadic product of Kronecker deltas $$\eqalign{ \d_{ij}\d_{kl} &= \G_{ijkl} = \F_{iljk} = \E_{ikjl} \\ }$$ but they are very handy for rearranging matrix-vector products $$\eqalign{ yx^TM &= (y\G M^T)\,x \\ Mxy^T &= (M\E y)\,x \\ }$$ For ease of typing, define the variables $$\eqalign{ w &= Ax &\qiq dw=A\,dx \\ B &= ww^T &\qiq dB = {w\,dw^T+dw\,w^T} \\ \b &= w\cdot w &\qiq d\b = (2w)\cdot dw \\ }$$ Write your matrix-valued function in terms of these, then calculate its differential and gradient. $$\eqalign{ N &= \b^{-1}B \\ dN &= \b^{-1}dB - B\b^{-2}(d\b) \\ &= \b^{-1}{w\,dw^T} + \b^{-1}{dw\,w^T} - \b^{-1}N\LR{2w\cdot dw} \\ \b\:dN &= {w\,dx^T}A + A\,{dx\,w^T} - \LR{2N\star Aw}\cdot dx \\ &= \LR{w\G A + A\E w - 2N\star Aw}\cdot dx \\ \grad{N}{x} &= \frac{w\star A + A\E w - 2N\star(Aw)}{\b} \\ }$$ Note that juxtaposition implies a dot product (i.e. $w\G A=w\cdot\G\cdot A$) and $\star$ denotes the dyadic product. For example, the $\G$ tensor could be written in terms of the identity matrix $I$ as $$\G = I\star I$$ or the $B$ matrix in terms of $w$ vector as $$B = w\star w$$ I know you said that you didn't want to use index notation, but I think you are just being stubborn because it clarifies things tremendously (and is easy to code in Python) $$\eqalign{ f &= Aw = A^2x \\ \grad{N_{ij}}{x_k} &= \frac{A_{jk}w_i+A_{ik}w_j-2N_{ij}f_k}{\b} \\ }$$