Derivative of a matrix with respect to a matrix

522 Views Asked by At

If $\mathbf x$ and $\mathbf y$ are matrices such that the function $\mathbf z=\mathbf x^n \mathbf y^m $ is well defined, how can I find the derivatives $$ \frac{\partial \mathbf z}{\partial \mathbf x} \qquad \frac{\partial \mathbf z}{\partial \mathbf y} $$

From some example in Quantum Mechanics I know that, as an example, if $\mathbf z =\mathbf x^2 \mathbf y$ than $$ \frac{\partial \mathbf z}{\partial \mathbf x} =\mathbf{xy+ yx} $$

But where can I find the proof for this result? More general: where can I find some references about this kind of calculus with matrices?


Added after the answers.

My reference is the historical paper of Born and Jordan on foundation of matrix mechanic (it can be found in https://archive.org/details/SourcesOfQuantumMechanics). Here, in the paragraph 2 ( pag.282 of ref.), is defined the ''Symbolic differentiation'' of a matrix function and, as an example, is given: $$ \mathbf{ y=x_1^n x_2^m \qquad \frac{\partial y}{\partial x_1}=x_1^{n-1}x_2^m+x_1^{n-2}x_2^mx_1+\cdots +x_2^mx_1^{n-1} } $$

so it seems totally different from the answers. Is it wrong or it depends only on a different definition of the derivative?

My problem is to justify the result of Born and Jordan that is essential to prove the classical commutation relation of Quantum Mechanic.

If the definition of Born is so different from the ''standard'' mathematical definition, how we can motivate such difference?

3

There are 3 best solutions below

8
On

The best suggestion is to work with the definition of the directional derivative. If $f(\mathbf x,\mathbf y) = \mathbf x^2\mathbf y$, then if we change $\mathbf x$ in direction $\mathbf v$, we have \begin{align*} D_{(\mathbf v,\mathbf 0)}f(\mathbf x,\mathbf y) &= \lim_{t\to 0} \frac{f(\mathbf x + t\mathbf v,\mathbf y) - f(\mathbf x,\mathbf y)}t = \lim_{t\to 0}\frac{(\mathbf x+t\mathbf v)^2\mathbf y - \mathbf x^2\mathbf y}t \\ &= \lim_{t\to 0}\frac{(\mathbf x^2 + t\mathbf x\mathbf v + t\mathbf v\mathbf x+ \mathbf v^2)\mathbf y - \mathbf x^2\mathbf y}t = (\mathbf x\mathbf v + \mathbf v\mathbf x)\mathbf y, \end{align*} so your formula is totally incorrect (for starters, there's no way $\mathbf y$ can end up on the other side of $\mathbf x$). More importantly, there's no way to write the correct formula as a simple matrix multiplied by the vector $\mathbf v$ because the vector $\mathbf v$ gets intertwined.

Now if you think of $\mathbf x = [x_{ij}]$ and you want to compute $\partial f/\partial x_{ij}$, then you apply what I wrote with $\mathbf v = E_{ij}$, the matrix with a $1$ in the $ij$-position and $0$'s elsewhere.

3
On

Another approach is to guess & verify symbolically:

In the following, $x=(x_1,x_2)$, where $x_1,x_2$ are matrices, and similarly, $h=(h_1,h_2)$.

Using the example in the question, $f(x)=x_1^nx_2^m$, then $f(x+h) -f(x)= (x_1+h_1)^n (x_2+h_2)^m- x_1^nx_2^m$, and a little work shows that $f(x+h) -f(x)= (\sum_{k=0}^{n-1} x_1^k h_1 x_1^{n-k-1} ) x_2^m +x_1^n (\sum_{k=0}^{m-1} x_2^k h_2 x_2^{m-k-1} )+ O( \|h\|^2)$ from which we see that $Df(x)h = (\sum_{k=0}^{n-1} x_1^k h_1 x_1^{n-k-1} ) x_2^m +x_1^n (\sum_{k=0}^{m-1} x_2^k h_2 x_2^{m-k-1} )$.

So, with $f(x)=x_1^2x_2$, we have $Df(x)h = (h_1x_1+x_1h_1)x_2+x_1^2 h_2$.

0
On

$ \def\o{{\tt1}}\def\d{\delta}\def\p{\partial} \def\G{{\cal E}} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)}\def\BR#1{\L\{#1\R\}} \def\vec#1{\operatorname{vec}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\E{\color{blue}{\G}} $Since $\BR{x,y,z}$ are $2^{nd}$order tensors (i.e. matrices), the gradients $\BR{\grad{z}{x},\grad{z}{y}}$ are $4^{th}$order tensors.

The best way to handle this is to use index notation.

Another approach is to introduce a special isotropic $4^{th}$order tensor $\E$, whose components can be written using Kronecker deltas as
$$\E_{ijk\ell} = \d_{ik}\d_{j\ell} = \begin{cases} \o\quad{\rm if}\;\;i=k{\;\;\rm and\;\;}j=\ell \\ 0\quad{\rm otherwise} \\ \end{cases}$$ This tensor can also be defined as the gradient of a matrix with respect to itself, i.e. $$\E = \grad{X}{X}$$ The tensor has some interesting algebraic properties $$\eqalign{ &A\cdot B\cdot C^T = \LR{A\cdot\E\cdot C}:B \\ &A = \E:A = A:\E \\ }$$ where the single and double contraction products are defined as $$\eqalign{ \LR{A\cdot\E\cdot C}_{ijk\ell} &= \sum_{p=1}^P\sum_{q=1}^Q A_{i\c{p}}\,\E_{\c{p}jk\c{q}}\,C_{\c{q}\ell} \\ \LR{\E:A}_{ij} &= \sum_{p=1}^P\sum_{q=1}^Q \E_{ij\c{pq}}\,A_{\c{pq}} \\ }$$ Now calculate the differential and gradient of $z$ with respect to $x$. $$\eqalign{ z &= x^n\cdot y^m \\ dz &= \c{dx^n}\cdot y^m \\ &= \c{\sum_{j=1}^n x^{n-j}\cdot dx\cdot x^{j-1}}\cdot y^m \\ &= {\sum_{j=1}^n x^{n-j}\cdot\E\cdot\LR{x^{j-1}y^m}^T}:dx \\ \grad{z}{x} &= {\sum_{j=1}^n x^{n-j}\cdot\E\cdot\LR{x^{j-1}y^m}^T} \\ }$$ A similar calculation yields the gradient with respect to $y$. $$\eqalign{ \grad{z}{y} &= {\sum_{k=1}^m x^n\cdot y^{m-k}\cdot\E\cdot\LR{y^{k-1}}^T} \\ }$$ Using the standard basis matrices $E_{ij}$ (mentioned in Ted Shifrin's answer), you can extract an expression for the component-wise matrix-valued gradients $$\eqalign{ \grad{z}{y_{ij}} &= \LR{\grad{z}{y}}:E_{ij} \\ &= \sum_{k=1}^m x^n\cdot y^{m-k}\cdot E_{ij}\cdot {y^{k-1}} \\ }$$