Differentiation of a matrix expression with respect to a vector

89 Views Asked by At

I am interested in the expression:

$$ \frac{d}{d\bf{p}} \left[\left(D_1 +P_1 \right)^{-1}\left(P_1\bf{x} \right) \right] $$

where $\bf{p}, \bf{x} $ are $S \times 1$ vectors and $D_1,P_1$ are $S \times S$ matrices. In this case, only the entries of $P_1$ directly depend on $\bf{p}$ so I was thinking of writing:

$$ \frac{d}{d\bf{p}} \left[\left(D_1 +P_1 \right)^{-1}\left(P_1\bf{x} \right) \right] = \left(D_1 +P_1 \right)^{-1} \frac{dP_1}{d\bf{p}} \left(D_1 +P_1 \right)^{-1} \left(P_1\bf{x} \right) + \left(D_1 +P_1 \right)^{-1} \frac{dP_1\bf{x}}{d\bf{p}}. $$

I don't have much confidence that this is correct though. I don't even know if it makes total sense, since $P_1 \bf{x}$ is a vector and $\frac{dP_1}{d\bf{p}}$ is a tensor? Any help is appreciated.

2

There are 2 best solutions below

0
On BEST ANSWER

Generally speaking, when derivatives are involved that are encoded as tensors of higher order than matrices, you can no longer write the derivative just in terms of matrix multiplications, but you have to use tensor-contractions/einsums instead. A good resource is the computer algebra tool http://www.matrixcalculus.org/,

which gives us

$$ \frac{\partial \left( \mathrm{inv}(D+P)\cdot P\cdot x \right)}{\partial P} = x^\top \otimes \mathrm{inv}(D+P)-(\mathrm{inv}(D+P)\cdot P\cdot x)^\top \otimes \mathrm{inv}(D+P) $$

So, by chain rule with $P = P(p)$ we have

$$\begin{aligned} \frac{\partial (D+P)^{-1} P x}{\partial P}\frac{\partial P}{\partial p} &= \Big(x^\top \otimes (D+P)^{-1}-((D+P)^{-1} P x)^\top \otimes (D+P)^{-1}\Big)\frac{\partial P}{\partial p} \\&= (D+P)^{-1}\color{red}{\cdot}\frac{\partial P}{\partial p}\color{green}{\cdot}x \;-\; (D+P)^{-1}\color{red}{\cdot}\frac{\partial P}{\partial p}\color{green}{\cdot}(D+P)^{-1} P x \end{aligned}$$

since matrixcalculus.org uses the convention $AXB = (B^\top\otimes A)\cdot X$. Moreover, note that since

$$\begin{aligned} \frac{\partial f(p)}{\partial p} = \bigg(\frac{\partial \big((D+P)^{-1} P x\big)_i}{p_m}\bigg)_{im} = \sum_{jk}\bigg(\frac{\partial \big((D+P)^{-1} P x\big)_i}{\partial P_{jk}}\bigg)_{i,jk} \bigg(\frac{\partial P_{jk}}{\partial p_m}\bigg)_{jk,m} \end{aligned}$$

In the above,

$$ A\color{red}{\cdot}\frac{\partial P}{\partial p}\color{green}{\cdot}v = \Bigg( \sum_{jk} A_{ij}\frac{\partial P_{jk}}{\partial p_m}v_k\Bigg) _{im} $$

0
On

$ \def\p{\partial} \def\A{{\cal A}}\def\B{{\cal B}} \def\G{{\cal G}}\def\H{{\cal H}} \def\X{{\cal X}}\def\Y{{\cal Y}}\def\Z{{\cal Z}} \def\LR#1{\Big(#1\Big)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $The single and double contraction products of two arbitrary tensors are defined as $$\eqalign{ \Y &= \A\cdot\B \quad&\iff\quad \Y_{ijkqrs} = \sum_{\c{\ell}=1}^L \A_{ijk\c{\ell}}\B_{\c{\ell}qrs} \\ \Z &= \A:\B \quad&\iff\quad \Z_{ijrs} = \sum_{\c{k}=1}^K\sum_{\c{\ell}=1}^L \A_{ij\c{k\ell}}\B_{\c{k\ell}rs} \\ }$$ while their dyadic product is defined as $$\eqalign{ \X = \A\star\B \quad\iff\quad \X_{ijk\ell pqrs} = \A_{ijk\ell}\B_{pqrs} \quad\quad \\ }$$ These products can also be used to rearrange simpler matrix-vector equations, e.g. $$A\cdot B\cdot x = \LR{A\star x}:B$$ For typing convenience, omit all subscripts and define the matrix and tensor variables $$\eqalign{ C &= \LR{D+P}^{-1} \quad&\implies\quad dC = -C\,dP\,C \\ \H &= {C\star x - C\star CPx} \\ }$$ Write your function in terms of these new variables, then calculate its differential. $$\eqalign{ f &= CPx \\ df &= C\,dP\,x + dC\,Px \\ &= C\,dP\,x - C\,dP\,CPx \\ &= \LR{C\star x - C\star CPx}:dP \\ &= \H:dP \\ }$$ You didn't tell us anything about the function $P=P(p),\;$ so I'll assume that you don't need any help calculating its gradient $$\G = \grad{P}{p} \quad\implies\quad dP = \G\cdot dp$$ Knowledge of $\G$ allows us to finish calculating the gradient of $f$ $$\eqalign{ df &= \H:\G\cdot dp \\ \grad fp &= \H:\G \\\\ }$$


That is the best that can be done in light of the fact that you refuse to describe the relationship between $p$ and $P.\;$ However, I suspect that the relationship is something like $$\eqalign{ P &= {\rm Diag}(p) \\ P &= ap^T + pb^T \\ P &= App^T+ \big(p^Tp\big)B \\ p &= {\rm vec}(P) \\ }$$ in which case the gradient $\LR{\grad fp}$ will be a simple matrix which can be calculated without the need for higher-order tensors or dyadic products or any of the other nonsense which is required by the blind application of the chain rule to matrix calculus problems.