Proof of the derivative of $b^T X^T X c$ in the matrix cookbook equation (77)

497 Views Asked by At

I am trying to study some matrix derivatives in the matrix cookbook. I am having trouble taking the derivative of $b^T X^T X c$ with respect to $X$.

Also if anyone can point me to some good resources that I can learn from, that would be great. The matrix cookbook is good for reference but without proofs I don't think I am learning from it.

The problem I am trying to prove is as follows,

$$ \frac{\partial \ \boldsymbol{b^T X^T X c}}{\partial \boldsymbol{X}} =\boldsymbol{X}(\boldsymbol{bc^T +cb^T}) $$

I tried to use the chain rule ( I am unsure if there is a rule for if the inner derivative or the outer derivative comes first with matrices),

$$ \frac{\partial \ \boldsymbol{b^T X^T X c}}{\partial \boldsymbol{X}} = \frac{\partial \boldsymbol{b^T X^T X c}}{\partial \boldsymbol{X^T X}} \frac{\partial \boldsymbol{X^T X}}{\partial \boldsymbol{X}} $$

Then we know,

$$ \frac{\partial \boldsymbol{b^T X^T X c}}{\partial \boldsymbol{X^T X}} = \boldsymbol{b c^T} $$

So then the total expression becomes

$$ \frac{\partial \ \boldsymbol{b^T X^T X c}}{\partial \boldsymbol{X}} = \boldsymbol{b c^T} \frac{\partial \boldsymbol{X^T X}}{\partial \boldsymbol{X}} $$

But I am unsure how to calculate that partial derivative and it doesn't look like that this right anyhow. If anyone can provide me a proof or some references that would be nice. Thanks!

Update: For an update and reference to go with accepted answer I want to include that the chain rule for a matrix derivative of a scalar function $f(U)$ is as shown below,

$$ \frac{\partial f(U)}{\partial X} = \frac{Tr((\frac{\partial f(U)}{ \partial U})^T \partial U)}{\partial X} $$

This is shown in the matrix cookbook equation (213).

1

There are 1 best solutions below

0
On BEST ANSWER

$ \def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\vec#1{\operatorname{vec}\LR{#1}} \def\diag#1{\operatorname{diag}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} $Use the known gradient to write the differential of the function, which can then be expanded and rearranged to recover the desired gradient. $$\eqalign{ \grad{f}{\LR{X^TX}} &= G \\ df &= G:d\LR{X^TX} \\ &= G:\LR{X^TdX+dX^TX} \\ &= {XG}:dX + {XG^T}:dX \\ &= X\LR{G+G^T}:dX \\ \grad{f}{X} ​&= X\LR{G+G^T} \\\\ }$$


In the preceding, a colon is used to denote the matrix inner product, i.e. $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$

The properties of the underlying trace function allow the terms in a such a product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\\\ }$$