Derivative of Nested Matrix Quadratic Form

379 Views Asked by At

I have two real matrices: $\mathbf{A} \in \mathbb{R}^{k \times d}$, $\mathbf{B} \in \mathbb{R}^{d \times d}$, where $k \leq d$. Further $\mathbf{B}$ is symmetric. I also have two vectors $\mathbf{c},\mathbf{d} \in \mathbb{R}^d$. My question is, what is gradient of the following expression with respect to $\mathbf{A}$: $$ (\mathbf{A} \mathbf{c})^\top (\mathbf{A} \mathbf{B} \mathbf{A}^\top)^{-1} (\mathbf{A} \mathbf{d}) $$

An observation:

I know from the Matrix Cookbook that, if I replace the matrix $(\mathbf{A}^\top \mathbf{B} \mathbf{A})^{-1}$ with a matrix $\mathbf{E} \in \mathbb{R}^{d \times d}$ (that does not depend on $\mathbf{A}$) we have that: $$ \nabla_\mathbf{A} (\mathbf{A} \mathbf{c})^\top \mathbf{E} (\mathbf{A} \mathbf{d}) = \mathbf{E}^\top \mathbf{A} \mathbf{c} \mathbf{d}^\top + \mathbf{E} \mathbf{A} \mathbf{d} \mathbf{c}^\top $$

where $\nabla_\mathbf{A}$ signifies the gradient with respect to $\mathbf{A}$. So it seems I'm just missing a chain-rule step.

Thank you very much for any insights about this.

1

There are 1 best solutions below

1
On BEST ANSWER

For convenience define $$\eqalign{ x &= Ac &(dx &= dA\,c) \cr y &= Ad &(dy &= dA\,d) \cr E &= ABA^T \,\,&(dE &= dABA^T + AB\,dA^T) \cr }$$

Then write the function in terms of the Frobenius (:) inner product and find its differential $$\eqalign{ f &= x\,y^T:E^{-1} \cr\cr df &= dx\,y^T:E^{-1} + x\,dy^T:E^{-1} + x\,y^T:dE^{-1} \cr &= dA\,c\,y^T:E^{-1} + x\,(dA\,d)^T:E^{-1} - x\,y^T:E^{-1}\,dE\,E^{-1} \cr &= E^{-1}y\,c^T:dA + E^{-T}xd^T:dA - E^{-T}x\,y^TE^{-T}:(dABA^T + AB\,dA^T) \cr &= \Big(E^{-1}y\,c^T + E^{-T}xd^T - E^{-T}x\,y^TE^{-T}AB^T - E^{-1}y\,x^TE^{-1}AB\Big):dA \cr\cr }$$ Since $df=(\frac{\partial f}{\partial A}:dA),\,$ the gradient must be $$\eqalign{ \frac{\partial f}{\partial A} &= E^{-1}y\,c^T + E^{-T}xd^T - E^{-T}x\,y^TE^{-T}AB^T - E^{-1}y\,x^TE^{-1}AB \cr\cr &= E^{-1}\Big\{y\,c^T + xd^T - (x\,y^T + y\,x^T)E^{-1}AB\Big\} \cr\cr &= (ABA^T)^{-1}A\,\Big\{d\,c^T + c\,d^T - (c\,d^T + d\,c^T)\,A^T(ABA^T)^{-1}AB\Big\} \cr }$$ where the symmetry of $\,B$ and $E\,$ was utilized to simplify the final expression.

Notice that the positive terms correspond to what you found in the Matrix Cookbook. The negative terms account for the dependency of $E$ on $A$.