How to find the gradient $\nabla_W \left( u^T_1 W \left( W E W^T + \lambda I \right)^{-1} W^{T} u_2 \right)$?

69 Views Asked by At

I am struggling to find the gradient

$$\nabla_W \left( u^T_1 W \left( W E W^T + \lambda I \right)^{-1} W^{T} u_2 \right)$$

where $I \in \mathbb{R}^{n\times n}$ is the identity matrix, $\lambda > 0$, $E \in \mathbb{R}^{n\times n}$ is a symmetric matrix, $u_1,u_2$ are two vectors. I tried using the matrix cookbook but the best approximation for a solution for this problem is Eq. 127, which is not exactly the prototype of this expression. Would be grateful if anyone can help.

1

There are 1 best solutions below

0
On BEST ANSWER

$ \def\l{\left} \def\r{\right} \def\lr#1{\l(#1\r)} \def\s#1{\operatorname{Sym}\lr{#1}} \def\t#1{\operatorname{Tr}\lr{#1}} \def\p{{\partial}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $First, define the trace/Frobenius product and the $\tt{Sym()}$ function $$\eqalign{ X:Y &= \sum_{i=1}^m\sum_{j=1}^n X_{ij}Y_{ij} \;=\; \t{XY^T} \\ \s{X} &= \frac 12\lr{X+X^T} }$$ For typing convenience, define the symmetric matrix variables $$\eqalign{ A &= \s{u_1u_2^T} &\implies\quad A=A^T =\s{A} \\ B &= \lr{WEW^T+\lambda I}^{-1} &\implies\quad B=B^T = \s{B} \\ & &\implies\quad dB = -2B\,\s{dW\,EW^T} B \\ }$$ Write the function using the above definitions.
Then calculate its differential and gradient. $$\eqalign{ \phi &= A:WBW^T \\ d\phi &= A:dW\,BW^T + A:WB\,dW^T + A:W\,\c{dB}\,W^T \\ &= AWB:dW + BW^TA:dW^T + A:W\,\c{\l(-2B\,\s{dW\,EW^T} B\r)}\,W^T \\ &= AWB:dW + AWB :dW - 2BW^TAWB:\s{dW\,EW^T} \\ &= 2AWB:dW - 2BW^TAWB:dW\,EW^T \\ &= 2\l(AWB - BW^TAWBWE\r):dW \\ \grad{\phi}{W} &= 2\l(AWB - BW^TAWBWE\r) \\\\ }$$


Note that the transpose and cyclic properties of the trace function allows the terms in a Frobenius product to be rearranged in several different ways, e.g. $$\eqalign{ X:Y &= Y:X \;=\; Y^T:X^T \\ XY:Z &= X:ZY^T = Y:X^TZ \\ }$$ The Frobenius product also interacts nicely with the $\tt{Sym()}$ function $$\eqalign{ X:\s{Y} &= \s{X}:Y \\ }$$