How to differentiate this matrix expression?

148 Views Asked by At

I encounter one equation, and want to know how to do the matrix differentiation:

$$\frac{\partial\,\text{trace}\left(\left(\mathbf{\Theta}^T\mathbf{S}_W\mathbf{\Theta}\right)^{-1}\mathbf{\Theta}^T\mathbf{S}_B\mathbf{\Theta}\right)}{\partial\mathbf{\Theta}}.$$

One possible result is listed below, but I don't know how it is derived:

$$-2\mathbf{S}_W\mathbf{\Theta}\left(\mathbf{\Theta}^T\mathbf{S}_W\mathbf{\Theta}\right)^{-1}\left(\mathbf{\Theta}^T\mathbf{S}_B\mathbf{\Theta}\right)\left(\mathbf{\Theta}^T\mathbf{S}_W\mathbf{\Theta}\right)^{-1}+2\mathbf{S}_B\mathbf{\Theta}\left(\mathbf{\Theta}^T\mathbf{S}_W\mathbf{\Theta}\right)^{-1}.$$

Can you help explain it?

2

There are 2 best solutions below

1
On

Let us assume that all matrices in the above expression are square.
We can write $$ F = \frac{\partial}{\partial \theta}\left[\text{tr}\left((\theta^T\,S_w\,\theta)^{-1} \theta^T\,S_B\,\theta\right)\right] = \frac{\partial}{\partial \theta}\left[\text{tr}(W B)\right] $$ where $$ W := (\theta^T\,S_w\,\theta)^{-1} \quad \text{and} \quad B := \theta^T\,S_B\,\theta \,. $$ Define $ A:= WB$. Using the chain rule $$ \frac{\partial}{\partial \theta}\left[\text{tr}(A)\right] = \frac{\partial\,\text{tr}(A)}{\partial A}\cdot\frac{\partial A}{\partial \theta} = I\cdot\frac{\partial A}{\partial \theta} \,. $$ Here $I$ is the identity matrix and the ($\cdot$) in the above equation is interpreted as $$ \left[I\cdot\frac{\partial A}{\partial \theta}\right]_{ij} = \sum_m \sum_n I_{mn}\,\frac{\partial A_{mn}}{\partial \theta_{ij}} $$ Next we expand $\partial A/\partial \theta$: $$ \frac{\partial A}{\partial \theta} = \frac{\partial A}{\partial W}\cdot\frac{\partial W}{\partial \theta} + \frac{\partial A}{\partial B}\cdot\frac{\partial B}{\partial \theta} $$ It is easier if we work with indices at this stage. Then, $$ \left[\frac{\partial A}{\partial W}\right]_{ijkl} = \frac{\partial A_{ij}}{\partial W_{kl}} = \sum_p\frac{\partial W_{ip}}{\partial W_{kl}}\,B_{pj} = \sum_p I_{ik}\,I_{pl}\,B_{pj} = I_{ik}\,B_{lj} $$ and $$ \left[\frac{\partial A}{\partial B}\right]_{ijkl} = \frac{\partial A_{ij}}{\partial B_{kl}} = \sum_p W_{ip}\frac{\partial B_{pj}}{\partial B_{kl}} = \sum_p W_{ip}\,I_{pk}\,I_{jl} = W_{ik}\,I_{jl} $$ Now we need $\partial W/\partial \theta$. Define $$ C := \theta^T\,S_w\,\theta \,. $$ Then $$ \frac{\partial W}{\partial \theta} = \frac{\partial C^{-1}}{\partial C}\cdot\frac{\partial C}{\partial \theta} \, $$ The derivative of the inverse is given by $$ \frac{\partial C^{-1}_{mn}}{\partial C_{ij}} = - C^{-1}_{mi}\,C^{-1}_{jn} $$ and $$ \begin{align} \frac{\partial C_{ij}}{\partial \theta_{pq}} &= \frac{\partial}{\partial \theta_{pq}}(\theta^T\,S_w\,\theta)_{ij} = \sum_k \sum_l\frac{\partial \theta^T_{ik}}{\partial \theta_{pq}} S_{kl}\theta_{lj} + \sum_k\sum_l\theta^T_{ik} S_{kl} \frac{\partial \theta_{lj}}{\partial \theta_{pq}} \\ & = \sum_k \sum_l I_{kp} I_{iq} S_{kl}\theta_{lj} + \sum_k\sum_l \theta_{ki} S_{kl} I_{lp} I_{jq}\\ & = \sum_l I_{iq} S_{pl}\theta_{lj} + \sum_k \theta_{ki} S_{kp} I_{jq} \end{align} $$ Therefore, $$ \begin{align} \frac{\partial W_{mn}}{\partial \theta_{pq}} &= \sum_i\sum_j\frac{\partial C^{-1}_{mn}}{\partial C_{ij}}\frac{\partial C_{ij}}{\partial \theta_{pq}} = \sum_i\sum_j (- C^{-1}_{mi}\,C^{-1}_{jn})\left(\sum_l I_{iq} S_{pl}\theta_{lj} + \sum_k \theta_{ki} S_{kp} I_{jq}\right) \\ & = -\sum_j\sum_l C^{-1}_{mq}C^{-1}_{jn} S_{pl}\theta_{lj} -\sum_i\sum_k C^{-1}_{mi}C^{-1}_{qn} \theta_{ki}S_{kp} \end{align} $$ Next we calculate $$ \begin{align} \sum_m\sum_n\frac{\partial A_{rs}}{\partial W_{mn}}\frac{\partial W_{mn}}{\partial \theta_{pq}} & = \sum_m\sum_n (I_{rm} B_{ns})\left[-\sum_j\sum_l C^{-1}_{mq}C^{-1}_{jn} S_{pl}\theta_{lj} -\sum_i\sum_k C^{-1}_{mi}C^{-1}_{qn} \theta_{ki}S_{kp}\right] \\ & = -\sum_n \sum_j\sum_l B_{ns} C^{-1}_{rq}C^{-1}_{jn} S_{pl}\theta_{lj} -\sum_n \sum_i\sum_k B_{ns} C^{-1}_{ri}C^{-1}_{qn} \theta_{ki}S_{kp} \end{align} $$ The last stage is the product with $I$, $$ \sum_r\sum_s \sum_m\sum_n I_{rs}\frac{\partial A_{rs}}{\partial W_{mn}}\frac{\partial W_{mn}}{\partial \theta_{pq}} $$ which is $$ -\sum_r \sum_n \sum_j\sum_l B_{nr} C^{-1}_{rq}C^{-1}_{jn} S_{pl}\theta_{lj} - \sum_r\sum_n \sum_i\sum_k B_{nr} C^{-1}_{ri}C^{-1}_{qn} \theta_{ki}S_{kp} $$ In compact form $$ T_1 := I\cdot\frac{\partial A}{\partial W}\cdot\frac{\partial W}{\partial \theta} = -S_w \theta C^{-1} B C^{-1} - S_w^T \theta C^{-T} B C^{-1} $$ If $S_w$ is symmetric, then $C$ is symmetric and we have $$ T_1 = -2S_w \theta C^{-1} B C^{-1} = -2 S_w \theta (\theta^T\,S_w\,\theta)^{-1} (\theta^T\,S_B\,\theta) (\theta^T\,S_w\,\theta)^{-1} $$ Next you will have to repeat the process for the remaining term $$ T_2 := I\cdot\frac{\partial A}{\partial B}\cdot\frac{\partial B}{\partial \theta}\,.$$ The expression you seek is the sum $T = T_1 + T_2$.

0
On

For convenience, define the quantities $W=\Theta^TS_W\Theta$ and $B=\Theta^TS_B\Theta$, for which the differentials are $$\eqalign{ dW &= d\Theta^TS_W\Theta \,+\, \Theta^TS_W\,d\Theta \cr dB &= d\Theta^TS_B\Theta \,+\, \Theta^TS_B\,d\Theta \cr }$$ The function and its differential can be written in terms of the Frobenius product of these quantities
$$\eqalign{ f &= B:W^{-1} \cr \cr df &= W^{-1}:dB \,+\, B:dW^{-1} \cr &= W^{-1}:dB \,-\, B:W^{-1}\,dW\,W^{-1} \cr &= W^{-1}:dB \,-\, W^{-T}BW^{-T}:dW \cr &= W^{-1}:(d\Theta^TS_B\Theta + \Theta^TS_B\,d\Theta) \,-\, W^{-T}BW^{-T}:(d\Theta^TS_W\Theta + \Theta^TS_W\,d\Theta) \cr &= W^{-1}:d\Theta^TS_B\Theta+W^{-1}:\Theta^TS_B\,d\Theta-W^{-T}BW^{-T}:d\Theta^TS_W\Theta-W^{-T}BW^{-T}:\Theta^TS_W\,d\Theta \cr &= d\Theta:S_B\Theta W^{-T} + S_B^T\Theta W^{-1}:\,d\Theta - d\Theta:S_W\Theta W^{-1}B^TW^{-1} - S_W^T\Theta W^{-T}BW^{-T}:d\Theta \cr &= (S_B\Theta W^{-T} + S_B^T\Theta W^{-1} - S_W\Theta W^{-1}B^TW^{-1} - S_W^T\Theta W^{-T}BW^{-T}) : d\Theta \cr \cr }$$ Since $df=(\frac {\partial f} {\partial \Theta}):d\Theta$, the derivative is seen to be $$\eqalign{ \frac {\partial f} {\partial \Theta} &= S_B\Theta W^{-T} + S_B^T\Theta W^{-1} - S_W\Theta W^{-1}B^TW^{-1} - S_W^T\Theta W^{-T}BW^{-T} \cr }$$ Now if ($S_B,S_W$) are symmetric, then ($B,W$) are as well, and the derivative can be simplified to $$\eqalign{ \frac {\partial f} {\partial \Theta} &= 2\,S_B\Theta W^{-1} - 2\,S_W\Theta W^{-1}BW^{-1} \cr &= 2\,S_B\Theta(\Theta^TS_W\Theta)^{-1} \,-\, 2\,S_W\Theta (\Theta^TS_W\Theta)^{-1}(\Theta^TS_B\Theta)(\Theta^TS_W\Theta)^{-1} \cr }$$ which is the result you were questioning.