How to work out this matrix differentiation?

119 Views Asked by At

I am trying to minimize a function A wrt W, so seeking its gradient

$$ A = \ln \det ( WW^T + \sigma^2I)$$

So according to the chain rule I found

$$ \frac{\partial A}{\partial W} = tr((\frac{\partial g(U)}{\partial U})^T \cdot \frac{\partial U}{\partial W_{ij}}) $$

Where

$$ U = WW^T \sigma^2 I $$ $$ g(U) = \ln \det U $$

I found also that

$$ \frac{\partial \ln \det U}{\partial U} = tr(U^{-1}\partial U) $$

Since $ \partial U $ wrt U should be just a matrix full of ones, call it S,

$$ \frac{\partial \ln \det U}{\partial U} = tr(U^{-1}S) $$

And also

$$ \frac{\partial U}{\partial W_{ij}} = \frac{\partial WW^T + \sigma^2 I}{\partial W_{ij}} = \frac{\partial WW^T}{\partial W_{ij}} $$

Which I found is

$$ \frac{\partial WW^T}{\partial W_{ij}} = WJ^{ji} + J^{ij} W^T $$

So putting it all together

$$ \frac{\partial A}{\partial W} = tr(tr(U^{-1}S)^T \cdot (WJ^{ji} + J^{ij} W^T)) $$

(The transpose can be dropped as our function is scalar.)

However, this result does not seem to agree with a simple numerical derivation. Why is this?

1

There are 1 best solutions below

11
On BEST ANSWER

Rewrite the function in terms of the trace function and find its differential $$\eqalign{ A &= \log(\det(WW^T+\sigma^2 I)) \cr &= {\rm tr}(\log(WW^T+\sigma^2 I)) \cr\cr dA &= (WW^T+\sigma^2 I)^{-T}:d(WW^T) \cr &= (WW^T+\sigma^2 I)^{-T}:2\,{\rm sym}(dW\,W^T) \cr &= 2\,{\rm sym}(WW^T+\sigma^2 I)^{-1}:dW\,W^T \cr &= 2\,(WW^T+\sigma^2 I)^{-1}W:dW \cr }$$ Since $dA=(\frac{\partial A}{\partial W}:dW),\,$ the gradient must be $$\eqalign{ \frac{\partial A}{\partial W} &= 2\,(WW^T+\sigma^2 I)^{-1}W \cr }$$ The above derivation employs both the Frobenius (:) product and the sym() function $$\eqalign{ {\rm sym}(M) &= \frac{1}{2}(M+M^T) \cr A:M &= {\rm tr}(A^TM) \cr }$$