Derivative of a matrix: Outer product chain rule

1.9k Views Asked by At

I ran into a seemingly simple matrix calculus question that I can't seem to find the solution to.

Suppose I have the following matrices: $X_{(t \times n)}, V_{(n \times m)}$, and $\Phi_{(t\times m)} = f(XV)$ for some differentiable function $f$, which is applied element-wise to the argument $XV$.

I would like to calculate $\frac{\partial}{\partial V} \|1^T\Phi\|_2^2$, which I expanded to the outer product (hopefully correctly) as $\frac{\partial}{\partial V} 1^T \Phi\Phi^T 1 = \frac{\partial}{\partial V} 1^T f(XV) f(XV)^T 1^T$.

The Matrix Cookbook states that $\frac{d}{dx} \|x\|_2^2 = \frac{d}{dx} \|x^Tx\|_2 = 2x$. However, I'm not 100% certain I can use this in my case.

So far I have that $\frac{\partial}{\partial V} 1^T f(XV) f(XV)^T 1 = 2X^T[f(XV) \circ f^\prime(XV)]$ but my gradient checker (gradest in Matlab) is saying this is incorrect. I've been stuck on this all day, can anyone help?

I'm trying to figure out a vectorized solution (not involving for loop summations) since this piece of code will be called iteratively for optimization.

Edit: I've confirmed that $\frac{d}{d\Phi} \|1^T \Phi \|_2^2 = 2 \cdot 1 1^T \Phi$.

3

There are 3 best solutions below

0
On BEST ANSWER

I finally figured out (and verified with gradest) the correct solution in vectorized form: $$ \frac{d}{dV} \|1^T f(XV)\|_2^2 = 2 X^T \left[(1 1^T \Phi) \circ f^\prime(XV)\right], $$ where $\circ$ denotes the Hadamard product.

0
On

One way of approaching the problem is to go to index notation (with summation over repeated indices).

First, note that $$ \mathbf{v}\cdot\mathbf{\Phi} \equiv v_i\,\Phi_{ij} $$ Therefore, $$ ||\mathbf{v}\cdot\mathbf{\Phi}||_2^2 =(\mathbf{v}\cdot\mathbf{\Phi})\cdot(\mathbf{v}\cdot\mathbf{\Phi}) \equiv (v_m\,\Phi_{mp})(v_n\,\Phi_{kp}) = v_m\,v_n\,\Phi_{mp}\,\Phi_{np} $$ and $$ \begin{align} \frac{\partial}{\partial\mathbf{V}}(||\mathbf{v}\cdot\mathbf{\Phi}||_2^2) \equiv & \frac{\partial}{\partial V_{ij}}(v_m\,v_n\,\Phi_{mp}\,\Phi_{np}) = v_m\,v_n\,\frac{\partial}{\partial V_{ij}}(\Phi_{mp}\,\Phi_{np}) \\ = & v_m\,v_n\left[\frac{\partial \Phi_{mp}}{\partial V_{ij}}\,\Phi_{np} + \Phi_{mp}\,\frac{\partial \Phi_{np}}{\partial V_{ij}}\right] \end{align} $$ Also $$ \Phi_{ij} = f(X_{mp}\,V_{pq}) =: f_{ij}(A_{mq})\,. $$ Therefore, $$ \frac{\partial \Phi_{ij}}{\partial V_{kl}} = \frac{\partial f_{ij}}{A_{mq}}\,\frac{\partial A_{mq}}{\partial V_{kl}} = \frac{\partial f_{ij}}{A_{mq}}\left[\frac{\partial X_{mp}}{\partial V_{kl}}\,V_{pq} + X_{mp}\,\frac{\partial V_{pq}}{\partial V_{kl}}\right] $$ Getting these into an efficient vector form will depend on the matrices you are dealing with.

0
On

Let $Y=1^T\Phi$, then the problem is to find the derivative of the function $\,L=\|Y\|_F^2$

Better yet, using the Frobenius product, the function can be written as $\,L=Y:Y$

Start by taking the differential $$\eqalign{ dL &= 2\,Y:dY \cr &= 2\,1^T\Phi:1^Td\Phi \cr &= 2\,11^T\Phi:d\Phi \cr &= 2\,11^T\Phi:\Phi'\circ d(XV) \cr &= 2\,(11^T\Phi)\circ\Phi':d(XV) \cr &= 2\,(11^T\Phi)\circ\Phi':X\,dV \cr &= 2\,X^T[(11^T\Phi)\circ\Phi']:dV \cr }$$ Since $dL = \big(\frac {\partial L} {\partial V}\big):dV\,\,$ the derivative must be $$\eqalign{ \frac {\partial L} {\partial V} &= 2\,X^T[(11^T\Phi)\circ\Phi'] \cr }$$ This is the same result as @legomygrego, but with the step-by-step details. The only property which might be new to some readers is the mutual commutivity of the Frobenius and Hadamard products $$\eqalign{ A:B &= B:A \cr A\circ B &= B\circ A \cr A\circ B:C &= A:B\circ C \cr }$$