How to solve for differentials to find Jacobian in system of equations?

98 Views Asked by At

I was reading the paper Input Convex Neural Networks and couldnt understand part of the derivation for proposition 3 (Section G in supplementary materials). I have put an image of the section below:

link to image

The authors describe using differentials to solve for the desired gradients. In the first part, for equations 35 to 37 they describe a trick of replacing dh for I. However, when getting the Jacobian for G that clearly does not work and it is not mentioned how to extend the trick for it to work in this case. I tried reading on matrix differentials, but could not understand this derivation. Does anyone know how to proceed from equation 38 to 39?

1

There are 1 best solutions below

5
On BEST ANSWER

The paper cleverly defines a vector $$\eqalign{ c=\begin{bmatrix}c_y\\c_\lambda\\c_t\end{bmatrix} = -M^{-1}\begin{bmatrix}\frac{\partial\ell}{\partial y}\\0\\0\end{bmatrix} }$$ Other than the fact that it's symmetric, the details of the $M$ matrix are not important.

The $\,c\,$ vector is used to simplify calculations, like the following for equation $(37)$ $$\eqalign{ \bigg(\frac{\partial\ell}{\partial y}\bigg)^Tdy &= \begin{bmatrix} \Big(\frac{\partial\ell}{\partial y}\Big)^T &0&0\end{bmatrix}\begin{bmatrix}dy\\d\lambda\\dt\end{bmatrix} \cr &= -\begin{bmatrix} \Big(\frac{\partial\ell}{\partial y}\Big)^T &0&0\end{bmatrix}M^{-1}\begin{bmatrix}0\\dh\\0\end{bmatrix} \cr &= c^T\begin{bmatrix}0\\dh\\0\end{bmatrix} = c_\lambda^Tdh = c_\lambda:dh \cr \bigg(\frac{\partial\ell}{\partial y}\bigg)^T\frac{\partial y}{\partial h} &= c_\lambda \cr\cr }$$ The calculation to obtain equation $(39)$ is similar, using terms involving $dG$ instead of $dh$ $$\eqalign{ \bigg(\frac{\partial\ell}{\partial y}\bigg)^Tdy &= \begin{bmatrix} \Big(\frac{\partial\ell}{\partial y}\Big)^T &0&0\end{bmatrix}\begin{bmatrix}dy\\d\lambda\\dt\end{bmatrix} \cr &= -\begin{bmatrix} \Big(\frac{\partial\ell}{\partial y}\Big)^T &0&0\end{bmatrix}M^{-1}\begin{bmatrix}dG^T\lambda\\dG\,y\\0\end{bmatrix} \cr &= c^T\begin{bmatrix}dG^T\lambda\\dG\,y\\0\end{bmatrix} \cr\cr &= c_y^TdG^T\lambda + c_\lambda^TdG\,y \cr &= \lambda^TdG\,c_y + c_\lambda^TdG\,y \cr &= \Big(\lambda c_y^T + c_\lambda y^T\Big):dG \cr \bigg(\frac{\partial\ell}{\partial y}\bigg)^T\frac{\partial y}{\partial G} &= \lambda c_y^T + c_\lambda y^T \cr\cr }$$ NB: The authors use a different layout format for gradients, which is why mine are transposed compared to those in the paper.

Also, I've used subscripts for the components of the $c$ vector; using superscripts is just ugly.