Is there a formula for differentiating a nonlinear function by a matrix?

466 Views Asked by At

I'm struggling with matrix notation for representing the derivative of a nonlinear function by a matrix. Specifically, I'm calculating a gradient. I have:

$\quad \frac{\partial}{\partial \mathbf{W}} \phi ( \mathbf{W} \vec{x} )^T \vec{\beta}$

Where, say, $\vec{x}$ is $n \times 1$, $\vec{\beta}$ is $m \times 1$, and $\mathbf{W}$ is $m \times n$. To simplify the question, let's say $\vec{x}$ and $\vec{\beta}$ are constant vectors.

What has me stuck is $\phi(u)$ - a nonlinear transformation of its argument vector. (For my purpose it is the sigmoid function $\frac{1}{1 + e^{-u}}$). I can calculate this gradient exhaustively, but is there a shortcut that has a clean representation in matrix notation?

For example, if the problem were simply:

$\quad \frac{\partial}{\partial \mathbf{W}} (\mathbf{W} \vec{x})^T \vec{\beta}$

Then I could do this very neatly:

$\quad \frac{\partial}{\partial \mathbf{W}} (\mathbf{W} \vec{x})^T \vec{\beta} = \frac{\partial}{\partial \mathbf{W}} \vec{x}^T \mathbf{W}^T \vec{\beta} = \vec{\beta} \vec{x}^T$

1

There are 1 best solutions below

0
On BEST ANSWER

What you need to know is the "trick" for the finding derivative of scalar function applied element-wise to a matrix argument. Assume that you have a scalar function $S(x)$ whose derivative is known to be $S'(x)$. When you apply this element-wise to a matrix, the differential is $$\eqalign{ dS({\bf X}) &= S'({\bf X})\circ d{\bf X} \cr }$$ where $\circ$ denotes the Hadamard product.

For the Logistic function, the derivative is known to be: $\,\,\,\sigma' = \sigma - \sigma^2$.


Now let's rewrite your objective in terms of the Logistic function and the Frobenius product (denoted by a colon), then find its differential
$$\eqalign{ f &= \sigma({\bf Wx})^T{\bf b} \cr &= \sigma^T{\bf b} \cr &= {\bf b}:\sigma \cr\cr df &= {\bf b}:d\sigma \cr &= {\bf b}:\sigma'\circ d({\bf Wx}) \cr &= {\bf b}\circ\sigma':d{\bf W}\,{\bf x} \cr &= ({\bf b}\circ\sigma')\,{\bf x}^T:d{\bf W} \cr &= ({\bf b}\circ\sigma-{\bf b}\circ\sigma\circ\sigma)\,{\bf x}^T:d{\bf W} \cr }$$ Since $df=(\frac{\partial f}{\partial W}:dW),\,$ the gradient is $$\eqalign{ \frac{\partial f}{\partial {\bf W}} &= ({\bf b}\circ\sigma-{\bf b}\circ\sigma\circ\sigma)\,{\bf x}^T \cr }$$ In the case that the scalar function is the identity function, i.e. $S(x)=x$, then the deriviative is unity $S'(x)=1$.

When applied element-wise to a matrix argument, the result is a matrix of all-ones, which just happens to be the identity element for the Hadamard product. So $(b\circ\sigma')$ would be replaced by $(b\circ 1=b)$ in the differential, yielding a gradient of $$\eqalign{ \frac{\partial f}{\partial {\bf W}} &= {\bf b}\,{\bf x}^T \cr }$$ which is the result that you already knew.