Derivative of the derivative of a neural network w.r.t. itself

112 Views Asked by At

I'm trying to find the following derivative:

\begin{equation} \frac{\partial^2 f}{\partial f \partial x} \end{equation} where $f$ is a neural network, and $x$ is an input. To be more accurate, let's say $f$ is a 2-layer neural net with a 1-D output and 1-D input. Then, we can write $f$ as: $$ f(x) = tanh(xW_1 + b_1)W_2 + b $$ I tried deriving the formula for $\frac{\partial f}{\partial x}$ and got the following: $$ \frac{\partial f}{\partial x} = W_2^T\times (W_1^T\circ tanh'(xW_1 + b_1)) $$ where $\circ$ is the element-wise product. Now, I need to take its derivative w.r.t. $f$ itself. Any ideas on how to proceed from here?


Edit: As @NinadMunshi pointed out, I intend to find the following derivative after a variable change: $$ z = tanh(xW_1 + b_1)W_2 + b\\ g = \frac{\partial f}{\partial x} = W_2^T\times (W_1^T\circ tanh'(xW_1 + b_1))\\ \frac{\partial g}{\partial z} = ? $$

1

There are 1 best solutions below

0
On

$\textbf{Hint}$: Consider the case where $W_i,b_i\in\Bbb{R}$. Then the solution is simply

$$\frac{\partial^2 f}{\partial f \partial x} = -2W_1W_2\tanh(W_1x+b_1)$$

Now how would you adapt this answer from scalars to vector functions that act component wise?