Second derivative of Feed-forward neural network output.

Question

Second derivative of Feed-forward neural network output.

184 Views Asked by Bumbble Comm At 28 Mar 2026 - 4:34

I would want to calculate the Jacobian and Hessian matrix of feed-forward neural network output with given input vector, $I$:

$$A=W_n \times tansig(W_{n-1} \times ... \times tansig(W_1 \times I + B_1)+ ... +B_{n-1})+B_n$$ Where

I is input vector
$W_i$ is weight matrix of layer $i$
$B_i$ is bias matrix of layer $i$
$tansig$ is activation function - $tansig(x) = \frac{1}{1 + e^{-2x}}-1$

By applying chain rule, we calculate Jacobian matrix as shown:

Let $f_1 = tansig(W_1 \times I + B_1)$

$f_2 = tansig(W_2 \times f_1 + B_2)$

$...$

$f_{n-1} = tansig(W_{n-1} \times f_{n-2} + B_{n-1})$

$$ \to A = W_n \times f_{n-1}(f_{n-2} ... (f_1)...)+B_n$$ $$ \to Jacobian(A) = W_n \times \frac{\partial f_{n-1}}{\partial f_{n-2}} \frac{\partial f_{n-2}}{\partial f_{n-3}}... \frac{\partial f_{1}}{\partial I}$$ The derivative of $f_i$ with respect to $f_{i-1}$ is: $$ \frac{\partial f_i}{\partial f_{i-1}} = diag(dtansig(W_i \times f_{i-1} + B_i) \times W_i$$ Where $dtansig$ is the first derivative of activation $tansig$ $$dtansig(x) = \frac{4e^{2-x}}{(1 + e^{-2x})^2}-1$$

Substituting the derivative of each $f_i$ into Jacobian matrix, we have:

$$ \to Jacobian(A) = W_n \times diag\bigl(dtansig(W_{n-1} \times f_{n-2} + B_{n-1})\bigr) \times W_{n-1} \times ...\times diag\bigl(dtansig(W_1 \times I + B_1)\bigr) \times W_1$$

Now, I am having very hard time to derive $Hessian(A)$. With your knowledge and expertise, can you please help me how to find out the Hessian matrix of given neural network output, $A$.

Thank you very much!

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-10-27 05:41:19

Disclaimer: I am giving it a try, but I may have made some mistakes..

First of all,

$\frac{dtansig(x)}{dx} = -2(T^2 + T)$ where T = tansig(x)

this is because tansig(x) = $\frac{e^{2x}-1-e^{2x}}{1+e^{2x}} = \frac{-1}{1+e^{2x}}$

and, $\frac{dtansig(x)}{dx} = \frac{2e^{2x} + 2 - 2}{(1+e^{2x})^2} = -2T -2T^2$

So, W' = $\frac{dloss}{dW}$ = (-2dout*($T+T^2$)).dot(X.T),

where,

'dout' is the gradient flowing backwards. (I used numpy notations here a bit- '*' means elementwise multiplication, T.dot(X) means matrix multiplication and X.T is the transpose of X)

and T = tansig(WX+b)

from this we can get,

$\frac{d(W')}{dW}$ = (-2dout*($-2T-2T^2)*(1+2T)$).dot(X.T)).dot(X.T) as T is tansig(WX+b), we have another (.).dot(X.T) here.

Hope it helps.

Second derivative of Feed-forward neural network output.

There are 1 best solutions below

Related Questions in PARTIAL-DERIVATIVE

Related Questions in NEURAL-NETWORKS

Related Questions in HESSIAN-MATRIX

Trending Questions

Popular # Hahtags

Popular Questions