I have to compute the following double derivative:
$$ \partial _{x_i} \nabla_W \sigma(f(W,x))$$
where $W = (W_1, W_2, \dots, W_L)$ is the set of weight matrices, $f(W,x)$ is a $linear$ neural network, hence $f(W,x) = xW_1W_2 \dots W_L$, the value $x_i$ is the $i$-th entry of the vector $x$ and $\sigma$ is the softmax activation function,
$$\sigma(x)_i = \frac {e^{x_i}}{\sum_j e^{x_j}}.$$
I know that $\frac{\partial \sigma}{\partial x}$ is a matrix $J_{i,j}(x) = \sigma(x)_i(\delta_{i,j} - \sigma_j(x))$, hence
$$\nabla_W \sigma(f(W,x)) = J(f(W,x))\cdot x $$
First of all, is it right?
So, now, how can I compute
$$\partial_{x_i} J(f(W,x))\cdot x ?$$
Let's assume $z = f(W,x)$ for an easy notation, my guess would be to compute the matrix $A = \partial_{x_i} J(z)$ s.t.
$$A_{i,j} = \sigma(z)_i(1-\sigma(z)_i (1-2\sigma(z)_i) \ \ \ \ \ \ \ \ \text{ if} \ \ i=j $$
$$A_{i,j} = \sigma(z)_i \sigma(z)_j(\sigma(z)_i + \sigma(z)_j \ \ \text{if} \ \ i\neq j $$
where $A$ is the second derivative of the softmax function.
Hence, $$\partial_{x_i}J(z)\cdot x = J(z)\cdot x^{(i)} + A\cdot \partial_{x_i}z \cdot x$$
I don't know wether it's right or not, and second I don't know how to keep going. Does someone have suggestions?
Thank you very much!!