Directional Derivative of Softmax

101 Views Asked by At

I'm trying to solve an exercise about computing derivatives with softmax, but I'm somehow stuck.

I have a deep neural network $f(W,x)$, where $W$ are the weights, and I have a fixed weight-vector $\vec W = \nabla_W \frac 1 2 \| f(W,x_i)-y\|$. Note that here $x_i$ is a fixed point. Assuming that $\sigma$ is the softmax function, the final output of my network will be $\sigma(f(W,x))$.

I have to compute the norm of the following derivative:

$$\partial_{\vec W} (\sigma(f(W,x_i) - \sigma(f(W,x_i+r))$$

Where $r$ is a very small vector. Is there a formula which allows me to compute the following?

What I'm doing is the following, but I'm somehow stuck:

$\partial_{\vec W} (\sigma(f(W,x) - \sigma(f(W,x+r)) \approx \partial_{\vec W}(\langle \nabla_x \sigma(f(W,x),r\rangle) = \langle \vec W,\nabla_{\vec W}(\langle\nabla_x\sigma(f(w,x),r\rangle)\rangle =$

$\sum_i r_i \langle \nabla_{\vec W} \sigma (f(W,x),\partial_{x_i}\nabla_{\vec W} \sigma (f(W,x) \rangle. \ \ \ $(1)

I thought about proceeding using Cauchy Schwarz both on the norm and on the expected value but I'm not sure on how to keep going,

How can I estimate the norm of (1)? Thank you, I'd really appreciate it!