Recently, an improvement to differentiable neural computer was proposed in this paper (Improving Differentiable Neural Computers Through Memory masking, De-Allocation, and Link Distribution Sharpness Control)
On page 4 there is an equation (2) that's used as an activation function for a vector $\textbf{d} \in [0,1] $
$$S(\textbf{d}, s)_i=\frac{(\textbf{d}_i)^s}{\sum_j{(\textbf{d}_j)^s}}$$
and where power $s \in [0, \infty)$
Could someone help find its derivative wrt $\textbf{d}_i$ and wrt $s$, for the Machine-Learning community? It's probably similar to the softmax derivative
I think you may find
$$\begin{array}{ccl} \dfrac{d}{d \mathbf{d}_i} S(\textbf{d}, s)_i & =&\dfrac{s}{\mathbf{d}_i} \left(S(\textbf{d}, s)_i - S(\textbf{d}, s)_i^2 \right) \\ \dfrac{d}{d \mathbf{d}_k} S(\textbf{d}, s)_i & =&-\dfrac{s}{\mathbf{d}_k} S(\textbf{d}, s)_i S(\textbf{d}, s)_k \qquad\text{ for } k \not=i \\ \dfrac{d}{d s} S(\textbf{d}, s)_i & =& S(\textbf{d}, s)_i\left(\log_e\left(\mathbf{d}_i\right) -\dfrac{\sum_j \mathbf{d}_j^s \log_e\left(\mathbf{d}_j\right)}{\sum_j \mathbf{d}_j^s } \right) \\ \end{array}$$