Derivative for a softmax-like function

79 Views Asked by Bumbble Comm At 26 Mar 2026 - 1:26

Recently, an improvement to differentiable neural computer was proposed in this paper (Improving Differentiable Neural Computers Through Memory masking, De-Allocation, and Link Distribution Sharpness Control)

On page 4 there is an equation (2) that's used as an activation function for a vector $\textbf{d} \in [0,1] $

$$S(\textbf{d}, s)_i=\frac{(\textbf{d}_i)^s}{\sum_j{(\textbf{d}_j)^s}}$$

and where power $s \in [0, \infty)$

Could someone help find its derivative wrt $\textbf{d}_i$ and wrt $s$, for the Machine-Learning community? It's probably similar to the softmax derivative

Original Q&A

There are 1 best solutions below

Bumbble Comm On 21 Feb 2020 - 1:25 BEST ANSWER

I think you may find

$$\begin{array}{ccl} \dfrac{d}{d \mathbf{d}_i} S(\textbf{d}, s)_i & =&\dfrac{s}{\mathbf{d}_i} \left(S(\textbf{d}, s)_i - S(\textbf{d}, s)_i^2 \right) \\ \dfrac{d}{d \mathbf{d}_k} S(\textbf{d}, s)_i & =&-\dfrac{s}{\mathbf{d}_k} S(\textbf{d}, s)_i S(\textbf{d}, s)_k \qquad\text{ for } k \not=i \\ \dfrac{d}{d s} S(\textbf{d}, s)_i & =& S(\textbf{d}, s)_i\left(\log_e\left(\mathbf{d}_i\right) -\dfrac{\sum_j \mathbf{d}_j^s \log_e\left(\mathbf{d}_j\right)}{\sum_j \mathbf{d}_j^s } \right) \\ \end{array}$$

Derivative for a softmax-like function

There are 1 best solutions below

Related Questions in PARTIAL-DERIVATIVE

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions