Derivative of the following function (similar to Softmax)

353 Views Asked by At

I am having a hell of time trying to differentiate the following function with respect to x. Do you have any suggestions

$f(x) = \frac{ w(i)^x}{ \sum\limits_{j} w(j)^x }$

where $w$ is a vector Basically I don't get how to handle the vector in the denominator. Any help would be appreciated.

Thanks!

Also follow up:

$g(\hat{x}) = \sum\limits_{i} a* \hat{x}(i)$

what would be the derivative with respect to $\hat{x}$.

Again, thanks so much, I come from a CS background so still trying to wrap my head around the calculus of neural networks.

3

There are 3 best solutions below

4
On

Note the following:

$ \frac{d}{dx} \left( \sum_{j=1}^{n} w(j)^x \right)$ = $\sum_{j=1}^{n} \left( \frac{d}{dx} \left( w(j)^x \right) \right)$ = $\sum_{j=1}^{n} \left(\ x\ w(j)^{x-1} \right)$ = $x \sum_{j=1}^{n} w(j)^{x-1}$

You can differentiate the function with either the product rule or the quotient rule - use the above when you do.

0
On

Note that $f(x) = \frac{g(x)}{h(x)}$, where $g(x) = w(i)^x$ and $h(x) = \sum_j w(j)^x$. You can compute $f'(x)$ using the quotient rule.

The key step is that \begin{align*} h'(x) &= \sum_j \frac{d}{dx} w(j)^x \\ &= \sum_j w(j)^x \log(w(j)). \end{align*}

(To compute the derivative of $y(x) = c^x$, where $c > 0$, you can note that $y(x) = e^{x \log c}$ and then use the chain rule, which yields \begin{align} y'(x) &= e^{x \log c} \log c \\ &= c^x \log c. \end{align})

0
On

As a warm-up exercise, consider the vector-valued function $$g=w^x=\exp(x\log(w))$$where exp() and log() are applied elementwise.

The differential of this function is $$\eqalign{ dg &= g\circ\log(w)\,dx \cr &= (g\circ b)\,dx \cr }$$where $\circ$ represents the Hadamard (elementwise) product.


Let's write your function in terms of $g$, then find its differential and derivative $$\eqalign{ f &= \frac{g}{g^T1} \cr df &= \frac{dg}{g^T1} - \frac{g\,(dg^T1)}{(g^T1)^2} \cr &= \frac{dg-f\,(dg^T1)}{g^T1} \cr &= \frac{g\circ b-f\,(g\circ b)^T1}{g^T1}\,dx \cr &= (f\circ b-ff^Tb)\,dx \cr\cr \frac{\partial f}{\partial x} &= \big({\rm Diag}(f)-ff^T\big)\,b \cr &= \big({\rm Diag}(f)-ff^T\big)\,\log(w) \cr }$$ If you just want the $i^{th}$ component of the vector result, multiply by the appropriate Cartesian basis vector $$\eqalign{ \frac{\partial f_i}{\partial x} &= e_i^T\,\big({\rm Diag}(f)-ff^T\big)\,\log(w) \cr }$$