How to find the gradient of $f(x) = \frac{\exp(x)}{1^T\exp(x)}$, where $x$ is a vector, $1$ is all-one vector, and $\exp(x)$ is componentwise?

44 Views Asked by At

How to find the gradient of $$f(x) = \frac{\exp(x)}{1^T\exp(x)},$$ where $x \in \mathbb{R}^n$ is a vector, $1$ is all-one vector, and $\exp(x)$ is componentwise?


I think I do not compute it correctly since gradient should be matrix.

my attempt:

Let $\alpha(x) = 1^T\exp(x)$ such that \begin{align} f(x) &= \frac{\exp(x)}{1^T\exp(x)} = \alpha(x)^{-1} \exp(x) \\ \end{align}

I started with a differential but then got confused how to proceed as I get something strange... \begin{align} df(x) &= d\alpha(x)^{-1} \exp(x) + \alpha(x)^{-1} \ d\exp(x)\\ &\stackrel{??}{=} -\alpha(x)^{-2} \exp(x) dx \exp(x) + \alpha(x)^{-1} \exp(x) dx \end{align}

Any suggestion how to proceed?

2

There are 2 best solutions below

0
On BEST ANSWER

Define the variables $$\eqalign{ e &= \exp(x), \qquad &E = {\rm Diag}(e), \qquad &de = e\odot dx &= E\,dx \\ \alpha &={\tt1}^Te,\qquad&&d\alpha={\tt1}^TE\,dx &= e^Tdx \\ }$$ where $\odot$ denotes the componentwise/Hadamard product, which is necessary when differentiating componentwise functions. It is often eliminated in favor of diagonal matrices.

Write the function using these variables. $$\eqalign{ f &= \alpha^{-1}e,\quad\quad F = {\rm Diag}(f) \\ }$$ Then calculate its differential and gradient. $$\eqalign{ df &= \alpha^{-1}de + e\,d\alpha^{-1} \\ &= \alpha^{-1}de - e\,\alpha^{-2}d\alpha \\ &= \alpha^{-1}E\,dx - \alpha^{-2}e(e^Tdx) \\ &= \left(F - ff^T\right)dx \\ \frac{\partial f}{\partial x} &= F - ff^T \\ }$$

1
On

Denote this components of $f$ as $f_i$, i.e.,

$$f(x) = \begin{pmatrix} f_1(x)\\\vdots\\ f_n(x)\end{pmatrix}$$

Take the derivative of each component w.r.t. each input,

$$\frac{\partial f_i}{\partial x_j} = \partial_{x_j} (\exp(x_i)/1^T\exp(x))$$

Break it into two cases: $i=j$ and $i\neq j$.

$i=j$ case:

$$\partial_{x_i}\left(\frac{\exp(x_i)}{\sum\limits_{k=1}^n\exp(x_k)}\right) = \frac{\exp(x_i)}{\sum\limits_{k=1}^n\exp(x_k)}\left(1 - \frac{\exp(x_i)}{\sum\limits_{k=1}^n\exp(x_k)}\right)=f_i(1-f_i)$$

$i\neq j$ case:

$$\partial_{x_j}\left(\frac{\exp(x_i)}{\sum\limits_{k=1}^n\exp(x_k)}\right) =\left(\frac{-\exp(x_j)\exp(x_i)}{\left(\sum\limits_{k=1}^n \exp(x_k)\right)^2}\right)=-f_jf_i$$