Smooth approximation of maximum using softmax?

9k Views Asked by At

Look at the Wiki page for Softmax function (section "Smooth approximation of maximum"): https://en.wikipedia.org/wiki/Softmax_function

It is saying that the following is a smooth approximation to the softmax: $$ \mathcal{S}_{\alpha}\left(\left\{x_i\right\}_{i=1}^{n}\right) = \frac{\sum_{i=1}^{n}x_i e^{\alpha x_i}}{\sum_{i=1}^{n}e^{\alpha x_i}} $$

  • Is it an approximation to the Softmax?

    • If so, Softmax is already smooth; why do we create another smooth approximation?

    • If so, how do derive it from Softmax?

  • I don't see why this might be better than Softmax for gradien descent updates.

1

There are 1 best solutions below

8
On

This is a smooth approximation of maximum function:

$$ \max\{x_1,\dots, x_n\} $$

where $\alpha$ controls the "softness" of the maximum. The detailed explanation is available here: http://www.johndcook.com/blog/2010/01/13/soft-maximum/

Softmax is better then maximum, because it is smooth function, while $\max$ is not smooth and does not always have a gradient.