what happen to softmax function when the input gets multiply by a very large scalar?

493 Views Asked by At

The softmax function is defined as $S: \mathcal{R}^n \to \mathcal{R}^n$, where $S(x)_i= \frac{e^{x_i}}{\sum_je^{x_j}}$. Now consider multiplying $x$ by a scalar $c$, $S(cx)=\frac{e^{cx_i}}{\sum_j e^{cx_j}}$. What happens when $c$ gets arbitarily large?

1

There are 1 best solutions below

0
On BEST ANSWER

For any fraction $a/b$ you know that $(a/b)^c$ = $a^c/b^c$, and that increasing the denominator makes the fraction smaller. You also know that $(a_1 +a_2 + \cdots +a_n)^c \geq a_1^c + a_2^c + \cdots a_n^c$ when all the $a_i \geq 0$. Putting these together with $0 \leq e^x \ \forall x\in {\mathbb R}$ you'll get: $$ \left( \frac{e^{x_i}}{\sum_j e^{x_j}} \right)^c = \frac{e^{cx_i}}{(\sum_j e^{x_j})^c} \leq \frac{e^{cx_i}}{\sum_j e^{cx_j}} $$

Now consider that the numerator in the softmax function is also always an element of the denominator. You can thus divide everything through by the numerator to get $$\frac{1}{1+ \sum_{j\not= i} e^{c(x_j-x_i)}} $$ Then, this will tend to $1$ as $c$ increases if $x_j \lt x_i$ for all $i\not=j$ (since the summed terms will be fractional and so decreasing) and will tend to $0$ if any $x_j > x_i$.