For a vector $\mathbb{x}$, the softmax function $S:\mathbb{R}^d\times \mathbb{R}\rightarrow \mathbb{R}^d$ is defined as
$$ S(x;c)_i = \frac{e^{c\cdot x_i}}{\sum_{k=1}^{d} e^{c\cdot x_k}} $$
Consider if we scale the softmax with constant $c$, $$ S(x;c)_i = \frac{e^{c\cdot x_i}}{\sum_{j=1}^{d} e^{c\cdot x_j}} $$
Now since $e^x$ is an increasing and diverging function, as $c$ grows, $S(x)$ will emphasize more and more the max value. At $c \rightarrow \infty$, $S(x)$ outputs a one-hot vector with 1 at the position of the maximum element. Now this is my intuition, but how do I prove this?
Put $|\mathbb{x}| = n$ for convenience since this proof assumes $x \in\mathbb R^n$ for some $n$,
$$ S(x_i) = \frac{e^{x_i}}{\sum_{k=1}^{n} e^{x_k}} $$
scaling by constant $c$, $$ S(x_i, c) = \frac{e^{x_i (c)}}{\sum_{k=1}^{n} e^{x_k (c)}} $$
Let $\hat x = max(x_i)$ and divide multiply $S(x_i, c)$ by $\frac{e^{-\hat xc}}{e^{-\hat xc}}$:
$$\lim_{c \to \infty} S(x_i, c) = \lim_{c \to \infty} \frac{e^{-(\hat x -x_i) (c)}}{\sum_{k=1}^{n} e^{-(\hat x -x_k) (c)}}$$
Notice that $\Delta_i = \hat x -x_i > 0$ if $x_i \not = \hat x$ and $\hat \Delta = \hat x -x_i = 0$ if $x_i = \hat x$
$$\implies \lim_{c \to \infty} S(x_i, c) = \begin{cases} \lim_{c \to \infty} \frac{e^{-(\Delta_i) (c)}}{(\sum_{x_k \not = \hat x} e^{-(\Delta_k) (c)}) + 1} \text{, if $x_i \not = \hat x$}\\ \lim_{c \to \infty} \frac{1}{(\sum_{x_k \not = \hat x} e^{-(\Delta_k) (c)}) + 1} \text{, if $x_i = \hat x$} \end{cases} $$
$$\implies S(x_i, c) \to \begin{cases} \frac{0}{1} = 0 \text{, if $x_i \not = \hat x$}\\ \frac{1}{1} = 1 \text{, if $x_i = \hat x$} \end{cases}$$
as $c \to \infty$
Therefore, softmax $\to$ argmax as $x$ is scaled.