In Machine Learning, sigmoid function is used to maximize the likelihood. [Right?]
$$a(x) = \frac{1}{1+e^{-x}}\quad\{Sigmoid\;Function\}$$
which will give me the probability of success, now it's used when you only have two classes to classify which is equivalent to the SoftMax function
Before defining the SoftMax function, let's just say that there is some linear function that gives every a class a score based on some inputs, lets call it $L$
$$p(class[i]) = \frac{e^{L(i)}}{e^{L(1)}+e^{L(2)}+e^{L(3)}+.....+e^{L(n)}}$$
Now I'm struggling to prove that they're the same
SoftMaxFunction : If Linear function scores are $Z_1,Z_2,...,Z_n$, then $$p(class[i]) = \frac{e^{Z_i}}{e^{Z_1}+e^{Z_2}+.....+e^{Z_n}}$$