Extension of binary classification to multi-class classification

Question

Extension of binary classification to multi-class classification

294 Views Asked by user494522 At 30 Mar 2026 - 10:34

Multi-class classification is a generalization of logistic regression wherein we are dealing with binary classification. The latter problem is a setting where a number should be mapped to either $0$ or $1$. Hence, Logistic regression needs to convert the output of a neural network ($\hat{y}$) to to either $0$ or $1$ to decide. Therefore it use the sigmoid function defined as $$\sigma(\hat{y})=\frac{e^\hat{y}}{1+ e^\hat{y}}\tag{1}$$ where $\hat{y} \in \mathbb{R}$ is the output of the network.

On the other hand, Multi-class classification uses the softmax function to decide which is defined as

$$\text{softmax}(\hat{\textbf{y}})=\frac{e^{\hat{\textbf{y}}_i}}{\sum_{i=1}^{n}e^{\hat{\textbf{y}}_i}}\tag{2}$$ where $\hat{\textbf{y}} \in \mathbb{R}^n$ is the output of the network.

Question: How can we can play with $(1)$ to get $(2)$ algebraically or vice versa? If we start with $(1)$ how one can get rid of $1$ in denominator? or if we start with $(2)$ how we can generate $1$ in the denominator where $n=2$.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

softmax is a function from $\mathbb{R}^n \to \mathbb{R}^n$ and gives "probabilities" for each of $n$ alternatives.

$\sigma$ is a function from $\mathbb{R} \to \mathbb{R}$ and gives "probability" for one alternative. But of course, it is really choosing between two alternatives. It's just that, if there are only two alternatives, giving only one value is sufficient because probabilities must sum to $1$. I.e., if $\sigma(y)$ is the probability for one choice, then the other choice must have probability $1 - \sigma(y) = \sigma(-y)$.

Once you see that, then here is the "equivalence" (square brackets $[~]$ denote array or vector):

$$\begin{array}{} softmax( [y/2, -y/2] ) &=& \displaystyle [ {e^{y/2} \over e^{y/2} + e^{-y/2}}, {e^{-y/2} \over e^{-y/2} + e^{-y/2}} ]\\ &=& \displaystyle [ {e^y \over e^y + 1}, {e^{-y} \over e^{-y} + 1}] \\ &=& [ \sigma(y), \sigma(-y) ] \end{array} $$

So for $n=2$, softmax really is the same as $\sigma$, except with $y$ rescaled.

Extension of binary classification to multi-class classification

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions