Why can the sigmoid function seen as an estimation for a probability?

Question

Why can the sigmoid function seen as an estimation for a probability?

1.5k Views Asked by Bumbble Comm At 26 Mar 2026 - 7:41

In this video the speaker says "we could interpret the activation of a neuron as estimating the probability that some input $\mathbf{x}$ belongs to the class one".

I get that the sigmoidal function $\sigma(x)$ is in $[0,1]$ but yet it isn't an actual probability distribution but a logistic distribution. I'm a bit confused about that because it's not too rare that one sees something like

\begin{align*} p(y = 1\mid x) &= \frac{1}{1+\exp(-x)} \\ &= \sigma(x) \end{align*}

My question is if somebody could explain to me why this "estimate" indeed does make sense.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

We have (as noted in comments) $$ \operatorname{logit}(p) = \log \frac p {1-p} \quad \text{for } 0<p<1. $$ This approaches $+\infty$ as $p\uparrow1$ and approaches $-\infty$ as $p\downarrow0$.

We will observe many independent copies of a random variable $Y$ that is always equal to $0$ or $1$.

Let $H$ be some hypothesis under which the probability that $Y=1$ is larger than the corresponding probability under the hypothesis that $(\text{not }H)$, i.e. we have $$ \Pr(Y=1\mid H) > \Pr(Y=1\mid \text{not } H). $$ Then each time we observe $Y=1$, the probability of $H$ increases and each time we observe $Y=0$, the probability of $H$ decreases. Now observe the identity $$ \operatorname{logit} \Pr(H\mid Y=1) = \operatorname{logit} \Pr(H) + \log \frac{\Pr(Y=1\mid H)}{\Pr(Y=1 \mid \text{not } H)}. $$ Thus every time we see $Y=1$, the logit of the probability of the hypothesis increases by the SAME amount. And every time we see $Y=0$, the logit decreases by the same amount (but not in general the amount by which it increases when we see $Y=1$).

Since the logit function is the inverse of the logistic function, every increase of $1$ unit in the value of the variable you called $x$ also causes the logit of the probability of $H$ to increase by the same amount. The model is $$ \operatorname{logit}\Pr(Y=1) = \alpha + \beta x. $$ This makes the value of $x$ increase at the same rate as the number of times we observe $Y=1$.

More later maybe$\,\ldots$

Why can the sigmoid function seen as an estimation for a probability?

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions