I've been trying to understand the origin of the Logistic function in Logistic regression: $$\Pr(Y=1|x;\theta)=\frac{1}{1+e^{-\theta x}}$$ I was lead to beilive that one could somehow arrive at this purely from the Binomial Distribution and a Maximum Likelihood type argument, but I can't quite see it. Is seems that if one considers the Binomial Distribution as a member of the Exponential Family of distributions the Logit function arises as the "natural parameter", but I'm not quite sure as to the meaning or consequences of this.
To summarize:
- Is the logistic function "optimal" in some mathematical way, or is this just a convenient function? If it is, can one derive it from a Maximal Likelihood formulation?
- How (if at all) is all this connected to the Exponential Family?
I've searched around quite a lot, as well as tried to derive this myself, but so far no dice.
Any ideas?
Writing the probability mass function of a bernoulli random variable $X$ with parameter $\pi$ (let's simplify things here) and then introducing the exponential function \begin{eqnarray*} p \left( x ; \pi \right) & = & \pi^x \left( 1 - \pi \right)^{1 - x}\\ & = & \exp \left( x \log \left( \pi \right) + \left( 1 - x \right) \log \left( 1 - \pi \right) \right)\\ & = & \exp \left( x \log \left( \frac{\pi}{1 - \pi} \right) + \log \left( 1 - \pi \right) \right) \end{eqnarray*} From the above, we see that this is in the form of an exponential family with statistic $x$ and parameter $\log \left( \frac{\pi}{1 - \pi} \right)$ (the remaining $\log \left( 1 - \pi \right)$ is just a constant of integration).
It is a natural to write this in canonical form by making the transformation $\theta = \log \left( \frac{\pi}{1 - \pi} \right)$ in order to get something of the form $p \left( x ; \theta \right) = \exp \left( x \theta - c \left( \theta \right) \right)$. This new function is the logit function. If you would like to express the inverse relationship function you obtain the logistic transformation $$ \pi = \frac{1}{1 + \exp \left( - \theta \right)} $$ Regarding you two questions, and as far as I understand the issues: The logistic function rises from the Bernoulli distribution. That is as natural as you can can get. In that framework, the linear form $x \theta$ has the natural decomposition sufficient statistic vs. parameter. That is, taking the distribution of $n$ copies of $X$, the mean is a sufficient statistic for estimating $\theta$ and you do not need to go beyond that for the maximum likelihood estimator. I think this is the link you were looking for.