Related to: Show posterior probability takes the form of the logistic function
I basically want to derive the sigmoid function from conditional and total probabilities.
In other words, I want to prove that:
$$ p(y=1 \mid x) = \frac{1}{1 + \exp(-(\beta_0 + \beta_1x))} $$
Given:
$X \mid Y=y_k \sim N(\mu_k, \sigma_x^2)$, namely, $$P(X \mid Y= y_k) = \frac{1}{\sqrt{2\pi\sigma_x^2}}\exp\left(-\frac{1}{2\sigma_x^2}(x-\mu_k)^2\right)$$
where $\beta_0$ and $\beta_1$ are weights, $y \in \{ 0, 1 \}$ and
$\mathbf{x}$ is a vector representing only 1 feature (independent variable), with the explanation that is found on the 7th page of this. Also see
snapshot.
I have managed to obtain what its author also obtained for the argument of the exponential function within the fraction:
$$ p(y=1|x) = \frac{1}{1 + \exp(a)} \\
a=\ln\left( \frac{1-p(y=1)}{p(y=1)} \right) + \frac{{\mu_0-\mu_1}}{\sigma^2_x}x+\frac{{\mu_1^2-\mu_0^2}}{2\sigma^2_x}$$
- Why is it assumed that $\frac{{\mu_0-\mu_1}}{\sigma^2_x}= \beta_1$
- What happens to $\frac{{\mu_1^2-\mu_0^2}}{2\sigma^2_x}$ Is it equal to $\beta_0$?If so, why?
- What happens to $ln(\frac{1-p(y=1)}{p(y=1)})$ Is it equal to 0? If so, why?
- Why is the variance that of all x values, why not separate it into two variances: one of all values of x if y=0 and the other of all x values if y=1?
I'd be grateful if the person who can answer the above three questions not only answer them as yes or no questions but also provide a mathematical walkthrough. Many thanks