Derive $ \frac{1}{1 + exp(-(\beta_0 + \beta_1x))} $ from conditional and total probabilities

303 Views Asked by At

Related to: Show posterior probability takes the form of the logistic function

I basically want to derive the sigmoid function from conditional and total probabilities.

In other words, I want to prove that:

$$ p(y=1 \mid x) = \frac{1}{1 + \exp(-(\beta_0 + \beta_1x))} $$

Given: $X \mid Y=y_k \sim N(\mu_k, \sigma_x^2)$, namely, $$P(X \mid Y= y_k) = \frac{1}{\sqrt{2\pi\sigma_x^2}}\exp\left(-\frac{1}{2\sigma_x^2}(x-\mu_k)^2\right)$$
where $\beta_0$ and $\beta_1$ are weights, $y \in \{ 0, 1 \}$ and $\mathbf{x}$ is a vector representing only 1 feature (independent variable), with the explanation that is found on the 7th page of this. Also see snapshot.

I have managed to obtain what its author also obtained for the argument of the exponential function within the fraction:
$$ p(y=1|x) = \frac{1}{1 + \exp(a)} \\ a=\ln\left( \frac{1-p(y=1)}{p(y=1)} \right) + \frac{{\mu_0-\mu_1}}{\sigma^2_x}x+\frac{{\mu_1^2-\mu_0^2}}{2\sigma^2_x}$$

  1. Why is it assumed that $\frac{{\mu_0-\mu_1}}{\sigma^2_x}= \beta_1$
  2. What happens to $\frac{{\mu_1^2-\mu_0^2}}{2\sigma^2_x}$ Is it equal to $\beta_0$?If so, why?
  3. What happens to $ln(\frac{1-p(y=1)}{p(y=1)})$ Is it equal to 0? If so, why?
  4. Why is the variance that of all x values, why not separate it into two variances: one of all values of x if y=0 and the other of all x values if y=1?

I'd be grateful if the person who can answer the above three questions not only answer them as yes or no questions but also provide a mathematical walkthrough. Many thanks

1

There are 1 best solutions below

0
On BEST ANSWER
  1. Using $p$ for probabilities and $P$ for probability densities, $$\frac{p(Y=1|x)}{p(Y=0|x)}=\frac{P(X=x|Y=1)p(y=1)}{P(X=x|Y=0)p(y=0)}\\\propto\exp-\frac{(x-\mu_1)^2-(x-\mu_0)^2}{2\sigma_x^2}\propto\exp\frac{(\mu_1-\mu_0)x}{\sigma_x^2}.$$With the definition $\beta_1:=\frac{\mu_1-\mu_0}{\sigma_x^2}$, a constant $\beta_0$ exists for which $$\frac{p(Y=1|x)}{p(Y=0|x)}=\exp(\beta_0+\beta_1x)\implies p(Y=1|x)=\frac{1}{1+\exp-(\beta_0+\beta_1x)}.$$(By the way, you have a sign error in obtaining $\beta_1$ by inspection from $a$.) Introducing the symbol $\beta_1$ is just a handy abbreviation.
  2. We have $a=-\beta_0-\beta_1x$, so $\beta_0=\frac{\mu_0^2-\mu_1^2}{2\sigma_x^2}-\ln\frac{1-p(y=1)}{p(y=1)}$. Which leads us into your next question...
  3. That would be equivalent to $p(Y=1)=\frac12$. So, your inferences in questions 2/3 are both equivalent to this assertion.
  4. The rationale is that varying $Y$ only shifts the distribution of $X$ without also scaling it. (Were this not so, the probabilities of the two values for $Y$ would be a Gaussian function of $X$ instead.) The aim is to use the value of $X$ to update our Bayesian probability distribution for which of two values $Y$ has, by comparing the two conditional values of $P(x)$ with assumed equal variances (homoscedasticity). This is related to the homoscedasticity assumed in some tests of same-means null hypotheses. (Heteroscedasticity would require us to estimate the relative variances as part of such a test.)