I'm trying to deeply understand the math behind logistic regression. I think I'm almost there, but there is one little piece of the puzzle I haven't been able to fill in:
We assume a linear relationship between the predictor variables and the log-odds of the event that $Y=1$... $$l = \log_b\frac{p}{1-p} = \beta_0 + \beta_1x_1 + \beta_2x_x \dots$$
- Why can we assume that the log-odds is a linear combination of input variables
- Why can we assume that p is the probability of $Y=1$, rather than $Y=0$ (since the choice is binary).
I think all of these predictive models are fascinating. I just got into machine learning, and since I recently finished school, I thought it'd be cool to keep my math skills fresh by digging deeper into the math behind the models.
I'll answer your first bullet point. (I'm sure a previous question, possibly on another SE, asked it before, but I can't find it today.)
If the $Y=0$ and $Y=1$ populations have the same covariance matrix but different means for Normal distributions of $x$, Bayes's theorem implies a constant $b$ exists satisfying $\frac{p}{1-p}=b^l$. Exchanging $p$ with $1-p$ is equivalent to $b\mapsto\frac1b$. The appropriate value of $b$ depends on the difference in means.