This is a basic question (I think). I am trying to grasp the idea behind this example, where we define a "logistic function" and use that to work towards the maximum likelihood estimate (MLE).
We have a sample of $n$ people, with recorded data $(x_1,y_1),\ldots,(x_n,y_n)$, where $x_i$ is the BMI and $y_i\in\{0,1\}$ indicates whether the individual has some disease. We model the $x_i$ as non-random, and the $y_i$ as absolute values of random variables $Y_i$, where $$Y_i\sim\mathrm{Bernoulli}\big( f(x_i;\beta_0,\beta_1)\big).$$ Here, $f$ is a logistic function $$f(x;\beta_0,\beta_1)=\frac{\exp(\beta_0+\beta_1x)}{1+\exp(\beta_0+\beta_1x)}.$$
We then observe that if $\beta_0+\beta_1x$ is "very positive" then $Y_i$ is "likely to be $1$", and if $\beta_0+\beta_1x$ is "very negative" then $Y_i$ is "likely to be $0$".
My question is, what is a "logistic function" in this context, and why is it useful? I have a feeling that $f$ is supposed to describe the way BMI affects the probability of this disease. However, assuming that the relationship is in the form $\frac{\exp\cdot}{1+\exp\cdot}$, and depends linearly on precisely two parameters $\beta_0$ and $\beta_1$, seems like a huge leap, and one that needs to be justified somehow.
The rest of the example is copied below, just in case it is relevant to my question (I do not think it is).
Observe that $$1-f(x_i;\beta_0,\beta_1)=1/(1+\exp(\beta_0+\beta_1x_i),$$ so $$\frac{P_{\beta_0,\beta_1}(Y_i=1)}{P_{\beta_0,\beta_1}(Y_i=0)}=\frac{f(x;\beta_0,\beta_1)}{1-f(x;\beta_0,\beta_1)}=\exp(\beta_0+\beta_1x_i).$$ The likelihood function is \begin{align} L(\beta_0,\beta_1) &= P_{\beta_0,\beta_1}(Y_1=y_1,\ldots,Y_n=y_n)\\ &=\prod_i P_{\beta_0,\beta_1}(Y_i=y_i)\\ &=\prod_i f(x_i;\beta_0,\beta_1)^{y_i}\big(1-f(x_i;\beta_0,\beta_1)\big)^{1-y_i}. \end{align} In this case, the MLE has to be computed numerically. If $\hat{\beta_1}$ is large and positive, the model is appropriate; then if you recorded $x_{n+1}$ being large, this indicates a high probability that $Y_{n+1}$ is $1$. This does not imply a causal relationship.
Probabilities are restricted to being in the interval $[0,1]$, which means they can lead to unacceptable results if used in extrapolating linear regression, which can suggest answers in the range $(-\infty, +\infty)$
Meanwhile, suppose you want to say that doubling the probability of some outcome from $1\%$ to $2\%$ is roughly as substantial an effect as doubling the probability from $0.1\%$ to $0.2\%$: this might suggest taking the logarithm of the probability.
But you cannot do the same thing to an original probability of $98\%$ as doubling would lead to a nonsense $196\%$ probability. But it might make sense to say the equivalent was more like going to $99\%$, i.e. halving the probability of the outcome not happening.
One approach to deal with this is to take the odds $\dfrac{p}{1-p}$ which makes a $1\%$ probability have odds of $\dfrac{1}{99}$ and a $99\%$ probability have odds of $\dfrac{99}{1}$, i.e. reciprocals, and then take the logarithm of the odds (sometimes galled the logit function). Log-odds of outcomes can take any value in $(-\infty, +\infty)$ so long as the event has a probability which is neither $0$ nor $1$.
You could then do linear regression with these log-odds, and might get results of the form $\hat y = \beta_0+\beta_1 x$. But you presumably now want to turn this back into probabilities. You can use $y=\log\left(\dfrac{p}{1-p}\right) \implies p=\dfrac{\exp(y)}{1+\exp(y)}$ and so you get your logistic expression $$\hat p=\dfrac{\exp(\beta_0+\beta_1 x)}{1+\exp(\beta_0+\beta_1 x)}$$