In the context of a Bernoulli distribution, what exactly the relationship is between the linear predictor and the mean of the distribution function?

600 Views Asked by At

A Bernoulli distribution can be expressed as

$$ {\displaystyle f(k;p)=p^{k}(1-p)^{1-k}\quad {\text{for }}k\in \{0,1\}} $$

its mean is $p$. Let p = 0.7, the Bernoulli distribution can be expressed as $$ {\displaystyle f(k;0.7)=0.7^{k}(1-0.7)^{1-k}\quad {\text{for }}k\in \{0,1\}} \tag{1} $$ per wiki

The link function provides the relationship between the linear predictor and the mean of the distribution function.

for the Bernoulli distribution, the mean is p = 0.7

the logit function is in this form

$${\displaystyle \operatorname {logit} (p)=\log \left({\frac {p}{1-p}}\right)}$$

this equation $${\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln \left({\frac {\mu }{1-\mu }}\right)\,\!}$$ comes from Common distributions with typical uses and canonical link functions table on wiki.

for the case of Bernoulli distribution with p = 0.7

$$\mathbf {X} {\boldsymbol {\beta }} = \log \left({\dfrac {p}{1-p}}\right)$$

The Mean function $$ {\displaystyle \mu ={\frac {\exp(\mathbf {X} {\boldsymbol {\beta }})}{1+\exp(\mathbf {X} {\boldsymbol {\beta }})}}={\frac {1}{1+\exp(-\mathbf {X} {\boldsymbol {\beta }})}}\,\!} \tag{2} $$

also comes from Common distributions with typical uses and canonical link functions table on wiki.

for the case of Bernoulli distribution with p = 0.7

$$ {\displaystyle p ={\frac {\exp(\mathbf {X} {\boldsymbol {\beta }})}{1+\exp(\mathbf {X} {\boldsymbol {\beta }})}}={\frac {1}{1+\exp(-\mathbf {X} {\boldsymbol {\beta }})}}\,\!} \tag{3} $$

does this mean that Equation 2 would output 0.7 for a a Bernoulli distribution parameterized by Equation 1? If yes, what is the detailed procedure?

1

There are 1 best solutions below

2
On

First to clear things out:

If the response is $Y \sim \text{Normal}(\mu, \sigma^{2})$ we can model how the mean $\text{E}(Y) = \mu$ depends on some variables with equation: $$ \mu = \mathbf{x}^{T} \cdot \mathbf{\beta} $$ and because right hand side is a linear function of $\mathbf{\beta}$: $$ \mu = \beta_{0} + \beta_{1} \cdot x_{1} + ... + \beta_{m} \cdot x_{m} $$ we call this linear regression.

Now if the response is $Y \sim \text{Bernoulli}(\pi)$ we can model how the mean $\text{E}(Y) = \pi$ depends on some variables with equation: $$ \pi = \frac{\exp(\mathbf{x}^{T} \cdot \mathbf{\beta})}{1 + \exp(\mathbf{x}^{T} \cdot \mathbf{\beta})} $$ and because the right hand side is called (standard) logistic function or sigmoid: $$ S(x) = \frac{\exp(x)}{1 + \exp(x)} $$ we call this logistic regression. The inverse of $S(x)$ is a function $$ \text{logit}(x) = \ln\left( \frac{x}{1 - x} \right) $$ called logit because when $x = \text{Pr}(A)$ is probability of some event A this function returns logarithm of odds for the event A.

Now in this case, when the response is Bernoulli random variable, logit links the mean to linear predictor $\mathbf{x}^{T} \cdot \mathbf{\beta}$: $$ \text{logit}(\pi) = \text{logit}(S(\mathbf{x}^{T} \cdot \mathbf{\beta})) = \mathbf{x}^{T} \cdot \mathbf{\beta} $$ and we say that logit is the link function for Bernoulli response.

Now to answer your question:

Let's say we have only one explanatory random variable $X$ and we would like to know how mean $\pi$ depends on $X$: $$ \pi(X) = \text{Pr}(Y = 1 | X) $$ Using logistic regression: $$ \pi(X) = S(\mathbf{x}^{T} \cdot \mathbf{\beta}) $$ the function $S(\mathbf{x}^{T} \cdot \mathbf{\beta})$ will return an estimate of $\pi$ given value $x$ of $X$ and given an estimate of $\mathbf{\beta}$.

To give you detailed procedure and to simplify let's say we have no explanatory variables ($m = 0$), so $\mathbf{x}^{T} \cdot \mathbf{\beta}$ is just $\beta_{0}$ and model becomes: $$ \pi = S(\beta_{0}) = \frac{\exp(\beta_{0})}{1 + \exp(\beta_{0})} $$ To estimate $\beta_{0}$ we can use maximum likelihood (MLE) method and using Bernoulli probability mass function for sample size $n$ we get the likelihood in terms of $\pi$: $$ \mathcal{L}(\pi | \mathbf{y}) = \prod_{i = 1}^{n} \pi^{y_{i}} (1 - \pi)^{1 - y_{i}} \qquad \text{where } y_{i} \sim \text{iid Bernoulli}(\pi) $$ and taking into account our model we get the likelihood in terms of $\beta_{0}$: $$ \mathcal{L}(\beta_{0} | \mathbf{y}) = \prod_{i = 1}^{n} \left(\frac{\exp(\beta_{0})}{1 + \exp(\beta_{0})}\right)^{y_{i}} \left(1 - \frac{\exp(\beta_{0})}{1 + \exp(\beta_{0})}\right)^{1 - y_{i}} $$ using standard procedure of taking the logarithm on both sides to get log-likelihood and then differentiate the log-likelihood with respect to $\beta_{0}$ and equate to zero, we finally get an estimate: $$ \hat{\beta_{0}} = \ln\left(\frac{\frac{1}{n}\sum_{i=1}^{n} {y_{i}}}{1 - \frac{1}{n}\sum_{i=1}^{n} {y_{i}}}\right) $$ which is logit of $\frac{1}{n}\sum_{i=1}^{n} {y_{i}}$: $$ \hat{\beta_{0}} = \text{logit} \left( \frac{1}{n}\sum_{i=1}^{n} {y_{i}} \right) $$ so we see the logistic function $S(\beta_{0})$ will return $$ S(\hat{\beta_{0}}) = \frac{1}{n}\sum_{i=1}^{n} {y_{i}} $$ which is maximum likelihood estimator of $\pi$.

To answer your comment:

When using this linear model: $$ \text{logit}(\pi) = \mathbf{x}^{T} \cdot \mathbf{\beta} $$ we assume that logarithm of odds of $Y = 1$ is equal to $\mathbf{x}^{T} \cdot \mathbf{\beta}$ for some unknown vector of parameters $\mathbf{\beta}$.

Because the standard logistic function $S$ is the inverse of logit function, this formula: $$ S(\text{logit}(\pi)) = \pi $$ holds. In other words: the logistic function $S$ maps logarithm of odds $\text{logit}(\pi)$ to probability $\pi$.

In practice, this linear model $\mathbf{x}^{T} \cdot \mathbf{\beta}$ is only an approximation of $\text{logit}(\pi)$ and we can only estimate $\mathbf{\beta}$ from sample of values $y_{i}$ of Bernoulli response and corresponding values of explanatory variables $x_{i}$ to get $\mathbf{\hat{\beta}}$. So this formula: $$ \pi \approx \frac{\exp(\mathbf{x}^{T} \cdot \mathbf{\hat{\beta}})}{1 + \exp(\mathbf{x}^{T} \cdot \mathbf{\hat{\beta}})} $$ holds only approximately and the right hand side of this formula is called an estimator $\hat{\pi}$ of $\pi$.

Also if we have at least one explanatory variable $X$, this means that vector $x$ is not equal to a constant 1: $\mathbf{x} \neq 1$, then $\pi$ in this equation represents conditional mean: $$ \pi = \text{E}(Y | X) = \text{Pr}(Y = 1 | X) $$ and does not represent unconditional mean: $\text{E}(Y)$.