Is Wilks' Theorem (the LRT is asymptotically chi-squared-distributed) not applicable to logistic regression because of no absolute continuity?

117 Views Asked by At

Wilks' Theorem is given in the source below as Theorem 12.4.2, p. 515. It is part of the chapter "Quadratic Mean Differentiable Families" where parametric families $\{P_{\theta}, \theta \in \Omega\}$ are considered and $\Omega$ is assumed to be an open subset of $\mathbb{R}^k$. In the beginning of the chapter (p. 492), it is assumed that each $P_{\theta}$ is absolutely continuous with respect to a $\sigma$-finite measure $\mu$, and set $p_{\theta} = dP_{\theta}/d\mu(x)$.

In logistic regression, the unknown variable is usually called $\beta$ instead of $\theta$ and $n$ observations of the form $\{X_i,Y_i\}$ are considered where $X_i \in \mathbb{R}^k$ and $Y_i \in \{0,1\}$. So far, I've been mistakenly thinking that the assumption that each $P_{\theta}$ is absolutely continuous is fulfilled in the logistic regression with

$$p_{\beta} = \frac{e^{X_i^T \beta Y_i}}{1 + e^{X_i^T \beta}}.$$

However, I've come to think about whether $p_{\beta}$ is a probability density function and realized that it is not since it does not integrate to $1$.

I noticed that $p_{\beta}$ comes from assuming that $Y_i$ is Bernoulli($p_i$)-distributed with $p_i = \frac{e^{ti}}{1 + e^{ti}}, t_i = X_i^T\beta $ and then we have that

$$p_{\beta}(y) = dP_{\beta}/d\mu(y) = p_i^{Y_i}(1-p_i)^{1-Y_i} = \frac{e^{X_i^T\beta Y_i}}{1 + e^{X_i^T \beta}}.$$

So, I've realized it is a probability mass function, not a probability density function and the distribution in consideration is thus not absolutely continuous.

Does that mean that Wilks' Theorem is not applicable to logistic regression?

Source: E.L. Lehmann and J. P. Romano, "Testing Statistical Hypotheses", Springer Science+Business Media, 2008. It is freely accessible here: https://sites.stat.washington.edu/jaw/COURSES/580s/582/HO/Lehmann_and_Romano-TestingStatisticalHypotheses.pdf

1

There are 1 best solutions below

9
On BEST ANSWER

No, the fact that your formula is for a probability mass function does not estop or rule out the use of Wilks's theorem here.

The easy-to-overlook boilerplate language "$P_\theta$ is absolutely continuous with respect to a $\sigma$-finite measure $\mu$, and set $p_\theta=dP_\theta/d\mu(x)$" is your friend here. In this case one usually picks counting measure for $\mu$, which assigns mass $1$ to each integer. It is not a probability measure, but is a $\sigma$-finite measure. You are right, Bernoulli random variables are discrete random variables, but their distributions are absolutely continuous with respect to counting measure, and it is this $P_\theta\ll\mu$ property that makes $p_\theta$ a density function with respect to $\mu$.

Your regression setup does not, however, match the iid hypotheses of the theorem stated in L&R, so you are not out of the woods yet. If you are willing to go to a "covariates" model, where the $(X_i,Y_i)$ are iid, and the marginal distribution of the $X_i$ is known, then Wilks's theorem would apply, but you lose the strict interpretation as a logistic regression.