Does logistic regression not fulfill an inequality required for Wilks' Theorem or am I missing something?

53 Views Asked by At

The required inequality:

Wilks' Theorem is given in the source below as Theorem 12.4.2, p. 515. Before stating the inequality, some definitions are needed:

Let $Z_1, \dots, Z_n$ be i.i.d. according to q.m.d. family $\{P_{\theta}, \theta \in \Omega \}$ with derivative $\eta(z,\theta)$ and $\Omega$ is an open subset of $\mathbb{R}^k$. Assume each $P_{\theta}$ is absolutely continuous with respect to a $\sigma$-finite measure $\mu$, and $p_{\theta}(z) = dP_{\theta}(z)/d\mu(z).$

Define the score function $\tilde{\eta}(z,\theta)$ by

$$\tilde{\eta}(z,\theta) = \frac{2 \eta (z,\theta)}{p_{\theta}^{1/2}}$$

if $p_{\theta}(z) > 0$ and $\tilde{\eta}(z,\theta) = 0$ otherwise.

Consider testing the simple null hypotheses $\theta = \theta_0.$

Now, here's the inequality that seems to not hold:

For $\theta$ in a neighborhood of $\theta_0$ and a (measurable) function $M(z)$ with $\mathbb{E}_{\theta_0}[M(Z_i)] < \infty$, $$ |\text{log }p_{\theta}(z) - \text{log }p_{\theta_0}(z) - (\theta - \theta_0)\tilde{\eta}_{\theta_0}(z)| \leq M(z) |\theta - \theta_0|^2.$$

The above translated to logistic regression:

In logistic regression, we consider $n$ observations of the form $(Y_i, X_i)$ where $Y_i \in \{0,1\}$ and $X_i \in \mathbb{R}^k$. $\theta$ is $\beta \in \mathbb{R}^k$ and

$$p_{\beta}(y) = \frac{e^{X_i^T \beta Y_i}}{1+e^{X_i^T \beta}}.$$

One can check (using Theorem 12.2.2 in the source below), that it is a q.m.d. family with $\eta(y, \beta) = \frac{\dot{p}_{\beta}(y)}{2 p_{\beta}^{1/2}}$. That means that

$$\tilde{\eta}(\beta_0) = \frac{2 \eta (y,\theta)}{p_{\theta}^{1/2}} = \frac{\dot{p}_{\beta_0}}{p_{\beta_0}(y)}.$$

Consider the special case where $\beta_0 = 0$, $k = 1$ and $n = 1$.

Then,

$$p_{\beta}(y) = \frac{e^{X_1 \beta Y_1}}{(1 + e^{X_1 \beta Y_1})^2}$$

and

$$\dot{p}_{\beta}(y) = \frac{X_1 e^{X_1 \beta Y_1}(Y_1 + (Y_1 -1)e^{X_1 \beta})}{(1+e^{X_1 \beta})^2}$$

and

$$\tilde{\eta}(\beta_0) = X_1 Y_1 - \frac{X_1}{2}.$$

Thus, the inequality is

\begin{align*} &| \log(\frac{e^{X_1 \beta Y_1}}{1 + e^{X_1 \beta}}) - \log(\frac{1}{2}) - \beta (X_1 Y_1 - \frac{X_1}{2})| \leq M(z) \beta^2 \\ \iff & | X_1 \beta Y_1 - \log(1 + e^{X_1 \beta}) + \log(2) - \beta (X_1 Y_1 - \frac{X_1}{2})| \leq M(z) \beta^2 \end{align*}

Why I think the inequality does not hold:

The above condition brings up the question what $\mathbb{E}_{\beta_0}(M(Y_i))$ is and when it could be finite. Since $p_{\beta}$ is a probability mass function, it seems like $M(y) = C$ for some constant $C$ (see my previous question here; I am not sure if that is really the case though?).

It would need to hold for any $X_1 \in \mathbb{R}$ and $Y_1 \in \{0,1\}$. So, if it holds, then it should hold for the specific example $X_1 = 7$, $Y_1 = 1$. Thus, it would need to hold that

$$\frac{| \frac{7}{2} \beta + \log(\frac{2}{1 + e^{7 \beta}}) |}{\beta^2} \leq C. $$

Plotting the function $f(x) = \frac{| \frac{7}{2} x + \log(\frac{2}{1 + e^{7 x}}) |}{x^2}$ (or calculating $\lim_{x \rightarrow 0} f(x)$) reveals that it goes to infinity as $x$ approaches $0$ and the inequality does thus not hold.