How does the triangle inequality yield a step of a proof?

66 Views Asked by At

The step of the proof is: Fix any $c > 0$ and define

$$\epsilon_{n,c} := \underset{|h| \leq c}{\sup} | \log(L_{n,h}) - [\langle h,Z_n \rangle - \frac{1}{2} \langle h, I(\theta_0)h \rangle ] | .$$

Then, by the triangle inequality,

$$2 \log(L_{n,\hat{h}_n}) \leq 2 [\langle \hat{h}_n, Z_n \rangle - \frac{1}{2} \langle \hat{h}_n, I(\theta_0) \hat{h}_n \rangle + \epsilon_{n,c}],$$

if $|\hat{h}_n| \leq c$.

This is a step of the proof of Wilks' Theorem (Theorem 12.4.2, p. 525 in the source stated below). How does the triangle inequality yield it?

Idea: By definition of $\epsilon_{n,c}$, it holds that

$$\epsilon_{n,c} \geq | \log(L_{n,\hat{h}_n}) - [\langle \hat{h}_n,Z_n \rangle - \frac{1}{2} \langle \hat{h}_n, I(\theta_0)\hat{h}_n \rangle ] |.$$

if $|\hat{h}_n| \leq c$.

Now, by the triangle inequality, it holds that

$$| \log(L_{n,\hat{h}_n}) - [\langle \hat{h}_n,Z_n \rangle - \frac{1}{2} \langle \hat{h}_n, I(\theta_0)\hat{h}_n \rangle ] | \leq \log(L_{n,\hat{h}_n}) + \langle \hat{h}_n,Z_n \rangle - \frac{1}{2} \langle \hat{h}_n, I(\theta_0)\hat{h}_n \rangle$$

and, by the reverse triangle, inequality it holds that

$$| \log(L_{n,\hat{h}_n}) - [\langle \hat{h}_n,Z_n \rangle - \frac{1}{2} \langle \hat{h}_n, I(\theta_0)\hat{h}_n \rangle ] | \geq \Bigl| | \log(L_{n,\hat{h}_n})| - |\langle \hat{h}_n,Z_n \rangle - \frac{1}{2} \langle \hat{h}_n, I(\theta_0)\hat{h}_n| \Bigl|.$$

However, I don't see how either of those yield

$$\log(L_{n,\hat{h}_n}) \leq \langle \hat{h}_n, Z_n \rangle - \frac{1}{2} \langle \hat{h}_n, I(\theta_0) \hat{h}_n \rangle + \epsilon_{n,c}$$

which would yield the step of the proof.

I believe you don't need the definitions of the variables to answer my question, however, here they are for completeness:

Let $X_1, \dots, X_n$ be i.i.d. according to a q.m.d. (quadratic mean differentiable) family $\{P_{\theta}, \theta \in \Omega \}$ with derivative $\eta(x,\theta)$ and $\Omega$ is an open subset of $\mathbb{R}^k$. Assume each $P_{\theta}$ is absolutely continuous with respect to a $\sigma$-finite measure $\mu$, and set $p_{\theta}(x) = dP_{\theta}(x)/d\mu(x).$ Suppose the Fisher information matrix $I(\theta_0)$ is positive definite. Define the likelihood function $L_n(\cdot)$ by

$$L_n(\theta) = \prod_{i=1}^n p_{\theta}(X_i).$$

Define the score function $\tilde{\eta}(x,\theta)$ by

$$\tilde{\eta}(x,\theta) = \frac{2 \eta (x,\theta)}{p_{\theta}^{1/2}}$$

if $p_{\theta}(x) > 0$ and $\tilde{\eta}(x,\theta) = 0$ otherwise. Also, define the normalized score vector $Z_n$ by

$$Z_n = Z_{n,\theta_0} = n^{-1/2} \sum_{i=1}^n \tilde{\eta}(X_i,\theta_0).$$

Fix $\theta_0$ and consider the likelihood ratio $L_{n,h}$ defined by

$$L_{n,h} = \frac{L_n(\theta_0 + hn^{-1/2})}{L_n(\theta_0)} = \prod_{i=1}^n \frac{p_{\theta_0 + h n^{-1/2}}(X_i)}{p_{\theta_0}(X_i)}.$$

Suppose $\hat{\theta}_n$ is an efficient estimator for $\theta$ assuming $\theta \in \Omega$. Define the likelihood ratio $R_n = L_n(\hat{\theta}_n)/L_n(\theta_0)$.

Define $\hat{h}_n := n^{1/2}(\hat{\theta}_n − \theta_0)$ so that $2 \log(R_n) = 2 \log(L_{n,\hat{h}_n})$.

Source: E.L. Lehmann and J. P. Romano, Testing Statistical Hypotheses, Springer Science+Business Media, 2008. It is freely accessible here: https://sites.stat.washington.edu/jaw/COURSES/580s/582/HO/Lehmann_and_Romano-TestingStatisticalHypotheses.pdf

1

There are 1 best solutions below

1
On BEST ANSWER

Ignoring all the specific notation, the quantity $\epsilon_{n,c}$ has a definition of the form $$\epsilon = \sup_h\left\vert a(h)-b(h)\right\vert,$$ where $a(h)$ and $b(h)$ are two other quantities depending on $h$.

In these terms, the next inequality in the proof is (up to multiplication by 2) of the form $$a(\hat{h})\leq b(\hat{h})+\epsilon,$$ with $\hat{h}$ one of the possible parameters in the supremum.

I don't really see the relevance of the triangle inequality, since this type of estimate is more direct: \begin{align*} a(\hat{h})&=b(\hat{h})+(a(\hat{h})-b(\hat{h})) \\&\leq b(\hat{h}) + \left\vert a(\hat{h})-b(\hat{h})\right\vert \\&\leq b(\hat{h}) + \sup_h\left\vert a(h)-b(h)\right\vert \\&= b(\hat{h})+\epsilon, \end{align*} using only the basic properties of the absolute value and supremum.