Proof of Theorem 4.16 from Mathematical Statistics by Jun Shao (Second Edition, Section 4.5.1, p.287)

157 Views Asked by At

First I would like to state the Theorem - it reads as follows: Let $X_{1}, \dotsc, X_{n}$ be i.i.d. from a p.d.f. $f_{\theta}$ w.r.t. a $\sigma$-finite measure $\nu$ on $(\mathcal{R},\mathcal{B})$ where $\mathcal{R}$ is supposed to represent the real line and $\mathcal{B}$ shall be the borel $\sigma$-field on $\mathcal{R}$. Further let $\theta \in \Theta$, where $\Theta$ is an open set in $\mathcal{R}^k$. Suppose that for any $x$ in the range of $X_{1}$, $f_{\theta}(x)$ is twice continuously differentiable in $\theta$ and satisfies $$\frac{\partial}{\partial \theta}\int\psi_{\theta}(x)d\nu = \int\frac{\partial}{\partial \theta}\psi_{\theta}(x)d\nu$$ for $\psi_{\theta}(x) = f_{\theta}(x)$ and $\psi_{\theta}(x) = \frac{\partial}{\partial \theta}f_{\theta}(x)$; The Fisher information matrix $$ I_{1}(\theta) = E\left\{\frac{\partial}{\partial \theta}\operatorname{log}f_{\theta}(X_{1})\left[\frac{\partial}{\partial \theta}\operatorname{log}f_{\theta}(X_{1})\right]^{\mathsf{T}}\right\}$$ is positive definite; and for any given $\theta \in \Theta$, there exists a positive number $c_{\theta}$ and a positive function $h_{\theta}$ such that $E\left[h_{\theta}(X_{1})\right] < \infty$ and $$\sup_{\gamma: \vert\vert\gamma-\theta\vert\vert<c_{\theta}}\left\vert\left\vert\frac{\partial^2\operatorname{log}f_{\gamma}(x)}{\partial \gamma\partial{\gamma}^{\mathsf{T}}}\right\vert\right\vert \leq h_{\theta}(x)$$ for all $x$ in the range of $X_{1}$, where $\left\vert\left\vert A \right\vert\right\vert = \sqrt{\operatorname{tr}\left(A^{\mathsf{T}}A\right)}$ for any matrix $A$. Let $\skew{4}\hat{\theta}_{n}$ be an estimator of $\theta$ (based on $X_{1}, \dotsc, X_{n}$) and suppose that $$ \left[V_{n}(\theta)\right]^{-\frac{1}{2}}\left(\skew{4}\hat{\theta}_{n}-\theta\right) \overset{d}{\to}\mathcal{N}_{k}(0, I_{k})$$ with $V_{n}(\theta) = \frac{V(\theta)}{n}$ and $V(\theta)$ is some positive definite matrix, $I_{k}$ represents the $k\times k$ identity matrix, and $\overset{d}{\to}$ indicates convergence in distribution. Let, for every $n$, $I_{n}(\theta)$ be the Fisher information about $\theta$ contained in $X_{1}, \dotsc, X_{n}$, then, given the above scenario, there exists a $\Theta_{0} \subset \Theta$ with Lebesgue measure $0$ such that $$V_{n}(\theta) \geq \left[I_{n}(\theta)\right]^{-1}$$ holds for $\theta \notin \Theta_{0}$.

The proof of the univariate case can be summarized as follows: We take a sequence of realizations $x = (x_{1},\dotsc,x_{n})$, take $\theta_{n} = \theta + n^{-\frac{1}{2}}\in\Theta$, and set $$K_{n}(x,\theta) = \frac{\left[\operatorname{log}l(\theta_{n}) - \operatorname{log}l(\theta) + \frac{I_{1}(\theta)}{2}\right]}{\left[I_{1}(\theta)^{\frac{1}{2}}\right]},$$ where $l()$ denotes the likelihood function. Under the given assumtions of the Theorem the first result is that $$K_{n}(X,\theta)\overset{d}{\to} \mathcal{N}(0,1).$$ Then let $P_{\theta_{n}}$ (or $P_{\theta}$) be the distribution of the sequence $X = (X_{1},\cdots,X_{n})$ under the assumtion that $X_{1}$ has p.d.f. $f_{\theta_{n}}$ (or $f_{\theta}$) and define $g_{n}(\theta) = \left\vert P_{\theta}\left(\skew{4}\hat{\theta}_{n} \leq \theta\right)-\frac{1}{2}\right\vert$. The next result is that there exists a subsequence $\{n_{k}\}$ and a $\Theta_{0} \subset \Theta$ with Lebesgue measure zero such that $$\lim_{k \to \infty}g_{n_k}(\theta_{n_{k}}) = 0, \quad \theta \notin \Theta_{0}.$$ Now, if we let $\Phi$ to represent the standard normal c.d.f. and assume that $\theta \notin \Theta_{0}$, it is shown in a next step that for $t>\left[I_{1}(\theta)\right]^{\frac{1}{2}}$, $$P_{\theta_{n}}\left(K_{n}(X,\theta)\leq t\right) \overset{n \to \infty}{\to} \Phi\left(t-\left[I_{1}(\theta)\right]^{\frac{1}{2}}\right).$$ Then this last result and the fact that $$\lim_{k \to \infty}g_{n_k}(\theta_{n_{k}}) = 0, \quad \theta \notin \Theta_{0}$$ imply that there is a subsequence $\{n_{j}\}$ such that for $j = 1,2,\dotsc,$ $$P_{\theta_{n_j}}\left(\skew{4}\hat{\theta}_{n_j} \leq \theta_{n_j}\right) < P_{\theta_{n_j}}\left(K_{n_j}(X,\theta) \leq t\right).$$ Then the Author further concludes that this above inequality and the Neyman-Pearson lemma implies that for $j = 1,2,\dotsc,$ $$P_{\theta}\left(\skew{4}\hat{\theta}_{n_j} \leq \theta_{n_j}\right) < P_{\theta}\left(K_{n_j}(X,\theta) \leq t\right).$$

Now finally here comes my question: How can we use the Neyman-Pearson lemma to arrive via the second last inequality at this last inequality?