Also the framework is kind of annoying, my questions at the end of the text should be rather standard:
Framework
I am trying to understand the proof for the asymptotical normality of the maximum likelihood estimator in a GLM-setting, which is given in the Book From Finite Sample to Asymptotic Methods in Statistics by de Lima, Singer and Sen in chapter 10.5 and have some problems following their arguments.
For the exponential family of distributions they derived \begin{align*} -\frac{\partial^2 }{\partial \beta \partial \beta'} \log L_n(\beta) & = \sum_{i=1}^N \{g'(\mu_i(\beta))\}^{-2}[v_i(\beta)]^{-1} x_ix_i'\\ &\quad + \sum_{i=1}^N(Y_i-\mu_i(\beta))\{\frac{g''(\mu_i(\beta)}{[g'(\mu_i(\beta)]^2}+\frac{b'''(h(x_i'\beta))}{[g'(\mu_i(\beta)]^2[v_i(\beta)]^3}\}x_ix_i' \\ = I_n(\beta)+r_n(\beta) \end{align*} where $\log L_n(\beta)$ is the log-Likelihood and $$I_n(\beta) =\sum_{i=1}^N \{g'(\mu_i(\beta)\}^{-2}[v_i(\beta)]^{-1} x_ix_i'=\mathbb{E}(-\frac{\partial^2 }{\partial \beta \partial \beta'} \log L_n(\beta) ) $$ is the Information matrix.
They then note that $r_n(\beta)$ is a sum of independent centered random variables with finite variances $v_i(\mu_i(\beta))$ and nonstochastic matrix coefficients $G_i := \{\frac{g''(\mu_i(\beta)}{[g'(\mu_i(\beta)]^2}+\frac{b'''(h(x_i'\beta))}{[g'(\mu_i(\beta)]^2[v_i(\beta)]^3}\}x_ix_i'$
Among other things they assume (I already corrected some obvious mistakes):
- (10.5.31) $$\lim_{n\to \infty}\frac{1}{n^2}\sum_{i=1}^n v_i(\mu_i(\beta))trace(G_iG_i')=0$$
Define $B(\delta)= \{\beta^*\in \mathbb{R}^q : ||\beta^*-\beta||<\delta\}$, $w_{1i}:= \{g'(\mu_i(\beta)\}^{-2}[v_i(\beta)]^{-1}$ and
$w_{2i}:=\mu_i(\beta)\{\frac{g''(\mu_i(\beta)}{[g'(\mu_i(\beta)]^2}+\frac{b'''(h(x_i'\beta))}{[g'(\mu_i(\beta)]^2[v_i(\beta)]^3}\}$
we then assume that we have as $\delta\to 0$:
- (10.5.33) $$\sup_{\beta^*\in B(\delta)}n^{-1}\sum_{i=1}^N || [w_{ki}(\beta^*)-w_{ki}(\beta)] x_ix_i'|| \to 0, \quad k=1,2$$
- (10.5.34) $$\mathbb{E}\{\sup_{\beta^*\in B(\delta)}n^{-1}\sum_{i=1}^n|Y_i|||[w_{2i}(\beta^*)-w_{2i}(\beta)] x_ix_i'||\} \to 0$$
Questions:
- They say that it follows from (10.5.31) and the Chebyshev inequality, that $$n^{-1}r_n(\beta) = o_p(1)$$ How can I see this?!
- In their proof, at (10.5.38), they claim that it follows from (10.5.33) and (10.5.34) that $$\sup_{\beta^* \in B(K/\sqrt{n})} || \frac{1}{n} \frac{\partial^2 }{\partial \beta \partial \beta'} \log L_n(\beta)|_{\beta^*} - \frac{1}{n}\frac{\partial^2 }{\partial \beta \partial \beta'} \log L_n(\beta)|_{\beta}||=o_p(1)$$ I am not able to see this neither.
Any help is greatly appreciated.
I repeat/generalize/simplify your first question to make it easier for others to respond:
You want to show that with the random matrix $y_i^* = y_i G_i$ you have $$\frac{1}{n}\sum_{i=1}^n y_i^* = o_p(1).$$
Unfortunately I am not really familiar with the techniques used to proof such statements in the random-matrix-case.
But your assumption (the trace term) let one think about using the Frobenius-norm.
Edit: actual, the proof of the first statement is a lot easier than expected: note that we have $E(\frac{1}{n}\sum_{i=1}^n y_i^*) = 0$. Chose some arbitrary $1\leq k\leq q$ and $1\leq l\leq q$ and look at the $kl$-th variable of the matrix $\frac{1}{n}\sum_{i=1}^n y_i^*$, i.e. look at $$z_{kl,n}= \frac{1}{n}\sum_{i=1}^ny_i {G_i}_{,kl}$$ We then have $E(z_{kl,n})=0$ and $$var(z_{kl,n}) = \frac{1}{n^2}\sum_{i=1}^n v_i(\mu_i(\beta)) {G_i}_{,kl}^2$$
But obviously we have $$v_i(\mu_i(\beta)) {G_i}_{,kl}^2\leq v_i(\mu_i(\beta)) ||G_i||_F^2 = v_i(\mu_i(\beta)) trace(G_iG_i')$$ and hence $$var(z_{kl,n}) \leq \frac{1}{n^2} \sum_{i=1}^n v_i(\mu_i(\beta)) trace(G_iG_i').$$ Now, choose any $\varepsilon>0$ by the Markov inequality we have: $$P(|z_{kl,n}|>\varepsilon) = P(|z_{kl,n}|^2>\varepsilon^2) \leq \frac{var(z_{kl,n})}{\varepsilon^2}\leq \frac{\frac{1}{n^2} \sum_{i=1}^n v_i(\mu_i(\beta)) trace(G_iG_i')}{\varepsilon^2}\to 0$$
The assertion now follows since $k,l$ and $\varepsilon>0$ were arbitrary.
This should answere your first question.