Covariance, matrix inverse, and second order derivatives of log-likelihood

262 Views Asked by At

On page 35 of the book Analysis of survival data by D.R. Cox and D. Oakes, I see the following idea: the observed information matrix is defined as the matrix of minus the second derivatives of $l,$ the log likelihood of getting a certain value of parameters. (We are working with a parameter family of distributions, and want to perform hypothesis testing on these parameters $\phi$ (with is a vector) base on experiments.) If we write $v_{\omega \omega}$ for the leading submetrix of the inverse of the observed information matrix (here $\omega$ denotes the vector formed by components of $\phi$ of particular interest, and we extract submatrix at the rows/columns corresponding to these entries).

Then, the book says: $v_{\omega \omega}$ can be regarde as the estimated covariance matrix of $\hat \omega,$ the maximum-likelihood estimate of $\omega.$

Why is this the case?

1

There are 1 best solutions below

1
On BEST ANSWER

The basic logic is first derive the asymptotic distribution of MLE estimator, then use the so called 'information equality'.

Suppose we have $n$ iid samples $\left\{z_{i}\right\}_{1\leq i\leq n}$ each following true distribution with a pdf given by $g\left(z_{i}\right)$. As mentioned, we are working with a parameter family of distributions, further assume $g\left(\cdot\right)=f\left(\cdot,\phi_{0}\right)$, where function $f$ is known to us, and $\phi$ takes value in a compact set with $\phi_{0}$ being an interior point.

We derive MLE estimator from a moment condition perspective. Since $f\left(\cdot,\phi_{0}\right)$ is a pdf, we have $$\int f\left(z,\phi_{0}\right) dz=1,$$ differentiating both sides w.r.t. $\phi$ gives $$\int\frac{\partial f\left(z,\phi_{0}\right)}{\partial \phi}dz=0,$$ multiply and divide both sides by $f\left(z,\phi_{0}\right)$ gives (here we can choose the domain of integral to avoid the denominator becoming $0$, this won't affect the identity) $$\int\frac{\partial \log f\left(z,\phi_{0}\right)}{\partial \phi}f\left(z,\phi_{0}\right)dz=0, \tag{$\ast$}$$ which is (all expectations are taken w.r.t. the true probability measure $f\left(z,\phi_{0}\right)$) $$\mathbb{E}\left[\frac{\partial\log f\left(z,\phi_{0}\right)}{\partial \phi}\right]=0.$$ Apply the traditional sample analogue idea, we want to choose $\hat{\phi}$ such that $$\frac{1}{n}\sum_{i=1}^{n}\frac{\partial \log f\left(z_{i},\hat{\phi}\right)}{\partial \phi}=0,$$ clearly, this is same as FOC of fundamental MLE objective function. Assuming twice continuous differentiability of the log-likelihood, the (element-wise) mean-value theorem gives $$\frac{1}{n}\sum_{i=1}^{n}\frac{\partial \log f\left(z_{i},\phi_{0}\right)}{\partial \phi}+\left(\frac{1}{n}\sum_{i=1}^{n}\frac{\partial^{2}\log f\left(z_{i},\bar{\phi}\right)}{\partial \phi\partial \phi^{\top}}\right)\left(\hat{\phi}-\phi_{0}\right)=0,$$ where $\bar{\phi}$ lies 'between' $\hat{\phi}$ and $\phi_{0}$. Multiplying by $\sqrt{n}$ and solving for $\sqrt{n}\left(\hat{\phi}-\phi_{0}\right)$ gives $$\sqrt{n}\left(\hat{\phi}-\phi_{0}\right)=-\left(\frac{1}{n}\sum_{i=1}^{n}\frac{\partial^{2}\log f\left(z_{i},\bar{\phi}\right)}{\partial \phi\partial \phi^{\top}}\right)^{-1}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{\partial \log f\left(z_{i},\phi_{0}\right)}{\partial \phi}.$$ Denote $H=\mathbb{E}\left[\frac{\partial^{2}\log f\left(z_{i},{\phi}_{0}\right)}{\partial \phi\partial \phi^{\top}}\right]$ be the negative information matrix, and $V=\mathbb{E}\left[\frac{\partial \log f\left(z_{i},\phi_{0}\right)}{\partial \phi}\frac{\partial \log f\left(z_{i},\phi_{0}\right)}{\partial \phi^{\top}}\right]$ be the variance matrix. A mixture use of LLN, continuous mapping theorem, and CLT gives the following results $$\left(\frac{1}{n}\sum_{i=1}^{n}\frac{\partial^{2}\log f\left(z_{i},\bar{\phi}\right)}{\partial \phi\partial \phi^{\top}}\right)^{-1}\overset{\mathbb{P}}{\to}H^{-1} \quad \text{ and }\quad \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{\partial \log f\left(z_{i},\phi_{0}\right)}{\partial \phi}\overset{d}{\to}\mathcal{N}\left(0,V\right),$$ hence by Slutsky theorem, $$\sqrt{n}\left(\hat{\phi}-\phi_{0}\right)\overset{d}{\to}\mathcal{N}\left(0,H^{-1}VH^{-1}\right).$$ What remains is to prove $H^{-1}VH^{-1}=-H^{-1}$. We prove this by showing $H=-V$, recall equation ($\ast$), differentiate both sides w.r.t. $\phi^{\top}$ gives $$\int\left(\frac{\partial^{2}\log f\left(z,\phi_{0}\right)}{\partial \phi\partial\phi^{\top}}f\left(z,\phi_{0}\right)+\frac{\partial \log f\left(z,\phi_{0}\right)}{\partial \phi}\frac{\partial \log f\left(z,\phi_{0}\right)}{\partial \phi^{\top}}f\left(z,\phi_{0}\right)\right)dz=0,$$ which exactly implies $H+V=0$ as required.