I know that the Fisher matrix is easily obtained from the Hessian matrix $I\left(\hat{\beta}\right)=-H\left(\hat{\beta}\right)$
Why is the covariance variance matrix the inverse of the Fisher information matrix?
I know that the Fisher matrix is easily obtained from the Hessian matrix $I\left(\hat{\beta}\right)=-H\left(\hat{\beta}\right)$
Why is the covariance variance matrix the inverse of the Fisher information matrix?
Copyright © 2021 JogjaFile Inc.
Let $l(z,\theta) = \ln p_{\theta}(z)$. As you said, $$ I(\theta) = -\mathbb{E}_{\theta}(\nabla^2_{\theta} l(Z,\theta)). $$ Calculating the Hessian, you have $$ \begin{align} \nabla^2_{\theta} l(z,\theta)) &= \nabla_{\theta}\left(\dfrac{1}{p_\theta(z)}\nabla_\theta p_\theta (z) \right)\\ &=\dfrac{-\nabla_\theta p_\theta(z)\nabla_\theta p_\theta(z)^T}{p_\theta(z)^2} + \dfrac{\nabla^2 _{\theta}p_\theta(z)p_\theta(z)}{p_\theta(z)^2}. \end{align} $$ But the left side of this equation is $$\dfrac{-\nabla_\theta p_\theta(z)\nabla_\theta p_\theta(z)^T}{p_\theta(z)^2} = -\nabla_\theta(\ln p_\theta (z))\nabla_\theta(\ln p_\theta (z))^T, $$ hence $$ \begin{align} I(\theta) &= -\mathbb{E}_{\theta}(\nabla^2_{\theta} l(Z,\theta))\\ &= \mathbb{E}_{\theta}(\nabla_\theta(\ln p_\theta (z))\nabla_\theta(\ln p_\theta (z))^T -\mathbb{E}_{\theta}(\dfrac{\nabla^2 _{\theta}p_\theta(z)}{p_\theta(z)})\\ &= Cov(\nabla_\theta(\ln p_\theta (z))) - \int\dfrac{\nabla^2 _{\theta}p_\theta(z)}{p_\theta(z)}p_\theta(z)dz \\ &=Cov(\nabla_\theta(\ln p_\theta (z))) - \int\nabla^2 _{\theta}p_\theta(z)dz \\ &\overset{*}{=}Cov(\nabla_\theta(\ln p_\theta (z))) - \nabla^2_{\theta}\int p_\theta(z)dz \\ & = Cov(\nabla_\theta(\ln p_\theta (z))) - \nabla^2_{\theta}(1) \\ &=Cov(\nabla_\theta(\ln p_\theta (z))) + 0 ,\\ \end{align} $$
where in $*$ you must use some extra condition that allow differentiation under the integral sign with respect to $\theta.$
Edit:
I think I didn't quite answer your question. Actually, we can show that (under some assumptions) the maximum likelihood estimator (MLE) $\hat{\theta}_n$is asymptotically normal with mean $\theta_p$ (the parameter you are estimating) and covariance matrix $I(\theta_p)^{-1}$. That is, $$\sqrt{n}(\hat{\theta}_n - \theta_p) \overset{d}{\rightarrow} N(0,I(\theta_p)^{-1}), $$
where $\overset{d}{\rightarrow}$ stands for convergence in distribution.
For some references, I suggest 'Asymptotic Statistics' by Aad van der Vaart; 'Theoretical Statistics:Topics for a Core Course' by R.W. Keener and these links: