I am struggling to understand the relationship between the Fisher Information and the Variance.
So far, what I understand:
- Given a specific choice of Probability Distribution Function, the partial derivative of the Natural Logarithm of the corresponding Likelihood Function is called the Score Function
- If we square the Score Function and take its Expected Value - this is the Fisher Information (note: when there are multiple parameters, the Fisher Information will be a Matrix)
Now, the important result from the above, is that apparently:
- The (Negative) Inverse of The Fisher Information is equal to Variance
As an example, suppose you successfully evaluate the Fisher Information and have a Matrix containing the Fisher Information for all parameters (i.e. if the original Probability Distribution Function has "p" parameters, this will be a "p x p" Matrix) - if you can somehow manage to take the Inverse of this Matrix, the diagonal components of this matrix will contain the Variance Formulae for each of these parameters.
This seems to be a very important fact which is likely very useful in calculating the variance estimates for any probability distribution - but I am not sure why this is true. I tried to consult different references online (e.g. videos, university lecture notes), but I could not come across a source which demonstrated why this result is true.
Can someone please help me (i.e. walk me through the math) behind why the (Negative) Inverse of the Fisher Information is equal to the Variance? Is there a proof for this?
Thanks!
In my opinion, the Cramer-Rao Inequality (https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound) might be sufficient for proving the result in question. In your question, you outline the definition of a Likelihood Function and Fisher Information. Although it might not be a rigorous proof, it appears that this inequality shows the inverse relationship between Fisher Information and the Variance.
The only issue I can think of is that the Cramer-Rao Inequality provides a lower bound on the Variance vs. the actual Variance. I am not sure if this subtlety is of importance to you - but nonetheless, this could be a good start. Thus, might want look into the a proof of the Cramer-Rao Inequality (which there are plenty - e.g. https://gregorygundersen.com/blog/2019/11/27/proof-crlb/) and indirectly prove a version of this result.