Fisher Information Matrix in machine learning

53 Views Asked by Bumbble Comm At 11 Apr 2026 - 10:49

In these weeks I am reading some machine learning papers dealing with Fisher information theory. Given a parameter set $\Theta \in \Bbb R^d$, I have always defined the Fisher information of a statistical model $\mathcal{M}_{\Theta} = \{p_{\theta}(y|x): \theta\in\Theta\}$ as $$ F(\theta) = \mathbb{E}_{p_{\theta}(x,y)}\big[(\nabla \log p_{\theta}(x, y))^{\otimes 2}\big]. $$ Surprising, I have found works using another definition. Given an unknown distribution $p(x,y)$ and a parametric model $\mathcal{M}_{\Theta}$, the Fisher information is defined as $$ F(\theta) = \mathbb{E}_{p(x,y)}\big[(\nabla \log p_{\theta}(x, y))^{\otimes 2}\big] $$ These two definitions are different. While the first definition is justified and there is a huge mathematical literature in it, the second one is a little bit mysterious.

Questions

May someone help me to understand the meaning of this definition?
Do you know where and who introduced this definition?

Thanks!

Original Q&A

Fisher Information Matrix in machine learning

Related Questions in MACHINE-LEARNING

Related Questions in FISHER-INFORMATION

Trending Questions

Popular # Hahtags

Popular Questions