Question about Fisher information matrix

85 Views Asked by At

Consider a family $S$ of probability density functions on $X$ which is defined as $p:X\to\mathbb{R}$ such that $p(x\geq 0)$ and $\int_{X}p(x)dx=1$. Suppose each element of $S$ may be parameterized using real-valued variable $[\xi^1,\cdots,\xi^n]$ that is $$S=\{p(x;\xi):\xi=[\xi^1,\cdots,\xi^n]\in E\subseteq \mathbb{R^n}\}$$ where $x\in X$. $S$ is known as statistical manifold of dimension $n$.

Given a point $\xi\in E$ the Fisher information matrix of $S$ at $\xi$ is an $n\times n$ matrix $G(\xi)=[g_{ij}(\xi)]$, where $(i,j)$th element of $g_{ij}$ is defined by the equation below: $$g_{ij}(\xi)=E_{\xi}[\partial_{i}l_{\xi}\partial_{j}l_{\xi}]=\int \partial_{i}l(x;\xi)\partial_{j}l(x;\xi)p(x;\xi)dx$$ where $\partial_{i}=\frac{\partial}{\partial \xi^i}$, $l_{\xi}=l(x,\xi)=\log p(x;\xi)$, and $E_{\xi}$ denotes the expectation with respect to the distribution $p_{\xi}$.

My Questions: (1) After this author says tha $g_{ij}$ can also be written as $g_{ij}(\xi)=-E_{\xi}[\partial_{i}\partial_{j}l_{\xi}]$. I am not sure how do we get this but I think we need to use the following property of statistical model which says that $$\int\partial_{i}p(x;\xi)dx=\partial_{i}\int p(x;\xi)dx=0.$$

(2) Then the author says one more representation of $g_{ij}$, which is $$g_{ij}(\xi)=4\int\partial_{i}\sqrt{p(x;\xi)}\partial_{j}\sqrt{p(x;\xi)}dx.$$ For this I have no idea but I guess he is using $\sqrt{p(x;\xi)}$ function instead of $\log p(x;\xi)$.

EDIT: Solution for (2): Let's start with \begin{gather*} 4\int\partial_{i}\sqrt{p(x;\xi)}\partial_{j}\sqrt{p(x;\xi)}dx=4\int\frac{1}{2\sqrt{p(x;\xi)}}\partial_{i}p(x;\xi)\frac{1}{2\sqrt{p(x;\xi)}}\partial_{j}dx\\ =\int\frac{\partial_{i}p(x;\xi)\partial_{j}p(x;\xi)}{p(x;\xi)}dx = \int\partial_{i}l_{\xi}\partial_{j}l_{\xi}p(x;\xi)dx=E_{\xi}[\partial_{i}l_{\xi}\partial_{j}l_{\xi}] \end{gather*}

1

There are 1 best solutions below

0
On BEST ANSWER

Observe that $$ \partial_i \partial_j l(x;\xi) = \partial_i \frac{\partial_jp(x;\xi)}{p(x;\xi)} = \frac{\partial_i \partial_j p(x;\xi)}{p(x;\xi)} - \frac{\partial_i p(x;\xi) \cdot\partial_j p(x;\xi)}{p(x;\xi)^2.} \\ = \frac{\partial_i \partial_j p(x;\xi)}{p(x;\xi)} - (\partial_i l(x;\xi) \cdot \partial_j l(x;\xi)). $$ So, $$\mathbb{E}[- \partial_i \partial_j l(x;\xi) ] = \mathbb{E}[\partial_i l(x;\xi) \partial_j l(x;\xi)] - \int \partial_i \partial_j p(x;\xi) \mathrm{d}x.$$ So we need to argue that the final term is $0$. Under sufficient smoothness conditions, we can write $$ \int \partial_i \partial_j p(x;\xi) \mathrm{d}x = \partial_i \int \partial_j p(x;\xi) \mathrm{d}x = \partial_i \partial_j \int p(x;\xi) \mathrm{d}x.$$ But $\int p(x;\xi) \mathrm{d}x = 1$ is a constant as a function of $\xi,$ and so the expression is indeed $0$.

The precise smoothness condition needed here is from Leibniz's theorem - it should hold that $p(x;\xi), \partial_j p(x;\xi), \partial_i \partial_j p(x;\xi)$ are all continuous in $(x,\xi)$. This is typically assumed - the statement of (1) should read something like "if $p$ is appropriately smooth, then $g_{ij} = - \mathbb{E}_{\xi}[\partial_i\partial_j l_\xi]$."