Let $f_\theta({\bf x})$ be the probability density for a vector ${\bf x}$ with $N$ elements. It is conditioned on a parameter $\theta$. I do not know the explicit form of $f_\theta$. However, I am able to sample somewhat efficiently from $f_\theta$ for a given $\theta$.
I wish to estimate the Fisher information for the distribution $f_\theta$ with respect to the parameter $\theta$, namely,
$$ I(\theta) = \mathbb{E}_\theta \left\{ \left[ \frac{\partial}{\partial \theta}\ln f_\theta({\bf x}) \right]^2 \right\} $$
I would like to do that with a number of samples of ${\bf x}$ that does not grow exponentially in $N$. The issue is that I do not know how to handle the derivative. Does someone know a good way to do this?
My best idea so far is to
- pick two parameters $\theta_1$ and $\theta_2$ that are close to each other;
- sample $x$ a finite number of times for both $f_{\theta_1}$ and $f_{\theta_2}$;
- use the samples to estimate the Kullback-Leibler divergence between $f_{\theta_1}$ and $f_{\theta_2}$;
- use the relationship between the KL divergence and the Fisher information to estimate the Fisher information.
However, I'm not yet sure that step 3 gets rid of the dimensionality problem. The entire problem is that I cannot build the histograms of $f_\theta({\bf x})$ in a reasonable time as $N$ increases. And I need these histograms to estimate the KL divergence.
Edit: I found this, which might offer some answers.
Edit 2: I think it can be done when $f_\theta$ describes a Markov chaine for ${\bf x}$. Then the value of $f_\theta({\bf x})$ can be calculated efficiently for each sample using hidden Markov models, and the KL divergence can be estimated efficiently. I'll let you know if it works out.
Edit 3: It appears that the use of derivatives can be avoided provided that the probability density belongs to an exponential family with respect to the parameter $\theta$.