Empirical Fisher Information but with unknown true parameters and distribution?

123 Views Asked by At

I am not sure if I ask it correctly. I am working on using Fisher Information to examine the information in a model (say neural networks for simplicity).

What I know is that the definition of Fisher Information is

$I(\theta^*)=Var[\frac{\partial}{\partial \theta}\log p(Y|\theta,X)|_{\theta=\theta^*}]$

Conceptually I know that Fisher Information is the Variance of the derivative of log-likelihood. But what is the $\theta$ and the distribution? If Fisher Information is to evaluate the variance at the TRUE $\theta^*$, how come we know the true parameters? and also in the classification case, how can we know the also the true distribution?

What I have is

  1. A neural network model with some trainable parameters
  2. it is a classification task, last layer is softmax, loss function is cross-entropy

If we know the true parameters then we don't even have to train the model. So, I don't understand how can people compute the fisher information matrix in the research.

Thanks!