I'm reading about the derivation of Bayesian information criterion (page 216) and in the proof, it is given as fact that:
$$\int(\theta-\hat{\theta})\exp\left\{-\frac{n}{2}(\theta-\hat{\theta})^TJ(\hat{\theta})(\theta-\hat{\theta})\right\}\,d\theta=0,$$
where $\hat{\theta}$ is a maximum likelihood estimator for a model, $n$ is the number of data points and:
$$J(\hat{\theta})=\left.-\frac{1}{n}\frac{\partial^2\ell(\theta)}{\partial\theta\partial\theta^T}\right\rvert_{\theta=\hat{\theta}}=\left.-\frac{1}{n}\frac{\partial^2\log f(x_n|\theta)}{\partial\theta\partial\theta^T}\right\rvert_{\theta=\hat{\theta}}.$$
Question: Why is the equality true? Proof or references?
P.S. if you need more details let me know. The authors use Laplace approximation and Taylor series in this derivation. You can find a picture of the derivation here.
Here's an informal proof (but I bet my statistics lecturer would love it ...):
Since $\hat \theta$ maximizes $\ln f$, it's a critical point of $\ln f$ and Taylor expansion at $\hat \theta$ yields $$\ln f( \theta) = \ln f(\hat \theta)+\frac 12 (\theta - \hat \theta)^TH(\ln f)(\hat\theta)(\theta - \hat \theta)+o((\theta - \hat \theta)^2)$$
where $H(\ln f)(\hat\theta)$ is the Hessian of $\ln f$ at $\hat\theta$.
With your notations,$$ -\frac{n}{2}(\theta-\hat{\theta})^TJ(\hat{\theta})(\theta-\hat{\theta}) = \frac 12 (\theta - \hat \theta)^TH(\ln f)(\hat\theta)(\theta - \hat \theta) $$hence the """estimate""" (note the quotation marks):
$$\begin{align}\int(\theta-\hat{\theta})\exp\left\{-\frac{n}{2}(\theta-\hat{\theta})^TJ(\hat{\theta})(\theta-\hat{\theta})\right\}\,d\theta &= \int (\theta-\hat{\theta})e^{\ln f(\theta)-\ln f(\hat\theta)}d\theta \\ &=\frac{1}{f(\hat\theta)}\int (\theta-\hat{\theta}) f(\theta)d\theta\\ &=\frac{E_{\theta}(\theta-\hat\theta)}{f(\hat\theta)}\\ &=0 \end{align} $$
The last equality follows from consistency of the MLE ($n$ large and additional assumptions).