Considering two continuous probability distributions q(x) and p(x), The Kullback leiber Divergence is defined as the measure of the information lost when q(x) is used to approximate p(x). \begin{equation*} KL (p(x)|| q(x)) = \int_{R} p(x) \ln \frac{p(x)}{q(x)} dx \end{equation*}
A distributions in the exponential family can be written as : \begin{equation*} p_\theta(x) = \frac{1}{Z(\theta)} exp(\theta^T \phi (x)) \end{equation*} where $\phi(x)$ is the vector of the natural statistics of x and $Z(\theta) = \int_ exp(\theta^T \phi(x)) dx$ .
My goal is to prove that for the special case of the normal-gamma distribution. $$(X,T) - NormalGamma(\mu, \lambda,\alpha,\beta)$$ with pdf: $$ f(x,t;\mu,\lambda,\alpha,\beta) = \frac{\beta^\alpha \sqrt{\lambda}}{\Gamma(\alpha)\sqrt{2\pi}} t^{\alpha-\frac{1}{2}} e^{-\beta t} \exp\left(-\frac{\lambda t(x-\mu)^2}{2} \right). $$ we have the following theorem: the distribution $p_{\theta*}$ which minimises the Kullback-Leibler divergence for the normal inverse gamma distribution with probability distribution $p_\theta$ is given by : \begin{equation*} E_{p_{\theta^*}(x)}[\phi(x)] = E_{p_{(x)}}[\phi(x)] \end{equation*}
This will imply finding $\theta^*$ such that $\nabla_\theta f(\theta^*) =0$ Then I will need to find the hessian to proof that $\theta^*$ is actually the minimum. My problem is that I was not able to realize those derivation successfully. Could anyone help me please?