What is the name of this distribution?

65 Views Asked by At

This is a somewhat long question but I want to make sure that you understand the context properly. Please bear with me.

I'm reading chapter 10 of Bishop's Pattern Recognition and Machine Learning and I'm stuck on "10.1.3 Example: The univariate Gaussian". In it, he defines the following likelihood function of the data with respect to the parameters of a Gaussian with mean $\mu$ and precision $\tau$.

\begin{equation} \tag{1} p(\mathcal{D}|\mu, \tau) = (\frac{\tau}{2\pi})^{N/2}\mathrm{exp}\{-\frac{\tau}{2}\sum_{n=1}^N (x_n - \mu)^2\} \end{equation}

He also introduces conjugate prior distributions for $\mu$ and $\tau$:

\begin{equation} \tag{2} p(\mu|\tau) = \mathcal{N}(\mu|\mu_0, (\lambda_0\tau)^{-1}) \end{equation} \begin{equation} \tag{3} p(\tau) = \mathrm{Gam}(\tau|a_0, b_0). \end{equation}

He then seeks to approximate the posterior $p(\mu, \tau | \mathcal{D})$ by factorized variational approximation, which means he assumes that the posterior can be expressed as:

\begin{equation} \label{1}\tag{4} q(\mu, \tau) = q_{\mu}(\mu)q_{\tau}(\tau) \end{equation}

Note that the true posterior can not be factorized this way.

He then goes on to find that, for the optimal choices of $q_{\mu}(\mu)$ and $q_{\tau}(\tau)$,

\begin{equation} \tag{5} q_{\mu}(\mu) = \mathcal{N}(\mu | \mu_N, \lambda^{-1}_N) \end{equation} with \begin{equation} \tag{6} \mu_N = \frac{\lambda_0 \mu_0 + N\overline{x}}{\lambda_0 + N} \end{equation} \begin{equation} \tag{7} \lambda_N = (\lambda_0 + N)\mathbb{E}[\tau] \end{equation} and \begin{equation} \tag{8} q_{\tau}(\tau) = \mathrm{Gam}(\tau|a_N, b_N) \end{equation} with \begin{equation} \tag{9} a_N = a_0 + \frac{N}{2} \end{equation} \begin{equation} \tag{10} b_N = b_0 + \frac{1}{2}\mathbb{E}_{\mu}[\sum^N_{n=1}(x_n - \mu)^2 + \lambda_0(\mu - \mu_0)^2]. \end{equation}

He then suggests initializing $\mathbb{E}[\tau]$ to some random number and using it to compute $q_{\mu}(\mu)$, and then using that to re-calculate $q_{\tau}(\tau)$. This should be done until convergence.

Now let's say I carried out the optimization and converged at some values for $\mu_N$, $\lambda_N$, $a_N$ and $b_N$, which I refer to as $\mu_*$, $\lambda_*$, $a_*$ and $b_*$. Carrying out the multiplication of the two distributions in (\ref{1}) gives me

\begin{align} \label{2} \tag{11} q(\mu, \tau) &= \frac{b_*^{a_*}}{\Gamma(a_*)}\tau^{a_* - 1}\mathrm{exp}\{-b_*\tau\}\frac{1}{(2\pi\lambda^{*-1})^{1/2}}\mathrm{exp}\{\frac{\lambda_*}{2}(\mu-\mu_*)^2\}\\ &= \frac{b_*^{a_*}\tau^{a_* - 1}\lambda_*^{1/2}}{\Gamma(a_*)(2\pi)^{1/2}}\mathrm{exp}\{\frac{\lambda_*}{2}(\mu - \mu_*)^2 - b_*\tau\} \end{align} where all symbols except $\mu$ and $\tau$ are constants. We can then write:

\begin{equation} \tag{12} q(\mu, \tau) = C\tau^{a_*-1}\mathrm{exp}\{\frac{\lambda_*}{2}(\mu - \mu_*)^2 - b_*\tau\} \end{equation} with $C = \frac{b_*^{a_*}\lambda_*^{1/2}}{\Gamma(a_*)(2\pi)^{1/2}}$.

(\ref{2}) should be an approximation of the posterior distribution $p(\mu, \tau|\mathcal{D})$. I'm fairly sure that (\ref{2}) is correctly computed. My question, if this is the case, is: what is this type of distribution called? It looks most similar to a Normal-Gamma distribution (look here), but it is still not exactly the same, for example due to the different exponents on the $\tau$ factor in the numerator outside the exponential.

1

There are 1 best solutions below

0
On BEST ANSWER

(11) is a Normal distribution for $\mu$ and a Gamma distribution for $\tau$, and as the variables are independent (because (11) is separable) no-one's bothered giving it its own name as a multivariate distribution.