Gaussian-Wishart marginalization over precision matrix

286 Views Asked by At

I am trying to integrate a Gaussian-Wishart distribution over the precision parameter. According to Bishop's PRML book (as well as Wikipedia, etc.) this should give rise to a multivariate t-distribution. However, I could not find the steps for this anywhere, and the PRML book seems to treat it as a simple extension of the Gaussian-Gamma integration (to a student's t-distribution).

I tried deriving this result myself, but I am hitting some barriers. I was wondering if anyone can help me (or point me to a resource that shows the steps).

We want to integrate out the precision of a Gaussian-Wishart distribution:

$\DeclareMathOperator{\Tr}{Tr}$ $\int \mathcal{N}(\mu | \mu_0, (\beta \Lambda^{-1}) \mathcal{W}(\Lambda, W, v) d\Lambda$

$= \int \frac{|\beta \Lambda|)^{1/2}}{(2\pi)^{D/2}} \exp(-\frac{1}{2} (\mu - \mu_0)^\intercal \beta \Lambda (\mu - \mu_0)) B(W, v) |\Lambda|^{(v-D-1)/2}\exp(-\frac{1}{2} \Tr(W^{-1} \Lambda)) d\Lambda$

Where we simply expanded the definition of the two distributions, and $B(W,v)$ is the normalization constant for the Wishart distribution. The strategy now is to collect all terms that depend on $\Lambda$, that will turn out to be another Wishart distribution (and its integration will be the inverse of the normalization constant).

$= \int \frac{\beta^{D/2} |\Lambda|^{1/2}}{(2\pi)^{D/2}} \exp(-\frac{\beta}{2} \Tr((\mu - \mu_0)(\mu - \mu_0)^\intercal \Lambda ) B(W, v) |\Lambda|^{(v-D-1)/2}\exp(-\frac{1}{2} \Tr(W^{-1} \Lambda)) d\Lambda$

$= \int \big(\frac{\beta}{2\pi}\big)^{D/2} B(W, v) |\Lambda|^{1/2 + (v-D-1)/2} \exp(-\frac{\beta}{2} \Tr((\mu - \mu_0)(\mu - \mu_0)^\intercal \Lambda -\frac{1}{2} \Tr(W^{-1} \Lambda)) d\Lambda$

$= \big(\frac{\beta}{2\pi}\big)^{D/2} B(W, v) \int |\Lambda|^{(v+1-D-1)/2} \exp(-\frac{1}{2} \Tr\big((\beta(\mu - \mu_0)(\mu - \mu_0)^\intercal +W^{-1}) \Lambda\big) d\Lambda$

Where we first used the trace trick, then accumulated all terms that depend on $\Lambda$. We notice that what is inside the integral is an unormalized Wishart, so it's integral will be the inverse of the normalization constant of the Wishart $\mathcal{W}(\Lambda, (\beta(\mu - \mu_0)(\mu - \mu_0)^\intercal +W^{-1})^{-1}, v+1) $

$= \big(\frac{\beta}{2\pi}\big)^{D/2} \frac{B(W, v)}{B((\beta(\mu - \mu_0)(\mu - \mu_0)^\intercal +W^{-1})^{-1}, v+1)}$

We now plug in the definition of the normalization constants:

$= \big(\frac{\beta}{2\pi}\big)^{D/2} |W|^{-v/2} \Big(2^{vD/2}\pi^{D(D-1)/4} \prod_{i=1}^D{\Gamma\big(\frac{v+1-i}{2} \big)} \Big)^{-1} |(\beta(\mu - \mu_0)(\mu - \mu_0)^\intercal +W^{-1}|^{-(v+1)/2} \Big(2^{(v+1)D/2}\pi^{D(D-1)/4} \prod_{i=1}^D{\Gamma\big(\frac{v+2-i}{2} \big)} \Big)$

There are some easy simplifications on variable $v$, but I am stuck with how to simplify the terms involving W. I would appreciate any suggestions on how to proceed (or a link to a resource that shows this derivation). Thanks!

1

There are 1 best solutions below

0
On

I found the solution on the book "Optimal statistical decisions" (DeGroot, Morris H). What was missing was the following relation:

$ |A + vv^\intercal| = |A|(1 + v^\intercal A^{-1} v) $

We can also ignore any terms that don't depend on $\mu$ as they will only take part in the normalization constant of the multivate t-distribution.

So we have the following:

$ \propto |(\beta(\mu - \mu_0)(\mu - \mu_0)^\intercal +W^{-1}|^{-(v+1)/2} $

$ \propto |W^{-1}|^{-(v+1)/2} (1 +\beta(\mu - \mu_0)^\intercal W (\mu - \mu_0) |^{-(v+1)/2} $

$ \propto (1+\beta(\mu - \mu_0)^\intercal W (\mu - \mu_0) |^{-(v+1)/2}$

Which we recognize as an unormalized multivariate t-distribution. We just need some change of variables:

$(v' + D) = v + 1 \Rightarrow v' = v - D + 1$

$ \propto (1+\frac{v'}{v'}\beta(\mu - \mu_0)^\intercal W (\mu - \mu_0) |^{-(v'+D)/2}$

$ \propto (1+\frac{1}{v'}(\mu - \mu_0)^\intercal(\beta(v-D+1) W) (\mu - \mu_0) |^{-(v'+D)/2}$

Therefore obtaining a multivariate t-distribution with:

$\text{St}(\mu | \mu_0, \beta(v-D+1) W, v-D+1)$

($\beta(v-D+1) W$ is the precision matrix; D is the dimensionality)