I'm studying this paper and I have some difficulties to understand the result expressed by equation $(14)$, which says, in a simplified notation, that \begin{equation}\prod_{i=1}^n \mathcal{N}\left(y_i; Cx, X\right)\propto \mathcal{N}\left(\bar{y};Cx, \frac{X}{n}\right)\,\mathcal{LW}\left(\bar{Y}; n-1, X\right)\tag{1}\end{equation} where:
- $y_i\in\mathbb{R}^2$, $x\in\mathbb{R}^4$ are column vectors, $X$ is a square matrix $\in\mathbb{R}^{2\times2}$
- $C\in\mathbb{R}^{2\times4}$ is a constant matrix with the following structure \begin{equation}C\triangleq\begin{bmatrix}1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\end{bmatrix}\end{equation}
- $\bar{y}\triangleq 1/n \sum_{i=1}^n y^i$ is the mean of the ensemble $\{y_i\}_{i=1}^n$, $\bar{Y}\triangleq \sum_{i=1}^n (y^i-\bar{y})(y^i-\bar{y})'$ is the spread of the ensamble$\{y_i\}_{i=1}^n$:
- $\mathcal{N}(z; \mu, \Sigma)$ denotes the generic $m\triangleq\dim(z)$-variate Gaussian distribution with mean and covariance $\mu\in\mathbb{R}^m$, $\Sigma\in\mathbb{R}^{m\times m}$, $\Sigma>0$ \begin{equation}\mathcal{N}(z; \mu, \Sigma)\triangleq \left(\frac{1}{\sqrt{2\pi}}\right)^{m}\,\frac{1}{\sqrt{|\Sigma|}}\exp\left(-\frac{1}{2}(z-\mu)'\Sigma^{-1}(z-\mu)\right)\end{equation}
- $\mathcal{LW}\left(\bar{Y}; n-1, X\right)\triangleq |X|^{-\frac{n-1}{2}}\operatorname{etr}\left(-\frac{1}{2}\bar{Y}X^{-1}\right)$, where $\operatorname{etr}(\cdot)$ is the exponential trace function, i.e. $\operatorname{etr}(M)\triangleq\exp(\operatorname{trace}(M))$ where $\operatorname{trace}(M)$ is the trace ($\equiv$ sum of the diagonal elements) of the generic matrix $M$.
Derivation of relation $(1)$
I agree with relation $(1)$. To prove it, I don't follow the approach used in [1] (because I don't understand it) but a second approach suggested in this subsequent paper. First of all, due to the exponential properties and by observing that $\dim(y_i)=2$, hold \begin{equation}\begin{aligned}\prod_{i=1}^n \mathcal{N}\left(y^i; Cx, X\right) &= \prod_{i=1}^n \left(\frac{1}{\sqrt{2\pi}}\right)^{2}\,\frac{1}{\sqrt{|X|}} \exp\left(-\frac{1}{2}(y_i-Cx)'X^{-1}(y_i-Cx)\right)\\ &=\left(\frac{1}{2\pi}\right)^n |X|^{-\frac{n}{2}}\exp\left\{-\frac{1}{2}\sum_{i=1}^n\left[(y_i-Cx)'X^{-1}(y_i-Cx)\right]\right\} \end{aligned}\end{equation} now the second mentioned paper suggests to use the following identity (which can be checked by doing some algebraic manipulations) \begin{equation}\sum_{i=1}^n\left[(y_i-Cx)'X^{-1}(y_i-Cx)\right]=\operatorname{trace}\left\{\left[n(\bar{y}-Cx)(\bar{y}-Cx)'+\bar{Y}\right]X^{-1}\right\}\end{equation} thus, taking into account the linearity of the trace operator, \begin{equation}\begin{aligned}\prod_{i=1}^n \mathcal{N}\left(y^i; Cx, X\right) &=\left(\frac{1}{2\pi}\right)^n |X|^{-\frac{n}{2}} \exp\left\{-\frac{1}{2}\operatorname{trace}\left\{\left[n(\bar{y}-Cx)(\bar{y}-Cx)'+\bar{Y}\right]X^{-1}\right\}\right\}\\ &=\left(\frac{1}{2\pi}\right)^n |X|^{-\frac{n}{2}} \exp\left\{\operatorname{trace}\left[-\frac{1}{2}(\bar{y}-Cx)(\bar{y}-Cx)'\left(\frac{X}{n}\right)^{-1}\right]+\operatorname{trace}\left[-\frac{1}{2}\bar{Y}X^{-1}\right]\right\}\\ &=\left(\frac{1}{2\pi}\right)^n |X|^{-\frac{n}{2}}\operatorname{etr}\left[-\frac{1}{2}(\bar{y}-Cx)(\bar{y}-Cx)'\left(\frac{X}{n}\right)^{-1}\right]\operatorname{etr}\left[-\frac{1}{2}\bar{Y}X^{-1}\right]\\ &=\left(\frac{1}{2\pi}\right) |X|^{-\frac{1}{2}}\operatorname{etr}\left[-\frac{1}{2}(\bar{y}-Cx)(\bar{y}-Cx)'\left(\frac{X}{n}\right)^{-1}\right]\left(\frac{1}{2\pi}\right)^{n-1} |X|^{-\frac{n-1}{2}}\operatorname{etr}\left[-\frac{1}{2}\bar{Y}X^{-1}\right]\\ \end{aligned}\end{equation} now its easy to see that the first three factors define a bivariate Gaussian density (it is sufficient to recall the general property $\operatorname{trace}(aa'A^{-1})=a'A^{-1}a$ where $a$ is a column vector and $A$ is a simmetric and positive definite matrix), while the last two factors are the definition of $\mathcal{LW}$ so, as claimed, \begin{equation}\begin{aligned} \prod_{i=1}^n \mathcal{N}\left(y^i; Cx, X\right) &= \mathcal{N}\left(\bar{y};Cx, \frac{X}{n}\right)\,\left(\frac{1}{2\pi}\right)^{n-1}\mathcal{LW}\left(\bar{Y}; n-1, X\right)\\ &\propto \mathcal{N}\left(\bar{y};Cx, \frac{X}{n}\right)\,\mathcal{LW}\left(\bar{Y}; n-1, X\right) \end{aligned}\end{equation} where the factor $(2\pi)^{-(n-1)}$ is absorbed by proportionality sign $\propto$.
My question
Due to the previous derivation, I'm pretty sure that the definition of $\mathcal{LW}$ is correct. The problem arises when the author of 1 gives an interpretation of $\mathcal{LW}$, which seems to me incorrect.
The author defines the following two matrix-variate densities:
- Wishart: if $B$ is a random, square, positive definite, symmetric matrix $\in\mathbb{R}^{d\times d}$ then is Wishart-distributed with parameters $a$ (scalar), $A$ ($d\times d$ matrix) iff its density is \begin{equation}\mathcal{W}\left(B;a,A\right)\triangleq \frac{1}{Z}|X|^{\frac{a-d-1}{2}}\operatorname{etr}\left(-\frac{1}{2}A^{-1}B\right)\end{equation} where $Z$ is a suitable normalizing factor;
- inverse-Wishart: if $B$ is a square, positive definite, symmetric matrix $\in\mathbb{R}^{d\times d}$ then is inverse-Wishart-distributed with parameters $a$ (scalar), $A$ ($d\times d$ matrix) iff its density is \begin{equation}\mathcal{IW}\left(B;a,A\right)\triangleq \frac{1}{Z}|X|^{-\frac{a}{2}}\operatorname{etr}\left(-\frac{1}{2} B^{-1} A\right) \end{equation} where $Z$ is a suitable normalizing factor;
The author in 1 states that $\mathcal{LW}$ is proportional to a Wishart in $\bar{Y}$ with scalar parameter $n-1$ (and, as a consequence, with matrix parameter $X$), so if I have get correctly the point, \begin{equation}\begin{aligned} \mathcal{LW}\left(\bar{Y}; n-1, X\right)&\triangleq |X|^{-\frac{n-1}{2}} \operatorname{etr}\left(-\frac{1}{2}\bar{Y}X^{-1}\right)\\ &\propto\mathcal{W}(\bar{Y}; n-1, X)=\frac{1}{Z}|\bar{Y}|^{\frac{(n-1)-2-1}{2}}\operatorname{etr}\left(-\frac{1}{2}X^{-1}\bar{Y}\right)\\ &=\frac{1}{Z}|\bar{Y}|^{\frac{n-4}{2}}\operatorname{etr}\left(-\frac{1}{2}X^{-1}\bar{Y}\right)\\ \end{aligned}\end{equation} I cannot see why this is true: in the definition of $\mathcal{LW}$ is not present the factor $|\bar{Y}|^{\frac{n-4}{2}}$, which is not a scaling factor (thus cannot be ignored due to symbol $\propto$) because depends on $\bar{Y}$. On the other hand, from my prospective, it is natural to think $\mathcal{LW}$ as an inverse-Wishart in $X$ with scalar parameter $n-1$ and matrix parameter $\bar{Y}$. I have a strong suspect that I'm wrong because both 1 and 2 (and more recent papers) agreee on the fact that $\mathcal{LW}$ is Wishart in $\bar{Y}$.