Given:
- the random variable $\mathbf{X} \sim \mathcal{N}(\mathbf{c, \Gamma})$ where $\mathbf{X} \in \mathbb{R}^L$
- its affine transformation $\mathbf{Y} = \mathbf{A} \mathbf{X} + \mathbf{b} + \mathbf{E}$ where , $\mathbf{Y} \in \mathbb{R}^D$, $\mathbf{E} \sim \mathcal{N} (\mathbf{0}, \mathbf{\Sigma})$ and $\mathbf{A} \in \mathbb{R}^{D \times L}$ and $\mathbf{b} \in \mathbb{R}^D$
I want compute the joint distribution of $\mathbf{X}, \mathbf{Y}$.
To do that, I am computing the following:
$$\begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \end{bmatrix} \sim \mathcal{N} \left( \begin{bmatrix} \mathbf{c} \\ \mathbf{A c + b} \end{bmatrix}, \begin{bmatrix} \mathbf{\Gamma} & \mathbf{R_{X,Y}} \\ \mathbf{R_{Y,X}} & \mathbf{\Sigma + A \Gamma} \mathbf{A}^T \end{bmatrix} \right)$$
Where the I have computed the variance thanks to the properties of the affine transformations. However I am having troubles on how to compute $\mathbf{R_{X,Y}}$ ( $ = \mathbf{R_{Y,X}}^T $).
Can someone help me? Or at least give some hint on solve using the following formula?
$$\mathbf{R_{X,Y}} = \mathbb{E}[(\mathbf{X} - \mathbb{E}[\mathbf{X}])([\mathbf{Y} - \mathbb{E}[\mathbf{Y}])^T]$$
I will need it to compute later the conditional distribution of $\mathbf{X}$ given $\mathbf{Y}$ using well know formulas (e.g. conditional distribution of gaussian process )
First, say that all your vectors are $n\times1$. Hence $\mathbf{X}^{*}=(\mathbf{X} - \mathbb{E}[\mathbf{X}])$ is column and $\mathbf{Y}^{*T}=([\mathbf{Y} - \mathbb{E}[\mathbf{Y}])^T$ is row.
I guess that you would have no problem with computing the above. Basically, $\mathbf{X}^{*}$ as well as $\mathbf{Y}^{*T}$ are the mean-centered version of their "non-star" counterpart.
Then, one has $\mathbf{Z}=\mathbf{X}^{*} \mathbf{Y}^{*T}$, which is a $n \times n$ matrix. Here as well, I guess that you would have no problem to compute this Cartesian product. But, what you must understand is that each of the $n^2$ components of $\mathbf{R_{X,Y}} = \mathbb{E}[\mathbf{Z}]$, i.e. $\mathbb{E}[\mathbf{z}_{ij}]$ is actually rarely (not to say "not") computable as such, since it relies on the distribution of $\mathbf{z}_{ij}$. Yes, I am talking about the distribution of each $\mathbf{z}_{ij}$.
One way to get these $n^2$ distributions resorts to resampling methods. Those imply to repeat the calculation of $\mathbf{Z}$ many times, say, $m$ times.
Let's call $\mathbf{Z}_r$ for $r=1,...,m$, a counterpart of $\mathbf{Z}$, based on the $r$th of the $m$ resampled versions of $\mathbf{X}$ and $\mathbf{Y}$, i.e. on $\mathbf{X}_r$ and $\mathbf{Y}_r$. Once you have the $m$ resampled versions of $\mathbf{Z}$, you can build the square matrix $\mathbf{R_{X,Y}} = \mathbb{E}[\mathbf{Z}] = [m^{-1} \sum_{r=1}^m \mathbf{z}_{ij,r}]$.