Transform probability density from low to high dimension with Jacobian

1k Views Asked by At
  • $\vec{X}$ is a vector of random variables in $\mathbb{R}^{n}$
  • $\rho_{\vec{X}}(\vec{x})$ is probability density of $\vec{X}$.
  • $\vec{y} = f(\vec{x})$ is a vector in $\mathbb{R}^{m}$
  • $\vec{Y} = f(\vec{X})$ is a vector of random variables in $\mathbb{R}^{m}$
  • $\rho_{\vec{Y}}(\vec{y})$ is probability density of $\vec{Y}$.
  • $m \gt n$

The goal is to find $\rho_{\vec{Y}}(\vec{y})$ given $\rho_{\vec{X}}(\vec{x})$ and $f(\vec{x})$.


When does the following trick work?

$\rho_{\vec{Y}}(\vec{y}) d\vec{y} = \rho_{\vec{X}}(\vec{x}) d\vec{x}$

$\rho_{\vec{Y}}(\vec{y}) \det(\left \lvert J \right \rvert) = \rho_{\vec{X}}(\vec{x})$

For $n = 2, m = 4$, $$ J = \left[ \begin{matrix} a_{11} & a_{12} & 0 & 0 \\ a_{21} & a_{22} & 0 & 0 \\ a_{31} & a_{32} & 1 & 0 \\ a_{41} & a_{42} & 0 & 1 \\ \end{matrix} \right]$$

where:

  • $ a_{ij} = \frac{\partial{y_i}}{\partial{x_j}} $
  • The lower right corner of J is a block of identity matrix
  • $\det$ is determinant

The explanation of the trick says it tries to augment the transform from:

$\vec{x} \rightarrow \vec{y}$

to

$(x_{1}, x_{2}, y_{3}, y_{4}) \rightarrow (y_{1}, y_{2}, y_{3}, y_{4})$

and then integrate the extra dimensions out.


What if $\frac{\partial y_{1}(\vec{x})}{\partial{y_3}(\vec{x})} \ne 0$?

If the trick doesn't always work, what is the general method to do this?


Update: add an example

Example (Unit simplex transform or Stick breaking transform)

Let $\vec{x} = (x_1, x_2, x_3, \ldots, x_{K})$ be a point on the probability simplex:

  • $0 \le x_{n} \le 1$
  • $x_1 + x_2 + x_3 + ... x_{K} = 1$

The transformation $\vec{z} = (z_{1}, \ldots, z_{K-1}) = f^{-1}(\vec{x})$ maps from the probability simplex to a hypercube:

  • $z_{l} = \log(x_{l}) - \log(x_{l+1} + ... + x_{K})$

For ease of derivation, I define the following:

  • Let ${b_{l} = [1 + \exp(z_{l})]^{-1}}$

  • Let $c_{l} = 1 - b_{l} = [1 + \exp(-z_{l})]^{-1}$

Then, $$ x_{m} = \begin{cases} c_{1} & \text{if } m = 1 \\ \prod_{l=1}^{K-1} b_{l} & \text{if } m = K \\ c_{m} \prod_{l=1}^{m-1} b_{l} & \text{otherwise} \end{cases}$$

For $K = 3$,

$$ J = \left[ \begin{matrix} a_{11} & a_{12} & 0 \\ a_{21} & a_{22} & 0 \\ a_{31} & a_{32} & 1 \\ \end{matrix} \right]$$

where $a_{ij} = \frac{\partial x_{i}}{\partial z_j}$

Since $a_{12} = 0$, $ \det(J) = a_{11}a_{22}$

For higher K, only the diagonal terms contribute to the determinant.

$-\log[\det(J)] = \sum_{l=1}^{K-1} (K-l)\log[b_{l}] + \log[c_{l}]$

By testing over many randomly selected values, I verify that the above expression is equivalent to the expression in section "Absolute Jacobian Determinant of the Unit-Simplex Inverse Transform" in Stan, which mentioned the Jacobian is triangular.

But the manual and Betancourt 2010 didn't say the exact form of the Jacobian ...

2

There are 2 best solutions below

4
On BEST ANSWER

If $f$ is differentiable as you assume, then its image $f(\mathbb R^n)$ will be supported on an (at most $n$-dimensional) submanifold of $\mathbb R^m$. Since the ($m$-dimensional) Lebesgue measure of this submanifold will equal zero, $Y$ will never possess a probability density (with respect to the Lebesgue measure).

As far as I know, there is no general method to address this problem, since probability measures (or random variables) supported on submanifolds of $\mathbb R^m$ are complicated for exactly the above reason. In particular cases, solutions can be found which describe the distribution of $\vec Y$ in other ways than through a probability density.

Update: In your example, I do not see what $f$, $\vec Y$, $\rho_{\vec X}$ and $\rho_{\vec Y}$ are supposed to be.

0
On

I also originally believe there is no efficient way to deal with the problem. However, by accident, I read some paper on this topic. If the transformation is bijective, there may be some chance. Basically, the claim is preservation of volume, the $m \times n$ Jacobian
$$ J_f = \frac{\partial Y}{\partial X} $$ If the matrix is full column rank, its volume is $$ vol \:J_f = \sqrt{det J_f^T J_f}$$

So, the density function becomes $$ \rho_Y \times vol \: J_f = \rho_X $$

Ben-Israel, Adi, The change-of-variables formula using matrix volume, SIAM J. Matrix Anal. Appl. 21, No. 1, 300-312 (1999). ZBL0955.26007.