- $\vec{X}$ is a vector of random variables in $\mathbb{R}^{n}$
- $\rho_{\vec{X}}(\vec{x})$ is probability density of $\vec{X}$.
- $\vec{y} = f(\vec{x})$ is a vector in $\mathbb{R}^{m}$
- $\vec{Y} = f(\vec{X})$ is a vector of random variables in $\mathbb{R}^{m}$
- $\rho_{\vec{Y}}(\vec{y})$ is probability density of $\vec{Y}$.
- $m \gt n$
The goal is to find $\rho_{\vec{Y}}(\vec{y})$ given $\rho_{\vec{X}}(\vec{x})$ and $f(\vec{x})$.
When does the following trick work?
$\rho_{\vec{Y}}(\vec{y}) d\vec{y} = \rho_{\vec{X}}(\vec{x}) d\vec{x}$
$\rho_{\vec{Y}}(\vec{y}) \det(\left \lvert J \right \rvert) = \rho_{\vec{X}}(\vec{x})$
For $n = 2, m = 4$, $$ J = \left[ \begin{matrix} a_{11} & a_{12} & 0 & 0 \\ a_{21} & a_{22} & 0 & 0 \\ a_{31} & a_{32} & 1 & 0 \\ a_{41} & a_{42} & 0 & 1 \\ \end{matrix} \right]$$
where:
- $ a_{ij} = \frac{\partial{y_i}}{\partial{x_j}} $
- The lower right corner of J is a block of identity matrix
- $\det$ is determinant
The explanation of the trick says it tries to augment the transform from:
$\vec{x} \rightarrow \vec{y}$
to
$(x_{1}, x_{2}, y_{3}, y_{4}) \rightarrow (y_{1}, y_{2}, y_{3}, y_{4})$
and then integrate the extra dimensions out.
What if $\frac{\partial y_{1}(\vec{x})}{\partial{y_3}(\vec{x})} \ne 0$?
If the trick doesn't always work, what is the general method to do this?
Update: add an example
Example (Unit simplex transform or Stick breaking transform)
Let $\vec{x} = (x_1, x_2, x_3, \ldots, x_{K})$ be a point on the probability simplex:
- $0 \le x_{n} \le 1$
- $x_1 + x_2 + x_3 + ... x_{K} = 1$
The transformation $\vec{z} = (z_{1}, \ldots, z_{K-1}) = f^{-1}(\vec{x})$ maps from the probability simplex to a hypercube:
- $z_{l} = \log(x_{l}) - \log(x_{l+1} + ... + x_{K})$
For ease of derivation, I define the following:
Let ${b_{l} = [1 + \exp(z_{l})]^{-1}}$
Let $c_{l} = 1 - b_{l} = [1 + \exp(-z_{l})]^{-1}$
Then, $$ x_{m} = \begin{cases} c_{1} & \text{if } m = 1 \\ \prod_{l=1}^{K-1} b_{l} & \text{if } m = K \\ c_{m} \prod_{l=1}^{m-1} b_{l} & \text{otherwise} \end{cases}$$
For $K = 3$,
$$ J = \left[ \begin{matrix} a_{11} & a_{12} & 0 \\ a_{21} & a_{22} & 0 \\ a_{31} & a_{32} & 1 \\ \end{matrix} \right]$$
where $a_{ij} = \frac{\partial x_{i}}{\partial z_j}$
Since $a_{12} = 0$, $ \det(J) = a_{11}a_{22}$
For higher K, only the diagonal terms contribute to the determinant.
$-\log[\det(J)] = \sum_{l=1}^{K-1} (K-l)\log[b_{l}] + \log[c_{l}]$
By testing over many randomly selected values, I verify that the above expression is equivalent to the expression in section "Absolute Jacobian Determinant of the Unit-Simplex Inverse Transform" in Stan, which mentioned the Jacobian is triangular.
But the manual and Betancourt 2010 didn't say the exact form of the Jacobian ...
If $f$ is differentiable as you assume, then its image $f(\mathbb R^n)$ will be supported on an (at most $n$-dimensional) submanifold of $\mathbb R^m$. Since the ($m$-dimensional) Lebesgue measure of this submanifold will equal zero, $Y$ will never possess a probability density (with respect to the Lebesgue measure).
As far as I know, there is no general method to address this problem, since probability measures (or random variables) supported on submanifolds of $\mathbb R^m$ are complicated for exactly the above reason. In particular cases, solutions can be found which describe the distribution of $\vec Y$ in other ways than through a probability density.
Update: In your example, I do not see what $f$, $\vec Y$, $\rho_{\vec X}$ and $\rho_{\vec Y}$ are supposed to be.