I am stuck trying to derive an expression in the context of Bayesian inference for a model inverse problem.
Specifically, I am considering a dynamical system of the form $X(t)=\Phi(\theta,t)$ and assuming that data exist for the distribution of $X(t)$ at specific time points, i.e. from experiments. I am trying to infer the distribution of $\theta$ such that the simulated outputs best match the distribution of the available data.
I denote the probability density of $X$ given the data as $P(X|D)$. I am also assuming that we have a prior density for the parameters, denoted by $P(\theta)$. By forward simulating, we get the corresponding prior density of the simulated outputs, $P(X)$, irrespective of the measured data. My goal is to infer the "posterior" distribution of the parameters, denoted $P(\theta|D)$.
My derivation is as follows:
I consider the function $X(t)=\Phi(\theta,t)$ only at (n) specific time points, thus obtaining a finite-dimensional vector $X \in R^n$, and neglect the time dependence in the following.
The joint density of $X$ and $\theta$ is given by $P(X,\theta)$. I obtain
$P(\theta|D) = \int P(\theta|X)P(X|D) dX$ and I use Bayes law to rewrite $P(\theta|X) = \frac{P(X|\theta)P(\theta)}{P(X)}$.
Since $X=\Phi(\theta)$ is deterministic, I set $P(X|\theta) = \delta(X - \Phi(\theta))$ and the integral becomes \begin{eqnarray} P(\theta|D) &=& \int \delta(X - \Phi(\theta)) \frac{P(\theta) P(X|D)}{P(X)} dX \\ &=& P(\theta)\frac{P(X=\Phi(\theta)|D)}{P(X=\Phi(\theta))} \tag{1} \label{eq1} \end{eqnarray}
This expression should be correct, as it is similar to importance weighting. If we generate samples from the prior $P(\theta)$, forward simulate, and weight each sample by the ratio $\frac{P(X=\Phi(\theta)|D)}{P(X=\Phi(\theta))}$, then the resulting distribution of model outputs should have the target p.d.f. $P(X=\Phi(\theta)|D)$.
My confusion arises when trying to rewrite the denominator $P(X=\Phi(\theta)$. Using the same logic as above, $ $ \begin{eqnarray} P(X=\Phi(\theta)) &=& \int P(X=\Phi(\theta)|\theta')P(\theta')d\theta'\\ &=& \int \delta(\Phi(\theta)-\Phi(\theta')) P(\theta')d\theta'\\ &=& \int_\Omega P(\theta')d\theta' \end{eqnarray} where $\Omega = \left\{\theta': \Phi(\theta')=\Phi(\theta)\right\}$.
Is it possible to write the prior density of the transformed variable $X$ as the integral of prior density over the set $\Omega$ of all points in $\theta$ that produce the same model output? How do we express the density of a function of a random variable in terms of the density of that variable when the function is not invertible?
What happens to the integral when the model is invertible such that there is a unique solution to $\theta = \Phi^-1(X)$ for a given X? Is it possible to reason that the ratio $P(\theta)/\int_\Omega P(\theta') d\theta' \rightarrow 1$?
It seems like the correct expression for $P(\theta|D)$ in the invertible case should be
$P(\theta|D) = P(\Phi(\theta)|D) | det J_\Phi(\theta) | $ for Jacobian $J_\phi = [d\phi/d\theta]$, based on the rule for change of variables. But this does not seem to be consistent with the integral expression \ref{eq1}.
Ultimately I am trying to provide the correct expression for $P(\theta|D)$ in the invertible and non-invertible cases. Any help better understanding this is greatly appreciated!