Assume a non-linear relation between the random variables $\mathbf{Y} = f(\mathbf{X})$, where $\mathbf{Y}\sim p_Y$ takes values $\mathbf{y} \in \mathbb{R}^M$ and $\mathbf{X}\sim p_X$ takes values $\mathbf{x} \in \mathbb{R}^N$, with $M\leq N$. My question is about the "inverse" problem described below.
Direct problem - If we know the PDF $p_X$, then the PDF $p_Y$ is formally given by
$$p_Y(\mathbf{y}) = \int \delta^M(f(\mathbf{x})-\mathbf{y} ) \, p_X(\mathbf{x}) \, d^Nx$$
In general, this expression can not be handled with analytical techniques. However, we can sample some values $\mathbf{x}_i\sim p_X$: the scatter of the $f(\mathbf{x}_i)$ values already allows us to probe $p_Y$.
Inverse problem - The PDF $p_Y$ and the map $f$ are given. We want to estimate $p_X$. Unfortunately, the formal expression
$$ p_X(\mathbf{x}) = \int \delta^N(f^{-1}(\mathbf{y})-\mathbf{x} ) \, p_Y(\mathbf{y}) \, d^My $$
is useless: contrary to the previous case, it does not allow us to come up with a practical strategy (i.e. the "sampling" strategy above). The expression does not even make sense since $M\leq N$, plus we do not know the, potentially multivalued, map $f^{-1}$. Only if $f$ is a bijective, differentiable function, we may use the change of variables formula but, again, we have the practical problem that $f^{-1}$ is not analytically known (see e.g. this, this, this and this questions).
Does this inverse problem have a "name"? Of course, it is not always a well-posed problem, e.g. $f(\mathbf{x})=\mathbf{y}_0$, where $\mathbf{y}_0$ is a constant vector so that $p_Y= \delta^M(\mathbf{y}-\mathbf{y}_0 ) $ regardless of $p_X$: knowing $f$ and $p_Y$ tells us nothing about $p_X$. However, in most cases, the knowledge of $f$ and $p_Y$ should allow us to "know something" about $p_X$.
Is there any practical strategy/approach to tackle it? Maybe a Bayesian inference approach where some $\mathbf{y}_i$ distributed according to the known $p_Y$ is treated as the "data" and we infer $p_X$? Or maybe a maximum entropy approach where we try to maximize our ignorance on $p_X$ while accounting for constraints coming from knowledge of $p_Y$ and $f$?
Reference: A few days after posting the question I found this interesting reference that is very promising: D. Sanz-Alonso et al. Inverse Problems and Data Assimilation, available on arXiv.
In Measure Theory (advanced probability theory), your direct problem is known as the push forward and the inverse could probably be considered the pullback.
As highlighted here (https://mathoverflow.net/questions/122704/pullback-measures), “To define pullbacks of measures we need some additional data, because otherwise one would be able to obtain a canonical measure on an arbitrary measurable space M by pulling back the canonical measure on the point along the unique map M→pt.”
In other words, you need some kind of way of highlighting the relative importance of various points within $f^{-1}(y)$ for any $y$.
In probability theory terms, if we have some prior distribution $p_X$ and some known final distribution $p_Y$, then we want a distribution $q_X$ that pushes forward to $p_Y$ and that for any given value of $f(x)$ has the same density within $f^{-1}(x)$ as $p_X$. The simplest way to do this is by scaling up $p_X$. We can define some weight function $w(y)$ as the ratio of $p_Y$ and the push forward of $p_X$ (technically the Radon-Nikodym derivative) so that: $$w(y) \int \delta^M(f(x)-y)p_X(x) d^Nx = p_Y(y) \delta^M(y)$$ Then, we can just scale up $p_X$: $$q_X(x)= p_X(x) w(f(x))$$
Using capital letters for the equivalent measure and the derivative represents the Radon-Nikodym derivative this is fairly natural in measure-theoretic notation: $$Q_X(A)=\int_A \frac{d P_Y}{d(P_X \circ f^{-1})}(f(x))dP_X(x)$$
This is nice because $\frac{dQ_X}{dP_X}=\frac{d P_Y}{d(P_X \circ f^{-1})}(f(x))$ i.e. the relative densities of $Q_X$ and $P_X$ is fixed when $f(x)$ is fixed. Also, canceling out terms and substituting gives that $Q_X(f^{-1}(B))=P_Y(B)$ so it has the correct push forward implied distribution on $Y$.