I know Artificial Neural Networks are universal function approximators. In that sense, they might be used to learn any probability density function we give to them.
But I was wondering if some model existed to restrict the approximation of any function to the approximation of any continuous probability distribution. Such model would need to be a valid probability distribution by design. This seems to be an important for learning complex and high dimensional probability distributions from data.
It seems to me that Gaussian mixture models might be universal distribution approximators in the limit of an infinite number of components? Is there a more powerful, capable of benefiting from similar a principle that makes the multi-layer approach of ANN extremely capable at modelling very complex data with few parameters?
Or, are there operators on probability density functions, beside the addition, that can modify a pdf into another valid pdf and that can be composed together to create arbitrarily complex probability distributions? Also, I don't know what names such operators have, if they exist.
I am preferably looking for differentiable models and operators, as I imagine potential machine learning applications to this.
For context, I was experimenting with a component-wise non-decreasing neural networks squashed by a sigmoid to model the cumulative distribution, and differentiate once to get a valid pdf by construction. Unfortunately the result seems to be a very unstable model, I know it is capable of being multimodal but seems to fall back to what looks like a badly fitted Gaussian shape nearly every time. Furthermore, even scaling it to two dimensions seems to be too much for this kinds of models, I think it is because the multidimensional cumulative distribution function is already hard to learn by itself in this model, so it is nearly an impossible task to learn densities this way.