I know this question is quite vague, but I need some indication. I have a problem where I have a probability distribution $\mu$ on $\mathbb R^d$ and I want to find a differentiable function $f:\mathbb R^d\to\mathbb R^d$ minimizing $$F(f):=\int\frac12\|f\|^2+\nabla\cdot f\:{\rm d}\mu$$ (the divergence is applied componentwisely). I only need to know $f$ on a finite grid $G\subseteq\mathbb R^d$ of equidistant points. It can be assumed that $\mu$ is only supported on the bounding box of this grid. I cannot sample from $\mu$, since it is unknown to me. I only have i.i.d. samples from it.
How is such a problem usually solved? I guess it will be important to fix a function space (Banach space) on which I formally define $F$ for the minimization.
Gradient descent cannot be applied here (due to the unavailability of $\mu$), but maybe "stochastic" gradient descent? I've actually never touched that topic before (though I'm completely familiar with MCMC and stochastic analysis; so it should be easy to understand for me. However, I only find articles which seem to be written for practitioners and it's for some reason hard to understand for me as someone, which is used to work more mathematically rigorous).
Any advice is highly appreciated!