Assume I want to find the mimimum of some function $f(x)$ but I can only compute $$f(x) + \varepsilon,$$ where $\varepsilon$ is some random variable with mean 0. With enough computation power I can shrink the variance of $\varepsilon$, but that's costly.
Standard gradient descent algorithms do not work, because numeric finite-difference approximations of the gradient are really bad (+unstable) with the noise.
My parameter space is fairly small so that grid-search is an option, but it assume there are better ways. What are standard approaches to such problems?
Background: In each function evaluation I have to simulate several repetitons of a random process. I am using this for a simulated method of moments estimation.
For those coming here via a search:
I found two workable solutions.
R, usemlrMBO). But it wasn't working very well for me (it was slow and hard to adapt to my application).