Hopefully this question is on topic for the mathematics community (rather than the statistics) since the optimization method I am using relies on a statistical model. Well I am building an algorithm for constrained optimization, so a problem like,
$$\min_x f(x)$$ such that $$c(x)\leq 0$$ where $f(x)$ is scalar valued. I don't have any gradient information about $f(\cdot)$ or $c(\cdot)$ (so the derivative free optimization case), and in fact, I don't know the analytical forms of $f(\cdot)$ or $c(\cdot)$ but I do have some inputs $x$ and their corresponding outputs $f(x)$ and $c(x)$. These kind of problems arise in field of computer modeling and are what is known as a blak box optimization problem. So you could kind of think of this as a regression problem where I know some inputs and outputs and I want to fit a response surface for $f(\cdot)$ and $c(\cdot)$ and try to minimize that true function based on the predictive response surface I have built (hopefully I am not omitting too many details).
Basically though, my procedure follows this paper: Filter Methods but like I said I am in the derivative free optimization setting and I assume I don't know the exact forms of $f$ and $c$. I could of course follow what the paper describes, but the problem is I am not using a trust region to solve the subproblem of picking new points, but rather, and using a statistical models that picks the "next best point" based on a predictive criteria.
So its an iterative procedure where I gather my data (the inputs and outputs) in time and so far empirically the procedure works. However, now I want to prove theoretically that my algorithm will converge to the (possibly a local) solution. And there is where I am stuck. I don't really have too much intuition about how to prove this so any suggestions or citations would be very much appreciated.
Here are some of my assumptions so far:
The input space $\mathcal{X}\subset\mathbb{R}^d$ is a known, bounded, convex region such that $f:\mathcal{X}\rightarrow\mathbb{R}$ denotes a scalar-valued objective function and $c:\mathcal{X}\rightarrow\mathbb{R}^m$ denotes a vector of constraint functions.
Other than that I have no other assumptions, although I am willing to assume that $f$ are $c$ are lower bounded, continuous, and twice differentiable (so basically a nice smooth function).
The following is what I would do in such case.
If you know $f(x)$ for a certain set of $x\in{\bf X}$, then you can construct interpolation polynomial for which the error can be estimated. Similarly about $c(x)$.
Denote $P^f_n(x)$ a polynomial such that $\|P^f_n(x) - f(x)\|\to 0$. Similarly $P^c_n(x)$ a polynomial such that $\|P^c_n(x) - c(x)\|\to 0$.
Now approximate $$f^*=\min_{x:c(x)<0} f$$ using $$P_n^*=\min_{x:P^c_n(x)<0} P^f_n(x)$$
Analyze the error $\|f^*-P_n^*\|$ to hopefully find it is $\to0$ as $n\to\infty$.
In order to make such an analysis you have to make some certain assumptions on your functions $f(x)$ and $c(x)$, for example you have to assume some level of smoothness, belonging to some certain function space, existence of expansion in some basis etc, etc. If you success to prove something under such assumptions it doesn't say that this doesn't work without, but this is the only case you can prove so far. Later you can try to prove for another, hopefully expanded subspace of functions.
Another option is to work with finite differences instead of derivatives, which is in some sense similar, to the described above, but may be slightly less general.