First of all apologies for the title, I will change it back when I can think of something better.
Here is my problem: I have a "black-box" function, so I give it an input, gives me an output, but I have no idea of how it works. Presumably it is immensely complicated.
I have already obtained a huge dataset (or training set, if you are ML oriented). This dataset consists of $(\vec{x}_i, y_i)$ pairs, where $\vec{x}_i\in\mathbb{R}^n$ is the input and $y_i\in\mathbb{R}$ is the output.
I have already found the input $\vec{x}_i$ that gives me the global minimum.
Now, I know what is the minimum so problem solved right? Well, not quite. I want to do something different. I want to "create a map" of areas of the search space (the space of the input) which are most likely to give good results. I'm looking for areas, not points.
I was thinking of dividing the space in different areas and see what happens, but I have no idea..
Any ideas of how to approach the problem?
Let me rephrase that Given the dataset, I want to understand which areas are good or bad!
Without making some kind of assumption about the function, you can't do any better than testing a load of random points and reporting their results. (Same as for finding the optimum itself, see No Free Lunch theorem).
In practice almost all real world problems allow for some weak assumptions to be made about the shape of the function, usually smoothness. If you do this then you could fit your favourite simpler parameteric function to the points that you have looked at, such as using a Gaussian Process to model the surface. A GP will let you put in a fairly weak smoothness assumption, then will give you some estimate of your belief about the function values all over the space. It will also suggest good places to test next in order to reduce the uncertainty at interesting and uncertain locations.
If you happen to know anything else about your function then you should think about how to include that in your assumptions too. For example if you know it's extracting the 3rd binary digit of each x and using that in some non-linear way, then you will get much better results by modelling it, maybe as an add-on to the GP model, than just using a basic GP.
For a nice GP python tool , try https://github.com/SheffieldML/GPy .