I have an unknown function $f:\mathbb{R}^2\rightarrow \mathbb{R}^2$ for which I'm determining a first order Taylor approximation through a non-linear optimization process in six variables (the coefficients of the Taylor approximation).
To do this I am maximising a similarity function $g:\mathbb{R}^6\rightarrow \mathbb{R}$ such that $0\le g(\bar{x}) \le 1$ where $\bar{x}$ is the Taylor coefficients that I'm searching for.
I have no problems with the optimization in it self, but what I would like to do is inspect the shape of $g$ to learn more about the properties of the function.
More specifically I would like to quantitatively determine how "close" do I need to be to the global maximiser $\bar{x}^*$ for a descent-type algorithm to converge on the global maximiser instead of any local maxima. Or conversely how frequent are local maxima around the global maxima?
So my question is: Does any one know of a good way to visualize the shape of $g$? For a function $\mathbb{R}^2 \rightarrow \mathbb{R}$ I would just show the function value as height on a surface over the parameters, but this won't work for $g$ as I can't really see in 6 dimension. :/
Visualizing a function of one variable in a graph is straightforward. For more variables we could use series of graphs, colours or animations, but it quickly gets confusing and since space, time and colour are perceived differently, there are different ways to assign the variables to their visual representation. For example, we can easily differentiate the wavy $\sin(x)$ and triangular $(\arcsin(\sin(x))$ when we see them in a x-y graph, but not as a colour gradient or movement.
The other problem is of computational nature. For a function $f:\mathbb{R} \rightarrow \mathbb{R}$ if we want to compare the value at the origin with the nearest neighbours, we need to compute $f(-1), f(0), f(1)$. A function $f:\mathbb{R}^2 \rightarrow \mathbb{R}$ requires 9 points $$(-1,1), (0,1), (1,1)$$ $$(-1,0),(0,0),(1,0)$$ $$(-1,-1),(0,-1),(1,-1)$$ and for a 6-dimensional function we need to compute $3^6=729$ points, and yet it gives us only a little information about the gradient and tells us nothing about its local extrema and behaviour for small and big numbers
Of course we could make a series of plots by uniformly sampling the entire space at N coordinates for each dimension, but while it would give us a good idea of the function, we would need to compute $N^D$ points, which quickly gets out of hand. Instead we can look at the behaviour of the function around one specific point $\vec x_0$ by evaluating the function along lines through $\vec x_0$ parallel to the unit vectors $\vec e_n$ $$f_n(t) = f(\vec x_0 + t\vec e_n)$$
Here is an example of a $\mathbb{R}^4 \rightarrow \mathbb{R}$ function evaluated along the 4 main axes and all 12 combinations of 2 different axes
There are 16 plots, 100 points each. A uniform sampling would give us a hypercube with a resolution of mere $\sqrt[4]{1600} \approx 6$.
From this we can already see that the function is unbounded, there is one local minimum around the origin, the third coefficient introduces oscillations which can be responsible for many local minima and so on.
One can also minimize the function numerically and look at the intermediate results, whether there are any trends or correlations. Imagine walking from the top of Mount Everest to the bottom of the Mariana Trench without ever moving upwards. and then look at the recorded GPS route. you will probably notice that the latitude will mostly stay constant and the longtitude will have a clear trend until halfway through, where the behavoiur will switch because you will need to take a detour through Japan to avoid other ocean ridges. It can give you a rough idea whether the function has big monotone areas or many local extrema.