I want to explain a disadvantage of Gradient Descent where the gradient itself doesn't give information about how far we are away from the local/global minimum.
Say we have two functions with an intersecting point that has the same gradient for both functions. But assume that the point is nearer to an optimum of one function than to the other. After a gradient descent step, we would see that one function would approach the minimum better than the other. An example would be some functions like here.
For that, I am looking for two rather simple 1) 3D functions $f: \mathbb{R}² \rightarrow \mathbb{R}$ and $g: \mathbb{R}² \rightarrow \mathbb{R}$ that have an 2) intersecting point $f(a,b) = g(a,b)$, that also has the 3) same gradient $\triangledown f(a,b) = \triangledown g(a,b) $, for that given point $(a,b)$. But these points should have 4) different distances to the optimum.
I couldn't think of a way to create such functions. Can somebody give me a hint, on how to approach this problem?
Let $z=(x,y)^T$, and consider the two quadratic functions defined as follow:
$$ f(z) = a + b^Tz + \frac{1}{2}(z-z_0)^TC_f(z-z_0) $$ $$ g(z) = a + b^Tz + \frac{1}{2}(z-z_0)^TC_g(z-z_0), $$
where $a, b, C_f$ and $C_g$ are respectively of dimension $1 \times 1$, $2\times 1$ and $2 \times 2$ and comprise parameters. The corresponding gradients are respectively
$$ \frac{\partial f}{\partial z}(z) = b + C_f(z-z_0) $$ $$ \frac{\partial g}{\partial z}(z) = b + C_g(z-z_0) $$
So we have as required, at $z=z_0$, the equalities: $$ f(z_0)=g(z_0)= a+b^Tz_0 $$ $$ \frac{\partial f}{\partial z}(z_0) = \frac{\partial g}{\partial z}(z_0) = b $$
Then, by suitable choice of $C_f$ and $C_g$ we can reach a minimum for $f$ which is closer to the starting point $z_0$ than the minimum of $g$. The minimum of $f$ and $g$ are respectively, assuming that $C_f$ and $C_g$ are full rank matrices: $$z^*_f = z_0 - C_f^{-1}b, $$ $$z^*_g = z_0 - C_g^{-1}b. $$ Take for instance $C_f^{-1}$ to be diagonal with $1/2$ on the main diagonal and $C_g^{-1} = 100I_2$.