My textbook, Algorithms for Optimization, by Kochenderfer and Wheeler, says the following:
A point can also be at a local minimum if it has a zero derivative and the second derivative is merely nonnegative:
- $f'(x^∗) = 0$, the first-order necessary condition (FONC)
- $f''(x^∗) \ge 0$, the second-order necessary condition (SONC)
These conditions are referred to as necessary because all local minima obey these two rules. Unfortunately, not all points with a zero derivative and a zero second derivative are local minima, as demonstrated in figure 1.7.
The first necessary condition can be derived using the Taylor expansion about our candidate point $x^*$:
$$f(x^∗ + h) = f(x^∗) + hf'(x^∗) + O(h^2)$$ $$f(x^∗ − h) = f(x^∗) − hf'(x^∗) + O(h^2)$$ $$f(x^∗ + h) \ge f(x^∗) \Rightarrow hf'(x^∗) \ge 0$$ $$f(x^∗ − h) \ge f(x^∗) \Rightarrow hf'(x^∗) \le 0$$ $$\Rightarrow f'(x^∗)=0$$
Appendix C states the Taylor expansion about $a$ as
$$f(x) \approx f(a) + f'(a)(x - a) + \dfrac{1}{2} f''(a)(x - a)^2$$
So if we want the Taylor expansion about our candidate point $x^*$, as the textbook states, then we have
$$f(x) \approx f(x^*) + f'(x^*)(x - x^*) + \dfrac{1}{2} f''(x^*)(x - x^*)^2,$$
which is not $f(x^∗ + h) = f(x^∗) + hf'(x^∗) + O(h^2)$.
Or if we set $x = x^* + h$, we get
$$f(x^* + h) \approx f(a) + f'(a)(x^* + h - a) + \dfrac{1}{2} f''(a)(x^* + h - a)^2,$$
which is not $f(x^∗ + h) = f(x^∗) + hf'(x^∗) + O(h^2)$.
Or if we set $x = x^* + h$ and take the Taylor expansion about $h$ (rather than $x^*$ as was stated), we get
$$f(x^* + h) \approx f(h) + f'(h)(x^*) + \dfrac{1}{2} f''(h)(x^*)^2,$$
which is not $f(x^∗ + h) = f(x^∗) + hf'(x^∗) + O(h^2)$.
So I'm confused as to how the authors' result makes sense? I would greatly appreciate it if someone would please take the time to clarify this.
Have you tried $$x=x^*+h, a=x^*$$?