$O(h^3)$ in the second-order approximation for $f(\mathbf{x}^*)$

276 Views Asked by At

I am currently studying the textbook Algorithms for Optimization by Mikel J. Kochenderfer and Tim A. Wheeler. Chapter 1.6.2 Multivariate says the following:

The following conditions are necessary for $\mathbf{x}$ to be at a local minimum of $f$:

  1. $\nabla f(\mathbf{x}) = 0$, the first-order necessary condition (FONC)

  2. $\nabla^2 f(\mathbf{x})$ is positive semidefinite (for a review of this definition, see appendix C.6), the second-order necessary condition (SONC)

The FONC and SONC are generalizations of the univariate case. The FONC tells us that the function is not changing at $\mathbf{x}$. Figure 1.8 shows examples of multivariate functions where the FONC is satisfied. The SONC tells us that $\mathbf{x}$ is in a bowl.

The FONC and SONC can be obtained from a simple analysis. In order for $\mathbf{x}^*$ to be at a local minimum, it must be smaller than those values around it:

$$f(\mathbf{x}^*) \le f(\mathbf{x} + h \mathbf{y}) \iff f(\mathbf{x} + h\mathbf{y}) - f(\mathbf{x}^*) \ge 0 \tag{1.14}$$

If we write the second-order approximation for $f(\mathbf{x}^*)$, we get:

$$ f(\mathbf{x}^* + h \mathbf{y}) = f(\mathbf{x}^*) + h \nabla f(\mathbf{x}^*)^T \mathbf{y} + \dfrac{1}{2} h^2 \mathbf{y}^T \nabla^2 f(\mathbf{x}^*)\mathbf{y} + O(h^3) \tag{1.15}$$

I'm wondering where the $O(h^3)$ term came from in 1.15? I cannot see why it would algebraically be there?

I would appreciate it if someone would please take the time to clarify this.

3

There are 3 best solutions below

0
On BEST ANSWER

The term $O(h^3)$ means that the estimation error is locally bounded by a third degree polynomial. For example, the second order estimation of $f(x)=e^x$ at $x=2$ is $g(x) = e^2(0.5x^2 - x + 1)$, so $f(x) = g(x) + O(x^3)$. The higher the power of the error term, the more rapidly it goes to $0$ as $x\to 2$. Note that this example shows that the error term does not hold on the entirety of $\mathbb{R}$.

2
On

You don't need the 3-rd order approximation. What you need is the mean-value theorem applied first to the original theorem and then to its derivative. If g is a function of one variable with g'(0)=0, then $$ g(1)-g(0)=g'(t), 0<t<1 $$ $$ =g'(t)-g'(0) $$ $$ =tg''(\tau), 0<\tau<t<1 $$. If $ \phi $ is a function of n variables, $ P_0 $ is a point in $ R^n $ , v is avector with n components and we define $ \omega$ by $$ \omega(t)=\phi(P_0+tv) $$ then $$ \omega'(t)=v\bullet\nabla\phi(P_0+tv) $$ Thus if f is a function of n variables and we define $$ g(t)=f(P_0+tv) $$, we have $$ g'(t)=v\bullet\nabla f(P_0+tv) $$ and $$ g''(t)=vHess(P_0+tv)v^T $$ So if we take a point $ P_1 $ and set $v=P_1-P_0 $ we have $$ f(P_1)-f(P_0) = g(1)-g(0) $$ $$=tvHess(P_0+\tau v)v^T , 0<\tau<t<1$$

1
On

Here’s a bit more on local min and 2nd-/3rd order approximations in several variables, extending my first answer to one more derivative. First, for showing something is a ‘loose’ local min, i.e. $f(P_1) \ge f(P_0) $, knowing that a derivative of whatever order is non-negative, isn’t all that helpful, because being close to something non-negative is not a guarantee of being non-negative, whereas the situation is much better in the case of a strict local min, because being close enough to something positive guarantees being positive. In the case of function g of one variable, with g’(0)=0, we have $$g(1)=g(0)+\frac{1}{2!} g’’(0)+\frac{1}{3!}g’’’(t), 0<t<1 $$. As in the first answer, we’ll extend to a function f of n variables by taking two points $P_0$ and $P_1$ , defining $v=P_1-P_0=[v_1, … ,v_n]$=vector obtained by subtracting the coordinates of $P_0$ from those of $P_1$ and defining $g(t)=f(P_0+tv)$, which gives $$f(P_1)=f(P_0)+\frac{!}{2!}vHess(f(P_0))v^T+\frac{1}{3!}\sum_{1\le I,j,k \le n} f_{ijk}(P_0+tv)v_iv_jv_k,0<t<1$$ The problem with the last term, which we’ll call a cubic expression, the term ‘cubic form’ having another meaning in the literature, is that I haven’t been able to find much in the literature on such expressions, for which there are 3 possible explanations (i) my literature search has been inadequate (ii) not much can be done with such expressions (iii) the topic wouldd be a good area for research. In any event, it seems difficult to use this expression to determine if f has a local min at $P_0.$The only use I have found for such cubic expressions is in finding the torsion at a point on a space curve defined as the intersection of two surfaces $F(x,y,z)=0$ and $G(x,y,z)=0$,which seems not very closely related to local extrema.