Use Mean Value Theorem to show $f(y) = f(x) + \nabla f(x)^T(y-x) + \int\limits_0^1 t(y-x)^T\nabla^2 f(x+\xi (y-x))^T(y-x) dt$

1k Views Asked by At

Claim: Given a $C^2$, convex function $f$ and vectors $x,y \in \mathbb{R}^n, t \in [0,1]$ Suppose that

$$f(y) = f(x) + \nabla f(x)^T(y-x) + \int\limits_0^1 (\nabla f(x+t(y-x))^T-\nabla f(x))^T(y-x) dt$$

Then

$$f(y) = f(x) + \nabla f(x)^T(y-x) + \int\limits_0^1 t(y-x)^T\nabla^2 f(x+\xi (y-x))^T(y-x) dt$$

where $0\leq \xi \leq t$

I think the proof relies on the mean value theorem

Specifically, $$\nabla f(x+t(y-x))-\nabla f(x) = \nabla^2 f(x+\xi (y-x))^T(x+t(y-x)-x) = \nabla^2 f(x+\xi (y-x))^Tt(y-x) $$

While this has structural similarity compared to the first order mean value theorem, I am not sure how this can be proved and searching the literature up and down I just could not find a so called higher dimensional mean value theorem. Typing in mean value theorem + hessian returns no good results.

Can someone please provide a reference to a higher order, generalized, mean value theorem which involves the gradient and the Hessian and was used in this proof?

1

There are 1 best solutions below

1
On BEST ANSWER

Let us define the function $h:[0, 1] \rightarrow \mathbb{R}$ given by \begin{align} h(t)= \nabla f(x+t(y-x))^T(y-x). \end{align} Assuming $f \in C^2$ implies $h$ is continuously differentiable. For any fixed $t \in (0, 1]$, we apply the mean value theorem to see there exists $t^\ast \in (0, t)$ such that \begin{align} th'(t^\ast) =&\ h(t) - h(0)\\ =&\ \nabla f(x+t(y-x))^T(y-x)- \nabla f(x)^T(y-x)\\ =&\ [\nabla f(x+t(y-x))-\nabla f(x)]^T(y-x). \end{align}

Next, let us compute $h'(t)$. Observe \begin{align} h'(t) =&\ \frac{d}{dt}\left[\nabla f(x+t(y-x))^T(y-x) \right]\\ =&\ \frac{d}{dt}\left[\sum^n_{i=1} \frac{\partial f(x+t(y-x))}{\partial x_i} (y_i-x_i) \right]\\ =&\ \sum^n_{i=1} \frac{d}{dt}\frac{\partial f(x+t(y-x))}{\partial x_i} (y_i-x_i) \\ =&\ \sum^n_{i=1}\sum^n_{j=1} \frac{\partial^2f(x+t(y-x))}{\partial x_i \partial x_j} (y_j-x_j)(y_i-x_i)\\ =&\ (y-x)^T\nabla^2f(x+t(x-y))(y-x). \end{align} Now, combine everything will yield the desired result.

Remark: The Hessian matrix is symmetric which means $H^T = H$. Also, this result doesn't require $f$ to be convex.