Why is $\lim_{t \to 0} \frac{f(x + t(y-x)) - f(x)}{t} = \nabla f(x)^T\cdot (y-x)$?

152 Views Asked by At

What I want to understand

I am trying to understand why the following holds:

$$\lim_{t \to 0} \frac{f(x + t(y-x)) - f(x)}{t} = \nabla f(x)^T\cdot (y-x)$$

with $f: \mathbb{R}^p \mapsto \mathbb{R}$, and $x,y \in \mathbb{R}^p$, $t \in \mathbb{R}^+$

Also I am using the following definition of the gradient:

$\nabla{f(x)} = \begin{bmatrix} \frac{\partial f}{\partial x_1}(x) & \frac{\partial f}{\partial x_2}(x) & \dots & \frac{\partial f}{\partial x_p}(x) \end{bmatrix}^\intercal$

with $x_i$ being the $i$-th element of $x$.

Why I want to understand it

It is used when proving that a convex function always lies above its tangent line (see p. 5-6 of this example if you want)

What I know

The solution probably has something to do with the definition of the derivative as the limit of a difference quotient:

$$\lim_{\Delta z \to 0} \frac{g(z + \Delta z) - g(z)}{\Delta z} = g'(z)$$

With $z, \Delta z \in \mathbb{R}$, and $g: \mathbb{R} \mapsto \mathbb{R}$

However, I do not understand exactly how (or if) I can use this to arrive at this higher dimensional limit at the top of my question from this definition (or what additional information I need)

Thank you very much!

1

There are 1 best solutions below

2
On BEST ANSWER

I'm adding this as an answer so as to not have too many comments.

Thanks for editing your questions several times. It now (as far as I can tell) makes perfect mathematical sense, although there is still a missing link: you did not give a definition of the gradient.

I know I'm being very annoying, but there is a reason for that: the expression of the gradient in your question can actually itself be taken as a definition of the gradient. This is called the Gâteaux derivative of a function (see https://en.wikipedia.org/wiki/Gateaux_derivative).

Any answer to you question needs to know what definition you are considering, of course.

Best,

EDIT :

Thanks for adding the gradient definition. Note that your definition can be rephrased as : $$ (\nabla f)_i(x) = \nabla ^T f(x)\cdot e_i = \frac{\partial f}{\partial x_i}(x) = \frac{d}{dt} f(x+t*e_i) = \lim_{t->0} \frac{f(x+t*e_i) - f(x)}{t} $$ Where $(e_i)$ is the base in which you defined the coordinates.

So that at list when $y-x$ is one of the basis vectors, you have your answer. Now if you write $ z = y-x = \sum z_i* e_i$, and use the chain rule, you will get your result.