$\frac{d}{dt} f(x + tu) \mid_{t=0} = \langle \nabla f(x), u \rangle$

76 Views Asked by At

Given a differentiable function $f\colon \mathbb R^n \to \mathbb R$, we have $\frac{d}{dt} f(x + tu) \mid_{t=0} = \langle \nabla f(x), u \rangle$.

Here is a proof, but I have some questions.

Write $x(t) = x + t u = (x_1+tu_1,\ldots x_n+tu_n)$ and apply the chain rule: $$\frac d{dt} f(x(t)) = \sum_{k=1}^n \frac{\partial}{\partial x_k} f(x(t)) \frac{d}{dt}(x_k + t u_k) = \sum_{k=1}^n \frac{\partial}{\partial x_k} f(x(t))u_k = \nabla f(x+tu) \cdot u.$$

I think the use of $x(t) = x + t u = (x_1+tu_1,\ldots x_n+tu_n)$ is confusing since on the second line we have $\sum_{k=1}^n \frac{\partial}{\partial x_k} f(x(t)) \frac{d}{dt}(x_k + t u_k)$. And I think $\partial x_k$ should be read as $\partial (x_k +tu_k)$. Am I correct?

Does the last equality hold? Is $\sum_{k=1}^n \frac{\partial}{\partial x_k} f(x(t))u_k$ a number and $\nabla f(x+tu) \cdot u$ is a vector?

How do we use the information $t=0$?

2

There are 2 best solutions below

8
On

The symbol $\partial_{x_k}f$ denotes the partial derivative of $f$ with respect to the $k$-th coordinate $x_k$. It should not be read as $\partial_{x_k+tu_k}f$.

In the expression $\nabla f(x+tu) \cdot u$ you have a scalar product, so the result is a number.

The evaluation at $t=0$ is performed at the end: the result of $\frac{d}{dt}f(x(t))_{|t=0}$ is in fact $\nabla f(x(0)) \cdot u$.

0
On

Gibbs says in a comment "The point is that you are endowing ℝn with Cartesian coordinates (x1,…,xn). Each partial derivative ∂xk is just a symbol denoting differentiation with respect to the k-th coordinate. "

This is absolutely right. This is hard to explain and endlessly confusing and it's a combination of the fact that 1. the conventional notation is poor and 2. a lot of textbooks are poor.

If you have a function $f$ defined on $\mathbb{R}^n$ then it takes as its input an ordered list of $n$ real numbers (i.e. a vector in $\mathbb{R}^n$). Then it's possible to form $n$ new functions $D_1f,\dots,D_nf$ called the partial derivatives of $f$. You obtain $D_if$ by the formula $$ (D_if)(x) = \lim_{h \to 0} \frac{f(x + he_i) - f(x)}{h}, $$ where $e_i$ is the $i^{th}$ basis vector.

This is an unusual way to say it, but: When you take a partial derivative it is with respect to one of positions in which you can input a number to $f$.

Confusion arises because computing derivatives is often taught as a thing that you do to an expression, i.e. you have some expression for $f(x + tu,y)$ like $$ f(x+tu,y) = x^2 + tu^3 - y $$ and then you ``take'' a partial derivative by differentiating both sides. This ultimately leads to confusion of the form you are asking about when it comes to the chain rule.