I am confused about a definition of directional derivatives I found in the MIT Deep Learning book.
The directional derivative in direction $u$ (a unit vector) is the slope of the function $f$ in direction $u$. In other words, the directional derivative is the derivative of the function $f (x + \alpha u)$ with respect to $\alpha$, evaluated at $\alpha = 0$. Using the chain rule, we can see that $ \frac \partial {\partial \alpha} f(x + \alpha u) $ evaluates to $u^T \nabla_{x} f(x)$ when $ \alpha = 0$
I understand the first sentence but I don't understand what they are trying to say after that.
What I know:
- If $f: \mathbb{R}^n \to \mathbb{R}$, the directional derivative of $f$ at $x_0$ in the direction of the vector $v$ is defined as the limit $$D_v f(x_0) = \lim_{h\to 0}\frac{f(x_0+hv)-f(x_0)}{h}.$$
- Some people use $ \frac {\partial f} {\partial x}$ to talk about $f'(x)$
What I am confused about:
derivative of the function $f (x + \alpha u)$ with respect to $\alpha$
For me this sentence means that we have another function $g(\alpha) = f (x + \alpha u)$ and then compute the derivative of $g$ with respect to $\alpha$
$$ g'(\alpha) = \lim_{h\to 0} \frac {g(\alpha + h) - g(\alpha)} h = \lim_{h\to 0} \frac {f (x + (\alpha + h) u) - f (x + \alpha u)} h $$
But I don't see how this is equivalent to the definition with $D_v f(x_0)$.
Where does $u^T$ come from and what does this $\nabla_{x} f(x)$ even mean? Is it just the partial derivative with respect to $x$?
I only the chain rule written in terms of functions $(f(g(x)))' = f'(g(x)) g'(x)$ I don't understand what chain rule they are applying there.
Please be patient with me, I am just trying to understand how to reason about directional derivatives.