Why $\frac{\partial}{\partial\alpha}f(\mathbf{x} + \alpha\mathbf{u})$ evaluates to $\mathbf{u}^T\nabla_{\mathbf{x}}f(\mathbf{x})$

81 Views Asked by At

In Deep Learning (page 85) it is stated that:

Using the chain rule, we can see that $\frac{\partial}{\partial\alpha}f(\mathbf{x} + \alpha\mathbf{u})$ evaluates to $\mathbf{u}^T\nabla_{\mathbf{x}}f(\mathbf{x})$ when $\alpha=0$.

While I think to have understood that $\frac{\partial}{\partial\alpha}f(\mathbf{x} + \alpha\mathbf{u})$ is the directional derivative of $f(\mathbf{x})$ in the direction of $\mathbf{u}$, I still miss how to do the derivation using the chain rule.

Also, does $\alpha=0$ mean that we are taking an infinitesimal step in the direction of $\mathbf{u}$?

2

There are 2 best solutions below

2
On BEST ANSWER

Suppose we are in $\mathbb R^2$. The generalisation to $\mathbb R^n$ is immediate. Write

$$f({\bf x}+\alpha{\bf u}) = f(x_1+\alpha u_1,x_2+\alpha u_2).$$

By the chain rule, $$\frac{\partial}{\partial \alpha}f({\bf x}+\alpha{\bf u}) = \frac{\partial f}{\partial x_1}({\bf x}+\alpha{\bf u}) \frac{\partial(x_1+\alpha u_1)}{\partial \alpha} + \frac{\partial f}{\partial x_2}({\bf x}+\alpha{\bf u}) \frac{\partial(x_2+\alpha u_2)}{\partial \alpha} $$

$$= u_1 \frac{\partial f}{\partial x_1}({\bf x}+\alpha{\bf u}) + u_2 \frac{\partial f}{\partial x_2}({\bf x}+\alpha{\bf u}) = {\bf u}^T \nabla_{\bf x} f({\bf x}+\alpha{\bf u}). $$

0
On

You have to see the expression for $f(\mathbf{x} + \alpha \mathbf{u})$ as a composition of two functions, $h:\mathbb{R} \to \mathbb{R}^n$ and the given $f:\mathbb{R}^n \to \mathbb{R}$. So, if we let $h$ be defined as $\mathbf{h}(\alpha) = \mathbf{x} + \alpha \mathbf{u}$, then, the chain rule gives

$$\frac{\partial}{\partial\alpha} f\circ h\mid_{\alpha = 0} = \nabla_{h(0)}f\cdot \frac{\partial}{\partial\alpha}\mathbf{h}\mid_{\alpha = 0},$$

but, $\nabla_{h(0)}f = \nabla_xf$, $\frac{\partial}{\partial\alpha}\mathbf{h}\mid_{\alpha = 0} = \mathbf{u}$, and the dot product can be expressed either as $(\nabla_xf)^T\mathbf{u}$ or $\mathbf{u}^T\nabla_xf$.