Getting intuition for the coordinates of a gradient vector.

52 Views Asked by At

Let's say I have some function $f(x, y)$ whose gradient at $(x_0, y_0)$ is $$\nabla f(x_0, y_0) = \langle 4, 1\rangle$$

In this 3Blue1Brown video, Grant says something akin to

the change in $f$ is $4\times$ more sensitive to changes in $x_0$ then it is to changes in $y_0$.

when discussing the gradient of a cost function in his explanation of neural networks. Why does this hold?

It seems intuitive to me that changing $x_0$ would result in a larger change in $f$ then changing $y_0$ since the gradient more closely lines up with the $x$-axis but why is it exactly $4\times$?

2

There are 2 best solutions below

0
On BEST ANSWER

The assertion$$\nabla f(x_0,y_0)=(4,1)$$means that, near $(x_0,y_0)$, $f(x,y)$ behaves as $f(x_0,y_0)+4(x-x_0)+y-y_0$. So, near $(x_0,y_0)$, a small change in the value of $x$ is multiplied approximately by $4$, whereas a small change in the value of $y$ is creates a change of approximately the same size. So, yes, near $(x_0,y_0)$ $f$ is $4$ times more sensitive to changes in $x$ than to changes in $y$.

This would still hold if $\nabla f(x_0,y_0)=(4t,t)$, for some $t\ne0$.

0
On

The gradient vector of a function $f:\mathbb{R}^n\to\mathbb{R}$ at a point $(x_1,\dotsc, x_n)$ has the partial derivatives $\frac{\partial f}{\partial x_j}$ as its entries. The j-th partial derivative precisely measures the change of $f$ in the j-th direction (with respect to the standard basis), it is defined as the derivative of the funciton $\mathbb{R}\to\mathbb{R}, t\mapsto f(x + te_j)$ where $x = (x_1,\cdots, x_n)$ and $e_j$ is the j-th standard basis vector. That is in your case partial derivative in the direction of $x$ is 4 and the one in the direction of $y$ is 1, implying that the rate of change in the direction of $x$ is 4 times the rate of change in the direction of $y$.