Interprettion of Gradient of $f(x,y)=x$

80 Views Asked by At

let $f: \mathbb R^2 \to \mathbb R,\ (x,y) \mapsto x$

now the graph of $f(x,y)=x$ looks like this:

the gradient is: $\nabla f(x,y) =\begin{pmatrix} 1 \\ 0 \end{pmatrix}$

I'm trying to get a proper understanding here. I always thought of the gradient at a specific point $(x_0,y_0)$ pointing in the direction of the steepest ascent.

I think I have a hard time grasping the sentence "in the direction of the steepest ascent". That's not literally a vector in 3D right? It'd be a vector on the x-y-plane right? So

enter image description here

So when thinking about the gradient, I shouldn't really think about the graph but the codomain? Because I can't really make sense of it otherwise.

2

There are 2 best solutions below

2
On BEST ANSWER

The gradient is a vector in 2D ($\nabla f=(\partial_x f, \partial_y f)$). Take a point $x$ in $\mathbb{R}^2$. Also take the vector $v:=\nabla f(x)/|\nabla f (x)|$. Then, among all points $x+u$, $u \in \mathbb{R}^2$, $|u|=1$, the (linearized) function $f$ has its maximum value at $x+v$.

That's what it means 'direction' (unitary vector) 'of steepest ascent' (such that if you move from $x$ according to this vector, the linearisation of $f$ has its maximum value).

So the gradient has to be 2D, as you sum it to $x$, a point in 2D, and since it is the collection of two derivatives.


With 'linearized $f$' I mean the function $g(y)=f(x)+\nabla f(x) \cdot (y-x)$, the best first order approximation of $f$ near $x$. Luckily for you, in your case, the linearization is the function itself. But say we had a more weird looking $f$.

We use this function since we are considering a local property of $f$, i.e. the 'instant' changing behaviour of $f$ at $x$, when moving towards some direction: if we looked at the change of $f$ (not $g$) at points at distance $1$ from $x$, we'd lose control of the local behaviour of $f$ near $x$, since $f$ can behave very differently at distant points. By looking at the linearized version, we're sure we are looking at something that resembles $f$ at our point of interest, and that retains that information also at distant points.

But why do we look at points at distance $1$ from $x$? Because this is how one defines a direction (a unitary vector), but also because if you allowed $y$ to be as distant as you want from $x$ you could get an ascent as big or as small as you want. So, we limit ourselves to unitary vectors to end up with a well posed question.

0
On

It is the direction of the steepest ascent defined in the function domain.

In other words, it tells you in which direcction your $(x,y)$ input has to move to make the image $f(x,y)$ bigger.