is it wrong? -> using Gradient to get the steepest slope to go upwards, so in order get minimized loss we go the opposite

190 Views Asked by At

I am not good at mathematics and have been learning ML from Udacity.

In its tutorial video, the tutor says(I concluded it in a short way):

using Gradient to get the steepest slope to go upwards, so in order get minimized loss we go the opposite.

Here is the video starting from time 3:00: https://youtu.be/9ILiZwbi9dA?t=179

my question is : it is not necessary the exactly opposite direction of the gradient(steepest ascent) that the steepest descent goes. As it's 3D, if the gradients goes 0 degree, it doesn't necessary mean the 180 degree points you the steepest descent. So I assume his statement is wrong?

Could someone correct me(using plain language even a laymen can understand, thanks) if I got wrong somewhere?

Thanks

update

After reading John's answer, my understanding is:

At the particular point P we can draw a tangent line A. Let's assume that the 0 degree of A points to the fastest direction for ascending, but the actual fastest direction for descending is 150 degree at point P. Since the tangent line can ONLY have 2 directions which are 0 and 180, so we can only take 180 degree of the line A as our next direction for descending.

Is my understanding correct?

2

There are 2 best solutions below

7
On BEST ANSWER

When a function of two variables, $f(x,y),$ has a gradient at a particular point $(x_1,y_1)$, it implies that when you graph the function in three dimensions using the equation $z = f(x,y)$, there is a unique tangent plane to that graph at the point $P_1 = (x_1,y_z,f(x_1,y_1))$ that provides a very good approximation of the function $f$ near $(x_1,y_1)$.

In particular, if you travel along the surface of the three-dimensional graph in any direction from $P_1,$ your path initially is tangent to a straight line through $P_1$ and that line lies exactly in the tangent plane.

It is certainly possible to construct a function such that the steepest increase from some point $(x_1,y_1)$ in the $x,y$ plane is in the direction $0$ degrees from the positive $x$ axis direction, but the steepest decrease is in the direction $150$ degrees. For example, for $(x_1,y_1)$ you could define $f(x,y) = \frac14 x$ everywhere in quadrants I, II, and IV in the plane, but in quadrant III you make a deep V-shaped groove or trough in the graph of the function, centered along the line that goes through the point $(0,0)$ at a $150$-degree angle so that if you travel along the center of the groove starting at $(0,0)$ you decrease $f(x,y)$ faster than if you just follow the plane in the direction $180$ degrees. But the fastest increase is still in the direction exactly along the $x$ axis, $0$ degrees.

Such a function exists, even a continuous function like that exists, but you can never find its gradient at $(0,0),$ because the groove means that no matter how small a neighborhood you pick around $(0,0),$ there are always function values in the groove that are too far below the tangent plane that all the function values in quadrants I, II, and IV lie on.

You can have a direction of steepest descent that is along a different line than the direction of steepest ascent. You can have a gradient. You can have neither of these things. But you cannot have both.

An analogy in a one-variable function is the absolute value function, $g(x) = \lvert x\rvert.$ We are told that the derivative of a function always tells us the direction in which the function increases; and if the derivative tells us the function increases when we increase $x$, then the function decrease when we decrease $x.$ But at $x = 0,$ the function $g(x)$ increases in both directions. How can that be? It is made possible by the fact that $g(x)$ has no derivative at $x = 0.$

1
On

What the speaker said was correct, "locally", which means that if you stand on a reasonably smooth part of a mountain, there's some path through the place you're standing -- call it $P$ -- that "climbs up fastest", right? One that gains altitude as fast as possible. If you drew that path on the ground in white paint, at the point $P$, you could draw a line tangent to the path. Going uphill along the tangent is the best thing you could do if you wanted to instantaneously gain altitude; going downhill along that tangent is the best thing you could do to instantaneously LOSE altitude. And because the tangent line is a straight line, those two ideal directions are 180 degrees apart.

Now the path that you've drawn in white paint may wander to the left and right a bit, so the best possible direction at $P$ may be different from the best possible direcition at $P'$, where $P'$ is 10 feet away from $P$. But if you're right AT the point $P$, those two "fastest rise" and "fastest fall" directions are in fact opposite.

One very important part of that claim is that the place on the mountainside where you're standing is "reasonably smooth". But it turns out that in the mathematical details, "having a gradient" is exactly the condition needed to be "reasonably smooth".