I am not good at mathematics and have been learning ML from Udacity.
In its tutorial video, the tutor says(I concluded it in a short way):
using Gradient to get the steepest slope to go upwards, so in order get minimized loss we go the opposite.
Here is the video starting from time 3:00: https://youtu.be/9ILiZwbi9dA?t=179
my question is : it is not necessary the exactly opposite direction of the gradient(steepest ascent) that the steepest descent goes. As it's 3D, if the gradients goes 0 degree, it doesn't necessary mean the 180 degree points you the steepest descent. So I assume his statement is wrong?
Could someone correct me(using plain language even a laymen can understand, thanks) if I got wrong somewhere?
Thanks
update
After reading John's answer, my understanding is:
At the particular point P we can draw a tangent line A. Let's assume that the 0 degree of A points to the fastest direction for ascending, but the actual fastest direction for descending is 150 degree at point P. Since the tangent line can ONLY have 2 directions which are 0 and 180, so we can only take 180 degree of the line A as our next direction for descending.
Is my understanding correct?
When a function of two variables, $f(x,y),$ has a gradient at a particular point $(x_1,y_1)$, it implies that when you graph the function in three dimensions using the equation $z = f(x,y)$, there is a unique tangent plane to that graph at the point $P_1 = (x_1,y_z,f(x_1,y_1))$ that provides a very good approximation of the function $f$ near $(x_1,y_1)$.
In particular, if you travel along the surface of the three-dimensional graph in any direction from $P_1,$ your path initially is tangent to a straight line through $P_1$ and that line lies exactly in the tangent plane.
It is certainly possible to construct a function such that the steepest increase from some point $(x_1,y_1)$ in the $x,y$ plane is in the direction $0$ degrees from the positive $x$ axis direction, but the steepest decrease is in the direction $150$ degrees. For example, for $(x_1,y_1)$ you could define $f(x,y) = \frac14 x$ everywhere in quadrants I, II, and IV in the plane, but in quadrant III you make a deep V-shaped groove or trough in the graph of the function, centered along the line that goes through the point $(0,0)$ at a $150$-degree angle so that if you travel along the center of the groove starting at $(0,0)$ you decrease $f(x,y)$ faster than if you just follow the plane in the direction $180$ degrees. But the fastest increase is still in the direction exactly along the $x$ axis, $0$ degrees.
Such a function exists, even a continuous function like that exists, but you can never find its gradient at $(0,0),$ because the groove means that no matter how small a neighborhood you pick around $(0,0),$ there are always function values in the groove that are too far below the tangent plane that all the function values in quadrants I, II, and IV lie on.
You can have a direction of steepest descent that is along a different line than the direction of steepest ascent. You can have a gradient. You can have neither of these things. But you cannot have both.
An analogy in a one-variable function is the absolute value function, $g(x) = \lvert x\rvert.$ We are told that the derivative of a function always tells us the direction in which the function increases; and if the derivative tells us the function increases when we increase $x$, then the function decrease when we decrease $x.$ But at $x = 0,$ the function $g(x)$ increases in both directions. How can that be? It is made possible by the fact that $g(x)$ has no derivative at $x = 0.$