Zig-zag Behavior of the Gradient Descent method.

568 Views Asked by At

It is said that the steepest descent method has a zig-zag behavior, so the search directions of two successive iterations are orthogonal to each other. Now, I don't understand why we have to zig-zag while considering a simple function $x^2 + y^2$. The negation of gradient is $(-2x, -2y)^T$. If we start from $(10,0)^T$, we have $(-20,0)^T$. Assuming step size $\alpha = 0.1$, we get to a better point at $(x,y)^T = (8,0)^T$ and the direction is $(-16,0)^T$, which is not orthogonal to the previous one. To me it seems we are just moving on the x-axis to the left and sooner or later we will get to $(0,0)^T$ that is the minimum.