I am learning ML and gradient descent and was wondering about something. To find the next optimal step we differentiate with respect to each theta in turn. But aren't there cases when the differential w.r.t theta 0 tells us that doing a step to the right will bring us to a lower point, then theta 1 differential tells us that doing a step forward will bring us to a lower point. Then we update those two simultaneously (do one step right, one forward) and find that we are actually at a higher point than we were. Because while 1,0 and 0,1 are lower than 0,0, 1,1 is actually higher.
Or imagine you are in a graph like this:

and you are at a point 0,0. Now it seems like differentials will always be zero, even though there are clearly lower points. Does the algorithm fail us in this case?
Is my understanding of partial derivatives not correct, or these factors are simply compensated for by the facts that in Gradient Descent we usually repeat the algorithm several times from different starting points and/or the graphs are rarely that complex?
sry if too noob and thanks.