Machine Learning gradient descent clarification

56 Views Asked by At

I was watching Andrew Ng's CS229 ML lectures on youtube and i noticed something when he was explaining gradient descent using contour plots.contour plot

He's showing what theta gets updated to at each iteration of gradient descent. He says that it always goes in a direction orthogonal to the rings. In the image, it shows that the optimal value for $\theta_0$ is around 25, we started at around $30$. He explains that in linear regression the cost function looks like some sort of bowl and it has one global minimum. I'm just wondering given how we update $\theta_0$, how is it possible that in the first iteration $\theta_0$ goes further away from the minimum (from $30$ to $>30$). If we update $\theta_0$ based on its gradient, how does it go further away from the minimum?