Gradient descent/ nonlinear optimization intuition needed

1k Views Asked by At

all. I'm taking an introductory AI class, and we're using the gradient descent algorithm to find the optimized/ lowest cost of a set of thetas (variable coefficients) to best fit a regression line. In the attached image (provided by course, please ignore markings from lecture) we are shown that to optimize all of the theta values, each is updated during each iteration of the gradient descent algorithm. Each should converge on its optimal value.

While I can implement this in octave, what I don't understand is, with multiple thetas being optimized, if one (or more) theta has reached its optimal value, but the algorithm continues to optimize the others, how does the already optimized theta not get altered/ moved away from its optimized value? In the slide, take theta1... if it is at its optimal value, and theta2 is still being optimized, in a given iteration, at theta 1, there is still a vector of error values (h(x) - y) of length m. Multiply this by each row's x value for that theta, and there will be a value, that is added to the current value of theta. So.. when a theta is optimized, but the algo continues, what makes a given theta value 'stay put?' Hope that makes sense. Thanks in advance.

enter image description here

Brief notation notes: alpha = scalar value to adjust degree of update, m = examples in data set, h theta(x) = the collective variable values * the current thetas, y = target variable in each row.