I have the error function
$E(\mathbf{w}) = \dfrac{1}{2} \sum\limits_{n = 1}^N \{ y(x_n, \mathbf{w}) - t_n \}^2$,
where
$y(x, \mathbf{w}) = w_0 + w_1x + w_2x^2 + \dots + w_Mx^M = \sum\limits_{j = 0}^M w_j x^j$
This is the sum of squares of the errors between the prediction $y(x_n, \mathbf{w})$ for each data point $x_n$ and the corresponding target values $t_n$.
By substitution, we have
$E(\mathbf{w}) = \dfrac{1}{2} \sum\limits_{n = 1}^N \left( \sum\limits_{j = 0}^M w_jx^j_n - t_n \right)^2$
When finding the minimum of the error function, we set it equal to $0$:
$\dfrac{\partial{}}{\partial{}w_i} E(\mathbf{w}) = \sum\limits_{n = 1}^N x_n^i \left( \sum\limits_{j = 0}^M w_jx_n^j - t_n \right) = 0$
The error function is always positive. However, that does not necessarily mean that it is always increasing. This means that it could have many critical values, right? In which case, how do we know that we are solving for the global minimum and not a local minimum? My textbook says to minimise the error function, but that only makes sense if there's a global minimum, right?
So in reality when we set it equal to $0$, as was done above, what critical point are we actually solving for?
I've become very confused thinking about this, so I'd appreciate any help and explanations to clear this up.
This is an ordinary linear least squares problem, https://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)#Computation.
Because the optimization problem is convex (convex quadratic objective function with no constraints), every stationary point is a global minimum.
If the problem is non-degenerate (full rank), there will be a unique global minimizing solution, otherwise there could be an infinity of solutions, all obtaining the same global minimum objective value. There is widely available specialized software to robustly solve such problems. See https://stats.stackexchange.com/questions/160179/do-we-need-gradient-descent-to-find-the-coefficients-of-a-linear-regression-mode/164164#164164