Gradient descent's cost function: Mean Squared Error vs. Sum of Squared Errors

1.5k Views Asked by At

In many introductory Machine Learning textbooks or online resources, the cost function to be optimized with gradient descent to find a linear regression model is the Mean Squared Error (MSE), defined as: $$MSE=\frac{1}{n}\sum_i (x_i -\hat{x}_i)^2$$ (often multiplied by 1/2 for derivation convenience). But why use MSE instead of Sum of Squared Errors (SSE, aka Residual Sum of Squares or SSR), namely $$SSE=\sum_i (x_i -\hat{x}_i)^2$$ and get stuck with the 1/n factor?