Gradient descent's cost function: Mean Squared Error vs. Sum of Squared Errors

1.5k Views Asked by user18516 At 30 Mar 2026 - 2:04

In many introductory Machine Learning textbooks or online resources, the cost function to be optimized with gradient descent to find a linear regression model is the Mean Squared Error (MSE), defined as: $$MSE=\frac{1}{n}\sum_i (x_i -\hat{x}_i)^2$$ (often multiplied by 1/2 for derivation convenience). But why use MSE instead of Sum of Squared Errors (SSE, aka Residual Sum of Squares or SSR), namely $$SSE=\sum_i (x_i -\hat{x}_i)^2$$ and get stuck with the 1/n factor?

Original Q&A

Gradient descent's cost function: Mean Squared Error vs. Sum of Squared Errors

Related Questions in MACHINE-LEARNING

Related Questions in LINEAR-REGRESSION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions