Why divide regularization factor by size of dataset?

207 Views Asked by Cheshie At 01 Jul 2025 - 2:34

Suppose I'm trying to minimize a cost function:

$$ J(\theta) = \frac {1} {2m} \sum _{i = 1}^ m (h_\theta (x^{(i)}) - y^{(i)})^2 $$

Adding regularization, as seen here, we get:

$$ J(\theta) = \frac {1} {2m} [\sum _{i = 1}^ m (h_\theta (x^{(i)}) - y^{(i)})^2 + \lambda \sum_{i = 1}^n \theta_i^2] $$

My question is: why is the regularization factor also divided by $2m$?