What are the legitimate criteria for choosing a given cost function in the implementation of gradient descent algorithm?

85 Views Asked by Bumbble Comm At 30 Mar 2026 - 3:55

I'm working through the introductory material of a Machine Learning course (Stanford's to be precise), and notice that in the lecture notes (see page 4) given by Stanford's Andrew Ng, the cost function $J(\theta)$ is defined as:

$$ J(\theta) = {1\over 2} \sum_{i=1}^m (h_\theta (x^{(i)}) - y^{(i)})^2 $$

Whereas comparing this with what I've found elsewhere, in particular, this online posting of an implementation of a batch gradient descent, by H Kong, is:

$$ J(\theta) = {1\over 2m} \sum_{i=1}^m (h_\theta (x^{(i)}) - y^{(i)})^2 $$

The key difference here, is the fact that the later definition multiplies the RHS of former expression by $1\over m$.

So, given the two different definitions of the cost function - both of which claim it is the Least Squares cost function, how do I determine which one to adopt in my own work. What formal criteria are there for choosing between these two, or any other potential variations of the cost function for that matter?

Also, what's the significance of including that $1\over m$ in the later definition?

Original Q&A

What are the legitimate criteria for choosing a given cost function in the implementation of gradient descent algorithm?

Related Questions in STATISTICS

Related Questions in ALGORITHMS

Related Questions in REGRESSION

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions