(ML) Gradient Descent Step Simplication Question for Linear regression

Question

(ML) Gradient Descent Step Simplication Question for Linear regression

248 Views Asked by Bumbble Comm At 10 May 2026 - 12:30

I am doing the Coursera Machine Learning course and there is something I am struggling with Gradient Descent applied to the linear regression.

So basically given the definition of the cost function (Mean Squared Error): $J(\theta) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left ( \hat{y}_{i}- y_{i} \right)^2 = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x_{i}) - y_{i} \right)^2$

And the basic steps below (in a 1D case):

Repeat until convergence

$temp0 = \theta_0 - \alpha \frac{\partial}{\partial \theta_0} J(\theta_0, \theta_1)$
$temp1 = \theta_1 - \alpha \frac{\partial}{\partial \theta_1} J(\theta_0, \theta_1)$
$\theta_0 = temp0$
$\theta_1 = temp1$

At some point it is shown that:

$\theta_0 = \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) - y_{i})$
$\theta_1 = \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) - y_{i}) x_{i}\right)$

With a bit of explanation but based on $m$ = one single element:

$\frac{\partial}{\partial \theta_j} J(\theta)=\frac{\partial}{\partial \theta_j} \frac{1}{2} (h_{\theta}(x) - y)^2$
$\Leftrightarrow 2 \frac{1}{2} (h_{\theta}(x) - y) \frac{\partial}{\partial \theta_j} (h_{\theta}(x) - y) = (h_{\theta}(x) - y) \frac{\partial}{\partial \theta_j}(\sum_{i=0}^{n} \theta_i x_i - y )$
$\Rightarrow (h_{\theta}(x) - y) x_j$

I don't really see how we end up on step 2 from step 1, any idea?

I feel there is something missing, even if we put: $h_\theta(x)=\theta_0 + \theta_1 x_i$. I mean the partial derivative is applied on $j$...

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This is the chain rule of differentiation being applied. If you have something like $f(g(\theta_j))$, its (ordinary) derivative with respect to $\theta_j$ is $f'(g(\theta_j)g'(\theta_j) $, where $f'$ and $g'$ are the derivatives of $f$ and $g$.

The same applies here, where $f(t) = \frac{1}{2}t^2$ and $g(\theta_j) = h_{\theta_1, \dots, \theta_j, \dots, \theta_m}(x)$ with all the other $\theta_{i\neq j}$ being considered as fixed. It's just a little bit weird since $h$ is not written as being a function of $\theta$.

We have $f'(t) = t$ and $g'(\theta_j) =\frac{\partial}{\partial \theta_j} (h_\theta(x) - y)$

Applying the chain rule:

$\frac{\partial}{\partial \theta_j} \frac{1}{2}(h_\theta(x) - y)^2 = f'(g(\theta_j))g'(\theta_j)\\ = g(\theta_j)g'(\theta_j)\\ = (h_\theta(x) - y)\frac{\partial}{\partial \theta_j} (h_\theta(x) - y)$

(ML) Gradient Descent Step Simplication Question for Linear regression

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions