I'm currently doing Andrew's course, and in this course there's a part that he shows the partial derivative of the function $\frac{1}{2m}\sum_{i=1}^{m}(H_\Theta(x^i)-y^i)^2$ for both $\Theta_0$ and $\Theta_1$. But I couldn`t wrap my mind around it. I would like to see a step by step derivation of the function for both $\Theta$s.
The Hypothesis Function is defined as $H_\Theta=\Theta_0+\Theta_1x$
And the partial derivatives are
For $\Theta_0$
$\frac{1}{m}\sum_{i=1}^{m}(H_\Theta(x^i)-y^i$
For $\Theta_1$
$\frac{1}{m}\sum_{i=1}^{m}(H_\Theta(x^i)-y^i)x^i$
Consider I have a function $u = x^2 +1$ and $f(x)=u^2=(x^2+1)^2$. Following chain rule, I will get the derivative:
$\frac{df}{dx}= \frac{df}{du} * \frac{du}{dx}= 2(x^2+1) * 2x$
Do the same with machine learning problem, call $f = \frac{1}{2m}\sum_{i=1}^{m}(H_\Theta(x^i)-y^i)^2$, just focus on $u = H_\Theta(x^i)-y^i$, we can see that:
$\frac{df}{du} = 2*(H_\Theta(x^i)-y^i)$
With $\Theta_0$: $\frac{du}{dx} = 1$
With $\Theta_1$: $\frac{du}{dx} = x^i$
Number $2$ is shortened with $2m$ equal $m$