Partial derivative of MSE cost function in Linear Regression?

15.9k Views Asked by At

I'm confused by multiple representations of the partial derivatives of Linear Regression cost function.

This is the MSE cost function of Linear Regression. Here $h_\theta(x) = \theta_0+\theta_1x$ .

\begin{aligned}J(\theta_0,\theta_1) &= \frac{1}{m}\displaystyle\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2\\J(\theta_0,\theta_1) &= \frac{1}{m}\displaystyle\sum_{i=1}^m(\theta_0 + \theta_1x^{(i)} - y^{(i)})^2\end{aligned}

Are these the correct partial derivatives of above MSE cost function of Linear Regression with respect to $\theta_1, \theta_0$? If there's any mistake please correct me.

\begin{aligned}\frac{dJ}{d\theta_1} &= \frac{-2}{m}\displaystyle\sum_{i=1}^m(x^{(i)}).(\theta_0 + \theta_1x^{(i)} - y^{(i)})\\ \frac{dJ}{d\theta_0} &= \frac{-2}{m}\displaystyle\sum_{i=1}^m(\theta_0 + \theta_1x^{(i)} - y^{(i)})\end{aligned}

1

There are 1 best solutions below

6
On BEST ANSWER

The derivatives are almost correct, but instead of a minus sign, you should have a plus sign. The minus sign is there if we differentiate

$$J = \dfrac{1}{m}\sum_{i=1}^m\left[y_i-\theta_0-\theta_1 x_i\right]^2$$

If we calculate the partial derivatives we obtain

$$\dfrac{\partial J}{\partial \theta_0}=\frac{2}{m}\sum_{i=1}^{m}[y_i-\theta_0-\theta_1x_i]\cdot\left[-1 \right]$$ $$\dfrac{\partial J}{\partial \theta_1}=\frac{2}{m}\sum_{i=1}^{m}[y_i-\theta_0-\theta_1x_i]\cdot\left[-x_i \right]$$

In order to find the extremum of the cost function $J$ (we seek to minimize it) we need to set these partial derivatives equal to $0$ $$\dfrac{\partial J}{\partial \theta_0}=\frac{2}{m}\sum_{i=1}^{m}[y_i-\theta_0-\theta_1x_i]\cdot\left[-1 \right]=0$$ $$\implies \sum_{i=1}^{m}[y_i-\theta_0-\theta_1x_i]=0$$ $$\dfrac{\partial J}{\partial \theta_1}=\frac{2}{m}\sum_{i=1}^{m}[y_i-\theta_0-\theta_1x_i]\cdot\left[-x_i \right]=0$$ $$\implies \sum_{i=1}^{m}[y_i-\theta_0-\theta_1x_i]\cdot\left[x_i\right] = 0.$$

As we divide by $ -2/m $ for both cases we will obtain the same result. If you had $ +2/m $ then you would divide by $ 2/m $ and still obtain the same equations as stated above. If the equation that we need to solve are identical the solutions will also be identical.