How to calculate the partial derivatives of m and b using a cost function?

177 Views Asked by At

I've been diving into some beginner Calculus due to my interest in machine learning, mostly as a hobby. With that being said, I'm having some trouble understanding how the slope and y-intercept of a linear regression are derived based on an example in the Book, Essential Math for Data Science.

m = 0.0
b = 0.0

L = .002

iterations = 100_000

n = float(len(points))

for i in range(iterations):
    D_m = sum(2 * p.x * ((m * p.x + b) - p.y) for p in points)

    D_b = sum(2 * ((m * p.x + b) - p.y) for p in points)

    m -= L * D_m
    b -= L * D_b

print("y = {0}x + {1}".format(m, b))

From what I know of the power rule, if we have two variables we want to partial derivative of, we do each one separately, with the other a constant, which = 0.

f(x) = x^2 + y^3

d_x, wrt x = 2x + 0

d_y wrt y = 3y + 0

For the code above, why do we multiple by x twice in D_m and only once in D_b?

1

There are 1 best solutions below

2
On

The cost function writes $$\phi(m,b) = \sum_n \left[ (b+mx_n)-y_n \right]^2 $$ The gradient wrt $m$ writes $$ \frac{\partial \phi}{\partial m} \doteq \phi_m = 2 \sum_n x_n \left[ (b+mx_n)-y_n \right] $$ Gradient descent to solve the linear regression problem implies the update $$ m_{k+1} = m_{k} - \eta \ \phi_m (m_{k}) $$ where $\eta$ is the learning rate. I guess now you can clearly identify which part of the code these equations belong to.