I've been diving into some beginner Calculus due to my interest in machine learning, mostly as a hobby. With that being said, I'm having some trouble understanding how the slope and y-intercept of a linear regression are derived based on an example in the Book, Essential Math for Data Science.
m = 0.0
b = 0.0
L = .002
iterations = 100_000
n = float(len(points))
for i in range(iterations):
D_m = sum(2 * p.x * ((m * p.x + b) - p.y) for p in points)
D_b = sum(2 * ((m * p.x + b) - p.y) for p in points)
m -= L * D_m
b -= L * D_b
print("y = {0}x + {1}".format(m, b))
From what I know of the power rule, if we have two variables we want to partial derivative of, we do each one separately, with the other a constant, which = 0.
f(x) = x^2 + y^3
d_x, wrt x = 2x + 0
d_y wrt y = 3y + 0
For the code above, why do we multiple by x twice in D_m and only once in D_b?
The cost function writes $$\phi(m,b) = \sum_n \left[ (b+mx_n)-y_n \right]^2 $$ The gradient wrt $m$ writes $$ \frac{\partial \phi}{\partial m} \doteq \phi_m = 2 \sum_n x_n \left[ (b+mx_n)-y_n \right] $$ Gradient descent to solve the linear regression problem implies the update $$ m_{k+1} = m_{k} - \eta \ \phi_m (m_{k}) $$ where $\eta$ is the learning rate. I guess now you can clearly identify which part of the code these equations belong to.