Linear regression using gradient descent: is the whole weight vector updated with the same number?

1.5k Views Asked by At

I'm using gradient descent with mean squared error as error function to do linear regression. Take a look at the equations first. enter image description here As you can see in eq.1, the prediction is done with a bias term b and a weight vector W. Eq.2 shows the error function (MSE) while eq.3 shows the partial derivatives used to update the weights (eq.4). My question is, should all the weights in the weight vector be updated each iteration by the same number? It seems like eq.3 should return a single number; not a vector.

2

There are 2 best solutions below

0
On BEST ANSWER

Maybe the other previous answers are correct but I did not understand them.

Looking at the left hand equation 3 in the original post, $x_i$ is a vector and not a scalar which results in $dL/dW$ being a vector, not a scalar which I misunderstood at first. $\Alfa$ remains a scalar but the weights in the W vector are updated differently each iteration.

3
On

Equations are usually used in this form when you're working in a neural network kind setting, where the bias term is also a vector.

In the case of linear regression, since the bias term is a single scalar, a more intuitive way to look at these equations is to treat the summation like $\sum_{j=0}^{p}W_jx_{ij}$ by treating $x_{i0}$ as 1, and b as $W_0$.

Programmatically speaking, this translates to padding your data matrix with a column of 1's to the left. You'll find this being done often in a bunch of tutorials that walk you through linear regression!

This gets rid of the second gradient equation and reduces it to only updating W with each iteration. Hope this helps :)