Vectorizing Regularization in Linear Regression

109 Views Asked by At

I am wondering if someone could elaborate on the vectorized partial derivative of the MSE cost function. When writing code, I noticed that there seemed to be something wrong with the partial derivative terms that the class was outputting. I used the following formulas:

$$\frac{\partial J(\theta)}{\partial \theta}=\frac{1}{m}X^T(X\theta-y) \in R^{n+1}$$ $$\theta := \theta-\alpha (\frac{\partial J(\theta)}{\partial \theta})$$

Below, I have attached my work through the first iteration, where $\theta_0$ increases despite already being at an ideal value (0). If someone could explain to me where my error is, either in the assumption that $J(\theta)$ always decreases or in my math/formulas. Thanks.

enter image description here

1

There are 1 best solutions below

2
On BEST ANSWER

The first coordinate need not stay at $0$ and in this case, will not stay at $0$ since the coordinate of the gradient is non-zero. It will still converge back to $0$ if the step size is chosen carefully.

We have $\theta_0 = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$.

$\theta_1 = \theta_0 - \alpha\begin{bmatrix} -\frac52 \\ -\frac{15}2 \end{bmatrix}$

In general,

\begin{align}\theta_{k+1} &= \theta_k -\alpha X^T(X\theta_k - y) \\ &=(I-\alpha X^TX)\theta_k + \alpha X^Ty \\ &= \begin{bmatrix} 1-4\alpha & -10\alpha \\ -10\alpha & 1-30\alpha \end{bmatrix} \theta_k +\alpha \begin{bmatrix} 10 \\ 30\end{bmatrix}\end{align}

We have to choose the step size carefully such that it converges. Let me pick $\alpha =0.01$ since the spectral radius is less than $1$.

octave:1> alpha = 0.01
alpha =  0.010000
octave:2> eig([1-4*alpha, -10*alpha; -10*alpha, 1-30*alpha])
ans =

   0.66599
   0.99401

Using the following Python code, we can see the progress:

import numpy as np 

alpha = 0.01
A = np.array([[1-4*alpha, -10*alpha], [-10*alpha, 1-30*alpha]])
b = np.array([10*alpha, 30*alpha])
theta = np.array([0,0])

for i in range(1000):
    theta = np.matmul(A,theta) + b
    if i % 100 == 0:
        print(theta)

and we get the following output:

[0.1 0.3]
[0.16620987 0.94346837]
[0.09116498 0.96899279]
[0.05000337 0.98299276]
[0.02742651 0.99067164]
[0.01504325 0.99488346]
[0.00825112 0.99719361]
[0.00452568 0.99846072]
[0.00248231 0.99915571]
[0.00136153 0.99953691]