Does gradient descent and normal equation give the same answer?

472 Views Asked by At

I tried to optimize for a linear regression model using both approaches and they gave me two completely different answers.

My sample data set was:

df <- data.frame(c(1,5,6),c(3,5,6),c(4,6,8))

Here's the R code I was using to try to calculate gradient descent, perhaps someone can point out the error to me:

lm_gradient_descent2 <- function(df,learning_rate, y_col=length(df),scale=TRUE){

n_features <- length(df) #n_features is the number of features in the data set

#using mean normalization to scale features

if(scale==TRUE){

for (i in 1:(n_features)){
  df[,i] <- (df[,i]-mean(df[,i]))/sd(df[,i])
    }
  }
  y_data <- df[,y_col]
  df[,y_col] <- NULL
  par <- rep(1,n_features)
  df <- merge(1,df)
  data_mat <- data.matrix(df)
  #we need a temp_arr to store each iteration of parameter values so that we can do a 
  #simultaneous update
  temp_arr <- rep(0,n_features)
  diff <- 1
  while(diff>0.0000001){
    for (i in 1:(n_features)){
      temp_arr[i] <- par[i]-learning_rate*sum((data_mat%*%par-y_data)*df[,i])/length(y_data)
    }
    diff <- par[1]-temp_arr[1]
    print(diff)
    par <- temp_arr
  }

  return(par)
}

When I used excel's regression to test, it gave me the same answer as the normal equation approach. So my guess is there's something wrong with my calculations.