Understanding gradient descent learning rate for curve fitting

236 Views Asked by At

For a piece of code I'm writing, I'm trying to implement gradient descent. I want to use gradient descent to fit data to a formula. To get from a dataset and the formula to a function I can minimize, I use sum of squares. To calculate the derivative of the function, I approximate using dy/dx with a delta of 1e-4. To keep complexity low, I initially chose gradient descent over Newton or Levenberg-Maquardt. Mainly to get around matrix inversion.

The formula I'd like to fit is $y(x) = a + (b \cdot x) + (c \cdot x^2)$

My example dataset looks like this:

X Y
19.666 -0.695
27.777 -1.050
38.9027 -2.075

From this dataset I derived the formula: $$ y(a,b,c) = (-0.695-(a+(19.666 \cdot b)+(19.666^2 \cdot c)))^2 + \\ (-1.050-(a+(27.777 \cdot b)+(27.777^2 \cdot c)))^2 + \\ (-2.075-(a+(38.9027 \cdot b)+(38.9027^2 \cdot c)))^2 $$

I'm starting my gradient descent at $(0,0,0)$. During iterations, I notice that the chosen learning rate appears to be much to big for parameter c, but more appropriate for parameters a and b. With a learning rate of 1e-5, I'm seeing this output:

A B C Lsq-y StepA Step B Step C
7.6403 246.684 8713.64 5.89115 7.6403e-5 0.00246684 0.0871364
-7.6403e-5 -0.00246684 -0.0871364 22245.2 -0.0045501 -0.149888 5.1814

Because the Lsq-y value increases, I stop the execution at this point. I'm diverging.

I'm wondering if gradient descent is the appropriate method, if there are any errors in my approach? I've played around with the learning rate, but haven't found a learning rate that works within say 1 million iterations.

1

There are 1 best solutions below

0
On

There are at least a few possible approaches to this problem. If you do not require writing all of your own code from the bottom up, a straightforward approach is to use R's polyreg extension. The line after the word "(Intercept)" shows the coefficients given by the output.

#Written in R Version 4.2.1, polyreg version 0.8.0
require(polyreg)
x = c(19.666,27.777,38.9027)   #data points used for fit
y = c(-.695,-1.050,-2.075)
df = data.frame(x,y)

polyFit(df, deg = 2)   #polynomial regression function
getPoly time: 0
lm() time: 0.02
Call:
lm(formula = y ~ ., data = plm.xy)
Coefficients:
(Intercept)     `I(x^1)`     `I(x^2)`  
  -1.207575     0.075505    -0.002514  
#code to plot the points and polynomial fit
xx = seq(0,50,.1)
nf = function(x) -1.207575 + 0.075505*x - 0.002514*x^2
nf
function(x) -1.207575 + 0.075505*x - 0.002514*x^2
plot(x, y)
lines(xx, nf(xx))

enter image description here

If you wish to calculate the polynomial fit with your own code, you can write your own polynomial regression. One rendition of this approach is given in Numerical Methods for Engineers by Chapra and Canale, 7th ed., 2015, page 473. It is likely also described in other texts on numerical methods.