For a piece of code I'm writing, I'm trying to implement gradient descent. I want to use gradient descent to fit data to a formula. To get from a dataset and the formula to a function I can minimize, I use sum of squares. To calculate the derivative of the function, I approximate using dy/dx with a delta of 1e-4. To keep complexity low, I initially chose gradient descent over Newton or Levenberg-Maquardt. Mainly to get around matrix inversion.
The formula I'd like to fit is $y(x) = a + (b \cdot x) + (c \cdot x^2)$
My example dataset looks like this:
| X | Y |
|---|---|
| 19.666 | -0.695 |
| 27.777 | -1.050 |
| 38.9027 | -2.075 |
From this dataset I derived the formula: $$ y(a,b,c) = (-0.695-(a+(19.666 \cdot b)+(19.666^2 \cdot c)))^2 + \\ (-1.050-(a+(27.777 \cdot b)+(27.777^2 \cdot c)))^2 + \\ (-2.075-(a+(38.9027 \cdot b)+(38.9027^2 \cdot c)))^2 $$
I'm starting my gradient descent at $(0,0,0)$. During iterations, I notice that the chosen learning rate appears to be much to big for parameter c, but more appropriate for parameters a and b. With a learning rate of 1e-5, I'm seeing this output:
| A | B | C | Lsq-y | StepA | Step B | Step C |
|---|---|---|---|---|---|---|
| 7.6403 | 246.684 | 8713.64 | 5.89115 | 7.6403e-5 | 0.00246684 | 0.0871364 |
| -7.6403e-5 | -0.00246684 | -0.0871364 | 22245.2 | -0.0045501 | -0.149888 | 5.1814 |
Because the Lsq-y value increases, I stop the execution at this point. I'm diverging.
I'm wondering if gradient descent is the appropriate method, if there are any errors in my approach? I've played around with the learning rate, but haven't found a learning rate that works within say 1 million iterations.
There are at least a few possible approaches to this problem. If you do not require writing all of your own code from the bottom up, a straightforward approach is to use R's polyreg extension. The line after the word "(Intercept)" shows the coefficients given by the output.
If you wish to calculate the polynomial fit with your own code, you can write your own polynomial regression. One rendition of this approach is given in Numerical Methods for Engineers by Chapra and Canale, 7th ed., 2015, page 473. It is likely also described in other texts on numerical methods.