Nonlinear approximation with e-squared-function

97 Views Asked by At

I have collected a few measurements (x and y coordinates) and would be interested in how to make a possible approach to the data with this function: $$ y = a \cdot e ^ {b (x + c) ^ 2} $$

I have already made a linear regression, but I would be interested in a regression with the e-function. It must somehow be possible to deduce the e-function to a linear or polynomial case and to determine the parameters. With the normal e-function, like $$ y=a \cdot e ^ {bx} $$ regression is possible, that's what I tested. But how to proceed with the e squared function?

2

There are 2 best solutions below

3
On

Based on $n$ data points $(x_i,y_i)$, you want to adjust parameters $a,b,c$ of the model $$y = a \, e ^ {b \,(x + c) ^ 2}$$ As you noticed, the problem is $c$.

So, in a preliminary step, fix $c$ at a given value and define $t_i=(x_i+c)^2$. So $y=a e^{b t}$ and taking logarithms $$\log(y)=\log(a)+ bt=\alpha +b t$$ So, define $z_i=\log(y_i)$ and perform a linear regression which gives, for this value of $c$, $\alpha$ and $b$ and then $a=e^\alpha$. With these, compute the $y_i^{calc}$ and the sum of squares $$SSQ(c)=\sum_{i=1}^n(y_i^{calc}-y_i)^2$$

Now, vary $c$ and plot $SSQ(c)$ and try to find a place where $SSQ(c)$ is close to a minimum. At that approximate value, you then know $a,b$ and you are ready for a nonlinear regression since you now have good estimates for each of the three parameters you are looking for.

If you have no nonlinear regression tool, just continue the procedure using smaller and smaller steps for parameter $c$.

You can do all of that using Excel.

Edit

You can get an approximate solution if you select from your data three points in arithmetic progression that is to say $x_1$, $x_2=x_1+d$, $x_3=x_1+2d$. Using $$\log(y_i)=\log(a)+b(x_i+c)^2$$ then $$\log(y_2)-\log(y_1)=b d (2 c+d+2 x_1)$$ $$\log(y_3)-\log(y_1)=4 b d (c+d+x_1)$$ $$k=\frac{\log(y_3)-\log(y_1) } {\log(y_2)-\log(y_1) }=\frac{4 (c+d+x_1)}{2 c+d+2 x_1}\implies c=d \left(\frac{1}{k-2}-\frac{1}{2}\right)-x_1$$

For illustration purposes, let us use the following data set $$\left( \begin{array}{cc} x & y \\ 0.5 & 630 \\ 1.0 & 360 \\ 1.5 & 230 \\ 2.0 & 170 \\ 2.5 & 140 \\ 3.0 & 120 \\ 3.5 & 130 \end{array} \right)$$

Using $x_1=0.5$, $x_2=2.0$ and $x_3=3.5$ (that is to say the end points and the one in the middle, we get $c \approx -3.1363$. For this value, the linear regression leads to $\alpha=4.82282$, $b=0.233356$. So $a=124.315$.

Using these starting guesses, a full nonlinear regression would lead to the following results $$\begin{array}{clclclclc} \text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\ a & 124.485 & 2.2108 & \{117.450,131.521\} \\ b & 0.234862 & 0.0077 & \{0.210415,0.259309\} \\ c & -3.12729 & 0.0502 & \{-3.28701,-2.96756\} \\ \end{array}$$

Edit after JJacquelin's answer

When we use the traditional least square approach, what we minimize is $$SSQ_1=\sum_{i=1}^n \left(y_i^{calc}-y_i^{exp} \right)^2$$ If you make a logarithmic tansform, what is minimized is $$SSQ_2=\sum_{i=1}^n \left(\log(y_i^{calc})-\log(y_i^{exp}) \right)^2$$ which is not the same since what is measured is $y$ and not $\log(y)$; so, if you want to be rigorous, after the first step, you need to perform a nonlinear regression.

We can show that, using the logarithmic transform is almost equivalent to the minimization of $$SSQ_3=\sum_{i=1}^n \left(\frac{y_i^{calc}-y_i^{exp} } {y_i^{exp} }\right)^2$$ since $$r_i=\log(y_i^{calc})-\log(y_i^{exp})=\log\left(\frac{y_i^{calc}}{y_i^{exp}} \right)=\log\left(1+\frac{y_i^{calc}-y_i^{exp}}{y_i^{exp}} \right)$$ and, if the absolute errors are small $$r_i \approx \frac{y_i^{calc}-y_i^{exp}}{y_i^{exp}} \implies SSQ_2 \approx SSQ_3 $$ Minimizing $SSQ_3$ would give $a=124.117$, $b=0.23266$, $c=-3.14126$ which are very close to the numbers given by JJAcquelin.

If you look here, you will notice that, for the least squares fitting of an exponential function, they suggest to minimize $$SSQ_4=\sum_{i=1}^n y_i^{exp}\left(\log(y_i^{calc})-\log(y_i^{exp}) \right)^2$$ which does not make the problem more difficult. Using the same data, we should get $a=124.422$, $b=0.233604$, $c=-3.13423$.

7
On

$$\text{Fitting of}\quad y=ae^{b(x+c)^2}\quad \text{to data :}$$ $$ (x_1,y_1)\:,\: (x_2,y_2)\:,\: … \:,\: (x_k,y_k) \:,\: … \:,\: (x_n,y_n).$$

A very simple calculus :

$$\left(\begin{matrix} C_1 \\ C_2 \\ C_3 \end{matrix}\right)= \left(\begin{matrix} \displaystyle\sum_{k=1}^n x_k^4 & \displaystyle\sum_{k=1}^n x_k^3 & \displaystyle\sum_{k=1}^n x_k^2 \\ \displaystyle\sum_{k=1}^n x_k^3 & \displaystyle\sum_{k=1}^n x_k^2 & \displaystyle\sum_{k=1}^n x_k \\ \displaystyle\sum_{k=1}^n x_k^2 & \displaystyle\sum_{k=1}^n x_k & \displaystyle\sum_{k=1}^n 1 \end{matrix}\right)^{-1} \left(\begin{matrix} \displaystyle\sum_{k=1}^n x_k^2\ln(y_k) \\ \displaystyle\sum_{k=1}^n x_k\ln(y_k) \\ \displaystyle\sum_{k=1}^n \ln(y_k) \end{matrix}\right) $$ $$a=e^{\left(C_3-\frac{C_2^2}{4C_1} \right)}$$ $$b=C_1$$ $$c=\frac{C_2}{2C_1}$$ EXAMPLE :

From Claude Leibovici's data : $\quad\left( \begin{array}{cc} x & y \\ 0.5 & 630 \\ 1.0 & 360 \\ 1.5 & 230 \\ 2.0 & 170 \\ 2.5 & 140 \\ 3.0 & 120 \\ 3.5 & 130 \end{array} \right)$

The result of the above method is : $\quad \begin{cases} a=124.249 \\ b=0.232567 \\ c=-3.140717 \end{cases} \quad$

very close to the C.Leibovici's results : $\quad \begin{cases} a=124.485 \\ b=0.234862\\ c=-3.12729 \end{cases}$ .

Edit after C.Leibovici's comments

The change of criteria of fitting suggested in https://mathworld.wolfram.com/LeastSquaresFittingExponential.html

in fact consists to a weighted regression with weight $=\frac{y}{\ln(y)}$.

With this trick, the least mean square relative regression becomes approximately (but not exactly) a least mean square regression.

$$\left(\begin{matrix} C_1 \\ C_2 \\ C_3 \end{matrix}\right)= \left(\begin{matrix} \displaystyle\sum_{k=1}^n x_k^4 \frac{y_k}{\ln(y_k)} & \displaystyle\sum_{k=1}^n x_k^3 \frac{y_k}{\ln(y_k)} & \displaystyle\sum_{k=1}^n x_k^2 \frac{y_k}{\ln(y_k)} \\ \displaystyle\sum_{k=1}^n x_k^3 \frac{y_k}{\ln(y_k)} & \displaystyle\sum_{k=1}^n x_k^2 \frac{y_k}{\ln(y_k)} & \displaystyle\sum_{k=1}^n x_k \frac{y_k}{\ln(y_k)} \\ \displaystyle\sum_{k=1}^n x_k^2 \frac{y_k}{\ln(y_k)} & \displaystyle\sum_{k=1}^n x_k\frac{y_k}{\ln(y_k)} & \displaystyle\sum_{k=1}^n \frac{y_k}{\ln(y_k)} \end{matrix}\right)^{-1} \left(\begin{matrix} \displaystyle\sum_{k=1}^n x_k^2 y_k \\ \displaystyle\sum_{k=1}^n x_k y_k \\ \displaystyle\sum_{k=1}^n y_k \end{matrix}\right) $$ $$a=e^{\left(C_3-\frac{C_2^2}{4C_1} \right)}\quad;\quad b=C_1\quad;\quad c=\frac{C_2}{2C_1}$$

Using the same data, we get $a=124.386$, $b=0.233374$, $c=-3.13566$.

enter image description here