Nelder Mead function fit error estimation via surface fit?

833 Views Asked by At

I'm using Nelder-Mead's simplex-downhill algorithm to fit 3 parameters (a,b,c) of a non-linear function (2d input in 1d output). $$ f(x,y)=\frac{a\cdot (b+x)^2}{(2+y\cdot c)^2} $$

Now I would like to estimate the parameter errors that the found optimum has.

Nelder and Mead proposed in their paper from 1965 to use the final simplex, add all middle points beetween two points and do a quadratic surface fit (Appendix of: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.5920&rep=rep1&type=pdf ).

This fit can then be expressed as $$ y=a_0+\sum_i a_ix_i + \sum_{ij} b_{ij}x_ix_j $$ with $$ \begin{align} a_0&=y_0\\ a_i&=2y_{0i}-(y_i+3y_0)/2,&&i=1,...,n\\ b_{ii}&=2(y_i+y_0-2y_{0i}),&&i=1,...,n\\ b_{ij}&=2(y_{ij}+y_0-y_{0i}-y_{0j}),&&i\neq j \end{align} $$

with this the variance-covariance matrix can be calculated, whose diagonal elements yield the parameter errors.

  • Isn't it a chicken/egg story that I need to do a fit to estimate errors of my fit?
  • Isn't the error of the surface fit important for my original fit error?
  • Doing a surface fit for the 3 parameters needs a1-a3,b11-b33 = 12 new fit parameter to estimate the error of the original 3 fit parameters?

I also thought about changing each calibrant within it's error and redoing the fit. 6 calibrants would require the algorithm to run 1+2*6=13 times instead of 1, though... changing them in all combinations would be 1+6!= 721 times... Even if the simplex is faster when the last optimum is used as a new starting point, this would still drastically increase the run-time.

Should I do another simplex-downhill for the surface-fit and ignore it's errors or is there a better way to error estimate for my fitted parameters?

1

There are 1 best solutions below

0
On

After reflecting about the paper for a few hours, I think I finally understand the procedure...

Procedure

  1. you take your simplex points $p_i,i=0,1,...,N$ and make a new coordinate system with them so you get $x_i,i=0,i=0,1,...,N$ with $x_0=(0,...,0)$
    • since you have N+1 points in Dimension N you can make 3 axes with one point as the systems origin
    • the new system is scaled so that all points are 1 unit away from center ($p_1(a,b,c)\to x_1(1,0,0)$)
  2. create middle-points between your simplex points.
    • They always are between two simplex points and their index indicates that (simplex 1 & 2 → middlepoint 12)
    • Their position is not important. Important is only their value (for what your simplex was optimized) and index. The value $y$ is the average value of the forming two simplex points.
  3. do a quadratic surface fit.
    • $y=a_0+a'x+x'bx$
    • \begin{align} a_0&=y_0\\ a_i&=2y_{0i}-(y_i+3y_0)/2,&&i=1,...,n\\ b_{ii}&=2(y_i+y_0-2y_{0i}),&&i=1,...,n\\ b_{ij}&=2(y_{ij}+y_0-y_{0i}-y_{0j}),&&i\neq j \end{align}
    • these formulas are correct in the transformed coordinate system. In here, B is the information matrix. This means, $a_0$ and $a$ are not important and can be neglected. This also means only the quadratic terms influence the parameter error.
  4. now we transform B into our coordinate system:
    • $Q=\left(\begin{matrix}\uparrow&\uparrow&\\p_1-p_0&p_2-p_0&\cdots\\\downarrow&\downarrow\end{matrix}\right)$ of the original points
    • ${Q^{-1}}'HQ^{-1}$ is the information matrix in the original coordinate system, so $C=QH^{-1}Q'$ is the covariance-matrix.
  5. squareroot of the diagonal entries of C is the parameter error.

Example

As a simple example of application we have a 2D-System with final simplex points

  • $p_0=(1,2),~p_1=(2,3),~p_2=(3,1)$ and $y_0=2,~y_1=1,~y_2=3$.
  • Now we imagine our new coordinate system with our middle point values of $y_{01}=1.5,~y_{02}=2.5,~y_{12}=2$.
  • calculate b: $b_{11}=2(1+2-2*1.5)=0,~b_{22}=2(3+2-2*2.5)=0,~b_{12}=2(1.5+2-1-2)=1$
  • $\Rightarrow B=\left(\begin{matrix}0&1\\1&0\end{matrix}\right)$
  • Transform into original system: $Q=\left(\begin{matrix}1&2\\1&-1\end{matrix}\right)$
  • B^{-1} via pseudo-inverse $B = U\Sigma V^{H},~~\Rightarrow B^{-1}=\left(\begin{matrix}0&1\\1&0\end{matrix}\right)$
  • $C=QB^{-1}Q'=\left(\begin{matrix}1&2\\1&-1\end{matrix}\right)\cdot\left(\begin{matrix}0&1\\1&0\end{matrix}\right)\cdot\left(\begin{matrix}1&1\\2&-1\end{matrix}\right)=\left(\begin{matrix}1&2\\1&-1\end{matrix}\right)\cdot\left(\begin{matrix}2&-1\\1&1\end{matrix}\right)=\left(\begin{matrix}4&1\\1&-2\end{matrix}\right)$

So the the best point of the simplex (downhill) was $p_1$ with $y_1=1$ at position $(2\pm2 ,3\pm\sqrt2)$

In reality the qualtiy values should be much closer together. The diagonal-elements of H seem to be 0 by default. In the end this procedure seem to assume, that a stronger curve in the quadratic fit (higher values in B) would result in a smaller confidence interval .

I hope I finally understood all the parts necessary to do an error estimation for my simplex fits. ^^