How to approach to fitting curve?

320 Views Asked by At

I'd like to approximate fitting curve some kind of curves like below. (1, 3.5), (2, 4.3), (3, 7.2), (4, 8) which is having 4 points.

enter image description here

and I heard that this solver is PINV() of matlab function. But I don't know how to use.

Would you please let me know how to use and find a approximation fitting curve equation?

2

There are 2 best solutions below

9
On BEST ANSWER

If you want a polynomial fit, you can use the function polyfit. In your case you would use

x=1:4;y=[3.5,4.3,7.2,8];c=polyfit(x,y,d)

where d is the degree of the desired polynomial (where $d \in \{ 0,1,\dots,n-1 \}$ and $n$ is the number of data points). polyfit then gives the coefficients $a_d,a_{d-1},\dots,a_0$ of the polynomial $a_d x^d + \dots + a_0$, in that order. So if you had d=2 and a desired evaluation point x0, then you could evaluate this way:

y0=c(1)*x0^2+c(2)*x0+c(3)

There are at least two better ways to do it (Horner's method is one; a vectorized version of the above is another), but it's fine to do it the simple way in a small case like this.

More generally, suppose you have a model of the form $y=\sum_{j=1}^n c_j f_j(x)$ for known functions $f_j$ and unknown coefficients $c_j$, and data points $(x_1,y_1),\dots,(x_n,y_n)$. Then you can assemble the (usually overdetermined) linear system which is given by $y_i=\sum_{j=1}^n c_j f_j(x_i)$. So this looks like $Ac=y$, where $A$ is a matrix with $a_{ij}=f_j(x_i)$. In MATLAB this system (overdetermined or uniquely determined) can be solved with the backslash operator as

c=A\y

(If the system is underdetermined, this will still give a result, but you typically want to do something different in the underdetermined case.)

0
On

Problem statement

Fit the given data set with a sequence of polynomial fits: $$ y(x) = a_{0} + a_{1} x + \dots + a_{d} x^{d} $$ Indicate where we can solve the linear system with the normal equations, and when we must rely exclusively upon the pseudoinverse.

Linear System

The linear system for the polynomial with highest order $d$ is $$ \begin{align} \mathbf{A} a &= y \\ % A \left[ \begin{array}{cccc} 1 & x_{1} & \cdots & x_{1}^{d} \\ 1 & x_{2} & \cdots & x_{2}^{d} \\ 1 & x_{3} & \cdots & x_{3}^{d} \\ 1 & x_{4} & \cdots & x_{4}^{d} \\ \end{array} \right] % a \left[ \begin{array}{cccc} a_{0} \\ a_{1} \\ \vdots \\ a_{d} \\ \end{array} \right] % &= % % y \left[ \begin{array}{c} y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \end{array} \right] % \end{align} $$

Least squares solution

The least squares solution is defined as $$ a_{LS} = \left\{ a \in \mathbb{R}^{d+1} \colon \lVert \mathbf{A} a - y \rVert_{2}^{2} \text{ is minimzed} \right\} $$

The residual error vector is $$ r = \mathbf{A} a - y. $$ The total error, the quantity minimized, is $r^{2} = r\cdot r$.

Summary of results

The total error demonstrates typical behavior. Increasing the order of fit initially reduces the error. Then it plateaus, or may actually increase.

For these data, the cubic fit provides the best combination of total error and computational cost.

r2

The amplitudes for each order are collected in the table below.

$$ \begin{array}{clllllll} k & a_{0} & a_{1} & a_{2} & a_{3} & a_{4} & a_{5} & a_{6} \\ 0 & 5.75 \\ 1 & 1.65 & \phantom{-}1.64 \\ 2 & 1.65 & \phantom{-}1.64 & \phantom{-}0 \\ \color{blue}{3} & \color{blue}{9.} & \color{blue}{-10.05} & \phantom{-}\color{blue}{5.25} & \color{blue}{-0.7} \\ 4 & 4.04271 & \phantom{-}0.277692 & -1.97938 & \phantom{-}1.36554 & -0.206554 \\ 5 & 2.57594 & \phantom{-}1.38279 & -0.0545302 & -0.868031 & \phantom{-}0.545109 & -0.0812778 \\ 6 & 1.89099 & \phantom{-}1.41016 & \phantom{-}0.686033 & -0.172893 & -0.615848 & \phantom{-}0.350573 & -0.0490168 \\ \end{array} $$

Lower order fits: $d\le 3$

For $d=0$, the amplitude is given by the mean of the data $a_{0} = \bar{y}$.

For $d=1,2,3$, the normal equations $$ \mathbf{A}^{T} \mathbf{A} a = \mathbf{A}^{T} y $$ can be solved as $$ a_{LS} = \left( \mathbf{A}^{T} \mathbf{A} \right)^{-1} \mathbf{A}^{T} y $$

Higher order fits: $d\ge 4$

For $d\ge4$, the product matrix $\mathbf{A}^{T} \mathbf{A}$ is rank deficient and cannpt be inverted. The solution requires the pseudoinverse. The least squares minimizers are the affine space $$ a_{LS} = \mathbf{A}^{+}y + \left( \mathbf{I}_{4} - \mathbf{A}^{+} \mathbf{A} \right) z, \quad z\in \mathbb{R}^{4} $$

Sequence of plots

sequence