Using generalized logistic curve to create a mathematical model from data.

95 Views Asked by At

enter image description here

The first row is time and the second row is height of a plant. We need to use generalized logistic curve to model the behavior of the plant. The equation of the logistic curve is : $$N = \frac{N_*}{1+(N_*/N_0-1)e^{-a_0t}}$$ where $N_*$ is the maximum height of the plant or in other words, the supremum of the logistic function.

$a_0$ is how fast the function increases. My question is the following:

I know how to determine $N_*$, we just make up a number that is above $251$ and less than $251 + (251 - 247) $. But how to exactly determine $a_0$. I am quite lost here. How I would do it is to calculate every $a_0$ for every time and height, with equation $a_0 = \frac{1}{t}\ln(\frac{N}{N_0})$. In this equation $N$ would be the difference between heights, so let's say between $18$ and $33$ the $N$ for time equals $1$ would be $33 - 18$. Is the process here of creating a table of $\sum_{k = 1}^{10}(10 - i)$ column elements of differences, so that we take multiple $N_0$ until we run out of data from the primary table in the picture (so until $N_0 = 9$). Then we can use a formula for $a_0 = \frac{\sum a_{0l}*t_l}{\sum t_l}$ where $l$ is the number of elements.

Is my procedure for creating a logistic out of the data provided correct?

4

There are 4 best solutions below

0
On BEST ANSWER

Method 1: One degree of freedom

$\newcommand{\eqd}{\triangleq}$ $\newcommand{\brp}[1]{{\left(#1\right)}}$ $\newcommand{\brs}[1]{{\left[#1\right]}}$ $\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}$ $\newcommand{\pderiv}[2]{{\frac{\partial#1}{\partial#2} }}$ Have you considered using the Method of Least Squares to find $a_0$? For that method you can define an error function $$e(t_n)\eqd N(t_n)-y_n\eqd \frac{N_{*}}{1+\brs{N_{*}/N_0-1}e^{-a_0t_n}}-y_n$$ and a cost function (the error cost) as the norm squared of $e(t_n)$ as $$cost(a_0)\eqd\norm{e}^2\eqd\sum_{n=0}^{n=10}e^2(t_n)$$ You want to find the $a_0$ that minimizes that cost; that is, you want to find where (with respect to $a_0$) the cost function "goes to the lowest point". Suppose you set $N_0\eqd N(0)=18$ and $N_*\eqd252$ $\ldots$enter image description hereFrom the plot, it appears that $cost(a_0)$ is minimized around $a_0=0.65$.

To get a more accurate optimal $a_0$, we can differentiate $cost(a_0)$ with respect to $a_0$ and set the expression to $0$. To do that, this (lemma) may come in handy: $$\begin{align} \boxed{N'(t)} &\eqd \pderiv{}{a_0}N(t) \\&\eqd \pderiv{}{a_0}\brs{\frac{N_{*}}{1+\brp{\frac{N_*}{N_0}-1}e^{-a_0t}}} && \text{by definition of $N(t)$} \\&= \frac{0-N_*\brs{\frac{N_*}{N_0}-1}e^{-a_0t}(-t)}{\brp{1+\brs{\frac{N_*}{N_0}-1}e^{-a_0t}}^2} && \text{by Quotient Rule} \\&= \frac{N_*^2}{\brp{1+\brs{\frac{N_*}{N_0}-1}e^{-a_0t}}^2} \brs{\frac{1}{N_0}-\frac{1}{N_*}}te^{-a_0t} \\&\eqd \boxed{\brs{\frac{1}{N_0}-\frac{1}{N_*}}N^2(t)te^{-a_0t}} && \text{by definition of $N(t)$} \end{align}$$ Then$\ldots$ $$\begin{align} \boxed{0}&= \frac{1}{2\brp{\frac{1}{N_0}-\frac{1}{N_*}}}\cdot0 \\&=\frac{1}{2\brp{\frac{1}{N_0}-\frac{1}{N_*}}}\pderiv{}{a_0}\norm{e}^2 \\&\eqd \frac{1}{2\brp{\frac{1}{N_0}-\frac{1}{N_*}}}\pderiv{}{a_0}\sum_{n=0}^{n=10}e^2(t_n) && \text{by definition of $\norm{\cdot}$} \\&\eqd \frac{1}{2\brp{\frac{1}{N_0}-\frac{1}{N_*}}}\pderiv{}{a_0}\sum_{n=0}^{n=10}\brs{N(t_n)-y_n}^2 && \text{by definition of $e$} \\&= \frac{1}{2\brp{\frac{1}{N_0}-\frac{1}{N_*}}}\sum_{n=0}^{n=10}2\brs{N(t_n)-y_n}N'(t_n) && \text{by Chain Rule} \\&= \frac{1}{2\brp{\frac{1}{N_0}-\frac{1}{N_*}}}\sum_{n=0}^{n=10}2\brs{N(t_n)-y_n}\brs{\frac{1}{N_0}-\frac{1}{N_*}}N^2(t_n)te^{-a_0t_n} && \text{by (lemma)} \\&= \boxed{\sum_{n=0}^{n=10}N^2(t_n)\brs{N(t_n)-y_n}t_ne^{-a_0t_n}} \\&\eqd Dcost(a_0) && \text{(call the sum $Dcost(a_0)$)} \end{align}$$ Plotting $Dcost$ with respect to $a_0$, it appears that $Dcost(a_0)$ crosses $0$ at around $a_0=0.66$:enter image description here

The uniroot function from the R stats package indicates that $Dcost(a_0)$ crosses $0$ at $a_0=0.6631183$ with estim.prec=6.103516e-05.

Using $N_0\eqd18$, $N_*\eqd252$, and $a_0\eqd0.6631183$, $N(t)$ seems to fit the 11 data points fairly well ($cost(0.6631183)=31.32307$) $\ldots$ enter image description here

Some R code supporting Reproducible Research:

#---------------------------------------
# packages
#---------------------------------------
#install.packages("stats");
 require(stats);
 rm(list=objects());

#---------------------------------------
# Data
#---------------------------------------
 tn = c(0:10)
 yn = c(18,33,56,90,130,170,203,225,239,247,251)
 t  = seq( from=min(tn), to=max(tn), length=1000 )

#---------------------------------------
# Estimate Function N(t)
#---------------------------------------
 N0=18
 Nh=252
 N = function(t,a) Nh / (1 + (Nh/N0-1)*exp(-a*t))

#---------------------------------------
# Cost Function
#---------------------------------------
 cost = function(x,tn,yn) 
 {
  #sum((N(tn,x)-yn)^2)
   summ = 0;
   for (i in c(1:11))
   { 
     summ = summ + (N(tn[i],x)-yn[i])^2
   }
   result = summ
 }

#---------------------------------------
# Partial derivative of Cost Function
#---------------------------------------
 Dcost = function(x) 
 {
   summ = 0;
   for (i in c(1:11))
   { 
     summ = summ + (N(tn[i],x))^2 * (N(tn[i],x)-yn[i]) * tn[i]*exp(-x*tn[i])
   }
   result = summ
 }

#---------------------------------------
# Find 0 crossing
#---------------------------------------
 x=seq(from=0.3, to=0.9, length=1000)
 aRoot = uniroot( Dcost, c(0.3, 0.9) )
 a0 = aRoot$root

#---------------------------------------
# Graphics
#---------------------------------------
 colors = c( "red" , "blue", "orange", "green"    );
 traces = c( "N(t)", "data", "cost"  , "Dcost"    ); 
 plot ( t , N(t, a0)      , col=colors[1], lwd=2, type='l', xlab="t", ylab="y", ylim=c(0,max(yn)+10) ) 
 lines( tn, yn            , col=colors[2], lwd=3, type='p' )
 lines( t , Dcost(t)      , col=colors[4], lwd=2, type='l' )
 lines( t , cost(t,tn,yn) , col=colors[3], lwd=2, type='l' )
 legend("topleft", legend=traces, col=colors, lwd=3, lty=1:1)
 grid()
1
On

The answer by @DanielJ.Greenhoe is very complete but it plays along with the initial premisse that you just pick $N_*$ in a compatible way. However, there is no reason why the minimisation process is not performed for the parameters $(N_*, a_0)$, instead of just $a_0$.

This leeds to the set of parameters $N_* = 255.844, a_0 = 0.65392$.

enter image description here

0
On

Method 2: Two degrees of freedom

$\newcommand{\eqd}{\triangleq}$ $\newcommand{\brp}[1]{{\left(#1\right)}}$ $\newcommand{\brs}[1]{{\left[#1\right]}}$ $\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}$ $\newcommand{\pderiv}[2]{{\frac{\partial#1}{\partial#2} }}$ $\newcommand{\opair}[2]{\left( #1,#2\right)}$ $\newcommand{\R}{\Bbb{R}}$

Method 1 gave an expression for $\pderiv{}{a_0}N(t)$; here is a similar lemma for $\pderiv{}{N_*}N(t)$: $$\begin{align*} \boxed{\pderiv{}{N_*}N(t)} &\eqd \pderiv{}{N_* }\brs{\frac{N_*}{1+\brp{\frac{N_*}{N_0}-1}e^{-a_0 t}}} && \text{by definition of $N(t)$} \\&= \frac{\brs{1+\brp{\frac{N_*}{N_0}-1}e^{-a_0 t}} - N_*\brs{\frac{1}{N_0}}e^{-a_0 t}} {\brp{1+\brs{\frac{N_*}{N_0}-1}e^{-a_0 t}}^2} && \text{by Quotient Rule} \\&= \frac{1 - e^{-a_0 t}} {\brp{1 + \brs{\frac{N_*}{N_0}-1}e^{-a_0 t}}^2} \\&= \brs{\frac{1 - e^{-a_0 t}} {N_*^2}} \brs{\frac{N_*} {1 + \brs{\frac{N_*}{N_0}-1}e^{-a_0 t}}}^2 \\&\eqd \boxed{\brs{\frac{1 - e^{-a_0 t}}{N_*^2}} N^2(t)} && \text{by definition of $N(t)$} \end{align*} $$ Then$\ldots$ $$\begin{align*} \boxed{0} &= \frac{N_*^2}{2}\cdot0 \\&=\frac{N_*^2}{2} \pderiv{}{N_*}\norm{e}^2 \\&\eqd \frac{N_*^2}{2} \pderiv{}{N_*}\sum_{n=0}^{n=10}e^2(t_n) && \text{by definition of $\norm{\cdot}$} \\&\eqd \frac{N_*^2}{2} \pderiv{}{N_*}\sum_{n=0}^{n=10}\brs{N(t_n)-y_n}^2 && \text{by definition of $e$} \\&= \frac{N_*^2}{2} \sum_{n=0}^{n=10}2\brs{N(t_n)-y_n}\pderiv{}{N_h}N(t_n) && \text{by Chain Rule} \\&= N_*^2 \sum_{n=0}^{n=10} \brs{N(t_n)-y_n} \brs{\frac{1 - e^{-a_0 t_n}}{N_h^2}} N^2(t_n) && \text{by lemma} \\&= \boxed{\sum_{n=0}^{n=10} N^2(t_n) \brs{N(t_n)-y_n} \brs{1 - e^{-a_0 t_n} }} \end{align*}$$

So now we have two equations in two unknowns ($N_*$ and $a_0$): $\pderiv{}{N_h}\norm{e}^2=0$ and $\pderiv{}{a_0}\norm{e}^2=0$ (from Method 1). You can use a multi-root solver to find the solution $\opair{N_*}{a_0}$ to the equation pair $\opair{\pderiv{}{N_h}\norm{e}^2=0}{\pderiv{}{a_0}\norm{e}^2=0}$. The result using multiroot from the R package rootSolve is $\opair{N_*}{a_0}=\opair{255.8436595023}{0.6539203544}$. This solution is "better" than Method 1 solution in the sense that it results in a lower cost: $$\begin{array}{|c|rcl|} \text{Method} & cost(N_0,N_*,a_0) \\\hline \\1 & cost(18,252,0.6631183) &=& 31.32307 \\2 & cost(18,255.8436595023,0.6539203544) &=& 0.7140973 \end{array}$$

Some R code supporting Reproducible Research:

 require(stats);
 require(R.utils);
 require(rootSolve);
 rm(list=objects());

#---------------------------------------
# Data
#---------------------------------------
 tdata = c(0:10)
 ydata = c(18,33,56,90,130,170,203,225,239,247,251)
 t     = seq( from=min(tdata), to=max(tdata), length=1000 )

#---------------------------------------
# Estimate Function N(t)
#---------------------------------------
 N0 = ydata[1]
 N = function(t,N0,Nh,a0)
 {
   result = Nh / ( 1 + (Nh/N0-1)*exp(-a0*t) )
 }

#---------------------------------------
# Cost Function
#---------------------------------------
 cost = function(N0,Nh,a0)
 {
   summ = 0;
   for (i in c(1:length(tdata)))
   {
     summ = summ + ( N(tdata[i],N0,Nh,a0) - ydata[i] )^2
   }
   result = summ
 }

#---------------------------------------
# Partial derivative with respect to a0 of Cost Function
#---------------------------------------
 Pcosta0 = function(N0, Nh, a0)
 {
   summ = 0;
   for (i in c(1:length(tdata)))
   {
     summ = summ + ( N(tdata[i],N0,Nh,a0) )^2 *
                   ( N(tdata[i],N0,Nh,a0) - ydata[i] ) *
                   ( tdata[i] * exp(-a0*tdata[i]) )
   }
   result = summ
 }

#---------------------------------------
# Partial derivative with respect to Nh of Cost Function
#---------------------------------------
 PcostNh = function(N0, Nh, a0)
 {
   summ = 0;
   for (i in c(1:length(tdata)))
   {
     summ = summ + ( 1 - exp(-a0*tdata[i]) ) *
                   ( N(tdata[i],N0, Nh, a0) )^2 *
                   ( N(tdata[i],N0, Nh, a0)-ydata[i] )
   }
   result = summ
 }

#---------------------------------------
# Partial derivative vector of cost
#---------------------------------------
Pcost = function(x)
{
   N0 = 18
   Nh = x[1]
   a0 = x[2]
   F1 = Pcosta0( N0, Nh, a0 );
   F2 = PcostNh( N0, Nh, a0 );
   result = c(F1, F2);
}

#---------------------------------------
# Calculate roots
#---------------------------------------
 Roots = multiroot( f=Pcost, start=c(ydata[11-1],0.4) );
 Nh = Roots$root[1]
 a0 = Roots$root[2]

#---------------------------------------
# Display
#---------------------------------------
 printf("(N0, Nh, a0) = (%.2f, %.10f, %.10f) with estim.precis=%.2e\n", N0, Nh, a0, Roots$estim.precis )
 colors = c( "red" , "blue" );
 traces = c( "N(t)", "data" );
 plot ( t , N(t, N0, Nh, a0), col=colors[1], lwd=2, type='l', xlab="t", ylab="y", ylim=c(0,max(ydata)+10) )
 lines( tdata, ydata        , col=colors[2], lwd=5, type='p' )
 legend("topleft", legend=traces, col=colors, lwd=3, lty=1:1)
 grid()
0
On

Method 3: Three degrees of freedom

$\newcommand{\eqd}{\triangleq}$ $\newcommand{\brp}[1]{{\left(#1\right)}}$ $\newcommand{\brs}[1]{{\left[#1\right]}}$ $\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}$ $\newcommand{\pderiv}[2]{{\frac{\partial#1}{\partial#2} }}$ $\newcommand{\opair}[2]{\left( #1,#2\right)}$ $\newcommand{\otriple}[3]{\left( #1,#2,#3\right)}$ $\newcommand{\R}{\Bbb{R}}$ $\newcommand{\esth}{N}$

Method 1 optimized with respect to $a_0$ and Method 2 with respect to $\opair{N_*}{a_0}$. Both methods assume $N_0\eqd y_n[0] \eqd 18$. However, often in data modeling, it is assumed that measured data is the actual true value being measured plus some measurement noise. As such, it is highly unlikely that the true height of the plant at time $t=0$ is exactly $y_n[0]=18.000\ldots$, but rather $18+n(0)$, where $n(0)$ is a random noise variable at time $t=0$.

Method 1 gave a lemma for $\pderiv{}{a_0}N(t)$ and Method 2 for $\pderiv{}{N_*}N(t)$; here is a similar lemma for $\pderiv{}{N_0}N(t)$: $$\begin{align*} \boxed{\pderiv{}{N_0}\esth(t)} &\eqd \pderiv{}{N_0 }\brs{\frac{N_*}{1+\brp{\frac{N_*}{N_0}-1}e^{-a_0 t}}} && \text{by definition of $\esth(t)$} \\&= \frac{ - N_*\brp{\frac{-1}{N_0^2}}e^{-a_0 t}} {\brs{1+\brp{\frac{N_*}{N_0}-1}e^{-a_0 t}}^2} && \text{by Quotient Rule} \\&= \brs{ \frac{e^{-a_0 t}}{N_*N_0^2}} \brs{\frac{N_*}{1+\brs{\frac{N_*}{N_0}-1}e^{-a_0 t}}}^2 \\&= \brs{ \frac{e^{-a_0 t}}{N_*N_0^2}} \esth^2(t) && \text{by definition of $\esth(t)$} \end{align*}$$

$$\begin{align*} \boxed{0} &= \frac{N_*N_0^2}{2}\cdot0 \\&=\frac{N_*N_0^2}{2} \pderiv{}{N_0}\norm{e}^2 \\&\eqd \frac{N_*N_0^2}{2} \pderiv{}{N_0}\sum_{n=0}^{n=10}e^2(t_n) && \text{by definition of $\norm{\cdot}$} \\&\eqd \frac{N_*N_0^2}{2} \pderiv{}{N_0}\sum_{n=0}^{n=10}\brs{\esth(t_n)-y_n}^2 && \text{by definition of $e$} \\&= \frac{N_*N_0^2}{2} \sum_{n=0}^{n=10}2\brs{\esth(t_n)-y_n} \pderiv{}{N_0}\esth(t_n) && \text{by Chain Rule} \\&= \frac{N_*N_0^2}{2} \sum_{n=0}^{n=10}2\brs{\esth(t_n)-y_n} \brs{ \frac{e^{-a_0 t_n}}{N_*N_0^2}}\esth^2(t_n) && \text{by lemma} \\&= \boxed{\sum_{n=0}^{n=10} \esth^2(t_n)\brs{\esth(t_n)-y_n} e^{-a_0 t_n} } \end{align*}$$

So now we have a three equation triple $\otriple{\pderiv{}{N_h}\norm{e}^2=0}{\pderiv{}{N_0}\norm{e}^2=0}{\pderiv{}{a_0}\norm{e}^2=0}$ in a three variable triple $\otriple{N_*}{N_0}{a_0}$. The result using multiroot from the R package rootSolve is $\otriple{N_0}{N_*}{a_0}=\otriple{18.1994673377}{256.0554740324}{0.6508899900}$. This solution is "better" than both Methods 1 and 2 in the sense that it results in a lower cost: $$\begin{array}{|c|rcl|} \text{Method} & cost(N_0,N_*,a_0) \\\hline \\1 & cost(18,252,0.6631183) &=& 31.32307 \\2 & cost(18,255.8436595023,0.6539203544) &=& 0.7140973 \\3 & cost(18.1994673377, 256.0554740324, 0.6508899900) &=& 0.4533383 \end{array}$$ enter image description here Some R code supporting Reproducible Research:

#---------------------------------------
# packages
#---------------------------------------
#install.packages("stats");
#install.packages("R.utils");
#install.packages("rootSolve");
 require(stats);
 require(R.utils);
 require(rootSolve);
 rm(list=objects());

#---------------------------------------
# Data
#---------------------------------------
 tdata = c(0:10)
 ydata = c(18,33,56,90,130,170,203,225,239,247,251)
 t     = seq( from=min(tdata), to=max(tdata), length=1000 )

#---------------------------------------
# Estimate Function N(t)
#---------------------------------------
 N0 = ydata[1]
 N = function(t,N0,Nh,a0)
 {
   result = Nh / ( 1 + (Nh/N0-1)*exp(-a0*t) )
 }

#---------------------------------------
# Cost Function
#---------------------------------------
 cost = function(N0,Nh,a0)
 {
   summ = 0;
   for (i in c(1:length(tdata)))
   {
     summ = summ + ( N(tdata[i],N0,Nh,a0) - ydata[i] )^2
   }
   result = summ
 }

#---------------------------------------
# Partial derivative with respect to a0 of Cost Function
#---------------------------------------
 Pcosta0 = function(N0, Nh, a0)
 {
   summ = 0;
   for (i in c(1:length(tdata)))
   {
     summ = summ + ( N(tdata[i],N0,Nh,a0) )^2 *
                   ( N(tdata[i],N0,Nh,a0) - ydata[i] ) *
                   ( tdata[i] * exp(-a0*tdata[i]) )
   }
   result = summ
 }

#---------------------------------------
# Partial derivative with respect to Nh of Cost Function
#---------------------------------------
 PcostNh = function(N0, Nh, a0)
 {
   summ = 0;
   for (i in c(1:length(tdata)))
   {
     summ = summ + ( 1 - exp(-a0*tdata[i]) ) *
                   ( N(tdata[i],N0, Nh, a0) )^2 *
                   ( N(tdata[i],N0, Nh, a0) - ydata[i] )
   }
   result = summ
 }

#---------------------------------------
# Partial derivative with respect to N0 of Cost Function
#---------------------------------------
 PcostN0 = function(N0, Nh, a0)
 {
   summ = 0;
   for (i in c(1:length(tdata)))
   {
     summ = summ + ( exp(-a0*tdata[i]) ) *
                   ( N(tdata[i],N0, Nh, a0) )^2 *
                   ( N(tdata[i],N0, Nh, a0) - ydata[i] )
   }
   result = summ
 }

#---------------------------------------
# Partial derivative vector of cost
#---------------------------------------
Pcost = function(x)
{
   N0 = x[1]
   Nh = x[2]
   a0 = x[3]
   F1 = Pcosta0( N0, Nh, a0 );
   F2 = PcostNh( N0, Nh, a0 );
   F3 = PcostN0( N0, Nh, a0 );
   result = c(F1, F2, F3);
}

#---------------------------------------
# Calculate roots
#---------------------------------------
 Roots = multiroot( f=Pcost, start=c(ydata[1], ydata[11], 0.6) );
 N0 = Roots$root[1]
 Nh = Roots$root[2]
 a0 = Roots$root[3]

#---------------------------------------
# Display
#---------------------------------------
 printf("(N0, Nh, a0) = (%.10f, %.10f, %.10f) with estim.precis=%.2e\n", N0, Nh, a0, Roots$estim.precis )
 colors = c( "red" , "blue" );
 traces = c( "N(t)", "data" );
 plot ( t , N(t, N0, Nh, a0), col=colors[1], lwd=2, type='l', xlab="t", ylab="y", ylim=c(0,max(ydata)+10) )
 lines( tdata, ydata        , col=colors[2], lwd=5, type='p' )
 legend("topleft", legend=traces, col=colors, lwd=3, lty=1:1)
 grid()