Choosing the best trendline option for biological data?

564 Views Asked by At

I see MS Excel has several trend-line options; linear, logarithmic, polynomial, exponential, and power functions. What is basis/logic for selecting these functions for biological data? For e.g. I'm interested in understanding the change of abundance either transcripts or proteins vs different time course; my data fitting with polynomial trend-line. how can I compare different samples in this option?

if Excel is not a good option, how can I do this in R? enter image description here

2

There are 2 best solutions below

3
On

This depends on the shape/distribution of your data. See the plot below. The upper left graph depicts a linear relationship, so a linear function suits your data best. The upper right graph, however, is definitely not linear. A linear fit would be bad in this case. Here, we would need an exponential function in order to properly fit the data. Likewise with the polynomial and logarithmic cases (bottom two graphs).

enter image description here

Here's the R-code to create the above graph:

x <- runif(100,1,10)
y <- 3*x +rnorm(100,0,1)
z <- 0.3*exp(x)+rnorm(100,0,1)
a <- 1.1*x^5 - 13*x^4 +8*x^3 -12*x^2 -19*x+1+rnorm(100,0,1)
b <- log(x)+rnorm(100,0,0.05)
par(mfrow=c(2,2))
plot(x,y,main = "Linear Relationship")
plot(x,z,main = "Exponential Relationship")
plot(x,a,main = "Polynomial Relationship")
plot(x,b,main = "Logarithmic Relationship")

In practice you will have gathered data and your task is to model the relationship. First you should plot your data to see how it is distributed. If have decided which fit the appropriate one is, then you can pick one of several functions in R to estimate the models based on the data. For example, lm() can efficiently create linear models.

0
On

Check out "model selection" or "variable selection". Not sure if it can be automatically performed in excel. What you need is mainly a selection criteria (like AIC, BIC, Mallow's C$_p$, adjusted $R ^2 $ and so on) and perhaps a method to select (Forward, Backward, Stepwise). If you have small amount of models to choose from you can fit all the possible models by using packages like $\texttt{leaps}$ in $\texttt{R}$. Another option is regularization based "selection", i.e., LASSO and elastic net ($\texttt{glmnet}$ in $\texttt{R}$) that is estimation procedures that selects features as a part of the estimation.