What does ln() accomplish on a regression input?

Question

What does ln() accomplish on a regression input?

938 Views Asked by Bumbble Comm At 10 May 2026 - 12:32

I have gotten interested in forecasting using linear/nonlinear regression, particularly using Facebook's Prophet library for R/Python. It makes forecasting on a time-series input pretty straightforward.

However, one thing I don't fully understand in the "Quick Start" tutorial is why a natural logarithm is applied to the inputted values before giving it to the model, like so:

#log() uses base e
df['y'] = np.log(df['y']) 

#input into model
m = Prophet()
m.fit(df)

I somewhat remember logarithms from my high school/college math days, but none of my teachers ever made it clear why Euler's number was useful much less when used as a base for a logarithm.

This led me down a rabbit hole to explore natural logarithms because I am starting to see them everywhere. It's interesting that Prophet will accept an input regardless if ln() was applied, and it will produce somewhat similar curves. I'm guessing the forecasted output has to be exponentialized via $e^x$ to be meaningful as well.

What I want to know is what do natural logarithms accomplish in the context of linear regression inputs? For instance, here are two simple Excel charts plotting a series with y and ln(y). The charts look kind of similar, but what effect did ln(y) have?

And what logical reason does Prophet choose to input/output things that look like they need to be exponentialized via $e^x$ to be meaningful?

x   y       ln(y)
1   29      3.36729583
2   3       1.098612289
3   100     4.605170186
4   3       1.098612289
5   10      2.302585093
6   11      2.397895273
7   9       2.197224577
8   49      3.891820298
9   97      4.574710979
10  33      3.496507561

f(x)

ln(f(x))

Original Q&A

There are 2 best solutions below

Bumbble Comm On 13 Mar 2018 - 12:03

$\ln( \cdot)$ is a concave transformation, hence if the distribution of your $y$s is right-skewed, it will "flatten" it and thus make it more suitable for linear regression with classic assumptions.
Models of a kind $\log (y_i) = \beta_0 + \beta_1x_i$ are sometimes used when we are interested in a relative (percentage) change in $y$ as a a result of small (infinitesimal) changes in $x$, namely $$ \frac{\partial}{\partial x} (\ln y) = \frac{1}{y}\frac{\partial y}{\partial x} = \beta_1 $$ or $$ \frac{\partial y}{y} = \beta_1\partial x, $$ i.e., $\beta_1$ is the percentage change (estimated by $\hat{\beta_{1}}\times100\% $) of $y$ when $x$ changes by one unit ($\partial x \approx \Delta x = 1$).

**Bumbble Comm** · Accepted Answer

"Linear regression" is a technique for finding the straight line that best fits a given set of data points $(x,y)$. It's the right technique to use if the data points actually lie near some line, which is likely to be the case if there is some underlying reason to expect linearity. Regression finds the line that fits best. For a time series, $x$ will be time and $y$ the value you measure and then want to predict using the line.

But suppose your data are about the population of some biological system as time goes on. Then you would not expect those points to lie on or near a straight line, you'd expect some kind of exponential growth, expressed as $p = c e^{rt}$ where $p$ is population, $t$ is time and $c$ and $r$ are constants. So your data points will lie near a curve like that. You'd like to know the best values for the constants. Fortunately, linear regression comes to the rescue if you are clever. If you take the (natural) logarithms of the values of $p$ the resulting data points will lie near a line which you can find with linear regression. With the slope and intercept of that line you can get the constants you want for the exponential best fit.

The authors of the Python linear regression code anticipated the fact that some folks would want to use regression this way, so tell you how to input the logarithms of the measured values.

For the spiky data in your question neither linear regression nor finding the exponential best fit will be much use for prediction.

What does ln() accomplish on a regression input?

There are 2 best solutions below

Related Questions in LOGARITHMS

Related Questions in REGRESSION

Related Questions in EULER-MASCHERONI-CONSTANT

Trending Questions

Popular # Hahtags

Popular Questions