Help to find the best lower bound function for a given set of data, based in the natural logarithm function

245 Views Asked by At

I am trying to find a lower bound function for a set of data I have, and I am struggling with it. In the following graph the blue color is the set of data and the red color is my lower bound function.

inverse logarithm interpolation trial error

The data is bounded between $0$ and $1$, and looks a little bit like the inverse of the natural logarithm ($LN$), but with some initial noise, so my approach (by trial/error in Excel) was as follows (the red-colored line).

$f(n) = 0\quad,\quad n \in [4,9000]$

$f(n) = 0.98-\frac{1}{LN(n-9000)}\quad,\quad n\ge9000$

So i was able to approach to the data, but the initial segment of data $[4,9000]$ was avoided, because the initial growing of the data is smoother (and noisy) than the initial growing of the function $f(n)$ I prepared, so the lower bound is "broken" in that segment for some $n$.

For instance, in the worst case I could do exactly the same approach and create two or three more $f_i(s_i)$ functions to split the initial pending segment of data $n=[4..9000]$ in two, three or more segments $s_i$ with a similar lower bound function $f_i(s_i)$ based on the inverse logarithm but adapted to the specific segment $s_i$ of data.

Please I would appreciate very much if somebody could give me an idea about how could I make using only one function a closer lower bound, like the one I have drawn in black color.

Initially I would like to be able to get a closer function by using the natural logarithm or a similar approach, not using a "pure" polynomial interpolation.

In other words, what I am looking for is a "parametrized" natural logarithm-based (or "similar to natural logarithm-based") lower bound function more than a "pure" parametrized polynomial interpolation, any solution is welcomed but I would appreciate very much a natural logarithm interpolation if possible. Thank you!

2

There are 2 best solutions below

5
On BEST ANSWER

To my eye, the noise is about $\pm (10-20)\%$ of $(1-$ the curve). I would be tempted to smooth the data, fit a curve, then reduce it by a "three sigma" amount to produce a curve that is below "almost all" the points. It seems to me that would be a better representation of the data. Is that acceptable? The scare quotes are meant to indicate that a few points would fall below the curve you quote as a lower bound, but that may or may not be acceptable for your application. If you need a curve that is truly a lower bound, you will be farther from the "by eye" fit.

0
On

Well, finally I was able to find a better mix as follows:

Final lower bound

The initial segments are just simple polynomial approximations, and finally I found a better option for the last segment, including a combination of the inverse of the natural logarithm and $\frac{1}{x}$. This could give ideas to more people about how to attack this kind of problems, so I am adding it to the solutions.

$f(n) = 0,\quad n\lt1918$

$f(n) = (\frac{n^2-n}{(3300^2-3300)}*0.63)\quad,\quad n\lt3300$

$f(n) = 0.63\quad,\quad n\lt6000$

$f(n) = 0.71\quad,\quad n\lt9000$

$f(n) = \frac{4.95-\frac{1}{n}+\frac{LN(n)}{2.7}}{10}\quad,\quad n\ge9000$