Frontier Equation - Fit a polynomial to the top of a data set

79 Views Asked by At

How can I fit a polynomial to an empirical data set such that it fits the "top" of the data -- i.e. for every value of x, the output of function is greater than the largest y at that x. But at the same time it minimizes this such that it hugs the data. An example of what I'm referring to is seen in the image here:

enter image description here

1

There are 1 best solutions below

0
On

I am going to assume some things, since your question is missing some key pieces of information. I am going to assume that all the points are in $\mathbb{R}^2$, and denote them by $\{ (x_i, y_i) \}_{i=1}^m$. Second, I am going to assume that you have no additional requirements on the "shape" of the polynomial you want (e.g. convexity, concavity, ...).

For now, I will look at the standard polynomial basis, and thus assume that the decision variables are the coefficients of $p(x) = b + \sum_{j=1}^n a_i x^i$.

First, the polynomial should bound all the points from above. That is, $p(x_i) \geq y_i$, or, explicitly: $$ b + a_1 x_i + a_2 x_i^2 + \dots + a_n x_i^n \geq y_i $$ Second, the polynomial should be "tight" over the set, which means that its $y$ coordinates should be small. Assume that $[\ell, u]$ is the smallest interval containing all $x_i$, and minimize a function which measures the "tightness" over this interval. One example is taking the total area under the polynomial, that is: $$ f_1(b, a_1, \dots, a_n)= \int_{\ell}^u p(x) dx $$ Another example is looking at the highest $y$ coordinate the polynomial yields, that is: $$ f_2(b, a_1, \dots, a_n)= \max_{x \in [\ell, u]} p(x) $$ Both are convex functions of the coefficients. However, $f_1$ is easy to deal with, since it is a linear function of the coefficients. On the other hand, to the best of my knowledge, $f_2$ is intractable if the degree of the polynomial is greater than 4, and is very complicated to deal with anyway. You may devise your own other measures of 'tightness'.

The polynomial will be found by solving the following optimization problem: $$ \begin{aligned} \min_{b,a_1, \dots, a_n} &\quad f(b, a_1, \dots, a_n) \\ \text{s.t} &\quad b + a_1 x_i + a_2 x_i^2 + \dots + a_n x_i^n \geq y_i & i = 1, \dots, m \end{aligned} $$ where $f$ is the measure of "tightness" you choose. Choosing $f_1$ above yields a linear program, which has many readily available solvers.

From a numerical perspective, I would chose a different polynomial basis. Any book on polynomial approximation suggests several such bases. In that case, your polynomial becomes $$ p(x) = \sum_{j=0}^n a_j \ell_j(x) $$ where $\ell_j$ are the basis polynomials. Its coefficients, which are the decision variables, become $a_0, \dots, a_n$.