Previously asked on Stats.SE
We have $N$ samples of the unknown function $f(x)$ on the finite interval $[a, b]$. The samples are subject to white noise of known variance. We want to approximate the function $f(x)$ by a polynomial $P_m(x)$ of order $m$. It is, however, known, that the original function $f(x)$ is smooth in some sence. For example, let's say we know that the original $f(x)$ is Lipschitz-continuous on the interval $[a, b]$ with some known constant $K$. Is there a standard procedure to select the best-fitting polynomial $P_m(x)$ while requiring that the polynomial satisfies the smoothness requirement.
I am not bound to Lipschitz continuity. I am happy to switch to another metric of smoothness if it is more convenient for this problem. However, the smoothness metric should be minimax. This means that the bound must apply to the smoothness of the worst point of the function, which is not necessarily the average such as L2.
Note: The question is about how to solve this problem in practice. Namely:
- Is there an analytical solution?
- Is there a known algotithm used in practice to solve this problem?
- Recommend an existing library that can perform this sort of fitting.
This can be done with standard Linear Regression tools. So let the polynomial be given by $p(x) = \sum_{k=0}^m w_k x^k$. We use the fact that for smooth functions, the Lipschitz constant is equal to maximum-norm of the derivative. Hence, we need to find an upper bound for $\|p'\|_{L^{\infty}}$ on $[a,b]$ w.r.t. the coefficients $w_k$.
Towards this end, it is useful to expand $p'$ in terms of Chebyshev polynomials of the first kind, since these are minimal w.r.t. the $\infty$-norm. We have:
$$\|w_k T_k(x)\|_{L^\infty([-1,+1])} \le \sum_{k}|w_k|\cdot\|T_k(X)\|_{\infty} = \sum_{k}|w_k| = \|w\|_1$$
Thus, you can try to solve your problem with a cleverly constructed LASSO method, using the linear model
$$ \hat y(x, w) = b +\sum_{k=0}^{m-1} w_k \int T_k(x)dx = b + \sum_{k=0}^{m-1}w_ k\cdot \underbrace{\tfrac{1}{2} \Big({\tfrac {T_{k+1}(x)}{(k+1)}}-{\tfrac {T_{k-1}(x)}{(k-1)}}\Big)}_{=:\phi_k(x)} = \phi(x)^T w + b $$
$$(1)\qquad \min_w \tfrac{1}{2N}\|\Phi(x)w + b - y \|_2^2 + \lambda \|w\|_1$$
Then, you are guaranteed that $\operatorname{Lip}(\hat y)=\|\hat y'\|_{L^\infty([-1,+1])} = \|\sum_{k=0}^{m-1}\hat w_k T_k(x)\| \le \|\hat w\|_1$. The degree of Lipschitz continuity is controlled by $\lambda$. By solving the whole Lasso path (i.e. over a range of values of $\lambda$) you can find the best fit that satisfies your condition. This functionality is provided for example in scikit-learn: Lasso and Elastic Net, Lasso Path