Flexible Basis Functions

298 Views Asked by At

What are the most flexible basis functions ?

The most commonly used basis function is the polynomial function; $\begin{pmatrix} 1,x,x^2\end{pmatrix}$

However when the function is complex it seems that we need a high order polynomial to model it accurately. This aspect greatly increases the number of parameters in our regression matrix that we need to estimate. Thus more data is needed. For instance given in linear regression if we use a polynomial of degree 6 then the expectation becomes $$f^{t}(x)\boldsymbol\beta=\beta_{1}+\beta_{2}x+\beta_{3}x^2+\beta_{4}x^3+\beta_{5}x^4+\beta_{6}x^5+\beta_{7}x^6$$

This implies that 7 hyperparameters $\boldsymbol\beta$ need to be acquired

What alternative basis functions require less hyper parameters and pose similar flexibility to high order polynomials

2

There are 2 best solutions below

0
On BEST ANSWER

Basically, the answer provided by g g is comprehensive. I just would like to illustrate the issue of model selection. Lets say, your data was generated by some periodic model, e.g., $y=\beta_0 + \beta_1\sin(x) + \epsilon$. In this case, using your strategy, i.e., fitting a polynomial, is legitimate but probably not an optimal solution. As you know from Taylor expansion (e.g., at $0$) you will quickly use pretty high order polynomial in order to get a good fit of only two cycles, namely $\hat{y}=b_0 + \sum_{j=0}^{6}b_jx^{2n+1}$ in $[-2\pi, 2\pi]$. But assumes that your data are scattered on a much larger range, then you will need a polynomial of a much higher order in order to get a good fit. Instead, estimating some basic periodic structure such as $y=\beta_0 + \beta\sin(x) + \beta_2\cos(x)$ will almost immediately lead you to the best model.

So, in the bottom line, it is worth to have some theoretic background on the phenomena that the data describe. In such case, instead of some random guessing, you will instantly select from a range of plausible models ("basis function"). Without some background knowledge or even intuition (maybe even from the scatter plots), to the best of my knowledge there is no a golden rule how to choose the optimal set of basis functions to fit the data.

0
On

Your question, as posed, is natural but has no straightforward answer due to its generality. Rephrased you are asking: Is there a finite dimensional space of functions, for example of dimension 3 or 7, which has the smallest approximation/regression error for all possible functions? The answer to this is clearly: "No".

No matter what your specific basis functions, there will always be functions, which are approximated particularly badly by this space. This is inevitable since you are approximating functions from a possibly infinite dimensional space by functions from a low dimensional one.

To improve on this negative answer, you need to have additional knowledge about properties of the function you would like to approximate and specifics about the approximation problem. This is why regression or machine learning is such a huge field.

Two specific examples for such properties: Polynomials will always be infinitely differentiable and will not have compact support. So if your particular function happens to be piecewise linear with compact support, polynomials are indeed very likely a bad choice and there are better alternatives.

If, on the other hand, you know your function is twice differentiable, the fact that it then has a Taylor expansion of second order, might justify an approximation by quadratic polynomials.

Further to considerations of the target function, your notion of approximation (i.e. what distance or likelihood) and even the way you gather samples (iid or a grid) will influence the choice of approximation procedure and basis functions.

To dig deeper have a look at Chapter 5 of The Elements of Statistical Learning.