How does one derive Radial Basis Function (RBF) Networks as the smoothest interpolation of points?

708 Views Asked by At

I was reading/watching CalTech's ML course and it said that one could derive the RBF Gaussian kernel from the solution to smoothest interpolation that minimizes squared loss. i.e. one can derive the predictor/interpolator:

$$ f(x) = \sum^{K}_{k=1} c_k \exp( -\beta_k \| x - w_k \|^2 )$$

from the Empirical Risk minimizer with a smoothest Regularizer:

$$ f^* = arg \min_f \mathcal{E}_S(f) = \sum^{N}_{n=1} (f(x_n) - y_n)^2 + \lambda R(f) $$

$$ f^* = arg \min_f \mathcal{E}_S(f) = \sum^{N}_{n=1} (f(x_n) - y_n)^2 + \lambda \sum^{\infty}_{k=0} a_j \int^{\infty}_{- \infty} \left( \frac{d^k h}{d x^k} \right) dx$$

unfortunately, they do not show the derivation of this. Thus I was wondering if someone could show me how minimizing the ERM using that regularizer, one could derive the RBF kernel function. In particular I am very interested in the exact mathematical details and if there is any maths I need to learn I am motivated to learn it to understand this derivation.

I believe they mentioned that this regularizer was for some simplified case (not sure which one) but I would be interested to start of in the simple explanation of this derivation (I believe its using only 1D calculus?) and then generalizing as it needed. As a first suggestion the generalization could be the answer to the question Why does minimizing $H[f] =\sum^{N}_{i=1}(y_i-f(x_i))^2+\lambda \| Pf \|^2 $ leads to solution of the form $ f(x) =\sum^N_{i=1}c_iG(x; x_i)+p(x)$? .