I am learning about fast multiple methods (e.g. the black box FMM described here), and I am told that "smooth" or "well-behaved" kernels are low rank. This low rank allows them to be compressed, which can dramatically accelerate computation, as in the FMM.
To test this, I made three kernels:
$$k_1(x,y) = \frac{1}{1+\|x-y\|^2}\\ k_2(x,y) = \frac{1}{\|x-y\|^2}\\ k_3(x,y) = e^{-\|x-y\|^2/\sigma}$$
Clearly, $k_1$ and $k_3$ are well behaved, whereas $k_2$ blows up.
I generated 500 equidistant points on an interval, evaluated the kernel at these points, computed SVD of the resulting kernel matrix, and plotted the singular values, giving the expected result:
Kernel 1 and Kernel 3 are indeed approximately low rank (their spectrum decays fast), whereas Kernel 2 does not.
Indeed, the rank-10 reconstruction of kernel 1 and kernel 3 are fairly reasonable, where as kernel 2 is a disaster:
Relative error for 1/(1+dist): 0.253009, 1/(dist): 0.999517, gaussian: 0.012810
My question is why does the "smoothness" of the kernel affect its rank? I think of rank as the dimension of the range of the matrix--how is that affected by smoothness of the kernel?
Matlab code for this demonstration:
bmin = -50; nboxes = 500; bmax = 50;
spacing = (bmax-bmin)/nboxes;
interpolation_nodes = bmin:spacing:bmax;
distmatrix = squareform(pdist(interpolation_nodes'));
kernel1 = 1./(1+distmatrix);
kernel2 = 1./(.0001+distmatrix.^2);
kernel3 = exp(-distmatrix./mean(distmatrix(:)));
subplot(1,3,1); [U1 S1 V1] = svd(kernel1); plot(diag(S1), 'linewidth',4); ylim([0,max(S1(:))]); title('1/(1+r^2)')
subplot(1,3,2);[U2 S2 V2] = svd(kernel2); plot(diag(S2), 'linewidth',4);ylim([0,max(S2(:))]); title('1/(r^2)')
subplot(1,3,3);[U3 S3 V3] = svd(kernel3); plot(diag(S3), 'linewidth',4);ylim([0,max(S3(:))]);title('e^{-r^2/sigma}')
error1 = norm(U1(:,1:10)*S1(1:10,1:10)*V1(:,1:10)' - kernel1)/norm(kernel1);
error2 = norm(U2(:,1:10)*S2(1:10,1:10)*V2(:,1:10)' - kernel2)/norm(kernel2);
error3 = norm(U3(:,1:10)*S3(1:10,1:10)*V3(:,1:10)' - kernel3)/norm(kernel3);
fprintf('Relative error for 1/(1+dist): %f, 1/(dist): %f, gaussian: %f\n', error1,error2,error3);