Uncertainty Principle in Kernel-based Interpolation

43 Views Asked by At

If one wants to interpolate or reconstruct a function $f:\Omega\to\mathbb{R},\,\Omega\subset\mathbb{R}^d,\, d>1$ on a finite Set $X_n:=\{x_1,\ldots,x_n\}\subset\Omega$ using translates of a positive definite and symmetric Kernel $K:\Omega\times\Omega\to\mathbb{R}$, i.e. the interpolant $s_f$ has the representation $$s_f(x)=\sum_{j=1}^n c_j K(\cdot,x_j)$$ with $c=(c_1,\ldots,c_n)^T\in\mathbb{R}^n$, one has to solve the linear system of equations $$A_{K,X_n}\cdot c = f_X,$$ where $f_X=(f(x_1),\ldots,f(x_n))^T\in\mathbb{R}^n$ and $A_{K,X_n}$ is the Interpolationmatrix, \begin{equation} A_{K,X_n}:=(K(x_j,x_k))_{1\le j,k\le n}=\begin{pmatrix}K(x_1,x_1)&\ldots& K(x_1,x_n)\\ \vdots & \ddots & \vdots\\ K(x_n,x_1)&\ldots&K(x_n,x_n)\end{pmatrix}\in\mathbb{R}^{n\times n} \end{equation} which is symmetric and positive definite. This Matrix can be ill-conditioned quite easy, especially when you have two points in $X_{n}$ that are close together because then you have two columns in $A_{K,X_n}$ that are almost linear dependent and therefore the matrix comes close to being singular.

Furthermore ${\tt Robert\;Schaback}$ proved a Trade-Off Principle or Uncertainty Principle in this article from 1993 that goes as follows. It holds, that

\begin{equation}\label{eq1}\tag{1} \min_{1\le j\le n}P^2_{K,X_n\setminus\{x_j\}}(x_j)\ge \lambda_{\min}(A_{K,X_n}), \end{equation}

where $\lambda_{\min}(A_{K,X_n})$ is the smallest Eigenvalue of the Interpolation matrix and $P_{K,X_n}(x)$ is the so called Power function which also gives an upper bound for the interpolation error $$|f(x)-s_f(x)|\le C_f\cdot P_{K,X_n}(x).$$ The Powerfunction is given by $$P_{K,X_n}^2(x) = K(x,x)-T(x) A_{K,X}^{-1}T(x)^T,\quad x\in\Omega,$$ where $T(x)=(K(x,x_1),\ldots,K(x,x_n))$.

So I was already able to understand the proof of \eqref{eq1} by myself, but the problem is the interpretation of this. There are many publications that mention this principle. They summarize the equation in the words

\begin{equation} \textit{“One can’t simultaneously achieve high accuracy and numerical stability"} \end{equation} or \begin{align} &\textit{"This implies that in settings where the Power Function}\\ &\textit{still is small after one point is left out, the kernel matrix}\\ &\textit{must be ill-conditioned."} \end{align}

The condition of $A_{K,X_n}$ is given by the ratio of the smallest and the biggest Eigenvalue of $A_{K,X_n}$ But \eqref{eq1} is only mentioning the smallest Eigenvalue. If the Power function is getting small, the smallest Eigenvalue is getting small. But what if the biggest Eigenvalue is getting small too? It's only the ratio that matters, right? So how could you say that the Condition number of $A_{K,X_n}$ is getting bigger, if the power function is getting small?

1

There are 1 best solutions below

4
On BEST ANSWER

Usually, the kernel function $K$ should be normalized. Then the largest eigenvalue of your matrix will be greater than all of the diagonal values.

$$\lambda_{1}\ge \max_i (K(x_i, x_i))$$

This is because $\lambda_1 = \sup_{\|v\|_2 = 1} v A v^T \ge \max_i e_i A e_i^T = \max_i K(x_i, x_i)$.

When $K$ has been normalized and the distribution of $x_i$ is not too weird, this diagonal value will not be small (at least in probability).

For instance, if the distribution of $x_i$ is uniform, then $\max_i K(x_i,x_i)\ge \frac{1}{n}tr(A) \approx \int_{\Omega} K(x, x) > 0$.

It is still possible the largest eigenvalue is still small once the samples are quite concentrated.