Approximation of a new kernel by a linear combination of previous kernels

30 Views Asked by At

From the reference by Knutsen, page 25, Kernel linear independence test is explained

Knutsen, Sverre. "Gaussian processes for online system identification and control of a quadrotor." (2019).

which is actually from the reference by Csató

Csató, Lehel, and Manfred Opper. "Sparse on-line Gaussian processes." Neural Computation 14.3 (2002): 641-668.

The author presented the equation 2.41 which is $$ k({\mathbf{x}},{\mathbf{x}}_{t+1} ) \approx \sum_{i=1}^{N_{BV}} \alpha_i k({\mathbf{x}},{\mathbf{x}}_{i} ). $$

where $$ \boldsymbol{\alpha}=\mathbf{K}^{-1}\mathbf{y} $$ where $\mathbf{K}$ is a covariance matrix from a kernel function with output $\mathbf{y}=[\mathbf{y}_1,...,\mathbf{y}_{N_{BV}}]$ and input $\mathbf{X}=[\mathbf{x}_1,...,\mathbf{x}_{N_{BV}}]$.

The equation states that a kernel function $k(\mathbf{x},\mathbf{x}_{t+1})$ at some new sample point $\mathbf{x}_{t+1}$ can be described by the old kernels in the function space. Would you explain how the author derived this approximation?