Verbal Interpretation of this Integral (Kernels for Statistical ML Theory)

40 Views Asked by At

In this paper, studying the use of kernel methods to devise a (machine) learning algorithm that is more robust to distribution shifts (of the input), the authors write equation 5 below, defining a feature map for a probability distribution over the input space:

$\Phi(P_x)=\int_\mathcal X k'_X(x,\cdot)dP_X(x)$

where $\mathcal X$ is the input space, $P_x$ is a probability distribution over $\mathcal X$ and $k'_X$ is a kernel on $\mathcal X$.

How is the '$\cdot$' supposed to be interpreted in the kernel function inside the integral? And what exactly does it mean to have our "variable of integration" be $P_X(x)$? Basically, what does this integral mean in words?

1

There are 1 best solutions below

2
On

$k'$ is a function which takes in two inputs. Probably, based on context, what the second input actually is, is immaterial here.

The integral written is in the Lebesgue sense, not the Riemann sense. Given a probability space $(X,\mathcal F, \mu)$, and a measurable simple function $\varphi: X\to \mathbb R$, we define $\int_X \varphi(x)\, d\mu(x) = \sum_1^n a_j\mu(A_j)$ where $\varphi(x) = \sum_1^n a_j 1_{A_j}(x)$ for $A_j$ disjoint and $\mu$-measurable. Given an arbitrary non-negative measurable function $f:X\to \mathbb R_{\geq 0}$, it can be shown that there exists a sequence of simple functions $\varphi_n$ such that $\varphi_n(x) \uparrow f(x)$ for every $x\in X$, and we define $\int_X f(x)\,d\mu(x) = \sup_{\varphi \leq f}\int_X \varphi(x)\,d\mu(x)$ where $\varphi$ are simple. For arbitrary measurable $f$, we define $\int_X f(x)\,d\mu(x) = \int_X f^+(x)\,d\mu(x) - \int_X f^-(x)\,d\mu(x)$, where $f^+(x) = \max\{0,f(x)\}$ and $f^-(x) = \max\{0,-f(x)\}$; this is well defined provided both integrals are finite.

More details can be found in any book on probability.