What is the difference between these two kernel definitions?

119 Views Asked by At

I am reading my graft and the document of David Haussler about Convolution Kernels on Discrete Structures, UCSC-CRL-99-10.

My graft

enter image description here

and the other document

enter image description here

enter image description here

The terminology seems to differ. The other document is from Computer Science department so I cannot trust it 100% at the moment. They call one instance of $\Phi$ kernel, $K$. I call the kernel $\sigma$. It seems that I can also take a series of "kernels" and call only one kernel.

The word convolution kernel caught my eye. I think the kernel of Wigner-Ville distribution is one of them. Is it?

Why are they taking a series of "kernels"? My interpretation can be false. What is the difference between the two kernel -definitions?

1

There are 1 best solutions below

0
On

Your first document considers both the specific Wigner-Ville "kernel", described (at least heuristically) by a formula. Convergence is an issue. Any (bilinear...) map on pairs of functions (heuristically) written in such a fashion (with something else in place of the $x(t-\tau)\,x(t+\tau)$ etc. in the W-V kernel) is usually called a "kernel map", and/or the thing replacing the W-V's $x(t-\tau)\,x(t+\tau)$ is "the kernel".

Your first document also mentions the space of tempered distributions "in two variables", $S'(\mathbb R^{2n})$ as a/the space of "kernels". And, indeed, Schwartz' Kernel Theorem shows that any continuous linear map $T:S(\mathbb R^n)\to S(\mathbb R^n)$ is given by $T(f)(g)=K(f\otimes g)$ for some $K\in S'(\mathbb R^{2n})$. And this gives a way to tweak the W-V kernel, etc. Thinking of tempered distributions as generalized functions, we might write $K(x,y)$ to specify $K$ itself, and imagine integration against $f(x)\,g(y)$.

In general, a "convolution kernel" $K(x,y)$ is a two-variable function of the special form $K(x,y)=F(x-y)$. That is, kernels can be made from one-variable functions by convolution. Not all.

Your "other source" is addressing just a very special class of "kernels", made in a special way, and is a bit sloppy about notation and terminology. But it is still a special case of the far more general idea that a kernel is a distribution of some kind:

The notation and imprecise language easily gives the wrong impression about what the $\Phi_n$'s are, in the notation of that "other" doc. It would be more accurate to take $\Phi_n$'s an orthonormal basis for some Hilbert space, and write $\Phi(x,y)=\sum_n \Phi_n(x)\Phi_n(y)$. Yes, it is possible to interpret the latter expression as some sort of inner product, in the $n$ variable, but that is very misleading, and irrelevant. Then the corresponding operator is $Tf(y)=\sum_n \langle f,\Phi_n\rangle\cdot \Phi_n(y)$

One of the features of this kind of operator is the positivity $\langle Tf,f\rangle\ge 0$ for all $f$. Also, it is symmetric in the sense that $\Phi(y,x)=\Phi(x,y)$, ... though with pointwise convergence issues we might prefer to say $\langle Tf,g\rangle=\langle f,Tg\rangle$ for all $f,g$ (keeping in mind that the scalars are real, not complex... )