I see the terms kernel and distribution used - what I presume to be - interchangeably all the time and hence my understanding is that they are the same e.g. within a publication the phrase "a gaussian kernel" and "a gaussian distribution" appears - to me - synonymous.
However, it is possible, as can be the case, that some nuance is missing from this comparison.
In both Meaning of "kernel" and What does kernel mean no definite answer is given. What are the most overloaded words in mathematics highlights that often terms in mathematics may have non-unique meanings - especially across fields.
So is there a formal or otherwise distinction between a kernel and a distribution?
Or is a kernel just any symmetric function that integrates to 1? i.e.
$$K(-u) = K(u)$$ and $$\int\limits_{-\infty}^\infty K(u)\mathbb{d}u=1$$
Notably if $K(-u) = K(u)$ then any tailed distribution (e.g. Weibull, Gamma, etc) is not a kernel?
Further befuddlement stems from articles like this, stating that:
kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable.
Such phrasing is, again, symmetric and - to me - implies that if a kernel estimation estimates a probability function, then a tried-and-true kernel is a probability function.
If kernels are not probability-distributions, what is a good / accessible resource to clarify my confusion?
Probably it is too late to answer that but as I saw this unanswered and I think I can help.
The term distribution refers to the theoretical and unknown function that explains the behavior of a random variable. Normal, Gamma, Weibull are all well known distributions. In statistics and probability the kernels are ways to estimate a distribution.
A gaussian kernel and a gaussian distribution are two different things. The gaussian distribution is defined as $$f(x)=\frac{1}{\sigma\sqrt{2\pi}}exp\left(-\frac{(x-\mu)^{2}}{2\sigma^{2}}\right)$$.
The kernel density estimator is defined as
\begin{eqnarray} \hat{f}(x)=\frac{1}{nh}\sum_{i=1}^{n}K\left(\frac{x-X_{i}}{h}\right), \end{eqnarray}
where $X_{i}$ $i=1,2,\ldots, n$ is your sample. Note the hat above $f$. This is simply a way of estimating the true and generally unknown $f$ that could be a gaussian density , a weibull density or whatever. Now regarding $K(\cdot).$ This is the so called kernel function and this function is taken to be a density itself. For example you can define $K(\cdot)$ to be the standard normal density, i.e. this is called the gaussian kernel. But you can also use other choices such as the triangular kernel, the Epanechnikov etc.Using the kernel density estimator shown above , even if you define $K(\cdot)$ to be a gaussian kernel, you can estimate very efficiently bimodal or even multimodal distributions.