In Fourier analysis, when I look at the theorems and useful results derived using summability kernel and convolution, I get to think
"Ok, I guess it works that way. but what is the intuition behind all those complicated looking definitions?"
To a beginner like me, the three properties of summability kernel, or the definition of convolution looks just not so intuitive. I wonder "why do I even care about such things?"
I get how they are used to derive good results, but I wonder how people got to those definitions from the start at all.
Can anyone explain please?
Given that this topic is best treated in a full course or book, I'll just give a few pointers:
A first way of looking at Fourier is through the "harmonic analysis" lens (Wikipedia) : you have a "basis" of functions $(e_k)_k$ (in classical Fourier $e_k(x) = e^{ikx}$) and you "decompose" a generic function $f$ along your basis through some "projection" operation $(f,e_k)$ (in Fourier $\hat f(k) = (f,e_k) = \int f(x) \bar{e_k}(x)dx$). When your basis is well-chosen, your decomposition reveals some information on $f$ and, even better, you can reconstruct $f$ from the $(f,e_k)$ terms (in Fourier transform, it looks like $\hat{\hat f} \sim f$). One remarkable example of "revealed information" (or correspondence) is that the higher the regularity of $f$, the faster $\hat f$ converges to zero (Wikipedia).
A second way is to consider Fourier as a simple convolution (Wikipedia). Given that the Fourier operation $(f,e_k)$ is linear, the $n$-th partial Fourier series approximation reads $S_n[f](x) = \sum_{-n}^n \hat f(k)e^{ikx} = \sum e^{ikx}\int f(t)e^{-ikt}dt = \int f(t) \sum e^{ik(x-t)}dt = f \star D_n (x)$ where $D_n$ (= the term in the sum) is the Dirichlet kernel.
So linearly decomposing over a base is equivalent to convoluting with some function. Both angles are interesting and reveal something about harmonic analysis.
The beauty of the Dirichlet kernel (or the Fejer kernel) is that it has fixed integral of $1$ and "converges" towards the Dirac delta function, which is just another way of saying that $f \star D_n \to f$ point-wise. Convolution as a way to smooth functions or to approach of a generic function with smooth functions is a very general and important idea that reaches far beyond harmonic analysis. You can for instance use it to prove the Weierstrass theorem by building a polynomial approximation for the Dirac delta (see for instance this link).
How these definitions appeared in history is probably not the most important part; rather try to see how they borrow concepts from different parts of maths and enlighten each other. Think of harmonic decomposition as inspired by geometric projections and linear algebra (how you can decompose a vector $x$ on a basis $e_i$ with coordinates $x_i = (x,e_i)$). Convolution is quite a natural idea in many domains (think of how you smooth an image by average each pixel with its neighbors)...