I have always thought of a "Convolution" as a mathematical operation between two functions. For example, if I have the function "f" and the function "g" - I can take the convolution of "f" on "g", and thus obtain a new "convolution function". Similarly, I can also evaluate the integral of this newly obtained convolution function.
For some reason, I keep seeing the "convolution integral" repeatedly being mentioned in probability. I could understand that perhaps someone might be interested in taking the convolution of two probability distribution functions and taking the integral of of the resulting convolution function - but apart from this, I would have never imagined convolution integrals being particularly relevant in probability.
For example, in time inhomogeneous Markov Chains, convolution integrals are apparently used to find out:
- the time dependent probability of being in any state within the Markov Chain
- the time dependent probability of transitioning between any two states within the Markov Chain
My Question: Can anyone please try to explain to me why Convolution Integrals are so relevant in Statistics and Probability Theory, and why they might be required in the above equations?
Thanks!

Convolution integral appears a lot in probability because independence is a central concept in probability theory and the distribution of independent sums is the convolution of the individual distributions. If the distributions have densities, then the density of the sum is the convolution integral of the individual densities. Note that it is crucial to have independence in order to conclude that the distribution of the sum is a convolution. There is a nice proof of the full central limit theorem ( for null arrays) based on convolution semigroups in Feller’s book and perhaps you can take a look at that.
Added: Take real valued random variables $X$ and $Y$, indepedent of each other. To make things simple, let's assume $X$ has a continous density $f$ and $Y$ has a continous density $g$. Then $Prob ( X+Y \leq z) \\ = \int_{x+y<=z}f(x)g(y)dxdy \\= \int\int_{-\infty}^{z-y}f(x)dxg(y)dy \\ = \int\int_{-\infty}^{z}f(u-y)dug(y)dy \\= \int_{-\infty}^{z}(\int f(u-y)g(y)dy) du \\$
The first equality is by independence. Hence $X+Y$ has density $(\int f(u-y)g(y)dy)$, which is exactly the convolution of $f$ and $g$. There is a corresponding statement when $X$ and $Y$ don't have densities.
You also asked an example. I am sorry that I don't quite get the example that you give but let me give you another example. Consider the hitting time of a 1d Brownian motion (or symmetric random walk if you are not familiar with Brownian motion) (starting from 0) of $a>0$ as $T_a$ (first time when it hits $a$). Suppose the density of that is $f_{a}(x)$. Then I claim $f_{a_{2}}(x)$ is the convolution of $f_{a_{1}}(x)$ and $f_{a_{2}-a_{1}}(x)$ for $a_{2}>a_{1}$.
To see this note that Brownian motion in 1d is recurrent so it must hit $a_{1}$. Also, to hit $a_{2}$, it must first hit $a_{1}$. But by the "strong markov property", the Brownian motion starts afresh from the time $T_{a_{1}}$ and starting from $a_{1}$, to hit $a_{2}$ is the same as starting from 0 and hit $a_{2} -a_{1}$. Here the independence of $T_{a_{1}}$ and $T_{a_{2}} - T_{a_{1}}$ follows from the "strong markov property", which is certainly nontrivial. So sometimes independence comes in disguise and not every independence is as simple as independent coin flips. I guess that's why you get surprised.