I am watching the (fantastic) SLAM lectures of Claus Brenner, where he introduces the Bayes-Filter (Kalman-Filter, Particle-Filter, Histogram-Filter).
He says, that the prediction step involves the convolution of distributions and the correction step a multiplication of distributions (link). $$\text{prediction: }p(x)=\sum_y p(x\mid y)p(y)$$ $$\text{correction: }p(x\mid z)=\alpha p(z\mid x)p(x)$$ My problem is with the convolution. It makes sense the way he derives it, but I cannot make the connection to the standard definition of convolution as give, e.g. at Wikipedia:
$$p(Z=z) = \sum_{k=-\infty}^\infty p(X=k)p(Y=z-k)$$
Is this a mistake in the video, or am I missing something? It just looks like the law of total probability.
As you said, given certain assumptions, the prediction step could be thought of as convolution in the classic sense. For example if $p(x|y)=\mathcal N(x;y,\Sigma)$ i.e. the conditional probability of $x$ given $y$ is a Gaussian distribution centered at $y$, with fixed covariance. This is not a crazy assumption if your $y$ is real position and $x$ is the sensor reading of the position, then the noise (emission distribution) may be spatially invariant.
In that case, the transformation from $p(y)$ to $p(x)$ is exactly convolution with Gaussian kernel $$ p(x)=\int_{-\infty}^{\infty} dk p(x|y=k)p(y=k)\\ =\int_{-\infty}^{\infty} dk \mathcal N(x;k,\Sigma)p(y=k)\\ =\int_{-\infty}^{\infty} dk \mathcal N(x-k;0,\Sigma)p(y=k)\\ $$ In this form you can see the similarity to your final convolution formula: random variable $p(y)$ is your $p(X)$, the centered normal distribution $\mathcal N(.;0,\Sigma)$ is the $p(Y)$
More generally if the probability density $p(x|y)$ is not in the form of $g(x-y)$ (translation invariant) you may conceptualize this process as convolution with a changing kernel.
BTW, on implementation level, you may not be able to integrate from $(-\infty,\infty)$, also $x,y$ may be discretized which could make the correspondance not exact.