I was reading Yoshua's Bengio [book][1] on convolutional neural networks and it has small section that described/explains the convolution in the context of estimating the location of a spaceship with a laser and removing noise from the measurements by taking a weighted average. However, I was not sure if I understood it fully.
Suppose we are tracking the location of a spaceship with a laser sensor and our sensor provides a single output $x(t)$ the position of the spaceship at time $t$ and it might have some noise. To make this better we take an average of the locations we read, but make it a weighted average $w(a)$ (because we want to weight more recent measurements more, for example, since they might be more relevant). To fix this we build a new fuction $s(t)$ providing a smoothed estimate of the position of the spaceship:
$$ s(t) = \int x(a) w(t-a) da$$
I know that this operation is called the convolution (denoted $s(t) = (x * w)(t)$) but it didn't really make sense to me in this context of weighted average and wanted to confirm/check (also, for values in the future $w$ is zero).
The main issue I believe I have is with the indices not matching between $x(a)$ and $w(t-a)$ not matching, this is what is not intuitive. If we are making a weighted average and want a smooth estimate of $x(t)$ at time $t$, why wouldn't we compute:
$$ s(t) = \int^t_{-\infty} x(a) w(a) da $$
which intuitively just means to me, take a weighted average from the beginning of time to current time $t$. Though that is not what we do and we do some other funny integral (convolution) that makes little sense to me.
I did try to understand it better by considering individual infinitesimal summations of the integral but the indices still didn't make sense to me.
For example:
Consider the weight of the most recent measurement of the integral at $a = t$. At this time we have $w(t-a) = w(0)$ and $x(a) = x(t)$ and the term that constributes to our infinetesiam summation is about:
$$ x(a) w(t-a) = x(t) w(0)$$
but the $t$ and $0$ don't really match in indices and I was a little confused. Does someone understand this better and why the convolution is the appropriate thing to use? What is the meaning of $w(0)$? I am pretty sure they just define $w(a_-)$ (where $a_-$ are negative values) to be zero make sure the summation is zero for the future.
Is it that the index $w(0)$ is the weight for the most recent measurement no matter what the time is? That way it weights it the same for all values of $t$ we might be interested?
Maybe you got it already, but in case it helps here is what I think:
Note: All the plots shown are made with the great demo of this website.
The main advantage of using convolution as a way of averaging a function $x(t)$ with a weight function $w(t)$ is that you can "select" the time at which you want $w(t)$ to take action.
For example suppose that $w(t)$ and $x(t)$ are the next time-dependent functions:
In this case, if we make use of this averaging (or more correctly, the scaled weighted average of $x(t)$ due to the fact that we are not dividing the result by $\int w(t)\,dt$):
$$s(t)=\int_{-\infty}^tx(a)\,w(a)\,da$$
we could only apply $w(t)$ in the region $t\in[0,1]$ because at any other time $w(t)=0$.
But if we make use of convolution:
$$s(t)=\int_{-\infty}^tx(a)\,w(t-a)\,da$$
we could get this scaled (up to a factor $\int w(t)\,dt$) weighted average at any time $t$ that we want. And I believe this is the reason why the author state that we can "weight more recent measurements". Following the example of the previous image, we will have this of the convolution:
Note: I recommend to visit the website I mentioned above and play with the demo. It will help a lot to visualize what I tried to explain.