I am reading the MIT Deep Learning textbook's description of convolutional neural networks and they introduce the mathematical operation of convolution as follows:
Suppose we are tracking the location of a spaceship with a laser sensor. Our laser sensor provides a single output x(t), the position of the spaceship at time t. Both x and t are real valued, that is, we can get a different reading from the laser sensor at any instant in time. Now suppose that our laser sensor is somewhat noisy. To obtain a less noisy estimate of the spaceship’s position, we would like to average several measurements. Of course, more recent measurements are more relevant, so we will want this to be a weighted average that gives more weight to recent measurements. We can do this with a weighting function w(a), where a is the age of a measurement. If we apply such a weighted average operation at every moment, we obtain a new function s providing a smoothed estimate of the position of the spaceship: $$s(t)=\int x(a)w(t-a)da$$
Wouldn't it make way more sense to define the integrand as $x(t-a)w(a)$ in this case, since we want the position function to have some reference of the time value for which we are trying to approximate position? Is this simply a typo in the book, or am I misunderstanding?
The correct way to write the convolution is with a definite integral: $$s(t)=\int_{-\infty}^\infty x(a)w(t-a)da$$ Now let's do a change of variables, such as $b=t-a$. The limits will switch order and $db=-da$ $$s(t)=\int_{\infty}^{-\infty}x(t-b)w(b)(-db)=\int_{-\infty}^{\infty}x(t-b)w(b)db$$ As mentioned in the comment, the convolution is commutative.