Why do we use $x - y$ rather than $x + y$ in the definition of the convolution? Is it just convention? (If we are thinking of convolutions as weighted averages, for instance against "good kernels," it should make no difference.)
Why $(f * g) (x) = \int f(y) g(x - y) dy$ rather than $(f * g) (x) = \int f(y) g(x + y) dy$?
Edit: I'm finding it really hard to choose a best answer. There are at least three very good ones here.
Intuitively, and abusing the notation a bit, you can consider the convolution as
$$ (f*g)(x) = \int_{p+q=x} f(p)g(q) $$
This makes it clear that $f*g = g*f$. On the other hand with your alternative definition we would get $$ (f*'g)(x) = \int_{q-p=x} f(p)g(q) $$ and therefore $(f*'g)(x) = (g*'f)(-x)$, which is untidy for no good reason.