In the Wikipedia article, the convolution is defined as $$(g*f)(x,y):=\sum_{dx=-a}^a\sum_{dy=-b}^b\omega(dx,dy)f(x-dx,y-dy).\tag1$$ The image $f$ is obviously understood as a function on $[-a,a]\times[-b,b]$. Now, regardless of what we insert for $x$ and $y$ in $(1)$, the sum in $(1)$ will always contain terms where $f$ is evaluated outside $[-a,a]\times[-b,b]$.
So, my question is: How can this be fixed? How is $g\ast f$ defined in practice?
There is an edge handling section, but what I don't get is that the problem I've described above does not only occur at the edges. It occurs regardless of what we insert for $x$ and $y$. Maybe (please let me know if I'm right) this is because one usually assumes that the support of $\omega$ is very small ... Because otherwise, something like "wrapping around" the values of $f$ should yield very weird results.
That equation doesn't describe the convolution of two images ($f * g$), but the convolution of a kernel with an image yielding a different image ($ g = \omega * f$) - notice that you miscopied it a bit.
And yes, kernels typically have small support, often having only a 3*3 non-zero block, and rarely larger than 5*5.
You can convolve two images, using periodicity or zero-padding like you say, but the result gets rather "information theoretical", and doesn't look anything like either of the original images, at least to the average human eye.