I recently stumbled across a question which really confused my understanding of convolution. It's the relation between the continuous integral and the discrete counterpart I don't get.
What I learned in school was:
$\int_{a}^{b} f(x) dx = \lim_{\Delta x\to 0} \sum_{i=1}^{n} f(x_i) \cdot \Delta x_i$
Why is the discrete version of a convolution integral for two series $f$ and $g$ defined as $(f * g)(n) = \sum_{k} f(k) \cdot g(n-k)$ and not some kind of "area under the curve" $\sum_{k} f(k) \cdot g(n-k) \cdot \Delta x_k$?
So why can't I say for a given $t$: $\int_{0}^{t} h(\tau) d\tau = \lim_{\Delta \tau\to 0} \sum_{i=1}^{n} h(\tau_i) \cdot \Delta \tau_i$ with $h(\tau_i) = f(\tau_i) \cdot g(t - \tau_i)$?
In fact, the Convolution Integral between two functions $f(t)$ and $g(t)$ is defined as
$$(f*g)(t)=\int_{-\infty}^\infty f(\tau)g(t-\tau)\,d\tau$$
If both $f(t)$ and $g(t)$ are causal functions (i.e., $f(t)$ is causal if $f(t)=0$ for $t<0$), then
$$\begin{align} (f*g)(t)&=\int_{0}^t f(\tau)g(t-\tau)\,d\tau\\\\ &=\lim_{\max_{i\in[1,n]}(\Delta \tau_i)\to 0}\sum_{i=1}^n f(\tau_i)g(t-\tau_i)\Delta \tau_i\tag1 \end{align}$$
For the discrete case, we sample $f(t)$ and $g(t)$ at integer values of $t$. So, in $(1)$, we set $t=n$.
Moreover, we define $f(t)$ and $g(t)$ in terms of a train of Dirac Deltas $\delta(t-k)$ with weights $f(k)$ and $g(k)$ so that $f(t)=\sum_{m=1}^n f(m)\delta(t-m)$, and $g(t)=\sum_{\ell=1}^n g(\ell)\delta(t-\ell)$ in the integral in $(1)$. Proceeding, we find that
$$\begin{align} (f*g)(n)&=\int_{0}^t f(\tau)g(t-\tau)\,d\tau\\\\ &=\int_{0}^n \sum_{m=1}^n f(m)\delta(\tau-m)\sum_{\ell=1}^n g(\ell)\delta(n-\tau-\ell)\,d\tau\\\\ &=\sum_{m=1}^n f(m)\sum_{\ell=1}^n g(\ell)\underbrace{\int_0^n \delta(\tau-m)\delta(n-\tau-\ell)\,d\tau}_{=1\,\,\text{for}\,\,\ell=n-m\,\,\text{and otherwise}\,\,=0}\\\\ &=\sum_{m=1}^n f(m)g(n-m) \end{align}$$