So it's known that $\sum_n \delta(x-nT) = \frac{1}{T}\sum_m e^{2\pi imx/T}$. This can be proven by expressing the left hand as a Fourier series and finding $c_m$. But it's just mindboggling that this is the case, is it not? Why would one expect this to happen?
If I were to try to plot a crazy amount of the right hand terms, i.e. $cos(2\pi x/T) + i \sin (2\pi x/T)$, would the results actually converge towards something that has value $T$ at all integer multiples of $T$ and zero otherwise? I find it extremely hard to believe. How do you interpret this weird expression? Is there any way to justify it besides the formal proof?
This is not actually true pointwise, although you can gain some basic intuition from the fact that every term in the sequence is $1$ when $x=kT$. Indeed, for any given $x$, the terms do not converge as $n \to \infty$. (The $n$th partial sum is $$ \frac{\sin{(2n+1)y}}{\sin{y}}, $$ where $y=\pi y/T$: it is clear that as $n$ increases, this becomes a curve that wiggles between $\pm 1/\sin{y}$ increasingly often, so no convergence occurs, with the possible exception of a few points.)
I suggest the following: it is easy to show that for any function that decays fast enough for the sum to converge, $$ \sum_{n \in \mathbb{Z}} f(z-nT) = \sum_{k \in \mathbb{Z}} \tilde{f}(k)e^{2\pi i k x/T}, $$ where $\tilde{f}$ is an appropriate definition of the Fourier transform (in particular, in this case $\tilde{f}(k) = \frac{1}{T}\int_{-\infty}^{\infty} f(x)e^{-2\pi i k x/T} \, dx$); this is called the Poisson summation formula, and is basically equivalent to the formula you have written, with certain caveats. But choose something whose Fourier transform we understand, like $f(x) = e^{-\pi (x/a)^2}/a$, which has nice decay, and Fourier transform $\tilde{f}(k) = e^{-\pi k^2 a^2/T}/T$, and therefore the PSF gives $$ \sum_{n \in \mathbb{Z}} \frac{1}{a} e^{-\pi ((x-nT)/a)^2} = \frac{1}{T}\sum_{k \in \mathbb{Z}} e^{-\pi k^2 a^2/T} e^{2\pi i kx/T} \tag{1} $$
What good's this? Well, suppose $a$ is very small. Then the right-hand side looks like a function that is close to $1$ most of the time, with some complex wiggles, because the exponentials decay relatively slowly, and when one has become small, you've crossed into another period where there's another large one. Contrast this with what the left-hand side looks like: you have a sequence of very large peaks at each integer multiple of $T$, each with the same area for any value of $a$.
Hence, in a sense, the right-hand side of (1) turns into the right-hand side of your formula, and the left-hand side of (1) turns into the left-hand side of your formula if we take $a \to 0$. What is this sense? Distributionally! It means that if we multiply the equation by any well-behaved (continuous will do) function and integrate, then the equality will still hold in passing to the limit (whereas a cursory look at the graphs, as noted above, will show that equality does not hold in any pointwise sense. C.f. the explanation of your formula in the Wikipedia article.