The "Mutual Information" YouTube channel shows in the video "The Most Important (and Surprising) Result from Information Theory" at 4:18 and then confirms in its comment that Shannon proved not only that you can get arbitrarily close to 0 bit error probability with rates not exceeding the channel's capacity, but also that the relation switches to linear growth with the excess in rate afterwards. Is it true? I've seen graphs suggesting the contrary.
If it is linear, is there an intuitive argument for it?
Discrete memoryless channels (well, even finite-memory channels) were shown to admit "strong converses" by Wolfowitz back in the late 50s. These results are of the following form: let a $(n,M,\varepsilon)$ code be one that transmits $M$ messages over $n$ uses of the channel, with probability of error at most $\varepsilon$. Let $C$ be the capacity of the channel. Wolfowitz shows (for DMCs, say), that for any $\varepsilon < 1,$ if the probability of incorrect block decoding is $\le \varepsilon,$ then $M < 2^{nC + K\sqrt{n}}$ for some constant $K$. Note that the rate of the code in this notation is $\log M /n$.
In other words, if we send $n \to \infty$ (large blocklengths, which note that we need to communicate at rates close to capacity), then if the rate of the codebook exceeds capacity by any constant, the error rate must explode all the way to $1$. In fact even more is known: with finite blocklength, it turns out that to reliably communicate, we need to undershoot capacity in the sense that the highest $M$ achievable looks like $2^{nC - \sqrt{nV}Q^{-1}(\varepsilon) + O(\log n)},$ where $V$ is called channel dispersion, and $Q(x) = \int_x^\infty e^{-u^2/2}\mathrm{d}u/\sqrt{2\pi}$ is the Gaussian upper tail (so $Q^{-1}(\varepsilon) \sim \sqrt{2\log \varepsilon}$).