In Cover & Thomas's "Elements of Information Theory", they prove the channel coding theorem (p.200). They do so by generating "random codes". This randomization is never actually used in practice when people write a channel code. Rather it is merely a theoretical method of constructing a proof.
I find this rather counterintuitive, and I would like to get a better intuition of using randomization/probabilistic methods to solve problems, where this randomization isn't inherently part of the problem itself.
Is there a textbook, with exercises that showcases a lot such problems and methods?
This is an extended comment, not an answer
The claim "randomization is never actually used in practice when people write a channel code" is not entirely correct. At least in the (very) large codelength regime, one valid approach to identify good channel codes is by computer search: Have the computer go through randomly generated codes (typically described by a generator or a parity check matrix) and it will eventually identify a good code. Of course, this search is not completely random but over a somewhat restricted search space, e.g., over the parity check matrices of a certain density (sparsity). This method is explicitly identified as a valid code generator approach in the standard book on error control coding by Lin&Costello (in the chapter of LDPC codes). Of course, one would probably attempt to improve this code by making (slight) sophisticated modifications, but this does not take away from the fact that the code is essentially randomly generated. I believe that some of the best reported LDPC codes in the literature were obtained by a (psedo-)random approach. However, these codes usually lack of other aspects such as efficient implementation of encoders.
What is not used in practice is decoding based on joint typically, assumed in the proof of Cover&Thomas. Practical decoders operate under the maximum likelihood principle (or an approximation of it when it is computationally impossible to exactly compute it).