The redundancy of a language is related to the existence of crossword puzzles. If the redundancy is zero any sequence of letters is a reasonable text in the language and any two-dimensional array of letters forms a crossword puzzle. If the redundancy is too high the language imposes too many constraints for large crossword puzzles to be possible. A more detailed analysis shows that if we assume the constraints imposed by the language are of a rather chaotic and random nature, large crossword puzzles are just possible when the redundancy is 50%. If the redundancy is 33%, three-dimensional crossword puzzles should be possible, etc.
In this case, redundancy is defined as $1-R$ where $R$ is the relative entropy: the ratio of the entropy of a source over the maximum entropy possible with the same number of symbols. Recall that entropy is defined as
$$H = - \sum_i p_i\log p_i,$$
where the sum is over all $n$ symbols in the alphabet and $p_i$ is the probability of character $i$ occurring. Further, $R$ is defined as
$$ R = \frac{H}{\log n}.$$
I understand the first part of his comment, it's clear that if the redundancy is $0$ then any 2D grid is a valid crossword since all character combinations are valid. However, what is the explantion behind the last two parts: large crossword puzzles are just possible when the redundancy is 50%. If the redundancy is 33%, three-dimensional crossword puzzles should be possible, etc..
I'd expect the way that large is defined is flexible here, but I assume it means that the words are nontrivially connected.
I'm looking for insight on how to define large in this case, and why we expect the needed reduction in redundancy as we increase in dimensions (in particular the $1/d$ scaling rule he suggests)
References
[1] Shannon, Claude Elwood. "A mathematical theory of communication." ACM SIGMOBILE mobile computing and communications review 5.1 (2001): 3-55. http://signallake.com/innovation/shannon1948.pdf