Abstract alphabets by Kolmogorov??

419 Views Asked by At

As a linguist I sometimes have a hard time trying to figure out the sense in which mathematical concepts are / might be used in my discipline. It is no wonder that we linguists are fascinated by the invention of the alphabet as a seminal milestone in Human History. However, mathematics plays around with the concept in a manner I would like to fully grasp. Thus, for instance, I find what I render a useful (and probably standard) definition can be found in

Johannes A. Buchmann Introduction to cryptography, New York, Springer, 2001. p. 73:

“To write texts, we need symbols from an alphabet. By an alphabet we mean a finite nonempty set $\Sigma$. The length of $\Sigma$ is the number of elements in $\Sigma$. The elements of $\Sigma$ are called symbols or letters.”

“Because alphabets are finite sets, their symbols can be identified with nonnegative integers. If an alphabet has length $m$, then its symbols are identified with the numbers in $\mathbb Ζ_m = \{0,1,\ldots, m-1\}$

Then he proceeds to elaborate on that in a fashion which will be too trivial for our audience here.

The fact is that, while reading some popular science book in Spanish about Kolmogorov, in the chapter about Kolmogorov complexity, after introducing all the typical concepts (entropy, mutual information and the like), they mentioned that Kolmogorov and his collaborators extended their ideas about finite alphabets (for a linguist the feature 'finite' seems substantial to the definition of alphabet) to what they called infinite alphabets. I have been searching on the web, in English, German and Spanish, but did not find any insightful or useful comment in this regard. Could someone explain me what an infinite alphabet is and, specifically, how Kolmogorov complexity, epsylon-entropy and so on ought to be understood for infinite alphabets?

Many thanks in advance.

3

There are 3 best solutions below

4
On BEST ANSWER

Well I guess Shannon and Kolmogorov first started with defining the capacity or entropy of binary channels (i.e. messages over the alphabet {0,1} with some Bernoulli measure), but as always later on mathematics took over to generalize this as much as possible while keeping the important properties and results about entropy.

Nowadays Kolmogorov(-Sinai) entropy is a very important tool in dynamical systems and ergodic theory and the Gauss map is just one example of a dynamical system which has many applications also to number theory etc. The thing is that it should be treated using a countable partition (just because of the map having a countable number of surjective branches) and this can be done by the generalized theory of Kolmogorov entropy treating countable alphabets.

10
On

First off, you should be careful not to attach any deep importance to the word "alphabet" here. From a linguistic point of view, the essential (and once revolutionary) feature of an alphabet is as far as I know that its symbols denote sounds rather than discrete words or concepts. This angle is, of course, completely absent in the mathematical use of the word. We've simply co-opted the word "alphabet" to stand for whichever inventory of symbols the strings in our mathematical theory are put together from. If the mathematicians who first formulated the theory had spoken a language written with a different script, the same concept might well have ended up being called a "syllabary", "script" or something like that.

A practical alphabet with symbols that can be reliably and quickly distinguished from each other by a person or a machine can only have finitely many different symbols. (Turing quipped that Chinese "attempts to have a denumerable infinity of symbols", but I don't think he spoke much Chinese).

Thus, most of the theory of formal languages (and its neighboring areas of computability, complexity, automata theory, and so forth) assume by default that the alphabet is a finite set.

It turns out after the fact that some parts of the theory don't use the assumption that the alphabet is finite, so there are some -- but not all! -- of its theorems that are still true even if $\Sigma$ happens to be an infinite set. This subset of the theory can sometimes be technically useful in that we can reason about sequences of, say, numbers as if they were sequences of characters, as long as we take care to keep within the part of the theory that works for infinite alphabets. That doesn't mean that one is claiming that an arbitrary number is actually an indivisibly write-down-able symbol -- just that it can be a useful analogy and intuition to imagine that it is, for some purposes.

That's really all there is to it: A theory made to deal with finite sets as alphabets turns out to be partially useful when the set in question is infinite, and it is useful for those purposes to keep explicitly track of which parts of the theory still work then.

Now what does this have to do with Kolmogorov complexity in particular? To the best of my knowledge (and I'm a computer scientist with a fairly theoretical slant, so I ought to know) nothing at all. Kolmogorov complexity is one of those areas of the theory that don't keep working if we use a finite alphabet -- it depends fundamentally on the finiteness of the alphabet. Allowing an infinite alphabet there would either (depending on just how we slice it) collapse everything into having a uniformly small complexity measure, or consider almost everything to have infinite complexity, which isn't very interesting either.

(Note that Kolmogorov was quite a polymath; he worked on a lot of things that are not specifically connected to Kolmogorov complexity).

1
On

Kolmogorov information and entropy indeed works for countable partitions of the phase space, which can be thought of as a countable alphabet. (This is the only infinite cardinality to be considered for finite measures.)

One standard example is the Gauss map which has countably many branches in the interval $[0,1]$. Now code the intervals for every one of those branches by a symbol, then you naturally need countably many of them (usually one uses the natural numbers $1,2,3,4,5,\ldots$ as an alphabet in this case).

The question now is to find an invariant measure for this map (the Gauss measure) and then we can use Kolmogorov's formalism for countable alphabets (or partitions) to compute the entropy of the Gauss map with respect to this measure.

As a reference for the Gauss map look at: http://www.maths.manchester.ac.uk/~cwalkden/magic/lecture07.pdf or the book by Brin Stuck on Dynamical Systems.

For the general framework take a look at the book by Petersen on Egodic Theory (pages 234-240) or the yet unpublished book by Einsiedler, Lindenstrauss and Ward: http://www.math.washington.edu/~solomyak/TEACH/582/11/EiLiWa_draft.pdf