I am aware of at least two types of mathematical information:
Shannon Information, which is the negative of entropy (i.e. a loss of entropy by $n$ bits is precisely a gain of Shannon information of $n$ bits). One also has mutual and conditional information of two random variables, and information theory is named for this type.
Fisher information. This type is apparently very commonly used and considered in statistics, to the best of my limited knowledge. It is also supposed to be the source of the name for information geometry (since the Fisher information metric is the Riemannian metric).
My question: Are there any other mathematical concepts of information which I am missing? How many types of mathematical information exist?
Certainly the two you mentioned are important, and perhaps the most widely known.
For categorical data there is Sorenson's similarity index. (Roughly, the probability that a randomly selected individual in my population will belong to the same category as I.) See Wikipedia on
categorical datafor several more.In a sense random data have less structure or 'information' than data with some nonrandom structure. Runs tests and autocorrelation functions are ways of judging randomness.
In his famous assertion that it takes about 7 shuffles to put a deck of cards into something like random order, Persi Diaconis used the concept of rising sequences as a measure of information that might be exploited by a player. (Rising sequences have been used by magicians doing card tricks for over a century.) Google
Diaconis shuffleto find a NYT article on this topic and the paper by Bayer and Diaconis with wonderful mathematical details.Digression: I will leave the definition of 'rising sequence' to Bayer and Diaconis. A deck in order from 1 through 52 has one rising sequence. One shuffle (cut and riffle) results in two rising sequences, two shuffles usually in four. A randomly permuted deck averages 26.5 rising sequences. The figure below, based on simulations, shows how the distribution of the number of rising sequences comes closer to the distribution for a random deck (lower right) as the number of shuffles increases.
Code makers and breakers have several measures of information content in a string of characters. Some of these can be found in readily available books and papers on cryptology. I would not be surprised if some of the most useful ones may have been discovered by mathematicians at government agencies and not publicly available.
There is a sense in which a Markov Chain can contain more information than a sequence of independent random variables. The whole idea of a Markov Chain is that the current observation may be useful in predicting the next one.
This is a list of a few kinds and measures of and viewpoints on information that are in practical use. I suppose that some of them can be shown to be related to (perhaps even equivalent to) Fisher or Shannon information and that some are not. My purpose is not to give you an exhaustive list, but just to suggest that the list of measures of information is quite large and may be endless.
As big data analysis becomes increasingly popular (and one hopes better focused and organized), I suspect that measurement of information content in large datasets will become a substantial field of investigation.