Deriving Probability Theory from Information Theory

133 Views Asked by At

In the paper "A Philosophical Treatise of Universal Induction" section 3 on Probability describes three different interpretations of probability theory: frequentist, objectivist, and subjectivist.

I am a proponent of concise definitions of fields. The most elegant definition I know of for Mathematics is: The study of patterns. I believe the best such definition for probability is: The study of patterns with incomplete information.

It's all about working with what we know to arrive at a best estimate of systems for which we have incomplete information.

In that light, it seems like one could derive probability theory from information theory. I know that historically, information theory is based on probability theory, but I wonder if one could reformulate information theory independent of probability theory, then derive probability theory from information theory.

Has that been attempted? Does it make any sense? Would it be of any value? I'm sorry if this is a malformed question. I'm not a mathematician by training. It simply seems like information is a more fundamental concept that probability should have been based upon but for historical reasons, the opposite happened.

EDIT: (For clarification)

Instead of defining information in terms of probability:

I(m) := -log(Pr(M=m)) // log base 2 for information in bits

It seems like one could define probability in terms of information (or lack there of):

Pr(M=m) := b^(-I(m)) // b=2 for information given in bits

If we take the definition of mathematics from above: The study of patterns We could build a formal language for describing patterns from a set of symbols (say {0,1}). Then we could use that language (as we have) to build pattern-based models that approximate the mechanics of the world (or imaginary worlds). Information could be defined as the length of those descriptions in units related to the alphabet (bits if the alphabet is {0,1}). Then, when dealing with systems for which we have incomplete information, we can derive probability theory to help us infer properties of the system or make optimal decisions with the information at hand or incorporate new information into our model.

That's the basic idea. The benefit, as with other interpretations, is to gain insight from viewing problems from a different perspective. We currently use information indirectly, for instance; when assigning priors. It might allow us to develop a more rigorous approach in practice.