How to measure the information content of a function?

574 Views Asked by At

I would say that the function $f(x)=x^2$ has more "information" than the function $f(x)=1$. Is this true, and if so, is there a way to measure the information content of a function?

3

There are 3 best solutions below

0
On

I would say that information is a relative thing. You need to define a reference space, so you can talk probabilities. When you have probabilities, then you can measure information.

If your space contains only the two mentioned functions, then you need to make a single measurement at $x=0$ to decide which function is presented. So, the functions are equal and you will need a single bit to describe the space.

You may have a different space though. For instance, take the same functions, plus a few new functions that are exactly similar to $f(x)=x^2$, on an interval $[a,b]$ and mutually different, on the rest of the domain. Then, we need more information to pinpoint $f(x)=x^2$, compared to $f(x)=1$, as there are some similar functions.

0
On

To be concrete, let $f:[-N,N]\rightarrow \mathbb{R}.$ The usual way of defining the distribution of a random variable is how we can proceed. Assume the uniform measure on $[-N,N]$ with $N<\infty$, and let $f(X)=Z.$ Then $$\mathbb{P}(Z \in A)=\mathbb{P}(f(X) \in A)=\mathbb{P}(f^{-1}(A)).$$

In this context we can say that the entropy of the function is the entropy of its output as a random variable.

If we choose the $f(X)=c,$ some constant then clearly $H(Z)=H(f)=0,$ since all the probability is concentrated at one point.

If $f$ is one-to-one, then the entropy is maximal.

If $f$ is two-to-one, then the entropy is smaller $(f(x)=x^2).$

Since it is the entropy of the function which maps the set $[-N,N]$ I wouldn't put any other measure on $[-N,N]$ and that's why I took $N<\infty$ to ensure the uniform distribution exists.

0
On

I'd suggest that the Kolmogorov complexity (https://en.wikipedia.org/wiki/Kolmogorov_complexity) of your $f(x)$ is the best way to define/measure its information content. It's simply the length of the shortest computer program that calculates $f(x)$. And then, clearly, $f(x)=1$ can be represented by a shorter (though not by much) program than $f(x)=x^2$.

This is actually a pretty common interpretation of information content of a function. A more elaborate discussion is at https://www.mdpi.com/1099-4300/13/3/595/htm (and google kolmogorov complexity vs entropy, or similar query, for more).