How to calculate entropy from a set of correlated samples?

914 Views Asked by At

Calculating the entropy of a set of uncorrelated samples is pretty straight forward: make a histogram, turn the histogram into probabilities, then calculating entropy the usual way: (How to calculate entropy from a set of samples?)

Doing this, a flat histogram results in a maximum entropy value whether the samples were correlated or not though.

Is there a way to calculate the entropy of correlated samples?

One thought I've had is making a markov chain of the samples and somehow using conditional entropy (https://en.wikipedia.org/wiki/Conditional_entropy) to calculate an entropy value, hoping that the markov chain would "automatically discover" correlation at least on the small scales (larger order markov chains being needed for longer scale correlation!). So far I haven't figured out how to make it actually work though.

Ideally I'm looking for a way that doesn't require knowledge of how the samples are generated, but if that is not possible, a method that does require knowledge of how they are generated would be helpful too.

1

There are 1 best solutions below

2
On BEST ANSWER

There is more than one definition of entropy you could be pursuing. It is up to you to define what is it exactly that you want to calculate, or give intuition on what properties the metric should have

Interpretation 1

$$H(X) = -\sum_i p_i\log p_i \approx -\sum_i \hat{p}_i\log \hat{p}_i$$

In its basic definition Shannon entropy does not care if the sequence of data is correlated, but only cares about the distribution of data. The only effect of the data being correlated on the entropy estimation is that you may be required to obtain more samples to get a dataset representative of your probability distribution than you would have had to in the iid case. If you have sufficient samples of your variable to be representative of the underlying probability distribution, then it does not matter that they are correlated. You can uncorrelate them by simply scrambling them in time, if you wish, but, as I said, the formula for entropy does not even care about their order. If you do not have sufficient samples, then your estimate of entropy is going to be wrong simply because the data does not carry sufficient information about the underlying distribution. The knowledge of the underlying correlation can help you estimate how many points you may need to sample, but it does not help in improving the actual entropy estimate.

Interpretation 2

$$H(X | Past) = H(X, Past) - H(Past)$$

The conditional entropy estimates the uncertainty about a random variable given additional knowledge. If you want to calculate in using binning, then you bin the joint distribution and the conditional variable, estimate entropies and subtract them. In the simplest case, if you have an order one Markov chain ($Past = X(t-1)$), the joint distribution $P(X(t), X(t-1))$ is a 2D distribution, and the conditional variable distribution $P(X(t-1))$ is a 1D distribution. Now, imagine that, in order to estimate the entropy $H(X)$ of a 1D distribution to a good accuracy, you require N=1000 datapoints. No surprise, you would require ~$N^2$ points to estimate conditional entropy of markov order 1, ~$N^3$ for markov order 2 etc. So, obviously, without further assumptions, it is not possible to estimate $H(X|Past)$, because you will have as many datapoints as dimensions, but you need an exponential number of datapoints to perform the estimate. This whole analysis is also dependent on the assumption that the probability P(X(t)) does not explicitly depend on time, but only on the past values of $X$. In other words, if a repeated experiment cannot be considered identically distributed, it is not possible to make progress.

Other possible interpretations

In case you actually don't want to calculate entropy, but some other measure that, for example, infers the temporal relationship in your data, then I can advice you further, but you would have to re-state the desired effect of your metric in your question. The simplest version is the mutual information between past and present samples.

Warning

Estimation of entropy from finite data is a notoriously hard problem. In particular, the naive binning method is quite sensitive to the exact number of bins in your histogram, and is also biased for it consistently underestimates true entropy. There are more advanced methods, but they are harder to implement. I would recommend using an existing suite for entropy estimation, as opposed to writing it yourself. For further reading on specifics of estimation I highly recommend a paper by Liam Paninski.