I'm a data scientist by trade, and when dealing with truly "big" time-series data (on the order of TBs), I often have to make a decision between averaging and random sampling. That is, I have to either sacrifice time by taking the computationally and temporally expensive route of averaging all values across a time subset, or sacrifice information representativeness by taking the computationally cheap route of taking a random sample of that subset.
I took a class on Information Theory back in undergrad, and my intuition is that an average is more representative of a subset than a random sample, but I don't know how to quantify that naïve assumption.
I imagine the "information" obtained by those two methods is dependent upon various properties of the data, i.e. variance, but I am simply not sure.
Answers would be greatly appreciated! Thanks!
NOTE: I suppose it's important to mention that this isn't a strict dichotomy, rather, I could also average any number N of random samples. I gave the two ends of the spectrum just to simplify the problem somewhat.
The answer to your question depends deeply upon the task at hand. If you're interested in looking at the average of the difference of two points, randomly selected from ${\cal D}$, or the range of ${\cal D}$, then the mean is useless. You must use other measures, such as variance. If instead you want to know the average value, well of course the mean is all you need.
For your specific question about "information," well this too depends upon the distribution. If you have a Delta function distribution, then the mean tells you everything. If instead you have a complicated mixture of other distributions, then the mean tells you almost nothing and you need lots more information (bits) to describe the full distribution.
You should look up sufficient statistics (if the distribution is known), since knowing the values of the sufficient statistics of your distribution is (provably) all you will ever have to use to answer any question about your data set. For example, the sufficient statistics for a one-dimensional Gaussian distribution are its mean and variance. Those two numbers tell you everything you'll ever be able to know about that particular distribution, allowing you to calculate any moment, any expected differences in randomly sampled points, etc. The difference in information (in bits) between knowing just the mean and knowing the mean and variance is in the number of bits used to specify the variance.