I'm looking for an efficient way to estimate entropy of continuous distribution based on a large (millions of samples) sample drawn from it. I've found a few papers (example: https://www.sciencedirect.com/science/article/pii/S0377042713006006), but it seems that all the methods suggested in those papers are better suited for relatively small samples.
Options tried and rejected:
- Binning samples and calculating entropy for induced discrete distribution - very noisy, depends a lot on the binning method;
- Estimators based on ranking (like in the paper above) - scale poorly to large samples, both in time they take and numerical accuracy (orders statistics difference becomes unstable since they're too close to each other);
Additional obstacles arise from the fact that some of the distributions I'm working with are not truly continuous - they're discrete with huge cardinality (like direct sum of tens of one-dimensional distributions of cardinality in thousands), hence difference-based methods simply fail since there can be long streaks of equal order statistics.
Any insights would be appreciated.