How do you approximate continuous distribution with discrete?

1.7k Views Asked by At

I have a continuous distribution $X$, which I would like to approximate by a discrete distribution. How do I do this? In particular, I would like a set of values of $X$ (which must be finite or at least countable) and probabilities associated to each point in that set.

For example, if I had a normal distribution with mean 0 and variance 1, then I could approximate it by the discrete distribution that takes the values -0.5, 0, and 0.5, with probabilities 0.25, 0.5, and 0.25 respectively.

That's obviously just something I made up in my head. What's the "best" way to do this?

The other way I thought of was to just simulate lots of samples from $X$, and then "group them together" in thight intervals. And then use the frequency of those intervals to calculate their probabilities ... and then to get a particular point, I could just round up.

So for example I could simulate from the normal distribution, and I would group together all samples in the interval [0, 0.001] and I would denote this "0" and give it a probability equal to how many samples were in that interval divided by total samples.

Is this the way to do it?

3

There are 3 best solutions below

0
On

1 Choose a constraint on your approximation. For example, how many points should it be supported on? (If you later want to apply FFT, for example, a power of two would be a good idea.)

2 Choose a metric that you will use to determine how close you are to the original distribution (e.g., Kolmogorov-Smirnov statistic or Kullbeck-Leibler divergence).

3 Optimize to that metric within those constraints.

0
On

The KL divergence between two distributions, one discrete and other continuous may not give you something actionable. See discussion here: https://stats.stackexchange.com/questions/69125/is-it-possible-to-apply-kl-divergence-between-discrete-and-continuous-distributi?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

The KS statistic or minimize the maximum difference of the cdfs for a given number of points - e.g. describe the best n-step cdf (for an n state discrete distribution) to minimize the maximum distance between that and the cdf you wish to approximate.

0
On

How about using the fact that: $$n^{-1}\lfloor nX\rfloor\leq X\leq n^{-1}\lceil nX\rceil$$ where: $$n^{-1}\lceil nX\rceil-n^{-1}\lfloor nX\rfloor\leq n^{-1}$$

This for a large positive integer $n$. Here $n^{-1}\lfloor nX\rfloor$ and $n^{-1}\lceil nX\rceil$ are both discrete random variables.