Why should one use a Poisson distribution instead of maximum entropy?

145 Views Asked by At

I'm trying to understand the meaning of the Poisson distribution and when it should be applicable. However, I'm confused by some of Wikipedia's examples of applications.

For concreteness, consider this example

For instance, a call center receives an average of 180 calls per hour, 24 hours a day. The calls are independent; receiving one does not change the probability of when the next one will arrive. The number of calls received during any minute has a Poisson probability distribution with mean 3: the most likely numbers are 2 and 3 but 1 and 4 are also likely and there is a small probability of it being as low as zero and a very small probability it could be 10.

My immediate instinct when reading this example was to go after the discrete probability distribution that maximizes entropy, constrained to the requirement that the average should be $\lambda = 180$ calls per hour. This can be done by maximizing the quantity $$- \sum_n p_n \log p_n - \alpha \left(\sum_n p_n - 1\right) - \beta \left(\sum_n n p_n - \lambda\right),$$ where $\alpha$ and $\beta$ are Lagrange multipliers that will later be fixed by the constraints on normalization and average. The resulting distribution is then $$p_n = \frac{1}{1+\lambda}\left(1 + \frac{1}{\lambda}\right)^{-n}, \tag{1}$$ which is not the Poisson distribution. If we plot it against a Poisson distribution for $\lambda = 5$, we get Comparison of the distribution of Eq. (1) with a Poisson distribution. Eq.(1) has a peak at zero, and then decays.

Later on the Wikipedia article, it is stated that

The Poisson distribution is an appropriate model if the following assumptions are true:

  • $k$ is the number of times an event occurs in an interval and $k$ can take values 0, 1, 2, ….
  • The occurrence of one event does not affect the probability that a second event will occur. That is, events occur independently.
  • The average rate at which events occur is independent of any occurrences. For simplicity, this is usually assumed to be constant, but may in practice vary with time.
  • Two events cannot occur at exactly the same instant; instead, at each very small sub-interval, either exactly one event occurs, or no event occurs.

If these conditions are true, then $k$ is a Poisson random variable, and the distribution of $k$ is a Poisson distribution.

but I do not understand how the distribution on Eq. (1) fails to satisfy these requirements.

In short, the point of this post boils down to the two related questions:

  1. In problems such as the call center one, how could one determine that the Poisson distribution is better suited than the one on Eq. (1)? In other words, which sort of information is being provided besides the average of the distribution?
  2. How does Eq. (1) fail to satisfy any of the requirements that Wikipedia states as pretty much defining a Poisson distribution? Does any of them encode information that is not giving by the maximum entropy procedure?

Derivation of Eq. (1)

For completeness, let me sketch the derivation of Eq. (1). Since we want to maximize $$- \sum_n p_n \log p_n - \alpha \left(\sum_n p_n - 1\right) - \beta \left(\sum_n n p_n - \lambda\right),$$ we differentiate it with respect to $p_n$ for each $n$ and each those derivatives to zero. This leads us to $$- \log p_n - 1 - \alpha - n \beta = 0,$$ and hence $$p_n = \exp(- 1 - \alpha - n \beta).$$

We may then impose the normalization and average conditions $\sum p_n =1$ and $\sum_n n p_n = \lambda$ to fix the constants $\alpha$ and $\beta$. I did the computation on Mathematica and, after simplifying the expression, got to Eq. (1).