I've noticed that it's very popular to formulate problems related to the probability of finding at least $k$ out of $m$ chocolate chips in one of $n$ cookies using Poisson Distribution. I wanted to know exactly why Poisson Distribution is suitable for this problem?
Why Poisson distribution is used for the Chocolate-Chip cookie problem?
1.8k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
In short, because the number of chips in a cookie is a good example of a Poisson process. There's no a priori physical reason why this should be so; in fact, quality controls processes might shy the process away from being Poisson.
The Poisson distribution is the canonical distribution of a Poisson process where there the frequency of events is independent of the events that have already happened, e.g. radioactive decay, etc. It's safe bet that the number of chips in a cookie is independent of how many chips have already made it into each dollop of delicious cookie batter.
Not all Poisson processes follow the Poisson distribution, however. Sometimes, your process might be more clustered or more regular. For example, the number of Starbucks in a zip-code is not Poisson -- there is a distinct dependency on the existence of other Starbucks in the region, and supply/demand usually ends up generating a cluster of events. Hence, cities have higher Starbucks density than rural areas.
On the other hand, some processes tend to distribute. For example, Bobcats are very territorial. So the spatial distribution of Bobcats tends to be more distributed and less clustered. There is a distinct dependence of prior events (an indigenous bobcat) on subsequent events (a new bobcat moves in).
Clustered/Distributed Poisson processes can be approximated using the binomial and negative binomial distributions. Specifically, the Poisson/Binomial/Negative Binomial distributions can be generated using a probability generating function approach with the generating function $f(r) = a+br$.
For the Poisson distribution, $b=0$. For clustered process, $b > 0$ and for distributed processes, $b < 0$.
This yields a family of singly-stochastic distributions that can do a fairly good job at approximating the number of discrete events in some interval, be it a time interval (events per day) or a spatial interval (Starbucks per square mile) or some other interval (chocolate chips per cookie).
Since we have no reason to expect interdependence of chocolate chips -- two chips have no interactions with each other that we care about -- then a Poisson process is most suitable.
It is a problem about Bernoulli trials. The binomial distribution goes to the Poisson one in the limit of a large number of trials and a small success probability. Let me solve the problem step by step so you can easily understand what I mean.
First of all, you need to calculate the probability of finding exactly $k$ chocolate chips in the first** cookie you take, given that your grandma used $m$ chocolate chips in order to make $n$ cookies.
Let's count:
The probability for a single chocolate chip to end up in your cookie is $\frac{1}{n}$ (fully random). This is the "success probability". So, the probability of each configuration having $k$ chips in it is $(\frac{1}{n})^k (1-\frac{1}{n})^{m-k}$.
The total number of different ways of selecting $k$ chips from all the $m$ is $\binom{m}{k}$. Each of these distributions is equally probable, with the probability given in the previous line.
Therefore, you will find $k$ chocolate chips in the first cookie you take with a probability of $P(k) = \binom{m}{k} (\frac{1}{n})^k (1-\frac{1}{n})^{m-k}$. You can tell that this is the binomial distribution.
If your teacher explicitly requests you to solve the problem using Poisson distribution (with average chips per cookie $\frac{m}{n}$), then you have to make sure that the exercise fulfills the conditions for it, namely that the success probability is small $\frac{1}{n} << 1$ and that the number of trials is big $m >> 1$. Usually $n > 100$ and $m > 100$ is given.
Finally, and just for completion, the probability of having at least $k$ chocolate chips is given by $1 - P(0) - P(1) - \cdots - P(k-1)$, where $P(k)$ can be the binomial or the Poisson distribution.
** This is a way to say that the probability is not conditioned on the previous knowledge of the number of chocolate chips in other cookies. For example, it might be possible that your relatives (or even yourself) ate already many cookies, but YOU don't know how many chocolate chips were in all the previously taken or still remaining cookies.