Bonferroni’s Principle discussed in Mining of Massive Data Sets book

1.3k Views Asked by At

I am reading a chapter of book Mining of Massive Data Sets book is available here http://www.mmds.org Chapter 1 http://infolab.stanford.edu/~ullman/mmds/ch1.pdf Now in Section 1.2.3 An example of Bonferroni's principle has been given. I have not been able to understand following text from book

There are one billion people who might be evil-doers. 2. Everyone goes to a hotel one day in 100. 3. A hotel holds 100 people. Hence, there are 100,000 hotels – enough to hold the 1% of a billion people who visit a hotel on any given day. 4. We shall examine hotel records for 1000 days

How do they come to conclusion that there are 100,000 hotels? How are 100,000 hotels enough to hold 1% of a billion people.

Then in the same example they have mentioned following

Thus, the chance that they will visit the same hotel on
one given day is 10−9. The chance that they will visit the same hotel on two
different given days is the square of this number, 10−18. Note that the hotels
can be different on the two days.

This probability calculation is also not clear to me. Any help will great. Thanks

2

There are 2 best solutions below

5
On BEST ANSWER

We are studying $10^9$ people.

On any day, we should expect $1\%$ of them to visit a hotel, which would be $10^9 \cdot 10^{-2}=10^7$ of them deciding to visit a hotel.

Each hotel can hold $100=10^2$ people, to handle then $10^7$ people, we need $\frac{10^7}{10^2}=10^5$ hotels in the market.

Now for your second question, the probability that two particular people would decide to visit some hotel on the same day independently is $(10^{-2})^2=10^{-4}$. Remember that we have $10^5$ hotels, hence the chance of visiting the same hotel on the same day would be $(10^{-5})(10^{-4})=10^{-9}$. The probability that they will visit the same hotel on $2$ different day would then be $(10^{-9})^{2}=10^{-18}$.

2
On

Thanks for posting this question. I too have a related question. I'm clear about needing $100,000$ hotels, as Siong Thye Goh answered, since out of $10^9$ people, we can expect $(10^9)(\frac{1}{100})=10^7$ people to visit any hotel on any day. Since any hotel can hold at most $100$ people, we need $\frac{10^7}{100}=10^5$ hotels.

However, for calculating the probability, if we assume that each person decides to go to any hotel on a particular day independently, and chooses the hotel to go to independently, then, the probability that a particular person goes to a particular hotel on a particular day is $(\frac{1}{100})(\frac{1}{10^5})=10^{-7}$. And given that one person went to a particular hotel on a particular day, another person decides to go to the same hotel on the same day independently, and the probability that two people decide to go to the same hotel on any particular day is therefore $(10^{-7})^2=10^{-14}$. Thus, the probability that two people go to the same hotel on two different days should be $(10^{-14})^2=10^{-28}$, and not $10^{-18}$ as computed in the book. I think the calculation in the book is correct only if the two people choose the hotel together. However, this is not the scenario being considered here, since this calculation in the book follows the assumption:

Suppose, however, that there really are no evil-doers. That is, everyone behaves at random, deciding with probability $0.01$ to visit a hotel on any given day, and if so, choosing one of the $10^5$ hotels at random.

If the probability is indeed $10^{-28}$ as I computed above, "the expected number of events that look like evil-doing" computed subsequently in the book, would be an insignificant number, instead of $250,000$, as computed in the book.

Hope someone could clarify this.


OK, wait, now that I think of it, I understand where I'm wrong. Given that one person went to a particular hotel on a particular day, another person decides to go on the same day independently, but he can no longer choose the hotel, since he has to go to the same hotel as the first person. So the probability that two people go to the same hotel on any given day is $(1/100)(1/10^5)(1/100)=10^{−9}$, just as in the book. My bad.