Probability of two ways of picking an item from a set in a set of sets.

111 Views Asked by At

There are so many questions on probability that there's a low probability of me finding a duplicate in the list, if there even is one.

And I can't figure out a good title, so feel free to edit!

Situation: There is a set of sets, and their sizes are different.  For example, a library is a set of books, and each book is a set of pages, but the number of pages varies. Here are two ways to pick a page at random:

  • Construct a set of all pages, i.e., of tuples containing all possible (book,page), and then a random pick from there.
  • Randomly pick a book, then randomly pick one of its pages.

Does every page in the library have the same probability of being chosen in both scenarios? Or does the second give pages in shorter books a greater probability? (Intuitively, the latter seems likely to me, but intuition is often wrong.)

2

There are 2 best solutions below

1
On BEST ANSWER

Think of a simple example where there is one book with two pages and one book with one page. Call pages $a$ and $b$ the pages of the book with two pages and $c$ the page in the book with one page. If you randomly pick a book and then a page then your probability of ending up with $c$ is $1/2,$ whereas if you do it the first way the probability of getting page $c$ must be $1/3.$

Thus, the first method gives the same chance to every page, while the second gives a greater chance to pages in smaller books.

2
On

Say there are $n$ books with pages $\{k_1, k_2, \cdots, k_n\}$ then probability that a particular page is picked when we have all pages laid out is $$\frac1{\sum k_i}$$ And if you pick the book first and then the page the probablity is $$\frac1n\cdot\frac1{k_i}$$ So we can't really say if either would be greater consider the example of 2 books with $k_1=1$ and $k_2=2$. Page in first book has greater chance to be picked if we do second method and page in book 2 has greater chance to be picked if we use the first method. So which is better depends on number of pages in the book and the total number of pages in all books.

So second method is better for a page in a book if the book contains less than average number of pages.