Possibility of estimating unknown number of items based on observations of repetitions?

87 Views Asked by At

Each morning, a man chooses a shirt to wear from his wardrobe at random (probability uniformly distributed). It is unknown how many shirts are in the wardrobe. We observe the man’s shirt choice on 7 different days; in that time, he only wore one shirt twice. What is the most likely number of shirts based on these observations?

The way I look at it, we are looking for the probability of picking any given shirt twice. The inverse, the probability of not picking any shirt twice, appears easier to solve for. The probability of not picking the first shirt a second time on day one is k/k. Day 2: (k-1)/k. Day 7: (k-6)/k.

Total probability of not picking a shirt twice is the product: P = [(k)(k-1)(k-2)(k-3)(k-4)(k-5)(k-6)]/k^7

Inverse (odds of picking a shirt twice): 1 - P

We know k≥6 due to six unique observations. If it were exactly six shirts, a repetition clearly has probability one, but at least a second repeat seems intuitively fairly likely (without my probability equation seemingly able to expand to this possibility). There must be some sweet spot statistically where the likelihood of only one repeat is maximised?

If k=7, P = 1 - [(7!)/7^7] = 116,929 / 117,649 or ~99% likelihood of repetition.

An expansion of the question: The number of observations increases to 20 days, in which he is observed to wear three shirts twice. What is the most likely number of shirts in the wardrobe?

1

There are 1 best solutions below

0
On

Suppose that there are $N$ shirts. Then the probability of only rewearing one shirt over the course of $7$ days, given the value of $N$, is $$ P(\text{2 repeats}\mid N)=\frac{N\cdot\binom{7}{2}\cdot (N-1)(N-2)(N-3)(N-4)(N-5)}{N^7}. $$ Why? There are $N$ ways to choose which shirt gets repeated, and $\binom{7}{2}$ ways to choose the two days on which it is worn. Then, for each remaining day, we choose a shirt we haven't seen before to wear. These are then taken out of the $N^7$ total ways we can choose which of $N$ shirts to wear on each of $7$ days.

Viewing this as a function of $N$, you get the likelihood function. So... go maximize it!