Expected number of samples to estimate the mode of categorical distribution

132 Views Asked by At

Suppose I have a categorical distribution with pmf $(p_1, \dots, p_n)$. What is the expected number of iid samples $\mathcal{S} = \{x_1, \dots, x_m\}$ I have to samples for the empirical mode to be equal to the actual mode $\mathrm{mode}(\mathcal{S}) = \arg\max_i p_i$.

That seems like a very standard question but I can't find anything about it online. Clearly, if the mode has probability 1 then the expected number of samples is 1 and when the mode goes to $0$ then the number of samples has to goes $\infty$. Intuitively the number of samples is related $\frac{1}{\max_i p_i}$, although the actual number probably depends on the pmf and not only on the probability of the mode.

Has this expected number of samples been studied ? Is the intuition of the number of samples being related to $\frac{1}{\max_i p_i}$ true and if so what is the formal relation ?

1

There are 1 best solutions below

1
On BEST ANSWER

Dutta and Goswami (2010) discussed just this problem, and proved several results. From one of these results (Theorem 4), it follows that if the distribution is unimodal with $n$ being the number of distinct values, then a sample size of— $$m = \left\lceil \frac{(4\epsilon+3)(\ln(\frac{n}{\delta})+\ln(2))}{6\epsilon^2} \right\rceil, $$suffices to correctly estimate the mode with probability $1-\delta$, where $\epsilon > 0$ is less than one half of the smallest possible difference between the mode's probability and the next most frequent item's probability.

References

  • Dutta, Santanu, and Alok Goswami. "Mode estimation for discrete distributions." Mathematical Methods of Statistics 19, no. 4 (2010): 374-384