Studying for my upcoming statistics exam I tried to solve the following:
In some population, each individual likes exactly one out of 30 possible music genres. In some survey, n people are drawn randomly from the above population and are asked what music genre they like. Denote by Pi the true proportion of people in the population that like genre number i. What is the smallest number n, such that with a probability of 97%, or higher, for all genres, the proportion of the people in the survey who answered they like music genre i is at most 5% of difference from Pi? Use Hoeffdings inequality to prove your claim.
I plugged in the probabilities, and managed to figure that for some given genre music i, one would to draw at least 840 individuals for the following to hold. And figured that since this holds for all genres, that 840 is actually the minimal sample size for this to hold for all genres. I thought I got it right, but then I thought that the number n that i got would hold for any number of genres for the same problem (which seems to me a bit unlikely, for example, that we would need the same sample size for the for the following problem and the same problem but with only 2 possible genres to pick from)
Can anyone elaborate on this?
Let $i \in \{1,\dots,k\}, k \,(=30)$ be the distinct categories, let $I_1, \dots I_N$ be the population with $N < \infty$ unknown and let $p_i = \sum_{I_j : I_j \text{ likes } i} 1 / N, i =1\dots, k$ be the unknown fractions you want to estimate.
For what I understood, the solution to your problem goes like this, let us fix the category $i \in \{1,\dots,k\}$, then you sample $n$ individuals, let $X_1,\dots,X_n$ represent the random variables associated to the category individual $j \in \{1,\dots,n\}$ likes. Thus you can compute an unbiased estimate of $p_i$ by computing $\bar{p}_i = 1/n\sum_{j=1}^n \unicode{x1D7D9}[X_j = i]$ where $\unicode{x1D7D9}[\cdot]$ is the indicator function. Now you need to apply the Hoeffding bound, this can be done since $X_1,\dots,X_n$ are independent r.v. Thus, let $\epsilon \in (0,1)$, $$\mathbb{P}[|\frac{1}{n}\sum_{j=1}^n \unicode{x1D7D9}[X_j = i] - p_i| \ge \epsilon p_i] \le 2 e^{-n \epsilon^2 p_i^2}$$
Now you can define $E_i := |\frac{1}{n}\sum_{j=1}^n \unicode{x1D7D9}[X_j = i] - p_i| \ge \epsilon p_i, i=1,\dots,k$, you can conclude by applying the union bound to $\mathbb{P}(\cup_i E_i)$ and setting the final probability to be less than $\delta$.