Lets suppose an election with $n$ people voting for $k$ candidates. An exit poll asks people what they voted for. Lets also suppose that they always say the truth and that we just care to find the candidate with the max votes.
Under regular assumptions about sampling, could we parameterize the probability of finding the winner after asking only a portion of the voters? Intuitively, asymptotically the probability goes to and reaches 1 when we question the whole set of voters.
The motivation behind this question is the following: We have a large set of data and an algorithm running on every single one of them. This takes a bit too long though. Given that this method outputs an integer, by taking $k$ samples maybe it is enough to make a guess for the maximum at least. (the most frequent number that pops up).
From related question, this was proposed to me: Using s to estimate $\sigma$ when finding the sample size in confidence interval questions, but I think it is not exactly equivalent. Could you help me model this?
I have the following proposition. Could you comment if you see any flow in the logic?
Let's suppose that the candidates were voted by a density of $p_1,p_2,p_3$ etc people $\in [0,1]$. Suppose we interview only 1 person. Then the probability that we get the winner is $\frac{p_1*n}{n}$. If we interview 3 people then we need 2 that have voted for the winner so: $\frac{p_1*n}{n} \frac{p_1*n-1}{n-1}$ $3\choose{2}$.
I think generalising this (let the following fraction $v_1$) we have $\frac{\frac{(p_1*n)!}{(p_1*n -k/2)!}}{\frac{n!}{(n-k/2)!}} $ $k\choose{k/2+1}$ and that is only for the exactly (k/2)+1 majority winner case. We need also for k/2+2 k/2+3... k winner observations.
The final formula is $(\sum_1^{k/2} v_i)$ * $n\choose k$
If we suppose that $p_i$ is known then we need to solve that formula to be greater than .99 if we want to be 99% sure.
Is the reasoning correct? Can you propose how the computation could be simplified? Or a lower bound for the sum maybe?