Recently my boss has asked me to run some statistical analysis on our office's college football pool. He asked some specific questions and then also left it open ended so I can provide additional analysis if I found anything interesting.
The assumptions are as follows:
- The pool consists of 60 participants
- Each week, each participant picks the winners of 10 pre-set football games
- The pool is 15 weeks long
- Each person has a 50% chance of correctly picking each game
And the question I am struggling with is:
- With 60 participants, what is the confidence of the best participant hitting a season average of 55%. 57.5%, 60%? 62.5%
All of the other questions are pretty straightforward normal distribution questions that I was able to solve the answer to. Can anyone help me solve this, please? My issue with the question is I can't think of how to conceptually attack this problem. It seems that the most logical way to solve for this would be to run simulations, but it has been a while since my college stats classes in which they taught this. Any guidance or help would be much appreciated. I have been using excel for this analysis
The rest of the questions he asked, I left below:
- For an individual in a given week, what is the probability of getting 10 right, 9, 8, etc…
- Over the season, what can an individual expect out of 15 weeks- __ weeks of 6 right, __ weeks of 5 right, __ weeks of 4 right, etc.
- For the season, what is the chance an individual might get 55% right? 57.5%? 60%? 62.5%
- For all participants for all weeks, how many 10 week wins can we expect? 9 wins? 8? For example, could we expect that 2 people hit 10 wins sometime over the course of the season?
- What is the standard deviation of the group’s overall performance and some statistics on where the group might come out for say 1-standard deviation, 2 sd?
I think this will work. Someone please let me know if I've had a brain-o. (ETA: It does—kind of. But see the comments.)
With each player picking $150$ games, the distribution of the number of correctly guessed games is approximately normal with a mean of $150/2 = 75$, and a variance of $150/4 = 75/2$, and therefore a standard deviation of $\sqrt{75/2} \approx 6.12$.
What you're looking for is the distribution of the $k$th order statistic, with $k = n = 60$ (that is, the maximum). We can do this by finding this order statistic for a uniform distribution (the percentiles, effectively), and mapping this onto the normal distribution. This random variable $M$ (for Maximum) has a Beta distribution:
$$ M \sim \text{Beta}(60, 1) $$
which has a PDF of
$$ f_M(m) = 60 m^{59} \qquad 0 \leq m \leq 1 $$
So, let's suppose you want to know the probability that the winner gets at least $60$ percent. This is $90$ of the $150$ games, or $15/6.12 \approx 2.45$ standard deviations out. This puts it at the $99.2847$th percentile. Then we integrate
\begin{align} P(\text{winner picks at least $60$ percent}) & = \int_{m=0.992847}^1 60m^{59} \, dm \\ & = \left. m^{60} \right]_{m=0.992847}^1 \\ & \approx 0.35 \end{align}
I don't dare trust that figure very closely until I try with some more significant digits. (ETA: I think it's still pretty good—or it would be, if the normal approximation were good enough.) But you get the idea.