Determine the number of samples needed with Bayes theorem

29 Views Asked by At

I have a simple sample statistic problem, but I am not sure that I solve it properly.

I have to enter many student marks to a test in an Excel table. After entering all $N$ marks, I want to know the sample size $S$ I should verify to have 95% chance at least that all my marks are correctly entered in the table.

I tried solving this with the Bayes theorem (which I often struggle to apply btw), here is my attempt :

  • event $E$ : $S$ marks verified have been correctly entered in the table
  • hypothesis $H$ : all the marks are correctly entered
  • wanted : $\mathcal{P}(H|E) = 95\% $
  • theorem : $\mathcal{P}(H|E) = \dfrac{\mathcal{P}(E|H)\mathcal{P}(H)}{\mathcal{P}(E)}$

I would then say that $\mathcal{P}(E|H) = 1$, $\mathcal{P}(H) = 75\%$ (let's say I a messy today and I may miss 1 mark out of 4), $\mathcal{P}(E) = \frac{S}{N}$ (not sure about this one), so that :

$$S = \frac{75}{95}N$$

So that for $100$ marks I should check $79$ marks. What leads me into thinking that I am very wrong here is that if I only wanted to be less sure that I correctly entered the marks (let's say $90\%$), I would have to check a larger sample $S$... Could you help me here ?

Disclaimer : I am not a statistician, so I apologize if I misused some concepts, I am just curious and I want to understand :)

1

There are 1 best solutions below

4
On

You have the right idea, but you are not writing the probabilities for the events of interest correctly.

Let $p$ be the probability of you making an error while entering one student's marks. Assuming that making an error in one student's marks is independent of making an error in other student's marks,

$$ \mathcal{P}(H) = (1 - p)^N \\ \mathcal{P}(E) = (1 - p)^S \\ \mathcal{P}(H|E) = (1 - p)^{N - S} $$ For this to be more than 0.95, $S \geq N - \frac{\log{0.95}}{\log{(1-p)}}$. If you are unsure about the above expressions, read about the Bernoulli and Binomial distributions.