I am a little confused about which probabilistic model to apply to this.
Say I have an intern who is entering data on a daily basis. I have a clerk who at the end of the week pulls out a random sample of the total data entered and verifies its correctness.
Let's say The intern enters 10K data weekly. The clerk pulls a random sample of 1K per week and verifies it. Say the correctness comes to around 98%.
At the end of the month, the clerk's boss performs a quality check on the clerk's work by randomly pulling out all the entries that have been verified by the clerk and then assigns a score to the clerk. Let's assume that At the end of the month the boss pulls out 400(out of 4K entries verified by clerk).
Say the score comes to 95%.
Then how should I determine the validated score for the intern in the first place?
So the problem is : The intern enters 10K data weekly. The clerk pulls a random sample of 1K per week and assigns a correctness score. At the end of the month the boss pulls out 400(out of 4K entries verified by clerk). The boss assigns a correctness score to the clerk.
On the basis of the above method, which probabilistic model should be implemented to determine the probability that the entry made by the intern is correct in the first place.
I can think of two methods, but not sure which one is correct.
First method is to simply multiply = P(intern) * P(clerk)
Second method is using Baye's theorem.
P(intern/clerk) = (P(clerk/intern) * P(intern)) / P(clerk)
and if Baye's theorem is correct, then what would be the value of P(clerk)? I guess the value of P(clerk/intern) = 0.95 and P(intern) = 0.98. How would I determine the value of P(clerk)?
Which method is more correct?
Legend:
P(intern) = Probability that the data entered by intern is correct P(clerk) = Probability that the data verified by clerk is correct
Define events:
\begin{eqnarray*} I &=& \text{"Intern is correct"} \\ C_1 &=& \text{"Clerk marks it correct"} \\ C_2 &=& \text{"Clerk's mark is correct"} \\ \end{eqnarray*}
Since event $I$ can happen in two ways:
\begin{eqnarray*} P(I) &=& P(C_1\cap C_2) + P(C_1^c\cap C_2^c) \\ && \\ &=& P(C_1)P(C_2) + P(C_1^c)P(C_2^c) \qquad\text{(assuming $C_1,C_2$ are independent)} \\ && \\ &=& 0.98\times 0.95 + 0.02\times 0.05 \\ && \\ &=& 0.932 \end{eqnarray*}