So the situation I have is N separate agents trying to predict a true/false event correctly. Say I have a set of true/false questions based on the same underlying concept, and at the start of a round I asked each one individually the question and determine if they are correct or not; say p% of them are correct. Then, I ask them all another question and find that X of them say true, and (N-X) of them say false. Is it possible, from this information alone, or across multiple trials, to determine the probabilities of true or false being the correct answer?
I say multiple trials because I feel like the probability of X of them being correct can be determined from a normal distribution, but I'm not entirely sure how.
Ideal situation would be I can determine the probability that the an answer is correct based off each independent trial.
Thanks for any help!
The problem is underdetermined – you need some kind of model of the agents.
In one extreme case, you could assume that the agents all independently have some unknown probability $q$ of providing the correct answer to a question, and you could have some uninformative prior for this probability (e.g. a uniform prior). Then you could get the posterior for this probability from the data for the first question and use that to calculate the probability of the data for the second question.
In the other extreme, you could assume that each agent is either always correct or always wrong, with an unknown proportion $\lambda$ always being correct. Then you could get the exact value of $\lambda$ from the data for the first question, and (unless it’s $\frac12$) this would tell you with probability $1$ which answer to the second question is correct.
In reality, the situation will likely be somewhere in between: The agents will have different reliabilities, and you need to somehow model their mix of reliabilities to get an answer.
In order to nevertheless give you some answer, let’s assume the first model, and to avoid having to choose a prior let’s say you ask so many test questions that the data approximately fully determine the probability $q$ independent of the prior. So now you know $q$ and you want to find the probability that the correct answer is “true” given that $X$ agents answered “true” and $N-X$ answered “false”.
The probability for this data in case the correct answer is “true” is $ \binom NXq^X(1-q)^{N-X}$, and the probability in case the correct answer is “false” is $\binom NX(1-q)^Xq^{N-X}$. Thus, given the data, the probability that the correct answer is “true” is
$$ \frac{\binom NXq^X(1-q)^{N-X}}{\binom NXq^X(1-q)^{N-X}+\binom NX(1-q)^Xq^{N-X}}=\frac{q^{2X}(1-q)^N}{q^{2X}(1-q)^N+(1-q)^{2X}q^N}\;. $$
To show how it would work with a prior, let’s assume a uniform prior for $q$ over $[0,1]$, and that you ask only one test question. Say $k$ agents correctly answered the test question (so $p\%=\frac kN$). The probability for this, given $q$, is $\binom Nkq^k(1-q)^{N-k}$, so the posterior probability density for $q$ is
$$ \frac{\binom Nkq^k(1-q)^{N-k}}{\int_0^1\binom Nkx^k(1-x)^{N-k}\mathrm dx}=(N+1)\binom Nkq^k(1-q)^{N-k}\;. $$
Then the probability for $X$ agents to answer “true” and $N-X$ to answer “false” is
$$ \int_0^1(N+1)\binom Nkq^k(1-q)^{N-k}\binom NXq^X(1-q)^{N-X}\mathrm dq=\frac{(N+1)\binom Nk\binom NX}{(2N+1)\binom{2N}{X+k}} $$
if the correct answer is “true” and
$$ \int_0^1(N+1)\binom Nkq^k(1-q)^{N-k}\binom NX(1-q)^Xq^{N-X}\mathrm dq=\frac{(N+1)\binom Nk\binom NX}{(2N+1)\binom{2N}{N+X-k}} $$
if the correct answer is “false”, so the probability that the correct answer is “true” is
$$ \frac{\frac1{\binom{2N}{X+k}}}{\frac1{\binom{2N}{X+k}}+\frac1{\binom{2N}{N+X-k}}}=\frac{(X+k)!(2N-X-k)!}{(X+k)!(2N-X-k)!+(N+X-k)!(N+k-x)!}\;. $$