We have a movie review site. Every new movie has 80% chance to be good and 20% chance to be bad.
There is an arbitrarily large number of reviewers. They all have good taste but also individual variations. There is an 80% chance that a reviewer rates a movie correctly and 20% chance he rates the movie incorrectly (sees a good movie as bad and a bad movie as good).
They goal of all reviewers is to conclusively sort all movies into good or bad. They can do it in two ways.
Every reviewer leaves a review according to his personal feelings about the movie. Even though there will always be wrong reviews, with an arbitrarily large number of reviewers the split between good and bad reviews will approach 80\20 reflecting the quality of the movie. The probability of misjudging a movie by majority vote is arbitrarily close to zero.
Every reviewer leaves a review according to his own best subjective probability estimate about if the movie is good or bad. Here problems begin.
Suppose a good movie is released, but the first three reviews are negative. This has a 0.2x0.2x0.2 = 0.8% probability.
The next reviewer correctly sees the movie as good. But given that the first three reviews are bad, the total probability estimate by the Bayes theorem means that the movie is more likely to be bad than good. The fourth reviewer leaves a negative review. From then, it doesn't matter what the next reviewers see -- the correct review to leave is completely determined by the previous reviews, regardless of whether the movie is good or bad, so the Bayesian evidence of all further reviews is zero. The movie remains misjudged forever.
So.
The first method gives correct results with ~100% probability.
The second method gives incorrect results in at least 0.8% of cases (and probably more, since there are more ways for this to go wrong).
Why is using all available Bayesian evidence leave you with less accuracy than leaving most of the evidence on the table and only going with the first piece of evidence you see?
To summarize the discussion in the comments:
There is no paradox here, the second method simply discards information so of course it is less reliable.
To see this, consider the fourth reviewer, having seen three Bad reviews. If the fourth reviewer likes the film they would do the obvious Bayesian computation and determine that the probability that the film was actually good is $$\frac {.8\times.2^3\times .8}{.2\times .8^3\times .2+.8\times.2^3\times .8}\approx .2$$ and therefore switch their vote to Bad. And, of course, if their honest vote were Bad then seeing three Bad votes in a row doesn't inspire them to switch to Good. Thus it makes no difference what the fourth reviewer actually thinks, they will vote Bad in all cases.
And, of course, it only gets more extreme after that.
Thus, this scheme rejects all the information present in the later reviewers personal judgment. Indeed, the later reviewers are just doing a simple bit of algebra that we could have done without their help.
As a variant: if each reviewer considered their own vote, and reported the Bayesian posterior (using the last reported Bayesian posterior as their Bayesian prior) then the method would be fine. Indeed, we could just reconstruct the sequence of votes this way.
As a general matter: any scheme which aims to get at the truth in an uncertain situation can sometimes lead to the wrong conclusion. The only antidote to that is to gather more evidence. If your scheme is a good one (and Bayes is quite good) then, eventually, the preponderance of evidence will allow you to reverse a false conclusion imposed by an unlucky streak near the start.