How is Bayesian inference better than classical inference on small samples?

3.5k Views Asked by At

I have heard many times that Bayesian inference is better than classical inference for small samples.

Why is it the case? In what kind of problems can I see this difference?

1

There are 1 best solutions below

0
On BEST ANSWER

This is necessarily an opinion-based question. There is vast variety opinion among statisticians about the value of the Baysian approach to statistical analysis--ranging from dismissively hostile to mindlessly evangelical. I am mainly an applied statistician, so I am willing to use whatever works, and in some settings I feel the Bayesian approach works better.

More directly to your question, the assertion that Bayesian inference works better than classical frequentist inference probably arises from the fact that Bayesian inference allows prior experience and expert opinion to be used in formulating a prior distribution. Both the prior distribution and the data are used to get the final result. The prior information may be especially important when there is not much data.

Almost all experiments and surveys arise in a context where there is some prior knowledge. Otherwise, planning the details of the study would be difficult.

  • If items (specimens or subjects) are to be weighed, we usually know whether we will use a laboratory balance, a postal scale, a bathroom scale, or a truck scale to do the weighing.

  • A preliminary power computation to plan how many items to use requires educated guesses (a) of variability and (b) of how large an effect needs to be, in order to have practical importance.

  • Also, one may have some idea in advance whether data will be normal, and that is important information.

So even frequentist statisticians make some use of prior experience when planning an experiment. The main distinction is that a Bayesian study will begin with a formal statement of some of that prior knowledge in terms of a prior probability distribution.


Here is a specific (overly simple) example. Suppose we are seeking a preliminary guess whether a particular ballot proposition will win in an upcoming election. At this early juncture, suppose we have been able to poll only $n = 20$ randomly chosen people.

Suppose 4 are in favor and 16 are opposed. A traditional 95% frequentist confidence interval based on that information would be something like $(.05, .40)$ (with slight differences, depending whether a Wald or Agresti CI is used).

However, very few US elections result in such a one-sided outcome. Propositions do not get on the ballot without many signatures on petitions, and thus they must have at least minimal support. Even with limited knowledge of the specific circumstances, one might feel comfortable using the prior distribution $\mathsf{Beta}(3,3)$ for the proportion $\theta$ in favor. This distribution puts about 95% of its probability between $.15$ and $.85.$

This prior distribution together with the likelihood function for the data (4 in favor out of 20), gives a 95% Bayesian probability interval of $(0.12, 0.45).$ There is so little information in the "survey" of 20 people, that the prior distribution has had a major effect on the resulting interval estimate. I would feel more comfortable with the more optimistic Bayesian interval estimate than with the frequentist one. I have about as much faith in the general knowledge about US elections as in the information from 20 subjects.

Note: There is an important difference in interpretation between frequentist CI's and Bayesian PI's. The former 'confidence' intervals speak only of the process used to derive them, without directly addressing the survey at hand. The Bayesian 'Posterior probability' intervals directly address the current survey, partly because the prior distribution is considered relevant. However, this difference between frequentist and Bayesian inference is quite general and is not restricted to the small sample sizes of your question.