Probability distribution of data and likelihood in Bayesian and Frequentive Statistics

Question

Probability distribution of data and likelihood in Bayesian and Frequentive Statistics

238 Views Asked by Bumbble Comm At 31 Mar 2026 - 7:13

I have recently been studying Bayesian as well as Frequentive Statistics (mostly null hypothesis significance testing) and am confused as to the meaning of the distribution of the likelihood and observed data in both. According to my understanding, Bayesian school of though treats the distribution of the sampled data to be same as that of the likelihood of th data conditioned over the hypothesis (For example if data follows a normal distribution then the likelihood function is also normal with same mean and variance) whereas in the frequentive school, the likelihood or null distribution is basically the distribution followed by the test statistic which is different from the distribution followed by the data. Could someone explain whether my understanding is correct?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2015-08-08 21:38:25

Your terminology is not exactly on the right track. Here is an example to show the likelihood function and how it is used in frequentist and Bayesian estimation.

Bayesian and Frequentist Inference for Bernoulli Data

Suppose we have a random sample $X_1, \dots, X_n$ from a certain population. In both frequentist and Bayesian approaches, the likelihood has the same form as the JOINT density function of $X_1, \dots, X_n$. Not the density function of the population (or of a single observation).

$Likelihood.$ In its role as a likelihood function, the joint density is viewed as a function of (one or more) unknown parameters, with observed data values $x_1, \dots, x_n$ (often now written in lower case) regarded as fixed known values. Another convention for the likelihood function is to say that it is defined only up to a constant multiple.

For example, suppose the $X_i$ are Bernoulli with parameter $\theta$ (success probability). Then the joint 'density' function is

$$f(\text{x}|\theta) = \prod_{i=1}^n \theta^{X_i}(1 - \theta)^{1 - X_i} = \theta^S(1-\theta)^{n-S},$$ where $S = \sum_{i=1}^n X_i.$ We often write $f(\text{x}|\theta) \propto \theta^S(1-\theta)^{n-S},$ where the proportionality symbol $\propto$ indicates that we may leave out factors that do not involve the parameter of interest.

Then $S \sim Binom(n, \theta).$ Specifically, if $n = 1000,$ and we observe $x_i$ with total $s = 620,$ then $f(s|\theta) = {1000 \choose 620}\theta^{620}(1-\theta)^{380} \propto \theta^{620}(1-\theta)^{380}.$

$Frequentist.$ From a frequentist point of view, we find the maximum likelihood estimator $\hat \theta = s/n = 620/1000 = 0.62.$ (The point at which $f(s|\theta$ reaches its maximum is $s/n$.)

Various confidence intervals have been proposed. A common 95% CI (based on normal and other approximations and not so good for small $n$) is $\hat \theta \pm 1.96\sqrt{\hat \theta(1-\hat \theta)/n}.$ For the data of our example, this is $(0.590, 0.650).$

The usual frequentist interpretations are that $\hat\theta$ has good properties as an estimator of $\theta$: it is unbiased because $E(\hat \theta) = \theta$ and it has minimum variance among unbiased estimators. The CI is said to have been produced by a procedure by which in 95% of such experiments over the long run, the CI will cover the fixed, but unknown, constant $\theta.$

$Bayesian.$ A Bayesian analysis view $\theta$ as a random variable with a prior distribution. Because $\theta$ must lie in the unit interval, a member of the Beta family of distributions is a common choice for the prior distribution of $\theta:$ perhaps $Beta(1/2, 1/2)$ or $Beta(1,1)$ if we have no previous experience or strong personal opinion about $\theta;$ perhaps something like $Beta(330, 270)$ if we think $\theta$ is likely to lie between 0.51 and 0.59, based on previous experiments, polls or hunches about the population.

Then Bayes Theorem comes into play. It states that the posterior distribution of $\theta$ based on the prior distribution and the likelihood function is $$f(\theta|s) \propto f(\theta) \times f(s|\theta),$$

where the (last suggested) prior distribution has 'kernel' $f(\theta) \propto \theta^{330-1}(1 - \theta^{270-1})$ and the likelihood (from above) is $f(s|\theta) \propto \theta^{620}(1-\theta)^{380}.$

By the 'kernel' of a distribution we mean the factors that involve the parameter of concern. Thus the kernel of the posterior distribution is $f(\theta|x) \propto \theta^{950-1}(1-\theta)^{650-1}$ and the posterior distribution is obviously $Beta(950, 650).$ One possible 95% Bayesian probability interval for $\theta$ cuts 2.5% from each tail of this distribution to give $(0.570, 0.618),$ which can be obtained from R code qbeta(c(.025,.975), 950, 650). A Bayesian point estimate might be the mean of the posterior distribution $950/(950+650) = 0.594$ (or its mode, which is essentially the same in this case).

The Bayesian interpretation of these results is that--for the particular survey or experiment at hand--the most likely value of $\theta$ is 59.4% and that there is 95% probability the true value of $\theta$ is between 57.0% and 61.8%. (Of course these statements are subject to believing the prior distribution.)

If our prior distribution had been $Beta(1,1) = Unif(0,1),$ (a 'non-informative' prior) the 95% probability interval would have been $(0.589, 0.650)$, numerically similar to the frequentist interval above.

Notes: (1) For simplicity, I have used 'density' for both discrete and continuous distributions; for former, I might have used 'point mass function'. (2) Bayesian statisticians often use $p$ instead of $f$ for density and likelihood functions. functions. (3) Finding the posterior distribution was very simple in the example given, because of the algebraic compatibility of the prior and likelihood. In such cases we say the prior and likelihood are 'conjugate'. Finding probability intervals for some posterior distributions requires intensive computation.

Probability distribution of data and likelihood in Bayesian and Frequentive Statistics

There are 1 best solutions below

Bayesian and Frequentist Inference for Bernoulli Data

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in BAYESIAN

Trending Questions

Popular # Hahtags

Popular Questions