How to compare likelihood of different distributions from data

Question

How to compare likelihood of different distributions from data

201 Views Asked by Bumbble Comm At 12 Apr 2026 - 8:51

Suppose you have a bunch of data for some observed quantity, and you have two hypotheses $f(x)$ and $g(x)$ for the probability distribution function, and suppose for simplicity that you know that one of these two models accurately represents your distribution. For example, maybe $f(x)$ is a gaussian distribution, whereas $g(x)$ is the sum of $f(x)$ plus another normal distribution with a different mean and smaller amplitude. This could correspond to measuring people's heights, where the null hypotheses would be that heights are normally distributed, and another hypotheses that heights have two peaks for men/women.

How would one go about comparing how likely the data comes from one model vs the other? What would you do if you do not go in with the inherent assumption that the single normal distribution is the "more likely" one, but your data is not significant enough to rule out a normal distribution completely?

This seems like a pretty fundamental question so I'm open to reference links.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-09-19 21:36:31

Judging Bimodality of Mixture Distributions.

A couple of examples to illustrate my comment:

First example. Means of the two normal components are several standard deviations apart.

set.seed(2020)
x1 = rnorm(100, 60, 2);  x2 = rnorm(100, 70, 2)
x = c(x1,x2)

Combined data fail a Shapiro-Wilk test of normality.

shapiro.test(x)

        Shapiro-Wilk normality test

data:  x
W = 0.93835, p-value = 1.651e-07

Histogram shows bimodality. (Also, tick marks along the horizontal axis, at individual observations, show two clusters.)

hist(x, prob=T, col="skyblue2", 
     main="Bimodal Mixture")
rug(x)

A normality plot of the 200 values is not anywhere near linear. The 'S' shape in this plot is typical of bimodal data.

plot(qqnorm(x))
  qqline(x, col="red", lwd=2)

Second example. The means of the two components are more nearly equal, and the standard deviations are a little different. [Very roughly speaking, perhaps with a little exaggeration of differences, this may be more like the mixture distribution of adult humans (men and women together).]

set.seed(919)
y1 = rnorm(100, 65, 2.5);  y2 = rnorm(100, 71, 3.5)
y = c(y1,y2)

Shapiro-Wilk test still detects non-normality (at 5% level).

shapiro.test(y)

        Shapiro-Wilk normality test

 data:  y
 W = 0.98212, p-value = 0.01201

However, a histogram does not clearly show bimodality.

hist(y, prob=T, col="skyblue2", 
     main="Mixture, but not Obviously Bimodal")

Judging normality by looking at historgrams is subjective. A histogram of the same data, but with different binning, might give a slightly different impression. If there are not too many observations, it may help to put tick marks along the horizontal axis to show individual observations. The histogram with more bins seems 'uneven', but I see no evidence of bimodal clustering among the tick marks.

hist(y, prob=T, col="skyblue2", br=20,
     main="Mixture, but not Obviously Bimodal")
rug(y)

Here a normal probability plot does not clearly show bimodality, but it is not as close to linear as one would expect for normal data. Roughly speaking, this may be the reason for failing the Shapiro-Wilk test.

plot(qqnorm(y))
  qqline(y, col="red", lwd=2)

How to compare likelihood of different distributions from data

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in REFERENCE-REQUEST

Related Questions in STATISTICAL-INFERENCE

Trending Questions

Popular # Hahtags

Popular Questions