Distinguishing between unimodal and bimodal normal data

Question

Distinguishing between unimodal and bimodal normal data

3k Views Asked by Bumbble Comm At 05 May 2026 - 9:40

I have a large number of data sets that have either a unimodal normal distribution or a bimodal normal distribution. I'm not a statistician by any means, so I'm quite limited in my experience.

For the bimodal data sets, I have implemented (through a library) the Expectation-Maximization method for identifying the distributions of the two constituents and that works great. The only problem is, when the algorithm is fed a unimodel distribution, it doesn't really converge to just one distribution (or two very close ones). The number I'm mostly interested in, is the delta between the two means, and so in the case of a unimodal distribution, the delta-mean is overestimated.

So my question is: Is there a good test for identifying bimodal distributions? Sometimes the means are quite close to one another, in the sense that there is no "dip" between the two means.

Example images:

Bimodal: it works great in this case, identifying the two peaks

Unimodal: it identifies two peaks that aren't really there, I would wish the two means were (much) closer

Close Bimodal: it identifies this one just fine, I would not want this to be considered unimodal

Original Q&A

There are 2 best solutions below

Bumbble Comm On 05 May 2016 - 2:32

Normal distributions are always unimodal. It looks like by "bimodal normal data" you mean a Gaussian Mixture Model (GMM) with 2 components (i.e. the PDF of the data is a convex combination of Gaussian PDF's).

There are several techniques to estimate the number of components in a Gaussian Mixture Model -- Bayesian Information Criterion, Akiakie Information Criterion, Calinski-Harabasz, etc. which you can find by searching for "Model Selection for Gaussian Mixture Models".

Many libraries have model selection for GMM's built in. scikit-learn's GMM has BIC and AIC built in, for example (and the documentation demos the use of BIC), and other generalizations of GMM's (eg. DPGMM's).

**Bumbble Comm** · Accepted Answer

The bimodal data you have may be a mixture of normal components, but that mixture is not normal. Thus it may be enough for you to use ordinary tests of normality. Most software packages incorporate such tests. I will show you briefly how R statistical software can be used for this purpose.

First, some bimodal data. I am simulating a mixture of two normal distributions $Norm(\mu = 100, \sigma = 15$ and $Norm(\mu = 150, \sigma = 20.$

 mix = sample(0:1, 1000, rep=T)
 x = mix*rnorm(500, 100, 15) + (1-mix)*rnorm(500, 150, 20)
 mean(x);  sd(x)
 ## 126.7677
 ## 30.89346

Usually when the means of the components are more than a few standard deviations apart, bimodality is apparent in a histogram.

A popular test for normality is the Shapiro-Wilk test. The null hypothesis is that the sample is normal and the alternative is that the data are not from a normal distribution.

 shapiro.test(x)

 ##        Shapiro-Wilk normality test

 ## data:  x 
 ## W = 0.9703, p-value = 2.043e-13

A P-value below .05 is often taken as an indication that the data are nor consistent with sampling from a normal population. Here the P-value is very much smaller than .05.

One difficulty in your case, that for large datasets this test will 'reject' the null hypothesis even for data that are only slightly nonnormal. For example, if the data are mainly from one normal distribution and occasionally from another very similar normal distribution, you may not see the non-normality in a histogram, and data may be 'near enough' to normal for practical purposes, and the Shapiro-Wilk test may still detect the slight departure from normal.

Here is such an example:

 mix = sample(0:1, 1000, rep=T, prob=c(.20, .80))
 y = mix*rnorm(500, 100, 15) + (1-mix)*rnorm(500, 115, 20)
 shapiro.test(y)

 ##   Shapiro-Wilk normality test

 ## data:  y 
 ## W = 0.9966, p-value = 0.02869

It is unclear what criterion you have in mind for bimodality. Your last example certainly looks borderline. If bimodality is just what you $recognize\; by\; eye$ as bimodal, you may have trouble finding a formal test that emulates your eyeball.

One simple possibility is that your examples of bimodal data have negative kurosis (normal curves have 0 kurtosis). Perhaps you could compute the kurtosis of your data and see if there is a value that allows you to discriminate as you like.

If you want to refine your question, giving an explanation why it it important to you to detect 'bimodality' of a certain degree, that might be helpful. For that revision you might get a better answer on 'crossvalidated' our sister statistics site.

Distinguishing between unimodal and bimodal normal data

There are 2 best solutions below

Related Questions in STATISTICS

Related Questions in ALGORITHMS

Related Questions in NORMAL-DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions