What is the best way to measure similarity between two histograms

Question

What is the best way to measure similarity between two histograms

273 Views Asked by Bumbble Comm At 02 Apr 2026 - 8:37

What is the best way to measure the similarity between two histograms? For example, in the following pictures, how can I tell if the distributions are similar enough?

I now have 2 lists of values, and I've normalized them to fall between 0 and 1.

I've tried multiple statistical tests including Pearson, Spearman, and Kolmogorov-Smirnov and it looks like Spearman is the best test to use. However, the Spearman is not consistent all the time, it sometimes gives me a high "s" value but the shapes of the distribution are not similar enough. In theory, a higher (positive) "s" means the values are strongly correlated. Am I even on the right track using correlation to measure similarities? Are there any other tests that can be used to do this?

corr0.89_1 = [10.7441, 8.9568, 11.0018, 9.29803, 8.92043, 8.78492, 13.5503, 6.74334, 6.14392, 5.75271, 28.851, 26.8173, 6.52642, 6.56071, 5.7169, 7.3095, 6.36379, 5.74984, 7.10243, 5.87364, 11.2827, 2.94984, 2.84836, 22.8551, 24.8372, 10.6571, 9.7891, 11.3021, 5.89328, 10.1372, 24.0525, 3.49401, 2.16394, 11.2825, 11.6859, 7.9918, 13.2742, 11.1194, 2.49575, 16.733, 27.918, 3.27145, 14.3346, 20.4979, 13.0808, 13.6282, 14.1474, 25.0414, 8.06032, 280.803, 22.0135, 18.2725, 12.9601, 7.64593]

corr0.89_2 = [9.14167, 6.30561, 7.7479, 8.05475, 7.14188, 7.62774, 9.18454, 1.48037, 1.5912, 2.07612, 21.302, 22.7082, 2.67858, 2.25732, 1.74804, 2.04191, 2.03539, 1.78882, 2.57568, 1.6512, 8.62473, 2.99236, 3.13484, 13.014, 16.2016, 9.17172, 7.97379, 9.12539, 4.8298, 8.42477, 16.0582, 2.68252, 1.92429, 5.6744, 4.70516, 5.20169, 11.0945, 9.10398, 2.68375, 13.6299, 17.3429, 3.19181, 9.41762, 12.2805, 9.92005, 11.5985, 11.7269, 17.4832, 6.66996, 60.8647, 13.9616, 14.9909, 10.4712, 6.13891]

corr1.0_1 = [0.00905783, 0, 0.0075662, 0, 0.00583336, 0, 0.0101741, 0.00617847, 0.00474902, 0, 0.0243326, 0.0300779, 0.0062144, 0.00581433, 0, 0.00712057, 0.00703617, 0, 0, 0, 0.0101258, 0.00844863, 0.014988, 0.0248553, 0, 0.00680134, 0.00762619, 0.00701553, 0.0106525, 0.00425654, 0.0160354, 0, 0, 0, 0, 0, 0.0110151, 0.00874536, 0, 0.0182528, 0.0291939, 0, 0.0426431, 0.0141304, 0.0139076, 0.0182638, 0.0177141, 0.021119, 0, 12.3977, 0.0121492, 0.016053, 0.0148212, 0.00767271]

corr1.0_2 = [0.128504, 0.119172, 0.0403692, 0.148327, 0.132162, 0.0366454, 0.139191, 0.0464803, 0.0235099, 0.0333772, 0.0427275, 0.0510047, 0.0278845, 0.0202271, 0.0918039, 0.129276, 0.0266636, 0.0399166, 0.693549, 0.131911, 0.134276, 0, 0, 0.248764, 0.17239, 0.0450586, 0.0932654, 0.0671463, 0.239433, 0.102551, 0.378029, 0.031807, 0.0181028, 0.107356, 0.145449, 0.0735069, 0.788291, 0.496569, 0.0209139, 0.0983066, 0.0530917, 0.0755444, 0.25198, 0.550969, 0.172254, 0.104131, 0.113987, 0.548016, 0.302768, 126.145, 0.886364, 0.107977, 0.4037, 1.23249]

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

First of all you need to define unambiguously what you mean by similarity for the problem you are solving, otherwise each of these coefficients will correctly measure similarity as per their own definition. Let us suppose you define similarity as how similar is the shape of one distribution to the other. In this case, if you resize both histograms to the same scale and place one on top of the other, in case of perfect similarity, each will completely overlap the other; if not, one or both will have some un-lapped parts.

For this definition, Spearman's coefficient will not work since it only measure the correlation between the rank order of the class intervals. We can have two distributions in which the rank order of the intervals are identical but the height of the interval in the two distribution are vastly different. In theory, you can have infinitely many distribution whose Spearman's coefficient is 1 but their shapes is all different. On their other hand, if the Pearson coefficient is 1 then their shape will be exactly be the same and there will be no un-lapped parts.

For more rigorous measures, refer to this link: Similarity measure between multiple distributions

What is the best way to measure similarity between two histograms

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in CORRELATION

Trending Questions

Popular # Hahtags

Popular Questions