test if observed distribution of labels is significantly different from background

23 Views Asked by Bumbble Comm At 02 Apr 2026 - 1:24

Apologies if this is a very basic question, but I'm finding it hard to answer without a bit of help.

I have a set of labels (n=7162) that are classified in different categories (n=30). This is the background distribution of labels and looks like this:

Then I have a sample in which not all the labels (and classes) are observed (numbers on top of each bar indicate the percentage of observed labels wrt the background labels in the class, absent numbers means all the labels for the class have been observed -- ie, 100%):

What I would like to understand is:

is the distribution of observed labels per class significantly different from the background distribution of labels in classes? (ie, is the sample biased towards or against some classes)
is any class significantly under- or over-represented in the sample? (for example, class 'X' contains 2275 possible labels, but only 1769 (77.4%) were observed, is that significant? what about class 'F' that contained only 2 possible labels but none were observed?).

This is the data used:

category,background,sample
A,7,7.0
B,383,318.0
C,53,53.0
D,19,18.0
E,26,26.0
F,2,0.0
G,234,231.0
H,94,87.0
I,4,4.0
J,76,76.0
K,180,175.0
L,177,177.0
M,10,10.0
N,553,538.0
O,1171,1082.0
P,252,210.0
Q,48,36.0
R,130,130.0
S,79,76.0
T,428,384.0
U,1,1.0
V,6,6.0
W,12,6.0
X,2275,1760.0
Y,510,504.0
Z,11,9.0
AA,207,202.0
AB,7,3.0
AC,24,24.0
AD,183,178.0

Total labels in categories: 7162

Observed labels in categories: 6331

Original Q&A

test if observed distribution of labels is significantly different from background

Related Questions in STATISTICS

Related Questions in HYPOTHESIS-TESTING

Trending Questions

Popular # Hahtags

Popular Questions