What is the equivalent of a z-statistic for a textual variable containing discrete values?

24 Views Asked by Bumbble Comm At 28 Mar 2026 - 2:10

I have a variable in my data which contains discrete values which have no canonical order, e.g. Apple, Orange, Pear.

These values appear with a certain frequency in my base sample. I have a subset of my sample which contains the same variable, and I would like to provide a measure of the similarity of the Fruit variable between the subset and the overall sample.

For continuous variables I use the z-stat and Kolmogorov-Smirnov, and I am looking for something equivalent for my Fruit variable.

I have considered ordering the values in the original sample by their frequency of occurrence and faking a CDF and using K-S, but that feels like a hack. Well, it would be a hack...

I could also invent something that takes a weighted difference of the populations, but I would rather use a conventional statistic if such a thing exists.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 12 Sep 2022 - 10:51

For categorical data, you can use the multinomial test with the null hypothesis parameters set to the base sample frequencies.

It will tell you how "unusual" your sample is if we assume it were drawn from the base sample.

What is the equivalent of a z-statistic for a textual variable containing discrete values?

There are 1 best solutions below

Related Questions in DESCRIPTIVE-STATISTICS

Trending Questions

Popular # Hahtags

Popular Questions