I have a variable in my data which contains discrete values which have no canonical order, e.g. Apple, Orange, Pear.
These values appear with a certain frequency in my base sample. I have a subset of my sample which contains the same variable, and I would like to provide a measure of the similarity of the Fruit variable between the subset and the overall sample.
For continuous variables I use the z-stat and Kolmogorov-Smirnov, and I am looking for something equivalent for my Fruit variable.
I have considered ordering the values in the original sample by their frequency of occurrence and faking a CDF and using K-S, but that feels like a hack. Well, it would be a hack...
I could also invent something that takes a weighted difference of the populations, but I would rather use a conventional statistic if such a thing exists.
For categorical data, you can use the multinomial test with the null hypothesis parameters set to the base sample frequencies.
It will tell you how "unusual" your sample is if we assume it were drawn from the base sample.