How to test if a sequence of items was selected from a set with uniform distribution?

176 Views Asked by At

Let's assume there is a fixed set of items from which items are supposedly selected with PRNG source with no item weights, i.e.,

items[prng.nextInt(items.length)]

Many items are sampled this way, about samples.length == 10 * items.length.

If I tally the occurrences of items, some occur more frequently than the average (1.0/items.length) and some occur relatively less frequently.

I have a suspicion that the items may have a weight associated with them, making it so that some items are more likely to be selected and others are less likely by default.

How can I test the sample sequence to detect such hidden bias?

(Background: I play a popular online game where types of legendary items are supposed to be dropped by mobs in an uniform fashion, yet the data collected of thousands of drops by me and some other players appear indicate not just non-uniformity, but per-player preference. For example, I use a flaming sword and I seem to get more flaming swords than frost swords yet another player gets crossbows and barely any kinds of swords, both inline with our playstyles.)

1

There are 1 best solutions below

4
On BEST ANSWER

I assume that there is only a finite number of items (or type of items) so one can use Pearson's chi-squared test (you can read about it here: https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test). Basically if you have $k$ items and you suspect that probability of getting each one is $\frac{1}{k}$ and yet the data tells you that the $i$-th item appeared $n_i$ times you can compute the number ($N=n_1+n_2+...+n_k$): $$\chi^2=\frac{k}{N}\left((n_1-\frac{N}{k})^2+(n_2-\frac{N}{k})^2+...+(n_k-\frac{N}{k})^2\right)$$ It should follow a Chi-squared distribution with $k-1$ degrees of freedom. Basically if you observe that this number is much higher than $k-1$ there is a strong suspicion that the data is not uniformly distributed. If you want to know exactly how likely it is you can use WolframAlfa/Mathematica function:
$$\textrm{CDF[ChiSquareDistribution[$k-1$],$\chi^2$]}$$ Where you substitute both the number of items and the value computed above. You should get a value from $[0,1]$ - the higher it is, the stronger is the suspicion that the values is not normally distribution. You can use the $0.95$ or $0.99$ as your cutoff - if the value will be greater than it you can conclude that the data is probably not uniformly distributed.