Is there a better alternative to using run tables for testing for randomness in a sample?
Currently, for statistical process control charting, I am testing the number of runs identified in a sample (essentially the number of times the sample crosses the average, plus one) against a standard table of significantly high or low run counts given a particular sample size:
n Low High
10 2 9
11 2 9
12 2 11
13 2 11
14 3 13
. . .
46 15 32
. . .
50 17 34
For example, if a sample of size 11 has less than 2 runs or more than 9 runs, then it can be said that that sample exhibits special cause variation (i.e. is not random).
I'm trying to approach this programmatically and having some formulaic approach rather than attempting to lookup values in a finite table would be a huge help.
Well, this test apears to check for autocorrelation of the variable. Rather than using a table like this, I am aware that most places have rules that state the following as our of control:
9 adjacent samples that are all increasing or decreasing
14 adjacent samples that alternate between increasing and decreasing
This would check for simular cyclic/autocorrelated behavior in a way easy to check but it is obviously not identical.
I believe that the table could be approximated by assuming that the samples are not autocorrelated and therefore have an equal probabilty (50%) of crossing the mean each sample (besided the first). I can then calculate the predicted number of runs (r) compared to the number of samples (n). I am still defining number of runs as number of times crossing the mean +1. The probability of there being r runs is $$P(r,n) = \frac{(n-1)!}{(n-r)!(r-1)!2^{(n-1)}}$$
You then can calculate the l (lower limit) and u (upper limit) values by picking a threshold probability ($P_t$) and finding the largest l and smallest u such that:
$$P_t < \underset{j=1}{\sum^{l-1}}\frac{(n-1)!}{(n-j)!(j-1)!2^{(n-1)}}$$ $$P_t < \underset{k=u-1}{\sum^{n-1}}\frac{(n-1)!}{(n-k)!(k-1)!2^{(n-1)}}$$
I get simular numbers to the table above using $P_t=.005$ but it doesn't work very well for n=11 and n=13. My calculations (and intuition) indicate that those upper limits should be 10 and 12 respectively instead of 9 and 11. I think that the long term trend would be closer but need more data to confirm this.