First of all, can you please check my understanding.
With 4 pairs in Spearman, there are 4! ways of arranging the data. Only 1 will have perfect score of 1 and only 1 will have a perfect score of -1. As we approach lower ranks, there are more possibilities our score happened by chance. Is this correct?
What do the tables actually show in layman's terms
Secondly, when looking to tables such as http://web.anglia.ac.uk/numbers/biostatistics/spearman/local_folder/critical_values.html https://mrphillipsibgeog.wikispaces.com/file/view/geog_SpearmanExplained.pdf/410697184/geog_SpearmanExplained.pdf We have different values - Why is this.
Thank you.
Normal approximations. In an earlier Comment (now deleted) I speculated that one reason for differences among tables might be that some use a normal approximation and some do not. In checking around, it looks as if this may be true in some cases, but it seems that most tables do not use normal approximations until $n$ is near 30, or greater. The Wikipedia article on 'Spearman rank correlation' gives a readable account of normal approximations; some computation based on, but beyond, the formulas there would be required to get a table of critical values for various significance levels.
Simulation. I also speculated in my former Comment that some tables might be based on simulation results with too few iterations. Again, this might be true in some cases, but in simulations of my own I found that for small $n$ there may be several legitimate interpretations of simulation results. I discuss this difficulty below.
Permutation distribution of $r_S$ under the null hypothesis of independence. If we knew the distribution of $r_S$ for data $(X_i, Y_i),$ where $X$ and $Y$ are independent, then the critical values of a test of $H_0: \rho_S = 0$ against the two-sided alternative at the 10% level would be found by cutting 5% from each tail of the null distribution. Several tables I looked at have the critical values $\pm 0.564.$
In order to simulate the null distribution when $n = 10,$ we take two samples of size 10 from $Unif(0, 1),$ find their ranks, and then find the Pearson correlation of the two vectors of ranks. If we do this 100,000 times, we can get a good approximation of the null distribution of $r_S$ and then the critical value. (This is the 'permutation procedure' mentioned in the Wikipedia article.) The code for this simulation in R statistical software is shown below.
These simulated critical values $\pm 0.552$ are in the same ballpark as the tabled values $\pm 0.564,$ but the tabled values are not within the expected margin of simulation error. What's wrong?
Compensating for discreteness. The difficulty stems from the fact that the ranks are discrete. And this discreteness is inherited by $r_S.$ The R code
length(unique(r.s))returns162. That is, among the $m = 100,000$ simulated values of $r_S$ there are only 162 distinct values. This makes it difficult to pin down precise critical values.So if we were to use 0.552 to cut "5%" from the upper tail of the null distribution, we would actually cut somewhere between 5.3% and 4.9%. To be sure that less than 5% lies above the critical value, we have to go up the next higher one of the 162 unique values of $r_S,$ which is 0.564:
So, because of the discreteness of the null distribution of $r_S,$ we cannot test at exactly the $2(.05) = 10$% level. Instead, we need to use critical values $\pm 0.564$ as in the printed table, testing instead at the $2(0.049) = 9.8$% level.
Similarly, for a two-sided test at the 2% level, the table gives critical values $\pm 0.745,$ while the simulation gives $\pm 0.721.$ The exact significance level using $\pm 0.745$ is $0.74$%, not $2.0$%.
Finally, I believe that the main discrepancies among printed critical values is due to rounding and discreteness, and would not disappear with larger-scale simulations.
The graph below shows the simulated null distribution of $r_S$ for $n=10.$ More iterations would make a slightly smoother graph, but would not change the critical values. The vertical red lines show approximate critical values for a two-sided test at the 10% level.