Kendall tau calculation

8.8k Views Asked by At

Can someone explain how the Kendall tau works? I can't seem to find a good explaination/tutorial/example. I've been running corr(x,y,'kendall') from Matlab's Statistics Toolbox, but other than some output, it doesn't give me any good intuition. I've been stepping through with the debugger, but it gets confusing at times. I know that for two matrices, x and y must have the same number of rows, but that's about it. Is there a simple example that will illuminate what the Kendall p-value and Kendall tau really are?

3

There are 3 best solutions below

0
On

I can't help with specific details on the Kendall tau rank correlation coefficient, but with respect to the Matlab implementation details, type edit corr in your Matlab command window. Below the help you'll see these

References:
   [1] Gibbons, J.D. (1985) Nonparametric Statistical Inference,
       2nd ed., M. Dekker.
   [2] Hollander, M. and D.A. Wolfe (1973) Nonparametric Statistical
       Methods, Wiley.
   [3] Kendall, M.G. (1970) Rank Correlation Methods, Griffin.
   [4] Best, D.J. and D.E. Roberts (1975) "Algorithm AS 89: The Upper
       Tail Probabilities of Spearman's rho", Applied Statistics,
       24:377-379.

and further down you'll find these explanations, which might be helpful

Spearman's rho is equivalent to the linear (Pearson's) correlation
between ranks.  Kendall's tau is equivalent to the linear correlation
between the "concordances" sign(x(i)-x(j))*sign(y(i)-y(j)), i<j, with
an adjustment for ties.  This is often referred to as tau-b.

Kendall's tau-b is identical to the standard tau (or tau-a) when there
are no ties.  However, tau-b includes an adjustment for ties in the
normalizing constant.

Spearman's rho and Kendall's tau are discrete-valued statistics, and
their distributions have positive probability at 1 and -1.  For small
sample sizes, CORR uses the exact permutation distributions, and thus,
the on-diagonal p-values from CORR(X,X) in those cases.

When there are ties in the data, the null distribution of Spearman's
rho and Kendall's tau may not be symmetric.  Computing a two-tailed
p-value in such cases is not well-defined.  CORR computes p-values for
the two-tailed test by doubling the smaller of the one-tailed p-values.

And of course all of the code is available for perusing...

0
On

Kendall's $\tau$ gives you a nonparameteric correlation measure between two variables $x$ and $y$. Nonparametric here means that the association between $x$ and $y$ doesn't have to be, say, linear (like with Pearson's correlation coefficient).

It is given by (see the wikipedia link above) $$\tau = \frac{(\text{number of concordant pairs}) - (\text{number of discordant pairs})}{\frac{1}{2} n (n-1) }.$$ Imagine you have $n$ observations of data, which are tuples $x_i,y_i$ of the two variables. Take any two of the possible tuples (pairs), and if "both values go in the same direction", then they are concordant. Formally, for two observations $i,j$, the pairs are concordant if either $x_i>x_j$ and $y_i>y_j$, or if $x_i<x_j$ and $y_i<y_j$. The numerator of the above equation just subtracts the number of concordant pairs from the number of discordant pairs (when $x$ and $y$ "go in a different direction"). The denominator is just a normalization, so that $\tau$ is always in the interval $[-1,1]$, like the linear correlation coefficient.

The intuition in short: Kendall's $\tau$ is just the number of observation pairs where both variables go in the same direction, minus the number of observations pairs where both variables go in the opposite direction, divided by the number of possible pairs. Hence, the more both variables go in the same direction, the higher $\tau$.

0
On

Came across this post and after using Kendall's $\tau$ a lot in the past few months, here is a resource that helped me.

Spearman $\rho$ or Pearson's R measure linear correlation, but Kendall's Tau is much more like a $\chi^2$ test in that it just measures the strength of a relationship. It kind of measure monotonicity; if the values of x and y "go" in the same direction more often then "going" in different directions, expect a high $\tau$.

Once you have a $\tau$ and a p-value, you can think about your data in the following way.

$P_c/P_d = (\tau+1)(\tau-1)$

So if you have $\tau=0.6$, then randomly selected pairs $(x_i, y_i), (x_j, y_j)$ are 4 time more likely to be concordant than discordant, or 4 times more likely to "go" together than "go" apart. The p-value just gives a measure (in the frequentist perspective) how likely your $\tau$ is to be a false positive.