I'm studying an article about similarity between trees of classifications procedures, and I cannot understand why the author used a term in his formula.
$$\frac{1}{M(M-1)} \sum_{1 \le i < j \le M} d(A_i, A_j)$$
From his words:
An heuristic motivation for this expression comes from its relationship with the alternate expression for the variance of a set of observations {X1, . . . , Xn} given, for example, in Serfling (1980):
$$ n^{-1} \sum_{i=1}^{n} (X_i - \bar{X})^2 = \frac{1}{n(n-1)} \sum_{1 \le i < j \le n} (X_i, X_j)^2$$
I could understand the sum, it's a sum of all the distances, in the general case, and from my case it's the dissimilarity (a type of distance).
However why it's used the $\frac{1}{n(n-1)}$ ? I would like a easy/intuitive explanation.
(I know it's probably some of the most easiest things in the world)