Consider the following function: $$ P = \frac{1}{n(n - 1)} \sum_{j=1}^k n_{j} (n_{j} - 1) $$ where for $n = \sum_{j=1}^k n_{j}$.
Intuitively, this function measure concentration of values the vector $(n_1, ..., n_k)$. Take the edge cases:
Values concentrated: $\exists j, n_j = n$ (in other words $\forall i \neq j, n_i = 0$) $\Rightarrow P = 1.0$
Least concentration (uniformly distribution): $n_1 = n_2 = ... = n_k \Rightarrow P = 0.0$
The formula is relatively simple-looking, but it's not obvious to interpret. I am looking for better/simpler ways to explain this formula to people who are not mathematically sophisticated (say in my psychology department, who know the basics of statistics like mean and variance). Would appreciate if you provide me with any suggestions on this.
Supppose we have sets $A_1,...A_k$ which are pairvise disjunct and for each $j$ let $n_j = |A_j|$.
We choose randomly two elements from $A= A_1\cup A_2\cup...\cup A_k$. Then the probability that they are from the same set is the expresion you wrote.