Interpreting an agreement function

67 Views Asked by At

Consider the following function: $$ P = \frac{1}{n(n - 1)} \sum_{j=1}^k n_{j} (n_{j} - 1) $$ where for $n = \sum_{j=1}^k n_{j}$.

Intuitively, this function measure concentration of values the vector $(n_1, ..., n_k)$. Take the edge cases:

  • Values concentrated: $\exists j, n_j = n$ (in other words $\forall i \neq j, n_i = 0$) $\Rightarrow P = 1.0$

  • Least concentration (uniformly distribution): $n_1 = n_2 = ... = n_k \Rightarrow P = 0.0$

The formula is relatively simple-looking, but it's not obvious to interpret. I am looking for better/simpler ways to explain this formula to people who are not mathematically sophisticated (say in my psychology department, who know the basics of statistics like mean and variance). Would appreciate if you provide me with any suggestions on this.

3

There are 3 best solutions below

0
On BEST ANSWER

Supppose we have sets $A_1,...A_k$ which are pairvise disjunct and for each $j$ let $n_j = |A_j|$.

We choose randomly two elements from $A= A_1\cup A_2\cup...\cup A_k$. Then the probability that they are from the same set is the expresion you wrote.

0
On

You can write your $P$ in terms of $C$, and vice versa, where $C=\sum_{i=1}^k (n_i-n/k)^2/(n/k)$ is the conventional chi-square statistic for testing if the vector of $n_i$ values comes from the all-categories-equally-likely flat multinomial model. So $P$ tells you no more or less than $C$ does, but your audience might be familiar with the chi-square statistic as commonly used to measure unevenness of category counts.

0
On

Probability to take the same element $j$ two times: $$ p_j=\left(\frac{n_j}{n}\right)\left(\frac{n_j-1}{n-1}\right)=\frac{n_j(n_j-1)}{n(n-1)} $$ Now we do not care about $j$ anymore and therefore marginalize: $$ p=\sum_j p_j=\frac{1}{n(n-1)}\sum_j n_j(n_j-1) $$

which is the probability to take the same element two times (independently of $j$)