Expected number of $1$s in a "similarity matrix"

41 Views Asked by At

There are $p$ points given (the $n$th point is called $p_n$). These points are split into exactly $n$ clusters, where $1\le n\le p$. Of course there are many possibilities for the cluster sizes.

Now there is a matrix $M$ (size: $p\times p$) for which this statement is true: $M_{x,y}$ is $1$ if $p_x$ is in the same cluster as $p_y$, otherwise the value is $0$.

My question is now: What is the expected number of $1$s in the matrix $M$ given the facts that there are $p$ points and $n$ clusters?

Thank you

1

There are 1 best solutions below

0
On

You have to say more about the way points are assigned to clusters. Let $S_k$ be the size of the $k$-th cluster. Obviously $\sum S_k = M$, and you require $S_k\ge 1$. Your number of $1$s is $\sum_k S_k^2$, and can be as large as $ (M-(n-1))^2+n-1$ (if you assign $n-1$ of your points to singleton clusters and the rest to the same, big, cluster) and approximately as small as $n(M/n)^2 = M^2/n$ if you make the clusters have equal size. (This last is a lower bound which can be achieved only if $M$ is divisible by $n$.)