Counting sets by their connectedness

393 Views Asked by At

Let $U = \{u_1, u_2, \ldots , u_m \}$ where each $u_i$ is an $r$-subset of $[n]$ and $\,\bigcup u_i \!=\! [n]$. Construct the intersection graph of $U$. That is, let node $i$ correspond to $u_i$ and add edge $(i,j)$ if $u_i \cap u_j \neq \emptyset$. For example, suppose $n=8$ and $U = \{123, 124, 234, 567, 568\}$. The intersection graph is

$\qquad \qquad \qquad \qquad \qquad \qquad \quad $ enter image description here

We notice there are two connected components, ie. $\mathcal{C}(U)=2$. Given $n,r,m$ how many sets have exactly $c$ connected components?

$$ \begin{array}{ll} T_{n,r,m}(c) &= \displaystyle \left[x^c\right] \sum_{\left|U\right| = m} x^{\mathcal{C}(U)} \\ &= \displaystyle \left[y^m x^c\right] \sum_U y^{\left|U\right|} x^{\mathcal{C}(U)} \end{array} $$

Can we come up with an expression for $T_{n,r,m}(c)$ or at least some way to efficiently compute it? I would be happy with an algorithm where $n \le 50, r \le 10$ is tractable. I've enumerated all the nonzero values for $n\le6$ to use for verification.


I've posted a partial answer that reduces the problem to counting $T_{n,r,m}(1)$, the sets with a fully connected graph.

2

There are 2 best solutions below

7
On BEST ANSWER

New Notations: Always fix $r$. Suppose $L=(c_1, c_2, \ldots, c_l)$ is a list of non-decreasing natural numbers. Denote by $|L|$ ($=l$) the length of the list and $\sum L = c_1 + \ldots + c_l$ the sum of entries in $L$. Let $T(n, L)$ count the number of sets $U = \{u_1, \ldots, u_{m}\}$, where $m=\sum L$, each $u_i$ is an $r$-subset of $[n]$ and $\cup U = [n]$, whose intersection graph have exactly $l$ connected components with sizes $c_1, c_2, \ldots, c_l$.

Relation with the Old Notation: $$T_{n,m}(c)=\sum_{|L| = c, \sum{L}=m}T(n, L).$$

Inductive Property:

  1. If $L = L_1+L_2$ (list concatenation) such that $L_1 < L_2$ (i.e., all entries in $L_1$ is less than the ones in $L_2$), then $$T(n,L)=\sum_{n_1+n_2=n} {n\choose n_1} T(n_1, L_1)T(n_2, L_2).$$
  2. Suppose $L = (c,c,\ldots, c) =: (c)^l$, where $l=|L|$. We have $$T(n, L)=\sum_k\sum_{l_1 + \ldots + l_k = l}\frac{1}{l_1!\ldots l_k!}\sum_{\ n_1 < \ldots < n_k}\frac{n!}{(n_1!)^{l_1}\ldots(n_k!)^{l_k}}1_{l_1n_1 + \ldots + l_kn_k=n}T(n_1, (c))^{l_1}\ldots T(n_k, (c))^{l_k}.$$

Boundary Condition: $T(n, L)=0$ if $n < r$ or $n > r|L|$.

Restriction Formula: Note that the total number of sets $U = \{u_1, \ldots, u_m\}$ each $u_i$ is an $r$-subset of $[n]$ is equal to $${{n\choose r}\choose m}.$$ On the other hand, one can count it using $T_{n,m}(c)$ according to $\cup U$. We get the total number of $U$ is equal to $$\sum_{n_0\leq n}{n\choose n_0}\sum_cT_{n_0, m}(c) = \sum_{n_0\leq n}{n\choose n_0}\sum_l\sum_{|L| = l, \sum{L}=m}T(n_0, L).$$ Therefore we have $$\sum_{n_0\leq n}{n\choose n_0}\sum_l\sum_{|L| = l, \sum{L}=m}T(n_0, L) = {{n\choose r}\choose m}.$$ We rewrite it as $$T(n,(m)) = {{n\choose r}\choose m} - \sum_{n_0 < n}{n\choose n_0}\sum_l\sum_{|L| = l, \sum{L}=m}T(n_0, L) - \sum_{l>1}\sum_{|L| = l, \sum{L}=m}T(n, L)$$

Punch Line: Define $(n_0, L_0) < (n, L)$ if $n_0 < n$ or $n_0 = n$ but $|L_0| > |L|$. Clearly, one can use the inductive properties, the boundary condition and the restriction formula to compute $T(n,L)$ once $T(n_0,L_0)$ are given for all $(n_0, L_0)< (n,L)$ (call $(n,L)$ complexity of $T(n,L)$) since in all formulas involved the complexity on the right hand side is always less than the complexity on the left hand side.

Update: For computational reasons (as OP mentioned in the comment), one may want to replace the inductive property 2 by a simpler recursion $$T(n,(c)^l)=\sum_{k}{n-1\choose k-1}T(k,(c))T(n-k,(c)^{l-1}).$$ Again the idea in the inductive properties is indeed based on OP's answer.

6
On

Here is a partial answer, which reduces the problem to counting $T_{n,r,m}(1)$, that is, counting the number of sets that have a connected intersection graph ($c=1$). The reduction occurs through a recurrence, with $c=1$ corresponding to the base cases. First let me change the notation. Let $r$ be given as before. Since $r$ is held fixed, I suppress it as an index.

$$ \; $$

  • $T_{n,m}^{c,\ell}$ : This is the same as the old notation $T_{n,r,m}(c)$ with the added constraint that each connected component has less than $\ell$ nodes.

  • $T_{n}^{c\,\times\,\ell}$ : constrains the graph to have exactly $c$ connected components of size $\ell$. The number of nodes is suppressed as an index since it is forced to equal $c\ell$.

  • $T_{n,m} = T_{n,m}^{1,\, \ell_{ > m}} = T_{n}^{1 \, \times \, m}$ : counts the number of connected graphs on $m$ nodes.

$$ \; $$

Remember that for all three of these, each node corresponds to an $r$-subset and their union must equal $[n]$. To compute $T_{n,m}^{c,\ell}$, let $\gamma$ denote the size of the largest connected component, let $\varsigma$ be the number of components of size $\gamma$, let $\kappa$ be the the number of elements covered by such components, and then recurse.

$$ T_{n,m}^{c,\ell} = \sum_{\gamma \,<\, \ell} \; \sum_{\varsigma \,\le\, c} \; \sum_{\kappa \, \le \, n} {n \choose \kappa}\; T_{\kappa}^{\varsigma \, \times \, \gamma} \; T_{n-\kappa,\,m-\gamma\varsigma}^{c-\varsigma,\,\gamma} $$

Note the summation limits can often be pruned. The ${n \choose \kappa}$ picks the elements of $[n]$ covered by the large components. To compute $T_{n}^{c \, \times \ell}$ for $c>1$ we enforce that the "first" component to be the one that covers element "1" and we suppose that it covers $\kappa$ total elements. Then we recurse.

$$ T_{n}^{c \, \times \, \ell} = \sum_{\kappa \, < \, n} {n-1 \choose \kappa - 1} \; T_{\kappa, \ell} \; T_{n-\kappa}^{(c-1) \, \times \, \ell} $$

Thus the problem has been reduced to counting the connected graphs $T_{n,m}$.

Update @roy's restriction formula: by summing over $c$ and picking out the term $T_{n,m}$ we get the expression

$$ T_{n,m} \; = \; {{n \choose r} \choose m} \; - \; \sum_{\kappa \, < \, n} \sum_{c} {n \choose \kappa} T_{\kappa,m}^{c, \infty} \; - \; \sum_{c \, > \, 1} T_{n,m}^{c,\infty}$$

note the third term on the RHS does not produce a $T_{n,m}$ (so there is no cancellation).