Size of a union of sets when only their sizes and those of pairwise intersections are given

157 Views Asked by At

Consider a finite set $X$ and a set of subsets $X_i \subset X$ with $\bigcup_i X_i = X$.

Assume that only the sizes $|X_i|$ and $|X_i \cap X_j|$ are given.

Obviously $\sum_i |X_i| - \sum_{ij}|X_i \cap X_j| \leq |X| \leq \sum_i |X_i|$.

Are there tighter lower and upper bounds for $|X|$?

Edit: To be more specific, assume that a matrix $A = \{\alpha_{ij}\}$ is given, with $\alpha_{ij} = |X_i \cap X_j|$, esp. $\alpha_{ii} = |X_i|$. Is there a formulaic matrix invariant $F(A)$ with $F(A) = |\bigcup_i X_i|$?

Here are some examples, describing families of four 4-element sets:

$F\left(\begin{array}{cccc}4&0&0&0\\0&4&0&0\\0&0&4&0\\0&0&0&4\end{array}\right) = 16\ \ \ \ \ \ \ \ \ \ F\left(\begin{array}{cccc}4&1&0&0\\1&4&2&0\\0&2&4&1\\0&0&1&4\end{array}\right) = 12\ \ \ \ \ \ \ \ \ \ F\left(\begin{array}{cccc}4&2&1&2\\2&4&2&1\\1&2&4&2\\2&1&2&4\end{array}\right) = 9$

$F\left(\begin{array}{cccc}4&1&1&1\\1&4&1&1\\1&1&4&1\\1&1&1&4\end{array}\right) = 13\ \ \ \ \ \ \ \ \ \ F\left(\begin{array}{cccc}4&2&2&2\\2&4&2&2\\2&2&4&2\\2&2&2&4\end{array}\right) = 10\ \ \ \ \ \ \ \ \ \ F\left(\begin{array}{cccc}4&1&0&1\\1&4&1&0\\0&1&4&1\\1&0&1&4\end{array}\right) = 12$

1

There are 1 best solutions below

1
On BEST ANSWER

There are sharper upper bounds than the one you wrote, at least when some of the intersections $X_i\cap X_j$ are non empty (if they are all empty, then of course your lower and upper bounds are the same, and sharp). If $P$ is any set of disjoint pairs of the index set$~I$, then $|X|\leq \sum_{i\in I}-\sum_{\{i,j\}\in P}|X_i\cap X_j|$, since if you give any $x\in X$ a weight $|\{\,i\in I\mid x\in X_i\,\}|-|\{\,\{i,j\}\in P\mid x\in X_i\cap X_j\,\}|$, then all weights are${}\geq1$ and the sum of all weights is an upper bound for $|X|$. The partition of this type that gives you the sharpest upper bound depends on your data, and might not be easy to find.

But one can do even better, I think. If you allow for the set $P$ of pairs to be the edges of any forest on the the index set (with a single tree being the best option) then you still get that any $x\in X$ gets a weight${}\geq1$, since the weight of any $x\in X$ is the number of vertices minus the number of edges in the induced subgraph on $\{\,i\in I\mid x\in X_i\,\}$, and since that subgraph is a forest, the difference is at least$~1$.

The best upper bound of this type would be the one associated to a maximum spanning tree on $I$ for the edge weights $|X_i\cap X_j|$, which can be found as a minor variation of the problem of finding a minimum spanning tree; that problem apparently has been extensively studied.