Can one nontrivially bound intersection size with a union of sets given only pairwise intersection sizes?

348 Views Asked by At

Say you have some finite, non-empty sets $S_1 \ldots S_n$. You are not able to inspect the contents of these sets directly, or even produce their unions or intersections; the only way you can learn about them is through the function $P$, defined as: $$ P(S_i, S_j)=\frac{|S_i \cap S_j|}{|S_i|} $$

In other words, $P$ tells you what fraction of $S_i$ is also contained in $S_j$—i.e., it captures precision.

What I'd ideally like to be able to do is compute $$1 - P\left(S_n, \bigcup_{i=1}^{n-1} S_i\right),$$ AKA how much of $S_n$ is novel with respect to $S_1 \ldots S_{n-1}$.

Of course, this is impossible given only the pairwise $P$ function; an exact result would also need to account for $k$-way intersections between the various sets.

The question, though, is whether it's possible to compute some nontrivial bounds on this value. Intuitively, it seems like all these pairwise intersection sizes should give you a decent amount of information about size of the union of $S_1 \ldots S_{n-1}$. What are the tightest bounds available on the expression above?

Relatedly, does it change anything if we are able to access $|S_i|$? For example, I've worked out (tentatively) that in the case of $n=3$, $$ P(S_3, S_1 \cup S_2) \geq \frac{|S_3 \cap S_1| + |S_3 \cap S_2| - \min(|S_1 \cap S_2|, \max(|S_1 \cap S_3|, |S_2 \cap S_3|))}{|S_3|} $$

And given the $|S_i|$s, all of these values should be computable from the $P(S_i, S_j)$. But I'm still not sure how to generalize this to larger $n$, and in particular whether the bound would become tighter or looser as $n$ grows.

1

There are 1 best solutions below

1
On

I'm not a set expert, so there may be a tighter bound. But one nontrivial bound (if you can assume all sets are the same size, if not then I'm not sure about a possible bound) is $$1 - \sum_{i=1}^{n-1} P(S_n,S_i)$$

As an example, assume there are 3 sets. Set 1 and Set 2 have a 1/3 overlap and Set 1 and Set 3 have a 1/4 overlap. in the worst case, Set 2 and Set 3 are disjoint, so the novel part of Set 1 must be equal to or less than $1- (\frac{1}{4} + \frac{1}{3})$. $(1-\frac{7}{12}) = \frac{5}{12}$. So $\frac{1}{3}\leq 1 - P(S_1,\bigcup_{i=2}^3 S_i) \leq \frac{5}{12}$ for this example. Since the overlap cannot be negative, any value less than 0 would be calculated as 0 or greater.

You could also calculate a tighter bound recursively. If you know that $P(S_2,S_3)$ is .9, then even if $P(S_1, S_3)$ is greater than .1, it can only add at most .1 to the value of $P(S_1, S_2)$.

Sorry if the bound isn't tight enough, but hopefully this bound will help. And it shows there is a nontrivial solution, if an additional assumption is made.