Estimate size of union of non distinct finite sets which size is only known as an aggregate

31 Views Asked by At

I have $n$ unknown finite sets $A_1,A_2,...,A_n$. I only know:

  • $|A_n| = X_n$
  • $ |\cup^n_{x=1}A_x| = X$
  • $ |A_x \cup A_y| \leq |A_x| + |A_y|\ with\ x,y \in \{1,...,n\}$

Can I estimate the size of $ \cup^6_{x=0}A_x $ better than using the inequality above.

My current approach would be running a simulation but I would expect a mathematical approach that is simpler and probably a less ad-hoc.

2

There are 2 best solutions below

6
On BEST ANSWER

Are you looking for a probabalistic answer?

Without any additional knowledge about the sets, we assume the elements are random. If all the sets were disjoint then we would have $|\cup_{x=1}^n A_x|=\sum_{x=1}^n |A_x|:=Y$, that is $X=Y$. But if we have non-empty intersections, then $X<Y$. Hence, we could say, that the probably a random element from a random set will repeat in another set is $\frac{Y-X}{Y}$.

This would then appear to form a binomial calculation based on how large $n$ is and how large each set is. For example, given a set $A$, how much of $A$ will repeat in another set $B$. Well for each element in $A$ the odds it will repeat in any of the sets is $\frac{Y-X}{Y}$, so the odds it will repeat specifically in $B$ is $\frac{Y-X}{Y}\frac{|B|}{Y-|A|}$.

Continuing in this fashion we could find what the expected size of the union of two random sets is. Similarly we could go through the calculations for 3 random sets unioning, and 4, etc.

2
On

Since you mentioned using computer simulation, I assume your sets are finite.

The last condition you have does not give any information, since for any sets we have $$ | A \cup B | + | A \cap B | = | A | + | B | $$ so your inequality is trivial and you don't have any information about sets relation at all. The best you can do is to bound $$ \max\left\{\max_{i \leq 6}|A_i|, X - \sum_{i=7}^n|A_i|\right\}\leq|\bigcup_{i = 1}^6 A_i|\leq \min\left\{\sum_{i =1}^6|A_i|, X\right\}. $$

If $n > 6$ you can easily find examples which show that those bounds are the best you can get.