Union of sets with pairwise intersection having half of the elements

1k Views Asked by At

Consider $k$ sets $S_1, S_2, \ldots, S_k$, of the following properties:

  • For every $i$, $\left|S_i\right| = p$
  • For every pair of $i$ and $j$, $\left|S_i \cap S_j\right| = \frac{p}{2}$

Now I need to find the minimum of $\left|\bigcup S_i\right|$ in terms of $k$ and $p$.

(Assume $p$ is large enough to be divisible by any small enough integers)

Via trial and error I suspect the answer to be $$\frac{\left(2^{m + 1} - 1\right)p}{2^m}$$ where $m = \lfloor\log_2 k\rfloor$, but cannot prove it.

That's also why I give that assumption about $p$, so that one do not need to worry about the situation where $\frac{p}{2^m}$ is not an integer.


Edition: actually, I verified that my suspected answer is wrong. For example, when $k = 5$, I managed to construct the sets using $\frac{5p}{6}$ elements. Now I am not sure what the actual answer should be. :(


Correction: I think I had mistaken $p$ with another variable I used in a larger problem where this proof is required, when giving the $\frac{5p}{6}$ lower bound in the edition above. It should be $\frac{5p}{3}$.

An example would be when $k = 5$ and $p = 6$ (to be divisible by both $2$ and $3$), then my $5$ sets would be $$\left\{1, 2, 3, 4, 5, 6\right\}, \left\{1, 2, 3, 7, 8, 9\right\}, \left\{1, 4, 6, 7, 9, 10\right\}, \left\{2, 4, 5, 7, 8, 10\right\}, \left\{3, 5, 6, 8, 9, 10\right\}$$ using $10$ elements. They are obtained basically as I try to "average" the occurrences of each number (in this example, specifically, letting each number appear $3$ times). Maybe we can go from here, but how to prove in general that this average is reachable?

2

There are 2 best solutions below

0
On

Not a full answer (or perhaps: answering a different question)

Discussion: For shorthand, write $U = |\bigcup S_i|$. Let $f(k,p)$ denote the minimum $U$, as a function of both $k$ (integer) and $p$ (even integer). There is no decomposition $f(k,p) = g(k) \cdot p$ that holds exactly for all $k,p$. (E.g. if $p=2, k=5$, the only solution I can think of has $U = 6$ where every set has 1 element unique to itself and 1 element common to every set, so $g(5) = 6/2=3$. Meanwhile the OP showed an example for $p=6, k=5, U=10$ so $g(5) = 10/6 = 5/3$.) For that reason, I also find it strange to say lets "assume" $p$ is large and highly composite... unless what you are actually interested in minimizing the ratio $U/p$, e.g. for large $p$, or for optimal $p$. E.g., were you perhaps interested in $h(k) = \min_p {f(k,p) \over p}$, i.e. the minimum ratio (that holds for some $p$, i.e. minimized over all possible even integers $p$)?

Anyway, depending on whether you want to minimize universe size $U$, or minimize the ratio $U/p$, the answers may be different. The rest of this post shows 2 separate families of solutions, one better at minimizing $U$ and the other better at minimizing $U/p$.

Family 1: based on $d$-bit binary vectors

Consider $d$-bit binary vectors $v \in \{0,1\}^d$. Let $D=\{1, 2, ..., d\}$ denote the bit positions. Let $T$ be a non-empty subset of $D$. There are $2^d - 1$ such subsets, which we name $T_1, T_2, ..., T_{2^d -1}$.

  • Define $S_i = \{v \in \{0,1\}^d: \sum_{t \in T_i} v_t = 1 \pmod 2\}$. I.e., for any vector $v$, sum up (modulo 2) all the bits $v_t$ at positions $t \in T_i$, and if the sum is $1$ then the vector belongs in $S_i$, otherwise it doesnt. E.g. for $T_i = \{2,3,5\}$, any vector $(*,1,1,*,1,*,*) \in S_i$ but any vector $(*,1,0,*,1,*,*) \notin S_i$, where $*$ denotes wildcard ($0$ or $1$).

  • For any $i$, it is obvious that exactly 1/2 of all vectors are in $S_i$, i.e. $|S_i| = 2^{d-1} = p$.

  • I think it's also true (although I cannot think of a short proof) that for any $i, j$, exactly 1/4 of all vectors are in $S_i \cap S_j$, i.e. $|S_i \cap S_j| = 2^{d-2} = p/2$.

Thus the sets $S_i$ meet the OP's requirements. There are $2^d - 1$ such sets, and of course you don't have to use all of them. In other words we have constructed examples, parameterized by $d$, where:

  • individual set size $p = 2^{d-1}$
  • no. of sets $k \le 2^d - 1$
  • universe size $U = 2^d - 1$, because every vector is in the union EXCEPT the zero vector
  • ratio $U/p = {2^d - 1 \over 2^{d-1}}$, akin to the original formula the OP suggested.

Family 2: based on $m$-choose-$n$

Let me now exhibit a different family of solutions generalizing the OP's $k=5, p=6$ example.

Suppose you have $m$ distinct objects $\{1, 2, ..., m\}$ and you choose $n<m$ of them. There are of course $L={m \choose n}$ ways to do this, i.e. there are $L$ such size-$n$ subsets. Order these $L$ subsets lexicographically and call them $T_1, T_2, ... T_L$. E.g. if $n=3$ then $T_1 = \{1,2,3\}, T_2 = \{1,2,4\}, ..., T_L = \{m-2, m-1, m\}$, etc. Now form the following $m\times L$ matrix $A$:

  • $A_{ij}=1$ if object $i \in T_j$.
  • $A_{ij}=0$ otherwise.

I.e. the $j$th column of $A$ is the Boolean vector for membership in $T_j$. Now switch your perspective and...

  • Identify the rows of $A$ as the OP's $S_i$ sets.

  • Identify the set of columns as the universe $\bigcup S_i$, and in particular, $L = |\bigcup S_i| = U$.

The following statements hold:

  • Each row contains the same number of $1$s: indeed, $\forall i, \sum_j A_{ij} = {m-1 \choose n-1}$, independent of $i$. This is the number of subsets $T_j$ that contains $i$, which equals the number of ways to choose the remaining $n-1$ elements out of the remaining $m-1$ possibilities. Since we identify each row $i$ as the OP's set $S_i$, this means $|S_i| = p = {m-1 \choose n-1}$.

  • For any two rows $i, i'$, their intersection contains the same number of $1$s: indeed, $\forall i, i', \sum_j (A_{ij}A_{i'j}) = {m-2 \choose n-2}$, independent of $i, i'$. This is the number of subsets $T_j$ that contains both $i$ and $i'$, which equals the number of ways to choose the remaining $n-2$ elements out of the remaining $m-2$ possibilities. Since we identify rows $i, i'$ as the OP's sets $S_i, S_{i'}$, this means $|S_i \cap S_{i'}| = p/2 = {m-2 \choose n-2}$.

  • Now we solve: ${m-2 \choose n-2} = {1 \over 2} {m-1 \choose n-1} \implies n = 1 + {m-1 \over 2} = {m+1 \over 2}$.

In other words, parameterized by odd integer $m$, we can define $n = {m+1 \over 2}$ and we have exhibited a collection of sets (rows of $A$) where:

  • individual set size $p={m-1 \choose n-1}$
  • no. of sets $k \le m$
  • universe size $U = L = {m \choose n}$, the no. of columns of $A$
  • ratio $U/p = {m \choose n} / {m-1 \choose n-1} = m/n = {2m \over m+1}$

Examples and comparisons:

The follow table lists, for several choices of $k$, the "best" examples offered by each family and the resulting $p, U, U/p$. (In each family, the best is obtained by taking the smallest $d$ or $m$ possible.)

                Family 1             Family 2
                --------             --------
     k     d   p   U   U/p      m   n   p   U   U/p
   ---     ---------------      -------------------
     3     2   2   3   3/2      3   2   2   3   3/2
   4,5     3   4   7   7/4      5   3   6  10   5/3
   6,7     3   4   7   7/4      7   4  20  35   7/4
   8,9     4   8  15  15/8      9   5  70 126   9/5
  10,11    4   8  15  15/8     11   6 252 462  11/6
  12,13    4   8  15  15/8     13   7   .   .  13/7
  14,15    4   8  15  15/8     15   8   .   .  15/8

As can be seen, Family 1 generally has much smaller $U$ (except equality when $k=3$), but Family 2 has smaller $U/p$ ratio (except equality when $k=3, 6,7, 14, 15$, and if I may extrapolate: $k= 30, 31, 62, 63,$ etc)

2
On

Some estimation:

Let $$M:=\bigcup S_i = \{1,2,...,n\}$$ Let $d_i$ be a number of sets that $i\in M$ is in them. Then we have $$\sum_{i=1}^n d_i =k\cdot p$$ and $${k\choose 2}{p\over 2} = \sum _{i=1}^n{d_i\choose 2} $$

By Jensen we have:

$$\sum _{i=1}^n{d_i\choose 2} \geq {{1\over n}(\sum d_i)^2-(\sum d_i)\over 2}$$

so we have $$ {k\choose 2}{p\over 2}\geq {{1\over n}(kp)^2-(kp)\over 2}$$ and thus $${k-1\over 2}\geq {1\over n}kp-1$$ so $$n\geq {2kp\over k+1}$$


But I don't know how to find a configuration with $n=\Big[{2kp\over k+1}\Big]$. However, for odd $k$ equality is achieved if $d_1=d_2=...=d_n = {k+1\over 2}$