Minimum size of set intersection

111 Views Asked by At

There was a survey about 5 kinds of common diseases and 100 people took the survey. 72 people have the first kind of disease, and 68, 66, 59, 82 for other 4 diseases respectively. People with at least 3 diseases took another more in-depth survey. What is the minimum number of participants who took another survey? Four possible answers are as follows.

A.48

B.49

C.50

D.51

I tried to achieve the answer but was not even close. Here's my reasoning.

Let set $A$ be all the 100 people, i.e., $|A|$=100. And let set $A_i$ be the people who have the $i^{th}$ disease, where $A_i \subset A, i \in \{1,2,3,4,5\} $, $|A_1|$=72, $|A_2|$=68, $|A_3|$=66, $|A_4|$=59 and $|A_5|$=82. Then we want the minimum size of $\bigcup_{i,j,k\in\{1,2,3,4,5\}}^{i \ne j \ne k, i \ne k} A_i \cap A_j \cap A_k$.

Notice that $|A_i \cap A_j| = |A_i| + |A_j| - |A_i \cup A_j| \ge |A_i| + |A_j| - |A|$, so similarly $|A_i \cap A_j \cap A_k| \ge |A_i| + |A_j| + |A_k| - 2|A|$.

So $|\bigcup_{i,j,k\in\{1,2,3,4,5\}}^{i \ne j \ne k, i \ne k} A_i \cap A_j \cap A_k| \ge |A_1 \cap A_2 \cap A_5| \ge 22$. But it's just a weak conclusion.

2

There are 2 best solutions below

7
On

$72 + 68 + 66 +59 + 82 = 347$ : Total number (with duplicates) of diseases is 347

$5*48+2*(100-48)=344$ : If only 48 people have more than 2 diseases, maximum total number (with duplicates) of diseases is 344

$344$ is less than $347$, so $48$ is not enough.

$5*49+2*(100-49)=347$

So $49$ should be enough, and it is easy to check that $49$ is enough.

4
On

You can solve the problem via integer linear programming, with a nonnegative integer decision variable $x_j$ for each of the $2^5=32$ possible subsets and linear constraints to enforce the given cardinalities. For your example, the problem is to minimize $$x_{7} + x_{11} + x_{13} + x_{14} + x_{15} + x_{19} + x_{21} + x_{22} + x_{23} + x_{25} + x_{26} + x_{27} + x_{28} + x_{29} + x_{30} + x_{31}$$ subject to linear constraints \begin{align} x_{1} + x_{3} + x_{5} + x_{7} + x_{9} + x_{11} + x_{13} + x_{15} + x_{17} + x_{19} + x_{21} + x_{23} + x_{25} + x_{27} + x_{29} + x_{31} &= 72 \\ x_{2} + x_{3} + x_{6} + x_{7} + x_{10} + x_{11} + x_{14} + x_{15} + x_{18} + x_{19} + x_{22} + x_{23} + x_{26} + x_{27} + x_{30} + x_{31} &= 68 \\ x_{4} + x_{5} + x_{6} + x_{7} + x_{12} + x_{13} + x_{14} + x_{15} + x_{20} + x_{21} + x_{22} + x_{23} + x_{28} + x_{29} + x_{30} + x_{31} &= 66 \\ x_{8} + x_{9} + x_{10} + x_{11} + x_{12} + x_{13} + x_{14} + x_{15} + x_{24} + x_{25} + x_{26} + x_{27} + x_{28} + x_{29} + x_{30} + x_{31} &= 59 \\ x_{16} + x_{17} + x_{18} + x_{19} + x_{20} + x_{21} + x_{22} + x_{23} + x_{24} + x_{25} + x_{26} + x_{27} + x_{28} + x_{29} + x_{30} + x_{31} &= 82 \\ \sum_{j=0}^{31} x_j &= 100 \end{align} The minimum turns out to be $49$, achieved by taking $$x_6 = 8, x_{10} = 1, x_{12} = 9, x_{17} = 23, x_{18} = 10, x_{31} = 49,$$ and all other $x_j=0$.


It turns out that the linear programming relaxation also has minimum value $49$, and the corresponding dual variables provide a short certificate of optimality as follows. Let $w_j$ be the Hamming weight of $j$, and let $\pi_j = (2- w_j \pmod 3)/3$. Now multiply the first five equality constraints by $1/3$, the last equality constraint by $-2/3$, the lower bounds $x_j \ge 0$ by $\pi_j$, and add everything up, yielding \begin{align} &\quad x_{7} + x_{11} + x_{13} + x_{14} + x_{15} + x_{19} + x_{21} + x_{22} + x_{23} + x_{25} + x_{26} + x_{27} + x_{28} + x_{29} + x_{30} + x_{31} \\ &\ge \frac{1}{3}\left(72 + 68 + 66 + 59 + 82\right) - \frac{2}{3} \cdot 100 \\ &= 49 \end{align}