Consider $k$ sets $S_1, S_2, \ldots, S_k$, of the following properties:
- For every $i$, $\left|S_i\right| = p$
- For every pair of $i$ and $j$, $\left|S_i \cap S_j\right| = \frac{p}{2}$
Now I need to find the minimum of $\left|\bigcup S_i\right|$ in terms of $k$ and $p$.
(Assume $p$ is large enough to be divisible by any small enough integers)
Via trial and error I suspect the answer to be $$\frac{\left(2^{m + 1} - 1\right)p}{2^m}$$ where $m = \lfloor\log_2 k\rfloor$, but cannot prove it.
That's also why I give that assumption about $p$, so that one do not need to worry about the situation where $\frac{p}{2^m}$ is not an integer.
Edition: actually, I verified that my suspected answer is wrong. For example, when $k = 5$, I managed to construct the sets using $\frac{5p}{6}$ elements. Now I am not sure what the actual answer should be. :(
Correction: I think I had mistaken $p$ with another variable I used in a larger problem where this proof is required, when giving the $\frac{5p}{6}$ lower bound in the edition above. It should be $\frac{5p}{3}$.
An example would be when $k = 5$ and $p = 6$ (to be divisible by both $2$ and $3$), then my $5$ sets would be $$\left\{1, 2, 3, 4, 5, 6\right\}, \left\{1, 2, 3, 7, 8, 9\right\}, \left\{1, 4, 6, 7, 9, 10\right\}, \left\{2, 4, 5, 7, 8, 10\right\}, \left\{3, 5, 6, 8, 9, 10\right\}$$ using $10$ elements. They are obtained basically as I try to "average" the occurrences of each number (in this example, specifically, letting each number appear $3$ times). Maybe we can go from here, but how to prove in general that this average is reachable?
Not a full answer (or perhaps: answering a different question)
Discussion: For shorthand, write $U = |\bigcup S_i|$. Let $f(k,p)$ denote the minimum $U$, as a function of both $k$ (integer) and $p$ (even integer). There is no decomposition $f(k,p) = g(k) \cdot p$ that holds exactly for all $k,p$. (E.g. if $p=2, k=5$, the only solution I can think of has $U = 6$ where every set has 1 element unique to itself and 1 element common to every set, so $g(5) = 6/2=3$. Meanwhile the OP showed an example for $p=6, k=5, U=10$ so $g(5) = 10/6 = 5/3$.) For that reason, I also find it strange to say lets "assume" $p$ is large and highly composite... unless what you are actually interested in minimizing the ratio $U/p$, e.g. for large $p$, or for optimal $p$. E.g., were you perhaps interested in $h(k) = \min_p {f(k,p) \over p}$, i.e. the minimum ratio (that holds for some $p$, i.e. minimized over all possible even integers $p$)?
Anyway, depending on whether you want to minimize universe size $U$, or minimize the ratio $U/p$, the answers may be different. The rest of this post shows 2 separate families of solutions, one better at minimizing $U$ and the other better at minimizing $U/p$.
Family 1: based on $d$-bit binary vectors
Consider $d$-bit binary vectors $v \in \{0,1\}^d$. Let $D=\{1, 2, ..., d\}$ denote the bit positions. Let $T$ be a non-empty subset of $D$. There are $2^d - 1$ such subsets, which we name $T_1, T_2, ..., T_{2^d -1}$.
Define $S_i = \{v \in \{0,1\}^d: \sum_{t \in T_i} v_t = 1 \pmod 2\}$. I.e., for any vector $v$, sum up (modulo 2) all the bits $v_t$ at positions $t \in T_i$, and if the sum is $1$ then the vector belongs in $S_i$, otherwise it doesnt. E.g. for $T_i = \{2,3,5\}$, any vector $(*,1,1,*,1,*,*) \in S_i$ but any vector $(*,1,0,*,1,*,*) \notin S_i$, where $*$ denotes wildcard ($0$ or $1$).
For any $i$, it is obvious that exactly 1/2 of all vectors are in $S_i$, i.e. $|S_i| = 2^{d-1} = p$.
I think it's also true (although I cannot think of a short proof) that for any $i, j$, exactly 1/4 of all vectors are in $S_i \cap S_j$, i.e. $|S_i \cap S_j| = 2^{d-2} = p/2$.
Thus the sets $S_i$ meet the OP's requirements. There are $2^d - 1$ such sets, and of course you don't have to use all of them. In other words we have constructed examples, parameterized by $d$, where:
Family 2: based on $m$-choose-$n$
Let me now exhibit a different family of solutions generalizing the OP's $k=5, p=6$ example.
Suppose you have $m$ distinct objects $\{1, 2, ..., m\}$ and you choose $n<m$ of them. There are of course $L={m \choose n}$ ways to do this, i.e. there are $L$ such size-$n$ subsets. Order these $L$ subsets lexicographically and call them $T_1, T_2, ... T_L$. E.g. if $n=3$ then $T_1 = \{1,2,3\}, T_2 = \{1,2,4\}, ..., T_L = \{m-2, m-1, m\}$, etc. Now form the following $m\times L$ matrix $A$:
I.e. the $j$th column of $A$ is the Boolean vector for membership in $T_j$. Now switch your perspective and...
Identify the rows of $A$ as the OP's $S_i$ sets.
Identify the set of columns as the universe $\bigcup S_i$, and in particular, $L = |\bigcup S_i| = U$.
The following statements hold:
Each row contains the same number of $1$s: indeed, $\forall i, \sum_j A_{ij} = {m-1 \choose n-1}$, independent of $i$. This is the number of subsets $T_j$ that contains $i$, which equals the number of ways to choose the remaining $n-1$ elements out of the remaining $m-1$ possibilities. Since we identify each row $i$ as the OP's set $S_i$, this means $|S_i| = p = {m-1 \choose n-1}$.
For any two rows $i, i'$, their intersection contains the same number of $1$s: indeed, $\forall i, i', \sum_j (A_{ij}A_{i'j}) = {m-2 \choose n-2}$, independent of $i, i'$. This is the number of subsets $T_j$ that contains both $i$ and $i'$, which equals the number of ways to choose the remaining $n-2$ elements out of the remaining $m-2$ possibilities. Since we identify rows $i, i'$ as the OP's sets $S_i, S_{i'}$, this means $|S_i \cap S_{i'}| = p/2 = {m-2 \choose n-2}$.
Now we solve: ${m-2 \choose n-2} = {1 \over 2} {m-1 \choose n-1} \implies n = 1 + {m-1 \over 2} = {m+1 \over 2}$.
In other words, parameterized by odd integer $m$, we can define $n = {m+1 \over 2}$ and we have exhibited a collection of sets (rows of $A$) where:
Examples and comparisons:
The follow table lists, for several choices of $k$, the "best" examples offered by each family and the resulting $p, U, U/p$. (In each family, the best is obtained by taking the smallest $d$ or $m$ possible.)
As can be seen, Family 1 generally has much smaller $U$ (except equality when $k=3$), but Family 2 has smaller $U/p$ ratio (except equality when $k=3, 6,7, 14, 15$, and if I may extrapolate: $k= 30, 31, 62, 63,$ etc)