Good evening,
As the title explains, I am looking for an algorithm (I'd implement it in Python) to generate all of the combinations for a given basket of X stocks taken Y at a time (order is not relevant and no repetitions allowed) and not sharing more than Z element between one and another.
I have added the extra restriction because my base case is a universe of 50 stocks and combinations of (i.e basket) 10 stocks, which would generate 10 272 278 170 possible baskets. Since I would have to run further analysis on these baskets, it would be computationally impossible.
As such, I have thought about applying a further restriction which would drastically reduce the number of baskets. In practice, for the same concrete example given above (universe of 50 and baskets of 10) I would like to only generate those baskets of 10, that share a maximum of 5 stocks with any single other basket.
I have tried to come up with a generic algorithm to do so (universe of X, basket of Y and minimum different components of Z) but I haven't managed to come up with a proper solution. Maybe this type of restricted combination does have an actual name in specific literature but my google search have been fruitless.
Hope someone has faced and solved that problem successfully or can come up with a great solution for it!
Thanks!
What you are looking for in the literature is known as a constant weight binary code (Wikipedia). Specifically, if $A(n,d,w)$ is the maximum number of binary vectors, each with $w$ ones and $n-w$ zeroes, such that the Hamming distance between any two codewords is $d$, then the largest possible size of your set of sets would be $A(X,2(Y-Z) ,Y)$. In the special case you mentioned, you want $A(50,10,10)$.
Not much is known about the exact optimal values of $A(n,d,w)$. The main ways to find constant weight binary codes usually involve a lot of computational effort. This table lists the values of $A(n,10,w)$ for $n\ge 32$ and $w\le 16$, but the highest relevant value for your problem is $A(32,10,10)\ge 500$. This means that even when you restrict yourself to only using the first $32$ elements of your $50$ element set, you can get $500$ sets. If you click on that link, they actually provide the list of codewords.
The question you need to consider is; since there is little hope of finding the maximal collection of subsets of a size of $50$, each with size $10$ and at most $5$ elements in common, how many subsets do you realistically need? Is $500$ enough?
Solution for $315$ subsets of $\{1,\dots,29\}$ of size $5$ where each subset has at most two in common with each other: