Can anyone please help frame this problem theoretically and/or guide me to its solution?
I think it might be partly related to 'set cover' type of problems, but the requirements are a bit different.
Suppose you have a total of $M$ features ($M \ge 3$), and you have the set of all objects possessing (each) exactly $n$ of these features ($2 \le n \le M-1$).
E.g. the features could be the first $M=5$ uppercase letters of the Latin alphabet: $\{A,B,C,D,E\}$.
For $n=2$, you have a set of objects $\{(A,B); (A,C); (A,D); (A,E); (B,C); ...; (D,E)\}$.
The number of such objects is of course $M \choose n$; in this example, ${5 \choose 2} =10$.
I need to calculate the minimal number of objects from this set that I am required to select, such that in their union each of the $M$ features is represented at least $k$ times ($1 \le k \le M-1)$.
E.g. in the above example ($M=5,n=2$), if I want each feature to be represented at least twice ($k=2$), I need at least $5$ objects, which could be $\{(A,B);(C,D);(A,E);(C,E);(B,D)\}$.
Note that I don't particularly need to know the objects themselves (as there are probably several degenerate solutions): just how many they are.
Supposing that we want to have each feature repeated $k=3$ times, the required number of objects is of course larger ($8$ for $M=5,n=2,k=3$).
Conversely, if we increase the number of features each object can have, the required number of objects decreases (e.g. $4$ for $M=5,n=3,k=2$)
I have tried framing this in a matrix format, to figure out a formula, meaning a function of $M,n,k$ that gives me the desired number, but I can't see it.
Any suggestions?
Thanks
PS
Find below an R script that calculates the number using linear programming
# calculate the minimal number of objects, each containing n features out of a total of M features
# such that in the set of all objects each feature is represented at least k times
M = 5
n = 2
k = 2
# make the sparse matrix of all objects containing n features out of M
# objects are in columns, and are as many as choose(M,n)
# features are in rows, and are M
N.objects <- choose(M,n)
obj.mat <- sparseMatrix(j = rep(1:N.objects,each = n), i = unlist(combn(1:M, m = n, simplify = F)), x = 1)
# the constraint is that each row of the matrix sums to at least k
dir <- rep(">=",M)
rhs <- rep(k,M)
# the objective is all 1's (we want to count and minimise the number of objects used)
obj <- rep(1,N.objects)
# solve the task by linear programming
if (length(find.package(package="Rsymphony",quiet=TRUE))==0) install.packages("Rsymphony")
require(Rsymphony)
lp.out <- Rsymphony_solve_LP(obj,obj.mat,dir,rhs,types="B",max=FALSE)
cat("\nN objects needed =",lp.out$objval)
cat("\n\n")
print(obj.mat[,as.logical(lp.out$solution)])
I am going to attempt a justification of the result we worked out with Paul's help.
Suppose the solution is a binary matrix with the $N$ objects in columns and the $M$ features in rows.
E.g. for $M=7, n=4, k=3$, where $6$ objects are necessary:
It is required to represent each of the $M$ features at least $k$ times, thus each row must contain at least $k$ 1's.
Each object has exactly $n$ distinct features, thus each column contains exactly $n$ 1's.
Therefore the total number of 1's ($N_1$) in the matrix, counting by rows, is $N_1 \ge M \cdot k$; counting by columns, it is $N_1=N \cdot n$.
Substituting $N_1$ from the equation into the inequality, we have:
$N \cdot n \ge M \cdot k$
i.e.:
$N \ge \frac {M \cdot k} {n}$
As all variables are positive integers, and we want the minimal $N$:
$N = \lceil \frac {M \cdot k} {n} \rceil$
I hope this makes sense and I haven't overlooked anything.