I have a program that needs to count each set of staff attending a given meeting/events - such as the group of Alice and Bob being in 8/10 of meetings, the group of John, Jim and Jeff being in 6/10, and the group all staff attending 5/10 meetings.
My goal is to count each possible set of two or more people in a group across dozens of meetings/events with as many as 30 attendees.
Currently, my plan is to calculate each powerset of a given meetings attendees and then count the occurrences of each subset (minus the empty set and subsets with one attendee), but that computation seems infeasible given larger numbers of staff.
Is it possible to simply record the meetings attendees for each meeting, then calculate the occurrences of each subset across all meetings?
Update
Say I have three meetings with the following attendees:
- Alice, Bob, Jeff
- Alice, Bob
- Alice, Bob, Jeff, John
I want to know specifically that the following sets were found and how many times they occurred in total:
- Alice, Bob -- 3 times
- Alice, Jeff -- 2 times
- Alice, John -- 1 time
- Bob, Jeff -- 2 times
- Bob, John -- 1 time
- Jeff, John -- 1 time
- Alice, Bob, Jeff -- 2 times
- Alice, Jeff, John -- 1 time
- Alice, Bob, John -- 1 time
- Bob, Jeff, John -- 1 time
- Alice, Bob, Jeff, John -- 1 time
You are correct that your powerset approach is likely to become intractable in practice, as the number of attendees presumably grows faster than the logarithm of the number of meetings.
Rather than the powerset, you could iterate through the subsets that actually occur. Something like this:
For each meeting, take all the subsets (of size $\ge 2$). If they're not in your list already, add them with tally $1$. If they are in the list, increment the tally (by $1$). By the end, you've got tallies for all the pairings that occurred and anything not in your list has a tally of $0$.
So in your example, it works like this:
Iterating through the $2^n-n-1$ desired subsets of a meeting of $n$ people ($2^n$ subsets total, $1$ of size $0$, $n$ of size $1$) can be done programmatically by iterating through all subsets (a built-in function or a known problem) and adding a conditional to skip subsets of size $\le 1$.
At the end of the process, if you had meetings of sizes $n_1,n_2,\dots,n_k$ then the sum of the tallies will be $\sum_{i=0}^k (2^{n_i}-n_i-1)$.