Apologies for poor use of terms, I do not understand enough of the problem to even ask the right questions.
My main question is, what domain of mathematics is this problem and is it solved problem which has provably perfect solution or is it fundamentally hard problem.
Consider I have list of items describing pair of source and destination, let's imagine for example bus route from a source city to a destination city. And we want to express the list with fewest possible lines without losing information.
- S1 => D1
- S1 => D2
- S2 => D1
- S2 => D2
- D1 => S1
- D1 => S2
- D2 => S1
- D2 => S2
Naive compression algorithm could be:
step1, group by Bside
- S1 => [D1, D2]
- S2 => [D1, D2]
- D1 => [S1, S2]
- D2 => [S1, S2]
step2, group by Aside
- [S1, S2] => [D1, D2]
- [D1, D2] => [S1, D2]
step3, collapse direction
- [S1, D2] <=> [D1, D2]
However this algorithm doesn't guarantee shortest possible list for all possible lists. Because depending on list we have, the order of operations might need to be different for best result or we may need to not compress maximally in earlier step to have better overall compression in the end.
adding another dimension to the list
Now what if the list items have additional dimension, such as colour:
- S1 => D1 (red)
- S1 => D1 (blue)
- S2 => D1 (red)
- S2 => D1 (blue)
- S1 => D1 (green)
Would compress to:
- [S1, S2] => D1 [red, blue]
- S1 => D1 green
Now if we have perfect solution to the 1st problem, the naive approach for the 2nd problem is to group the list by colour, yielding colourless lists. Apply perfect solution to each colour separately then collapse colours if same lines appear in multiple colours. But this approach too suffers from the problem that we might not want to compress earlier perfectly to have best overall compression.
Only solution I have is to either not try to do perfect solution or brute-force perfect solution by trying everything.
Consider a list where, for every item $S_n \Rightarrow D_m$ in the list, the converse item $D_m \Rightarrow S_n$ is also in the list.
In this instance, your 'set or list compression' problem is equivalent to the problem of covering the edge set of a bipartite graph with as few complete bipartite subgraphs as possible. This problem is known as the biclique cover problem and is NP complete.