I have a large set $S$ of items, but the set is not exactly known. All I know are the cardinal numbers of categories i.e. a number of disjoint subsets,
$ \vert{S_1}\vert \dots \vert S_n\vert$ with $\bigcup S_i = S$.
Thus I also know the cardinal number $\vert S\vert = \sum\vert S_i\vert$.
I then want to apply a predicate $p$ and estimate the cardinal number of the subset of $T \subset S$ which satisfies $p$, i.e.
$\vert T\vert = \vert \lbrace x\in S \mid p(x)\rbrace\vert$.
The predicate $p$ is precisely known and so are the predicates $p_i$ which need to be satisfied to fall into one of the categories $S_i$ (the "meanings" of $S_i$ are known).
In other words, I want to describe a large set with a few numbers $ \vert{S_1}\vert \dots \vert S_n\vert$, which still contain enough information, so I can estimate the cardinal number of the result of the filtering with $p$. I am willing to do a costly transformation on $p$ as long as the final computation of $\vert T\vert$ is fast.
I have the feeling that this is a textbook problem, but I don't know where to look for ideas.
- I don't know how to do this at all, and
- I don't know what is the best way to define $p_i$.
- What happens when my $S_i$ are not disjoint? Can I still say something about $\vert S\vert$ and $\vert T\vert$?
- I suppose the more $p_i$s I have, the better the estimate will be, but I'd like to be be able to say something quantitatively about the accuracy.