Fast way to estimate cardinal number of subset

137 Views Asked by At

I have a large set $S$ of items, but the set is not exactly known. All I know are the cardinal numbers of categories i.e. a number of disjoint subsets,

$ \vert{S_1}\vert \dots \vert S_n\vert$ with $\bigcup S_i = S$.

Thus I also know the cardinal number $\vert S\vert = \sum\vert S_i\vert$.

I then want to apply a predicate $p$ and estimate the cardinal number of the subset of $T \subset S$ which satisfies $p$, i.e.

$\vert T\vert = \vert \lbrace x\in S \mid p(x)\rbrace\vert$.

The predicate $p$ is precisely known and so are the predicates $p_i$ which need to be satisfied to fall into one of the categories $S_i$ (the "meanings" of $S_i$ are known).

In other words, I want to describe a large set with a few numbers $ \vert{S_1}\vert \dots \vert S_n\vert$, which still contain enough information, so I can estimate the cardinal number of the result of the filtering with $p$. I am willing to do a costly transformation on $p$ as long as the final computation of $\vert T\vert$ is fast.

I have the feeling that this is a textbook problem, but I don't know where to look for ideas.

  • I don't know how to do this at all, and
  • I don't know what is the best way to define $p_i$.
  • What happens when my $S_i$ are not disjoint? Can I still say something about $\vert S\vert$ and $\vert T\vert$?
  • I suppose the more $p_i$s I have, the better the estimate will be, but I'd like to be be able to say something quantitatively about the accuracy.