Given a group number and an potentially infinite number of groups, how would you calculate the predicted total number of groups?

68 Views Asked by At

Sorry for the confusing title, I’ll elaborate with an example.

You have been assigned group #4, and you are curious how many groups in total there are. Assuming groups are numbered sequentially starting at 1 and group assignment is random, you can assume there are at least 4 groups.

So far, I know that the most probable number of groups is your group number. You are more likely to be in group 4 if there are 4 groups, than if there were 1000. There is some kind of decay model where the larger the guess, the less probable it is, but I can’t figure it out. Whatever the probability model is, the area under it should equal 1.

2

There are 2 best solutions below

0
On

You need prior probabilities for this to have an answer.

Suppose $T$, the total number of groups, is a discrete random variable taking one of the values $1,2,3,\cdots$ with probabilities $p_1,p_2,p_3,\cdots$ and $G$, the group you're assigned, is a discrete random variable uniformly drawn from the numbers $1,2,\cdots,T$.

By design, we have the conditional probability

$$ \mathbb{P}(G=k\mid T=n)=\begin{cases}1/n & 1\le k\le n \\ 0 & \textrm{otherwise} \end{cases} $$

You want to know the conditional probability $\mathbb{P}(T=n\mid G=4)$, which Bayes rule says is

$$ \mathbb{P}(T=n\mid G=4)=\frac{\mathbb{P}(G=4\mid T=n)\mathbb{P}(T=n)}{\mathbb{P}(G=4)} $$

$$ = \frac{p_n/n}{p_4/4+p_5/5+p_6/6+\cdots}. $$

Under the assumption $p_k$ is nonincreasing with $k$, that means $\mathbb{P}(T=n\mid G=4)$ is decreasing with $n$. But notice this can fail otherwise, for example it is possible for $T$ to have prior distribution with $p_4=0$ and $p_5>0$, which makes $T=5$ more likely than $T=4$ given $G=4$.

Anyway, the conditional expected value is

$$ \mathbb{E}(T\mid G=4)=\sum_{n\ge4} \mathbb{P}(T=n\mid G=4)n=\frac{p_4+p_5+p_6+\cdots}{p_4/4+p_5/5+p_6/6+\cdots} $$

0
On

This question fits into the classical theory of statistical estimators. You have observed information (your group number $N$) and want to find a probability distribution for the total number of groups ($G$). Unfortunately, estimation theory tells us that your problem is subtly underspecified and cannot be solved as written.

Specifically, you haven't told us what your goal is.

For example, suppose you want to choose a probability distribution $\mathrm{dist}{(G)}$ that maximizes the probability that you fall into group $n$. Well, temporarily fix $G=g$. Assuming you have an equal chance of being placed in any given group, the probability that you are in group $n$ is $$\mathbb{P}[{N=n}|{G=g}]=\frac{1_{1\leq n\leq g}}{g}=\begin{cases}\frac{1}{g}&1\leq n\leq g\\0&\text{otherwise}\end{cases}$$ Varying $g$, $$\mathbb{P}[{N=n}]=\sum_{g=n}^{\infty}{\frac{\mathbb{P}[{G=g}]}{g}}$$ To maximize this probability, we should set $\mathbb{P}[{G=g}]=1_{G=n}$: it is most likely that you are in the group of largest index, and giving probability to any other group-count does worse.

Of course, that isn't the sort of decay model you're expecting, because you haven't written down your expectations as prior probabilities; the other answer here explains how to proceed from there.