How to take into account uncertainty on number of events

Question

How to take into account uncertainty on number of events

307 Views Asked by Bumbble Comm At 30 Mar 2026 - 4:26

Suppose I generate a set of events $X_{i}$ for $i = 1,2 \dots N$ and suppose every event is either a success or a failure, ie. $X_{i} = 0, 1$. If $N$ is fixed, the MLE for the probability of success is just $$\hat{p} = \frac{1}{N}\sum_{i = 1}^{N} X_{i}$$ and the variance of the MLE can be estimated as $$V(\hat{p}) = \frac{\hat{p}(1 - \hat{p})}N.$$ But now suppose that $$N \sim \text{Poisson}(\lambda_N),$$ where $\lambda_{N}$ is fairly large, so if we need to, this could be approximated as $$N \sim \text{Normal}\left(\mu = \lambda_{N}, \sigma^2 = \lambda_{N}\right).$$ The MLE is still just $$\hat{p} = \frac{1}{N}\sum_{i = 1}^{N} X_{i}$$ but the variance is increased. What I want to know is how to calculate the new variance of $\hat{p}$ by taking into account the uncertainty on $N$. I tried some error propagation on $N$ but I can't quite reproduce numerical results.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 31 Jul 2015 - 2:58

Variance of the Sum of $\boldsymbol{n}$ Items Where $\boldsymbol{n}$ is Variable

To complete the discussion, I think it would be nice to include a derivation of the formula for the variance where $n$ is variable, for those that do not have access to an appropriate text. $\newcommand{\E}{\operatorname{E}}\newcommand{\Var}{\operatorname{Var}}$

For a fixed $n$, we have $$ \E\!\left[\sum_{k=1}^nX_k\right]=n\E[X]\tag{1} $$ and, using $\Var[X]=\E\!\left[X^2\right]-\E[X]^2$, we get $$ \begin{align} \E\left[\left(\sum_{k=1}^nX_k\right)^2\right] &=n\E\!\left[X^2\right]+(n^2-n)\E[X]^2\\ &=n\Var[X]+n^2\E[X]^2\tag{2} \end{align} $$ Taking the expectation of $(1)$ in $n$ yields $$ \E\!\left[\sum_{k=1}^nX_k\right]=\E[n]\E[X]\tag{3} $$ and, using the fact that $\E\!\left[n^2\right]=\Var[n]+\E[n]^2$, taking the expectation of $(2)$ in $n$ gives $$ \begin{align} \E\left[\left(\sum_{k=1}^nX_k\right)^2\right] &=\E[n]\Var[X]+\E\!\left[n^2\right]\E[X]^2\\ &=\E[n]\Var[X]+\Var[n]\E[X]^2+\E[n]^2\E[X]^2\tag{4} \end{align} $$ Thus, $$ \begin{align} \mathrm{Var}\left[\sum_{k=1}^nX_k\right] &=\E\left[\left(\sum_{k=1}^nX_k\right)^2\right]-\E\!\left[\sum_{k=1}^nX_k\right]^2\\[6pt] &=\bbox[5px,border:2px solid #C0A000]{\E[n]\Var[X]+\Var[n]\E[X]^2}\tag{5} \end{align} $$

Variance of the Mean of $\boldsymbol{n}$ Items Where $\boldsymbol{n}$ is Variable

I think I misunderstood the question before the last edit. It appears that you are looking for the variance of $\frac1n\sum\limits_{k=1}^nX_k$ where $n$ is variable. If that is the case, we can follow the same approach as above:

For a fixed $n$, we have $$ \E\!\left[\frac1n\sum_{k=1}^nX_k\right]=\E[X]\tag{6} $$ and, using $\Var[X]=\E\!\left[X^2\right]-\E[X]^2$, we get $$ \begin{align} \E\left[\left(\frac1n\sum_{k=1}^nX_k\right)^2\right] &=\frac1n\E\!\left[X^2\right]+\left(1-\frac1n\right)\E[X]^2\\ &=\frac1n\Var[X]+\E[X]^2\tag{7} \end{align} $$ Taking the expectation of $(6)$ in $n$ yields $$ \E\!\left[\frac1n\sum_{k=1}^nX_k\right]=\E[X]\tag{8} $$ and taking the expectation of $(7)$ in $n$ gives $$ \E\left[\left(\frac1n\sum_{k=1}^nX_k\right)^2\right] =\E\!\left[\frac1n\right]\Var[X]+\E[X]^2\tag{9} $$ Thus, $$ \begin{align} \mathrm{Var}\!\left[\frac1n\sum_{k=1}^nX_k\right] &=\E\left[\left(\frac1n\sum_{k=1}^nX_k\right)^2\right]-\E\!\left[\frac1n\sum_{k=1}^nX_k\right]^2\\[6pt] &=\bbox[5px,border:2px solid #C0A000]{\E\!\left[\frac1n\right]\Var[X]}\tag{10} \end{align} $$ Note that by Cauchy-Schwarz, we have $$ \E\!\left[\frac1n\right]\ge\frac1{\E[n]}\tag{11} $$ Since a Poisson distribution allows $n=0$ and the mean of $0$ items is not well-defined, we will let $n-1$ have a Poisson distribution with mean $N-1$ so that we still have $\mathrm{E}[n]=N$. $$ \begin{align} \E\!\left[\frac1n\right] &=e^{-N+1}\sum_{k=0}^\infty\frac{(N-1)^k}{(k+1)\,k!}\\ &=\frac{e^{-N+1}}{N-1}\sum_{k=0}^\infty\frac{(N-1)^{k+1}}{(k+1)!}\\ &=\frac{1-e^{-N+1}}{N-1}\\[6pt] &\sim\frac1{N-1}\tag{12} \end{align} $$

**Bumbble Comm** · Accepted Answer

The formula for the variance of 'a random sum of random variables' is given in many probability texts. It is derived by conditioning as suggested by @Augustin. If $X_i, \dots, X_N$ are iid and $N$ is independent of the $X$'s, then the sum $S$ has variance $$V(S) = E(N)V(X) + V(N)[E(X)]^2.$$ Roughly speaking, the second term expresses the additional variability due to random $N$ as compared with fixed $n$. As you might expect, $E(S) = E(N)E(X).$

Notice the units in the formula for $V(S).$ Terms for $N$ are pure numbers. Terms for $X$ have squared units to match units of $S$.

In many cases, a normal approximation for $S$ gives good results. If the $X$'s are binomial, as in your case, normal approximation should work fairly well if $E(N)$ is moderately large. (But I would not trust it if $X$'s are extremely skewed with long tails, as for exponential, unless $E(N)$ is large.)

In practical cases, simulation is an easily programmed option--if only as a check on the accuracy of a normal approximation. (Depending on the language used, you might have to make explicit accommodation for cases where $N = 0$.)

Here is a simulation in R of 100,000 experiments with $N \sim Pois(10)$ and $X_i \sim Binom(1, 1/2) = Bernoulli(1/2).$ As the histogram shows, the normal fit is not excellent because the Poisson mean is relatively small.

NOTE: The dark dots atop the histogram bars are for $Pois(5)$, which is the exact distribution of $S$ in this particular case. Here $S$ is simply $N$ with half of its realizations filtered out at random. This result can be established analytically using moment generating functions. (Imagine thin lead foil blocking half of the particles emitted from a chunk of uranium. What gets through is still Poisson.)

 m = 10^5;  s = numeric(m)
 for (i in 1:m) {
   n = rpois(1, 10)
   s[i] = sum(rbinom(n, 1, .5)) }
 mean(s);  sd(s)
 ##  4.99118  # approx E(S) = 5
 ##  2.231150 # approx SD(S), where V(S) = 5
 sqrt(10*.25 + 10*.25)
 ##  2.236068 # exact SD(S)

 mean(s < 5)
 ## 0.4425    # simulated P(S < 5)
 pnorm(4.5, 5, sqrt(5))
 ## 0.4115316 # norm approx P(S < 5)

ADDENDUM: OP seems interested in the distribution of $S/N$ (see Comments). Here is histogram from code slightly altered to retain values of both $S$ and $N$ at each iteration.

How to take into account uncertainty on number of events

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in STOCHASTIC-PROCESSES

Related Questions in POISSON-DISTRIBUTION

Related Questions in PARAMETER-ESTIMATION

Trending Questions

Popular # Hahtags

Popular Questions