Distribution of Sum of Bernoulli Distributed Samples

111 Views Asked by At

Suppose I have a Bernoulli distribution conditioned on an exponentially distributed random variable, $t$.

\begin{equation} t \sim \text{Exp}(\tau_{0}) \end{equation}

\begin{equation} Z \sim \text{Bernoulli}(p = (1 - e^{-t/\tau_{1}})) \end{equation}

By Bayes Theorem, we can write the conditional probability distribution for $t$ given a successful trial, $Z=1$, as:

\begin{equation} P(Z=1|t)f(t) = P(Z=1)f(t|Z) \end{equation}

\begin{equation} P(Z=1|t) = (1-e^{-t/\tau_{1}}) \end{equation}

\begin{equation} f(t) = \frac{e^{-t/\tau_{0}}}{\tau_{0}} \end{equation}

\begin{equation} P(Z=1) = \int^{\infty}_{0} P(Z=1|t)f(t).dt = \frac{\tau_{0}}{\tau_{1} + \tau_{0}} \end{equation}

Thus the probability distribution $f(t|Z)$ may be written as:

\begin{equation} f(t|Z) = \frac{\tau_{0} + \tau_{1}}{\tau_{0}^{2}}(1-e^{-t/\tau_{1}}) e^{-t/\tau_{0}} \end{equation}

Suppose I then sample this distribution $N$ times independently, where $N\sim \text{Pois}(\mu)$ and calculate the sum of the random variables. Let us call this quantity $S$.

Let the distribution of the sum of $N$ random uncorrelated trials be defined $S = \sum^{N}_{i=0} Z_{i}$.

How do I calculate the distribution of $t$ given $N$, $f(t|N, S)$?

Secondly, what is the probability that $P(S=s|N)$?

ATTEMPT AT A SOLUTION:

The distribution of the sum of $S$ random uncorrelated trials, $\sum^{s}_{i=0} Z_{i}$, ought to be the $s^{th}$-fold convolution of the distribution we have just calculated:
\begin{equation} f(t|S=s) = \left(f(t|Z=1)\right)^{\circledast s} \end{equation}

Where $\circledast s$ is the $s^{th}$-fold auto-convolution of the distribution.

I can then calculate $f(t|N, S)$ by simply adding each probability distribution scaling by the probability it occurs:

\begin{equation} f(t|N, S) = \sum^{N}_{s=0} P(S=s|N)f(t|S=s) \end{equation}

This is not very useful without its total probability $P(S=s|N)$.

I also know that the probability of $N$ is

\begin{equation} P(N) = \text{Pois}(N, \mu) \end{equation}

I think I can write the total probability of each event occurring as:

\begin{equation} \sum^{N}_{s=0} P(S=s|N) = (P(Z=1) + (1 - P(Z=1))^{s} \end{equation}

So, for N=2:

Probability Sum to 0: \begin{equation} P(S=0|N=2) = (1-P(Z=1))^{2} \end{equation}

Probability Sum to 1: \begin{equation} P(S=1|N=2) = 2(1-P(Z=1))P(Z=1) \end{equation}

Probability Sum to 2: \begin{equation} P(S=2|N=2) = P(Z=1)^{2} \end{equation}

This should easily allow me to determine the probability of P(S>0|N), for instance:

\begin{equation} P(S>0|N=2) = 1-(1-P(Z=1))^{2} \end{equation}

And would in fact imply that the distribution of the sum of the Bernoulli is given by a binomial distribution:

\begin{equation} P(S|N) = \text{Binom}(N_{trials}=N, k=S, p=(1 - e^{-\frac{-t}{\tau}})) \end{equation}

I can demonstrate that the above statement is true with simulation, but I am stuck, because I assumed that $p = \frac{\tau_{0}}{\tau_{0} + \tau_{1}}$, which is certainly not the same.

Please find below a Python script simulating the problem I am facing. The program calculates $P(S>0|N)$ is calculated for each $N$, up to $N=10$.

from tqdm.auto import tqdm
import numpy as np
from scipy.stats import binom, expon, poisson
tau0 = 7.5
tau1 = 20.0
N_event = int(1e6)

#N = np.random.randint(10, size=N_event)
N = np.array(poisson.rvs(mu=1, size=N_event)) # should be independent of the count distribution

S_0 = []
S_1 = []
for _n in tqdm(N):
    _s_0=0
    _s_1=0
    if(_n>0):
        _t = np.array(expon.rvs(tau0, size=_n))
        _s_0=np.sum(binom.rvs(n=1, p=((1 - np.exp(-_t/tau1))), size=_n))
        _s_1 = binom.rvs(n=_n, p=((1 - np.exp(-_t[0]/tau1))), size=1)
    
    S_0.append(_s_0)
    S_1.append(_s_1)

S_0 = np.array(S_0)
S_1 = np.array(S_1)

for i in range(10):
    total_s0ni = sum((S_0==1)&(N==i))
    total_s1ni = sum((S_1==1)&(N==i))
    total_s2ni = sum((S_2==1)&(N==i))
    total_ni = sum((N==i))
    
    print("P(S0>0|N=i) = {:3.3f}, P(S1>0|N=i) = {:3.3f}".format(total_s0ni/total_ni, total_s1ni/total_ni))