What is an estimator for the "number of trials" given observed successes and the success probability?

669 Views Asked by At

The binomial distribution with $n$ trials, $k$ successes and success probability $p$ is given by

$$P(k;n,p) = \binom{n}{k} p^k (1-p)^{(n-k)}, \quad k \in \{0,...,n\}$$

Suppose that we observe $k$ successes and know $p$ but we do not know $n$. Observe that now $k$ and $p$ are fixed whereas $n$ is stochastic. So if $k=6$ and $p=0.4$,

$$P(k=6; n ,p=0.4) = \binom{n}{6} 0.4^6 (0.6)^{(n-6)}, \quad n \in \{6,...,\infty\}.$$ This is however (remark by @Xiaomi) not a valid probability function as it does not sum to one over its suppoer. Is there a probability mass function for $n$? What is a useful (unbiased, consistent) estimator for its parameter $n$?

3

There are 3 best solutions below

3
On BEST ANSWER

As noted in Xiaomi's answer, the probability distribution

$P(k=6; n ,p=0.4) = \binom{n}{6} 0.4^6 (0.6)^{(n-6)}, \quad n \in \{6,...,\infty\}.$

fails. The problem is that it assumes the six successes occur randomly among the $n$ occurrences, but this is not true. To achieve $n$ as an outcome the sixth success must occur exactly on attempt $n$. Only the first five successes occur randomly, and they are restricted to the first $n-1$ attempts (but no need for the fifth success to occur exactly at attempt $n-1$. The correct probability distribution with these characteristics is

$$P(k=6; n ,p=0.4) = \binom{n-1}{5} 0.4^5 (0.6)^{((n-1)-5)}\color{blue}{(0.4)}, \quad n \in \{6,...,\infty\}.$$

where the blue factor forces a success on trial $n$ and the rest of the expression accounts for the proper random occurrence of the other five successes. This simplifies to

$$P(k=6; n ,p=0.4) = \binom{n-1}{5} 0.4^6 (0.6)^{(n-6)}, \quad n \in \{6,...,\infty\}.$$

which now does normalize properly and should give consistent statistical estimates.

3
On

First of all, what you've stated is not the distribution function of $n$. It's the distribution function of $X$ given parameters $n,p$. You cannot simply interchange $n$ and $k$. If it was the PMF of $n$, it would sum to $1$ over all values of $n$, and that clearly doesn't. To answer your question...

In the (very unrealistic) situation where we have a Binomial random variable $X$, the number of successes out of $n$ trials, and we know $p$ in advance, we can estimate $n$ as simply as

$$\hat{n} = \frac{X}{p}$$

The basic idea here being that we observe $X$ successes, and so to get back to $n$ we need to re-scale by $1/p$. However this entire thought process is a bit non-sensical, as a Binomial random variable is characterised as being a number of successes out of some fixed and known number of trials $n$.

An interesting question is whether this estimator is consistent. Clearly it is unbiased, since

$$E[X/p] = np/p = n$$

But for the variance, we have

$$Var(\hat{n}) = Var(X/p) = Var(X)/p^2 = np(1-p)/p^2$$

So our estimator is clearly not consistent.

1
On

The binomial distribution is the probability of having $s$ successes in $n$ trials, given that the probability of success in each trial is $p$, and the outcomes of the trials are i.i.d. (Bernoulli Trials) .

The parameter $n$ is given, so wrt this the distribution is a conditional probability and we can write $$ P\left( {s\,\left| {\,n} \right.} \right) = \left( \matrix{ n \cr s \cr} \right)p^{\,s} q^{\,n - s} = {{P\left( {s \wedge n} \right)} \over {P(n)}} $$

We want to determine the complementary conditional probability $$ P\left( {n\,\left| {\,s} \right.} \right) = {{P\left( {s \wedge n} \right)} \over {P(s)}} $$ which is a totally licit question, provided that we know $P(n)$.

Assume that $n$ is uniformly distributed over the interval $[0,N]$.
Thus $P(n)= 1/(N+1)$, and we get $$ P\left( {s \wedge n} \right) = {{\left[ {0 \le n \le N} \right]} \over {N + 1}}\binom{n}{s}p^{\,s} q^{\,n - s} $$ where $[P]$ denotes the Iverson bracket

Note that the sum of the bivariate distribution $$ \eqalign{ & \sum\limits_{0\, \le \,n\,\left( { \le \,N} \right)} {\sum\limits_{0\, \le \,s\,\left( { \le \,n} \right)} {P\left( {s \wedge n} \right)} } = {1 \over {N + 1}}\sum\limits_{0\, \le \,n\,\left( { \le \,N} \right)} {\left[ {0 \le n \le N} \right]\sum\limits_{0\, \le \,s\,\left( { \le \,n} \right)} { \binom{n}{s} p^{\,s} q^{\,n - s} } } = \cr & = {1 \over {N + 1}}\sum\limits_{0\, \le \,n\,\left( { \le \,N} \right)} {\left[ {0 \le n \le N} \right]} = 1 \cr} $$ correctly checks to be $1$.

Then the marginal distribution in $s$ will be $$ P(s) = \sum\limits_{0\, \le \,n\,\left( { \le \,N} \right)} {P\left( {s \wedge n} \right)} = {{p^{\,s} q^{\, - s} } \over {N + 1}}\sum\limits_{0\, \le \,n\, \le \,N} {\binom{n}{s}q^{\,n} } $$ and we reach to $$ P\left( {n\,\left| {\,s} \right.} \right) = {{P\left( {s \wedge n} \right)} \over {P(s)}} = \left[ {0 \le n \le N} \right]{{\binom{n}{s}q^{\,n} } \over {\sum\limits_{0\, \le \,n\, \le \,N} {\binom{n}{s}q^{\,n} } }} $$

In the limit for $N \to \infty$ the expression above converges to $$ \bbox[lightyellow] { P\left( {n\,\left| {\,s} \right.} \right) = \binom{n}{s} \, q^{\,n - s} p^{\,s + 1} }$$

The expected value and the variance for $n$ result to be: $$ \bbox[lightyellow] { \eqalign{ & E\left( {n\left| {\,s} \right.} \right) = \sum\limits_{0\, \le \,n\,} {n\binom{n}{s}q^{\,n - s} p^{\,s + 1} } = {{1 - p} \over p} + {1 \over p}s \cr & \sigma ^{\,2} = \sum\limits_{0\, \le \,n\,} {\left( {n - {{1 - p + s} \over p}} \right)^{\,2} \binom{n}{s}q^{\,n - s} p^{\,s + 1} } = {{\left( {1 - p} \right)\left( {s + 1} \right)} \over {p^{\,2} }} \cr} }$$