Binomial distribution where successes increases the number of trials

158 Views Asked by At

I'm trying to calculate the probability of $k$ successes when the number of trials ($n$) increases by $6$ each time a success occurs (starting at $20$):

$$ P(X=k) = \ ? \qquad\text{where}\qquad p=.05, \quad n=20. $$

I can't just punch in $n=20+6k$ because that includes impossible scenarios like $X>0$ with no successes in the first $20$ trials. The run ends one you've reached the $(n+6k-k)$th loss.

Edit: Apologies for the lack of clarity. Hopefully these steps will help

  1. Begin run
  2. Perform $20$ trials
  3. Perform check to determine # of successes
  4. Perform $6$ additional trials for each success ($2$ successes = $12 $ add'l trials, totaling $32$ trials)
  5. Perform check to determine # of successes since previous check
  6. Perform $6$ additional trials for each success since previous check
  7. Repeat steps $5$ and $6$ until # of successes since previous check is $0$
  8. End run

See below for an attempted solution and tell me why it's wrong

$$\frac{20}{20 + 6k} \binom{20 + 6k}{k} \frac{19^{20+5k}}{20^{20+6k}}$$

4

There are 4 best solutions below

1
On

Thank you to the person who pointed out the flaw in my first approach. After some thinking, I believe this functions.

Lets try to make all of our trials the same, so that this can be a random walk. Then, we set $X_i \thicksim Bin(2,0.05)$, which is a set of 2 trials. Therefore, no matter what, we want at least 5 $X_i$. Now, we want this to be a random walk that counts how many trials we get to until we hit 0 remaining. So, if we start at 20, and let $Y_i$ be the the $i^{th}$ step on our walk. Then, The p.m.f. of $Y_i$ would be: $$P_{Y_i} = \begin{cases} -2 & \text{if $X_i$ = 0} \\ 6 & \text{if $X_i$ = 1} \\ 12 & \text{if $X_i$ = 2} \end{cases}$$ IF we add all of the $Y_i$, we would find that it models the number of trials we have remaining. To find the $\mathbb{P}(X = k)$, we would want $$\mathbb{P}(\sum_{i=1}^{5+3k}Y_i+20 = 0)$$ Under the condition that our sum never crosses below $0$.

2
On

I interpret the problem to intend that (for example) in order to have $~k = 2~$ successes (in what would have to be $~32~$ trials), you must have at least one success in the first $~20~$ trials, and at least two successes in the first $~26~$ trials.

So, when $~k = 2,~$ trials 27 through 32, inclusive must all fail.

$$\text{Let} ~~f(n,r) ~~\text{denote} ~~\binom{n}{r}(0.05)^r(0.95)^{n-r}.$$

In order to explicitly compute the probability of exactly $~k = 2~$ successes, you first reserve the factor $~f(6,0)~$ which refers to the necessary failures on trials 27 through 32, inclusive. Then, either

  • Compute $~f(26,2)~$ and subtract from this $~[f(20,0) \times f(6,2)].$
    This complementary approach signifies that both success must occur in the first $~26~$ trials but you must reject the situation where both successes occurred after trial 20.

  • Compute $~[f(20,2) \times f(6,0)] + [f(20,1) \times f(6,1)].$
    This direct approach signifies that either both successes occurred on or before trial 20, or exactly one of the two successes occurred on or before trial 20.


I used $~k = 2~$ as an example to facilitate describing the gymnastics that will be required, for larger values of $~k,~$ regardless of whether you use the complementary or direct approach.

For example, consider $~k = 5.~$ Then, there will be $~20 + (6 \times 5) = 50~$ trials, and you must have:

  • No more than $~5~$ successes ever.

  • At least 5 successes on or before trial 44.

  • At least 4 successes on or before trial 38.

  • At least 3 successes on or before trial 32.

  • At least 2 successes on or before trial 26.

  • At least 1 success on or before trial 20.


Edit
I also consider recursion to be no walk in the park here. For example, let $~P(k)~$ denote the probability of exactly $~k~$ successes, and consider trying to use the computation of $~P(2)~$ to compute $~P(3).~$

The constraints on $~P(3)~$ are very similar to the constraints on $~P(2).~$ However, the change to the constraints are:

  • The maximum number of allowable successes is $~3,~$ rather than $~2.$

  • You have the added constraint that you must have at least $~3~$ successes on or before move $~32.$

The difficulty in attempting to use $~P(2)~$ to assist in computing $~P(3)~$ is that $~P(2)~$ (in effect) has the constraint that you must have exactly $~2~$ successes on or before trial 26, while $~P(3)~$ allows the third success to occur anywhere in the range of trials 1 through 32.

0
On

Let $g(m,k)$ be the probability of $k$ successes after $m$ sets of six, without yet having reached the end. I think the following recursion holds. $$g(0,k)={20\choose k}p^k(1-p)^{20-k}\text{ if }k\gt0\\ g(m,k)=0 \text{ if }k\le m\\ g(m,k)=\sum_{i=0}^6g(m-1,k-i){6\choose i}p^i(1-p)^{6-i}\text{ if }k\gt m$$

Then $Pr(X=k)=g(k-1,k)(1-p)^6$.
The recursion is easy enough to calculate for small $k$ and specific $p$, but I can't see how to get a general formula.

2
On

Answer: $$P(X=k)=\frac{n}{n+6k}\binom{n+6k}{k}p^k(1-p)^{n+5k}$$

This is exactly what you guessed in your answer (upon setting $n=20$ and $p=1/20$).

Proof: Here is an equivalent phrasing of the problem. You maintain a counter, $N$, representing the number of trials remaining at any point of the process. Initially, $N=n$. Then,

  • Each time there is a failure, decrease $N$ by one.

  • Each time there is a success, increase $N$ by five. Each success grants six additional trials, and we subtract one from six to account for the trial containing the success.

This continues until we hit $N=0$, at which point the process stops. Please convince yourself that this is equivalent to list of steps you stated.

You want to find the probability of exactly $k$ successes. This implies that there were $n + 5k$ failures. To do this, we count up the number of valid sequences of length $n+6k$ which consist of $k$ successes and $n+5k$ failures, and multiply this number by the probability of each sequence occurring. The probability of such a sequence is $$ p^{k}(1-p)^{n+5k}. $$ All that remains is to count the number of valid sequences. A sequence is valid if and only if it consists of $k$ successes and $n+5k$ failures, and if the running total of trials remaining, $N$, never reaches zero until the very end. In other words, for each $t\in \{0,1,\dots,n+6k-1\}$, if we let $k_t$ be the number of successes up to time $t$, then it must be true that $$ n+5k_t - (t-k_t)>0 \qquad \text{for all $t\in \{0,1,\dots,n+6k-1\}$}. $$ Let us rearrange this inequality as follows: $$ n+5k-(t-k_t)>5(k-k_t) $$ If we interpret this inequality in words, we are saying that for all $t$, the number of failures AFTER time $t$ is more than $5$ times the number of successes AFTER time $t$. Note that $n+5k-(t-k_t)$ is the number of failures after time $t$, and $t-k_t$ is the number of successes after time $t$.

Therefore, if we reverse a valid sequence, we see that each initial segment of the reversed sequence has more than five times as many failures as successes. This shows that counting valid sequences is exactly solved by the following generalization of Bertrand's ballot problem.

Theorem Let $r\ge 1$ be an integer. Suppose in an election, candidate $A$ receives $a$ votes, and candidate $B$ receives $b$ votes, where $a\ge rb$. The number of ways to arrange these $a+b$ ballots in a sequence such that the running number of votes for $A$ is more than $r$ times the running number of votes for $B$ at all times is $$\frac{a-rb}{a+b}\binom{a+b}{a}.$$

Applying this theorem with $a=n+5k$, $b=k$, and $r=5$, we see that the number of valid sequences is $\frac{n}{n+6k} \binom{n+6k}{k}$. Therefore, multiplying this by the probability of each sequence, $p^k(1-p)^{n+5k}$, we arrive at the answer advertised at the beginning.

The following source contains four proofs of the theorem above, along with some information about the history of the problem:

Renault, Marc. “Four Proofs of the Ballot Theorem.” Mathematics Magazine, vol. 80, no. 5, 2007, pp. 345–52. JSTOR, http://www.jstor.org/stable/27643060.