logic behind the proof for forgetfulness of geometric distributions

95 Views Asked by At

I am currently trying to prove the forgetfulness property of geometric distributions (in neutral language: modelling the number of tails before the first heads in a sequence of coin flips) by showing the following: P(X = n + k | X >= n) = P(X = k). I have already proved the property (for a geometric distribution where P(X) = p^n(1-p)) but I am having troubling wrapping my head around the logic of one of the steps (bolded below).

P(X = n + k | X >= n) = P(X = n + k, X >= n) / P(X >= n)

= P(X = n + k)/P(X >= n)

= p^(n+k) * (1-p) / P(X >= n)

= p^(n+k) * (1-p) / [sum(p^k * (p-1)), k=n to infinity]

= p^(n+k) * (1-p) / p^n

= p^k * (1-p)

= P(X = k)

I understand that P(X >= n) is just the sum of an infinite geometric series so you can calculate it and obtain p^n (this is straightforward). However, could someone explain to me, in words, why the sum of the probabilities for X >= n is p^n, the probability of obtaining n tails in a row? I just don't understand why this is the case. Someone told me that it is p^n because all events in X >= n contain sequences that start off n tails but this explanation doesn't make much sense to me. Thank you.

2

There are 2 best solutions below

2
On BEST ANSWER

In any sequence of heads and tails, one cannot simultaneously have any two of

  • $n$ tails then first head
  • $n+1$ tails then the first head
  • $n+2$ tails then the first head
  • ...

because each pair of these differ. That is, these are disjoint events -- if you find yourself in a universe described by one of these, you are not simultaneously in a universe described by another of these. This means we add the probabilities of these outcomes to find the total probability that any of these outcomes occurs. (If these were not disjoint, we would need to perform an inclusion-exclusion analysis on their overlaps so that we would not overcount the total probability.) These are $P(X = n)$, $P(X = n+1)$, ... and we have argued $$ P(X \geq n) = \sum_{i=n}^{\infty} P(X=i) \text{.}$$

The one thing these all have in common is they start with $n$ tails in a row. In fact, every possible way to start with $n$ tails in a row is in that list. No sequence of coin flip outcomes starting with fewer than $n$ tails in a row is on that list. So collectively, these count the probability of all the ways to start with $n$ tails and then do anything afterwards, which is to say, these count the probability of all the ways to start with $n$ tails. Since each tail independently has probability $p$, we have $$ P(X \geq n) = p^n \text{.} $$

0
On

$\mathsf P(X=n) ~=~p^n(1-p)\mathbf 1_{n\in\Bbb N}$ where $p$ is the rate of "tails" results among a sequence of iid coin tosses, means that $X$ is the count of tails before the first head.

However, could someone explain to me, in words, why the sum of the probabilities for $X \geqslant n$ is $p^n$, the probability of obtaining n tails in a row?

If you will obtain $n$ tails in the first $n$ tosses, then it shall take at least those $n$ tosses before the first head.

If it will take at least $n$ tosses before the first head, then the first $n$ tosses shall all be tails.

The descriptions are equivalent; they are the same event.

So the probability for taking at least $n$ tosses before the first head is $p^n$, since this is the probability for obtaining $n$ consequetive tails among the first $n$ tosses.


In symbols, it is due to the Geometric Series: $$\sum_{k=n}^\infty p^k(1-p)~{=~p^n(1-p)\sum_{j=0}^\infty p^j\\=p^n(1-p)\tfrac1{1-p}\\=p^n}$$