I am currently trying to prove the forgetfulness property of geometric distributions (in neutral language: modelling the number of tails before the first heads in a sequence of coin flips) by showing the following: P(X = n + k | X >= n) = P(X = k). I have already proved the property (for a geometric distribution where P(X) = p^n(1-p)) but I am having troubling wrapping my head around the logic of one of the steps (bolded below).
P(X = n + k | X >= n) = P(X = n + k, X >= n) / P(X >= n)
= P(X = n + k)/P(X >= n)
= p^(n+k) * (1-p) / P(X >= n)
= p^(n+k) * (1-p) / [sum(p^k * (p-1)), k=n to infinity]
= p^(n+k) * (1-p) / p^n
= p^k * (1-p)
= P(X = k)
I understand that P(X >= n) is just the sum of an infinite geometric series so you can calculate it and obtain p^n (this is straightforward). However, could someone explain to me, in words, why the sum of the probabilities for X >= n is p^n, the probability of obtaining n tails in a row? I just don't understand why this is the case. Someone told me that it is p^n because all events in X >= n contain sequences that start off n tails but this explanation doesn't make much sense to me. Thank you.
In any sequence of heads and tails, one cannot simultaneously have any two of
because each pair of these differ. That is, these are disjoint events -- if you find yourself in a universe described by one of these, you are not simultaneously in a universe described by another of these. This means we add the probabilities of these outcomes to find the total probability that any of these outcomes occurs. (If these were not disjoint, we would need to perform an inclusion-exclusion analysis on their overlaps so that we would not overcount the total probability.) These are $P(X = n)$, $P(X = n+1)$, ... and we have argued $$ P(X \geq n) = \sum_{i=n}^{\infty} P(X=i) \text{.}$$
The one thing these all have in common is they start with $n$ tails in a row. In fact, every possible way to start with $n$ tails in a row is in that list. No sequence of coin flip outcomes starting with fewer than $n$ tails in a row is on that list. So collectively, these count the probability of all the ways to start with $n$ tails and then do anything afterwards, which is to say, these count the probability of all the ways to start with $n$ tails. Since each tail independently has probability $p$, we have $$ P(X \geq n) = p^n \text{.} $$