Assignment of initial probability values

228 Views Asked by At

Suppose a coin is tossed until a head is observed for the first time. It is given that the coin lands heads with probability $p$ and tails with probability $1-p$. Based on only this information, can we rigorously model a discrete probability space without any further assumptions?

Here's an attempt to do so. We consider all possible finite-length outcomes of this experiment, $$\{H, TH, TTH, TTTH, TTTTH, \dots\}$$ as well as the single infinite-length outcome $TTTTTTTT\dots$

The outcomes form a countable set. Now the problem is assigning the initial probability values to the outcomes. The only information we are given is that the coin lands heads with probability $p$, but this is irrelevant to our experiment here because the sample space is not $\{H,T\}$. Is it true that we have to make an assumption that the given probability is a conditional probability? And then we calculate

$$\begin{eqnarray*}P(\text{outcome starts with }H) &=& P(\text{1st toss is heads}) = p\\ P(\text{outcome starts with }T) &=& P(\text{1st toss is tails}) = 1-p\\ P(\text{outcome starts with }TH) &=& P(\text{2nd toss is heads | outcome starts with }T)\\&&\times P(\text{outcome starts with }T)\\ &=& p(1-p)\\ \end{eqnarray*}$$

We can determine that $P(TH) = P(\text{outcome starts with }TH) = p(1-p)$ since it is the only outcome that starts with $TH$. By induction we can find the initial probabilities for all the finite-length outcomes. By checking that they sum to $1$ we can derive that the probabilities of the infinite-length outcomes must be $0$. So we have completely modeled our experiment.

But is this assuming a lot? We are only given that a coin lands heads with probability $p$ in an experiment with sample space $\{H,T\}$. From here we assume, for all $k$, for all length $k-1$ strings $s$ not containing $H$, that $$P(k\text{th toss is heads | outcome starts with $s$}) = p.$$ Is this assumption by convention, or am I doing something wrongly? How exactly do we interpret "the coin lands heads with probability $p$ and tails with probability $1-p$"? What exactly is going on here?

EDIT: I just realized that the original experiment had uncountable sample space, so I changed the experiment. But the gist of the question still remains.

1

There are 1 best solutions below

1
On BEST ANSWER

Yes, the conditional probability that the coin will land with it's head up at $k$-th flip given the results of previous $k - 1$ flips equals simply the probability it will land with head up on the $k$-th flip, since outcomes of each flip are considered mutually independent (the coin has no memory, and outcomes of previous flips can in no way influence what will happen during the next flip).

When you ask what probability should be assigned to an outcome $T \ldots TH$, in the context of your question it reduces to:

What is the probability that $k$ flips of a coin produce sequence $T \ldots TH$?

Whenever, in mathematical context (e.g. in book on probability) one encounters something like "a coin is flipped $k$ times", it is a shorthand for "let us consider finite probability space $$\mathcal{B}_k = (\{H, T\}^k, 2^{\{H, T\}^k}, P_k)$$ where probability measure $P_k$ is given by $P_k (x_1 \ldots x_k) = P_1 (x_1) \cdots P_1 (x_k)$, with $P_1 (H) = (1 - P_1 (T)) = p$. If $$E_i (x) =\{x_1 \ldots x_n \in \{H, T\}^k | x_i = x\}$$ is an event that the $i$-th flip produced an outcome $x$, you can show that $$P_k (E_i (H)) = p, \hspace{1em} i = 1, \ldots, k$$ $$P_k (E_{i_1} (x_1) \cap \cdots \cap E_{i_r} (x_r)) = P_k (E_{i_1} (x_1)) \cdots P_k (E_{i_r} (x_r)), \hspace{1em} 1 \leqslant i_1 < \ldots < i_r < k$$ which means that events $E_1 (x_1), \ldots, E_k (x_k)$ will be mutually independent. It is easy to see that $P_k$ is only such measure, and this is why we have to set $P (T \ldots TH) = (1 - p)^{k - 1} p$.

If, on the other hand, mathematical book says "let us consider a sequence of infinitely many coin tosses", that is a shorthand for "let us consider probability space $$\mathcal{B}_{\infty} = (\{H, T\}^{\mathbb{N}}, \mathcal{B}(\{H, T\}^{\mathbb{N}}), P_{\infty})$$ where $\{H, T\}^{\mathbb{N}}$ is (uncountable) set of all infinite sequences of elements from $\{H, T\}$, and $P_{\infty}$ is probability measure, defined on the Borel $\sigma$-algebra of measurable sets $\mathcal{B}(\{H, T\}^{\mathbb{N}})$ such that $$P_k (E_i (H)) = p, \hspace{1em} i = 1, 2, \ldots$$ $$P_k (E_{i_1} (x_1) \cap \cdots \cap E_{i_r} (x_r)) = P_k (E_{i_1} (x_1)) \cdots P_k (E_{i_r} (x_r)), \hspace{1em} 1 \leqslant i_1 < i_2 < \cdots$$

where $E_i (x) =\{x_1 x_2 \ldots \in \{H, T\}^{\mathbb{N}} | x_i = x\}$ is again an event that $i$-th flip produces outcome $x$. We have now entered the realm of measure theory, and everything is much more complex. There is no such measure on the set of all subsets of $\{H, T\}^{\mathbb{N}}$ so we have to restrict our attention only to the Borel $\sigma$-algebra $\mathcal{B}(\{H, T\}^{\mathbb{N}})$ of measurable sets — but when we do so, one can show that there is only one such measure $P_{\infty}$.

Equipped with this space $\mathcal{B}_{\infty}$, we can identify each member $T \ldots TH$ from the sample space in your original question with the set $$E_{T \ldots TH} =\{T \ldots THx_{k + 1} x_{k + 2} \ldots \in \{H, T\}^{\mathbb{N}} | x_{k + 1} x_{k + 2} \ldots \in \{H, T\}^{\mathbb{N}} \}$$ of all sequences starting with $T \ldots TH$. We are grouping together all the possible evolutions of an experiment of tossing coins, after it has been stopped due to the appearance of the first head; in this way we have partitioned the sample space $\{H, T\}^{\mathbb{N}}$ into sets $E_H, E_{TH}, E_{TTH}, \ldots$ and $\{TTT \ldots\}$. Since $$E_{T \ldots TH} = E_1 (T) \cap E_2 (T) \cap \cdots \cap E_k (H)$$ we now directly get $P_{\infty} (E_{T \ldots TH}) = (1 - p)^{k - 1} p$.