Law of total probability explanation about sample space

252 Views Asked by At

enter image description here

$P(Y=0)+P(Y=1)=1$ in above diagram. Arrows represent probability $P(Y=0\, \text{or}\, 1|X=0\, \text{or}\, 1)$

To use the law of total probability, I know that to find $P(Y=0)$, we need to find the probability of the intersection of $Y=0$ with all the other events in its sample space. Why exactly does the sample space include $X=0$ and $X=1$? Isnt the sample space consisting $Y=0$ only consist of $Y=0$ and $Y=1$ since $P(Y=0)+P(Y=1)=1$?

4

There are 4 best solutions below

0
On

Simply $X$ and $Y$ are dependent random variables. To calculate the total probability one needs to consider the transition probabilities as well as the a-priori probabilities according to Bayes rule.

0
On

The law of total probability is that: $\mathsf P(Y=y) = \sum\limits_{x} \mathsf P(Y=y\mid X=x)\;\mathsf P(X=x)$

Since the support of $X$ is $\{0,1\}$:

$$\mathsf P(Y=y) \;=\; \mathsf P(Y=y\mid X=0)\,\mathsf P(X=0)+\mathsf P(Y=y\mid X=1)\,\mathsf P(X=1)$$


Why exactly does the sample space include $X=0$ and $X=1$? Isn't the sample space consisting $Y=0$ only consist of $Y=0$ and $Y=1$ since $P(Y=0)+P(Y=1)=1$?

The event of $Y=0$ is also the event of $(X,Y)\in\{(0,0),(1,0)\}$.   That is $(Y=0)\;=\;(Y=0,X=0)\cup (Y=0,X=1)$

The sample space is thus a cartesian product; it consists of $(X,Y)\in\{0,1\}\times\{0,1\}$ or equivalently $(X,Y)\in \{(0,0), (0,1), (1,0), (1,1)\}$

So you have $1 $ $ = \mathsf P(Y=0)\;+\;\mathsf P(Y=1) \\ = \mathsf P(Y=0,X=0)+\mathsf P(Y=0,X=1)\;+\;\mathsf P(Y=1,X=0)+\mathsf P(Y=1,X=1) \\ = \mathsf P(X=0)\;+\;\mathsf P(X=1)$

0
On

$Y=0$ could have happened in two different ways, meaning you can partition it like $$P(Y=0) = P(Y=0\cap X=0)+P(Y=0\cap X=1).$$ But you don't have a specific value for each term. For example you don't know the exact value of $$P(Y=0\cap X=0).$$ So, you take one more step and condition on $X$, $$P(Y=0) = P(Y=0|X=0)P(X=0)+P(Y=0|X=1)P(X=1).$$ Hence, the denominator.

6
On

More concise, if $\{A_i\}$ is any partition of your sample space $\Omega$ and $E$ is an event, which can be an $A_i$, then $P(E)=\sum_{j}P(E|A_j)P(A_j)$.

In your case, it is true that $A_1=\{w\in\Omega:Y(\omega)=0\}$ and $A_2=\{w\in\Omega:Y(\omega)=1\}$ is a partition of $\Omega$.

Then, for example, $P(A_1)=P(A_1|A_1)P(A_1)+P(A_1|A_2)P(A_2)$. But $P(A_1|A_1)=1$ and $P(A_1|A_2)=0$, cause once $A_1$ has happened, then only $A_1$ can happen since $\{A_1,A_2\}$ is a partition. Thus $P(A_1)=P(A_1)$ and there is no information.

Is for that reason that you should take a partition $\{A_i\}$ such that $E\neq A_i$ for all $i$. In your case, you could consider $E=\{w\in\Omega:Y(w)=0\}$ and $A_1=\{w\in\Omega:X(w)=0\}$ and $A_2=\{w\in\Omega:X(w)=1\}$