Subscript notation for combining random variables (Cover proof of theorem 2.7.3)

342 Views Asked by At

This question probably falls into the embarrassing category, but also one that is very hard to google for (I did!). What does the notation $X_\theta$ mean if $X_1$, $X_2$ and $\theta$ are random variables?

A bit more context. In the alternative proof of Thm. 2.7.3 in Cover & Thomas Elements of Information Theory they assume random variable $X_1$ and $X_2$ (with discrete distributions $p_1$ and $p_2$ respectively) over some set of values $A$. Then independently they introduce a random variable $\theta$ as follows:

$$ \theta = \begin{cases} 1 & \text{with probability } \lambda\\ 2 & \text{with probability } 1 - \lambda \end{cases} $$

and define a variable $X_\theta$ with a distribution $\lambda p_1 + (1-\lambda) p_2$. I am even confused, why this would be a distribution, if $p_1$ and $p_2$ are apparently unrelated discrete distributions?

The formulation of the sentence in the book is as if $X_\theta$ was the most obvious standard construction ever. It does not seem to be introduced in the text. While I can proceed reading the proof without putting a noun on $X_\theta$, I would greatly appreciate a hint on what do they mean.

2

There are 2 best solutions below

1
On BEST ANSWER

If $X_1$, $X_2$ are random variables and $\theta$ is a random variable that only takes values in $\{1,2\}$, then - provided that all random variables are defined on the same probability space - $X_{\theta}$ is a random variable prescribed by: $$\omega\mapsto X_{\theta(\omega)}(\omega)$$

We can find its distribution by application of law of total probability:$$P(X_\theta\in B)=P(X_\theta\in B\mid \theta=1)P(\theta=1)+P(X_\theta\in B\mid \theta=2)P(\theta=2)=$$$$P(X_1\in B\mid \theta=1)P(\theta=1)+P(X_2\in B\mid \theta=2)P(\theta=2)$$

If $\theta$ is independent wrt $X_1$ and $X_2$ then $P(X_i\in B\mid \theta=i)=P(X_i\in B)$ for $i=1,2$ so that we end up with:$$P(X_\theta\in B)=P(X_1\in B)P(\theta=1)+P(X_2\in B)P(\theta=2)$$

Setting $P(\theta=1)=\lambda$ we get:$$P(X_\theta\in B)=P(X_1\in B)\lambda+P(X_2\in B)(1-\lambda)$$

1
On

The distribution of Xθ is interpreted as the probability of obtaining a particular X value given a theta that depends on lambda. Hope that helps.