This question probably falls into the embarrassing category, but also one that is very hard to google for (I did!). What does the notation $X_\theta$ mean if $X_1$, $X_2$ and $\theta$ are random variables?
A bit more context. In the alternative proof of Thm. 2.7.3 in Cover & Thomas Elements of Information Theory they assume random variable $X_1$ and $X_2$ (with discrete distributions $p_1$ and $p_2$ respectively) over some set of values $A$. Then independently they introduce a random variable $\theta$ as follows:
$$ \theta = \begin{cases} 1 & \text{with probability } \lambda\\ 2 & \text{with probability } 1 - \lambda \end{cases} $$
and define a variable $X_\theta$ with a distribution $\lambda p_1 + (1-\lambda) p_2$. I am even confused, why this would be a distribution, if $p_1$ and $p_2$ are apparently unrelated discrete distributions?
The formulation of the sentence in the book is as if $X_\theta$ was the most obvious standard construction ever. It does not seem to be introduced in the text. While I can proceed reading the proof without putting a noun on $X_\theta$, I would greatly appreciate a hint on what do they mean.
If $X_1$, $X_2$ are random variables and $\theta$ is a random variable that only takes values in $\{1,2\}$, then - provided that all random variables are defined on the same probability space - $X_{\theta}$ is a random variable prescribed by: $$\omega\mapsto X_{\theta(\omega)}(\omega)$$
We can find its distribution by application of law of total probability:$$P(X_\theta\in B)=P(X_\theta\in B\mid \theta=1)P(\theta=1)+P(X_\theta\in B\mid \theta=2)P(\theta=2)=$$$$P(X_1\in B\mid \theta=1)P(\theta=1)+P(X_2\in B\mid \theta=2)P(\theta=2)$$
If $\theta$ is independent wrt $X_1$ and $X_2$ then $P(X_i\in B\mid \theta=i)=P(X_i\in B)$ for $i=1,2$ so that we end up with:$$P(X_\theta\in B)=P(X_1\in B)P(\theta=1)+P(X_2\in B)P(\theta=2)$$
Setting $P(\theta=1)=\lambda$ we get:$$P(X_\theta\in B)=P(X_1\in B)\lambda+P(X_2\in B)(1-\lambda)$$