How to model the probability function in a “weighted” sum of conditional means?

134 Views Asked by At

Assume $X$, $Y_1$ and $Y_2$ are normally distributed random variables. Define $Z$ as $Y_1+Y_2$, but $Z$ is either $Y_1$ or $Y_2$: in other words, $Y_1$ and $Y_2$ are mutually exclusive and there’s uncertainty about which, among the two, $Z$ is. I want to compute $E[X|Z]$.

The conditional mean should be:
\begin{align} E[X|Z]&=E[X]+\frac{\operatorname{Cov}[X,Z]}{\operatorname{Var}[Z]}(Z-E[Z])\\&=p(Z)E[X|Z=Y_1]+(1-p(Z))E[X|Z=Y_2]\\&=p(Z)\left(E[X]+\frac{\operatorname{Cov}[X,Y_1]}{\operatorname{Var}[Y_1]}(Y_1-E[Y_1])\right)+(1-p(Z))\left(E[X]+\frac{\operatorname{Cov}[X,Y_2]}{\operatorname{Var}[Y_2]}(Y_2-E[Y_2])\right) \end{align}

My only unknown is the probability $p(Z)$: how do I model it? The probability $p(Z)$ depends on the realization of $Z$. For example, is it a linear function in $Z$? Why or why not?

I want to capture the idea that there’s a probability $p$ to observe $Y_1$ and a probability $1-p$ of observing $Y_2$, but $p$ itself actually depends on the “size” of the observation $Z$ (e.g., the larger $Z$ is the more likely it is that $Z$ is $Y_1$ rather than $Y_2$ or similar lines of reasoning). How to model this idea in the simplest way?

2

There are 2 best solutions below

11
On BEST ANSWER

I'm not sure this will answer your question, but I hope you can get some hints on how to formulate your problem properly and on how to tackle it.

Let's start by introducing some notation: let $X\sim \mathcal{N}(\mu_X, \sigma^2_X)$, i.e. $X$ is a normal random variable with mean $\mu_X$ and variance $\sigma_X^2$. Similarly, $Y_1\sim \mathcal{N}(\mu_1, \sigma_1^2)$ and $Y_2\sim \mathcal{N}(\mu_2, \sigma_2^2)$. Also, let $\rho_{X,1}$ and $\rho_{X,2}$ be the correlation coefficients between, respectively, $X$ and $Y_1$, $X$ and $Y_2$, so that, for instance $\operatorname{Cov}[X, Y_1]=\rho_{X,1}\sigma_X\sigma_1$.

Next, we have a new random variable $Z$, which we define as $$ Z = BY_1 + (1-B)Y_2 $$ where $B\in\{0,1\}$. Although I'm not completely sure, what I understand from your comment is that you'd like a way to infer the value of $B$ from the observation of $Z$ - let's see what we can do.

A general framework for this type of problems, for if you want to look into it, is that of hypothesis testing and detection theory. We start by defining two hypotheses, the null hypothesis $H_0$ and the alternative hypothesis $H_1$, as \begin{align} H_0 &: B=0 \text{ or, equivalently, } Z=Y_2 \\ H_1 &: B=1 \text{ or, equivalently, } Z=Y_1. \end{align} Then, after observing $Z$, we'd like to make an educated guess as to whether $H_0$ is true or $H_1$ is. Keep in mind that, for this problem to make sense, we're also assuming that the distributions of $Y_1$ and $Y_2$ are different - just to focus on a simple practical case, let's say that $\mu_1 > \mu_2$ but $\sigma_1^2 = \sigma_2^2=\sigma^2$.

Case 1: $B$ is a parameter

In the first case we are going to consider, $B$ is just a parameter, meaning that it is fixed to either $0$ or $1$, although we don't know its value. In this case, a common approach is to look at the likelihood function \begin{align} L(b|Z=z)=p_{Z;B=b}(z) \end{align} where $p_{Z;B=b}(z)$ is the probability density function (pdf) of $Z$ when $B=b$. Important: note that the likelihood is a function of $b$, while $z$ is just a parameter - in other words, for each value of $z$, you have a different (likelihood) function of $b$. In our case, this function takes the values \begin{align} L(0|Z=z) &= p_{Z;B=0}(z) = p_{Y_2}(z) = \frac{1}{\sqrt{2\pi\sigma_2^2}}\exp\Bigl(-\frac{1}{2\sigma_2^2}(z-\mu_2)^2\Bigr) \\ L(1|Z=z) &= p_{Z;B=1}(z) = p_{Y_1}(z) = \frac{1}{\sqrt{2\pi\sigma_1^2}}\exp\Bigl(-\frac{1}{2\sigma_1^2}(z-\mu_1)^2\Bigr). \end{align} If $L(0|Z=z) > L(1|Z=z)$, we decide that $H_0$ is the true hypothesis and that $Z=Y_2$, otherwise we take $H_1$ as the true hypothesis and assume $Z=Y_1$. In practice, we can speed up things and look at the sign of the log-likelihood ratio (recall we are assuming $\mu_1>\mu_2$ and $\sigma_1^2=\sigma_2^2=\sigma^2$) $$ \log \frac{L(1|Z=z)}{L(0|Z=z)} = \frac{\mu_1-\mu_2}{\sigma^2}\Bigl(z-\frac{\mu_1+\mu_2}{2}\Bigr) $$ which tells us that we should pick $H_1$ and $Z=Y_1$ when $z > (\mu_1+\mu_2)/2$.

As a side note, in this "parametric" case, we have \begin{align} E[X|Z]&=\begin{cases} E[X|Y_1] &\text{if $B=1$} \\ E[X|Y_2] &\text{if $B=0$} \end{cases} \\ &=\begin{cases} \mu_X + \rho_{X,1}\frac{\sigma_X}{\sigma_1}(Y_1-\mu_1) &\text{if $B=1$} \\ \mu_X + \rho_{X,2}\frac{\sigma_X}{\sigma_2}(Y_2-\mu_2) &\text{if $B=0$.} \end{cases} \end{align}

Case 2: $B$ is a Bernoulli random variable

For the second case, we assume $B$ is a Bernoulli random variable independent of $Y_1$ and $Y_2$, with $\Pr[B=1]=p$ and $\Pr[B=0]=q=1-p$. Then, we can still apply the maximum likelihood approach of before (now, even though the final expression is the same, it is common to use the notation $p_{Z|B=b}(z)$ instead of $p_{Z;B=b}(z)$, just to emphasize that we are looking at the pdf of $Z$ conditioned to the random variable $B$ as opposed to fixing the value of a parameter). However, by doing this, we will not take into account what we know about $B$, and a maximum a posteriori (MAP) approach may be a better choice: we decide that $H_1$ is the correct hypothesis and $B=1$ after observing $Z=z$ if $$ \Pr[B=1|Z=z] > \Pr[B=0|Z=z]. \tag{1}\label{1} $$ The a posteriori probabilities are nothing else than the probability of $B$ being either $0$ or $1$ conditioned on the observation $Z=z$. Luckily, Bayes' theorem allows us to compute them very easily: $$ \Pr[B=b|Z=z]=\frac{p_{Z|B=b}(z)\Pr[B=b]}{p_Z(z)}. $$ Computing the marginal pdf $p_Z(z)$ may be difficult but, as you can see, it will appear on both sides of (\ref{1}) and our MAP test can be written as $$ p_{Z|B=1}(z)\Pr[B=1] > p_{Z|B=0}(z)\Pr[B=0] $$ or, rearranging and taking the logarithm $$ \log \frac{p_{Z|B=1}(z)}{p_{Z|B=0}(z)}=\frac{\mu_1-\mu_2}{\sigma^2}\Bigl(z-\frac{\mu_1+\mu_2}{2}\Bigr) > \log\frac{q}{p} = \log\frac{\Pr[B=0]}{\Pr[B=1]}. $$ In other words, we decide for $H_1$ (that is $B=1$ and $Z=Y_1$) if $$ z > \frac{\mu_1+\mu_2}{2} + \frac{\sigma^2}{\mu_1-\mu_2}\log\frac{q}{p}. $$ When $p=q=1/2$, this approach is equivalent to the previous one. On the contrary, when $p\ne q$, we see that it suggests to move the threshold to the left (if $q<p$) or to the right (if $q>p$) - can you see why?

For completeness, it's worth mentioning here too that \begin{align} E[X|Z] &= \Pr[B=1]E[X|Z, B=1] + \Pr[B=0]E[X|Z, B=0] \\ &= p E[X|Y_1] + q E[X|Y_2] \\ &= \mu_X + p \rho_{X,1}\frac{\sigma_X}{\sigma_1}(Y_1-\mu_1) + q\rho_{X,2}\frac{\sigma_X}{\sigma_2}(Y_2-\mu_2) \end{align} very similar to what you had already derived.

Last remarks

  • This is just a very quick introduction to hypothesis testing. Other questions arise from here - for instance, what's the probability of making the correct decision? Or what's the probability of accepting $H_1$ when $H_0$ is the true hypothesis (false alarm)? I invite you to look into the subject.
  • Our initial, simple assumptions about $Y_1$ and $Y_2$ are that they are both normal, with the same variance but with $E[Y_1] > E[Y_2]$. Starting from there, we showed, with two different approaches, that we can pick a threshold and guess that $Z=Y_1$ when the observed value $z$ of $Z$ is larger than the threshold. If this is not enough for you, you can make the model more complex by having different variances $\sigma_1^2 \ne \sigma_2^2$ or by assuming that $B$ is not independent of $Y_1$ and $Y_2$.
  • Note that, in any case, the problem of inferring $H_0$ or $H_1$ based on the observation $Z=z$ is disjoint from computing $E[X|Z]$, which only depends on the starting model. Also, just in case, note that the problem of expressing $E[X|Z]$ (a random variable) is different from the one of computing $E[X|Z=z]$ (a deterministic quantity), which does depend on the observation - but you still need a good initial model to work it out.

Update

To complete the answer by satisfying this request, under the model of case 2 we have \begin{align} E[X|Z=z] &= \Pr[B=1]E[X|Z=z, B=1] + \Pr[B=0]E[X|Z=z, B=0] \\ &= p E[X|Y_1=z] + q E[X|Y_2=z] \\ &= \mu_X + p \rho_{X,1}\frac{\sigma_X}{\sigma_1}(z-\mu_1) + q\rho_{X,2}\frac{\sigma_X}{\sigma_2}(z-\mu_2) \end{align} while, for the variance, \begin{align} \operatorname{Var}[X|Z=z] &= E\bigl[(X-\mu_X)^2|Z=z\bigr] \\ &= p E\bigl[(X-\mu_X)^2|Z=z, B=1\bigr] + q E\bigl[(X-\mu_X)^2|Z=z, B=0\bigr] \\ &= p \operatorname{Var}\bigl[X|Y_1=z\bigr] + q \operatorname{Var}\bigl[X|Y_2=z\bigr] \\ &= p(1-\rho_{X,1}^2)\sigma_X^2 + q(1-\rho_{X,2}^2)\sigma_X^2 \\ &= (1-p\rho_{X,1}^2-q\rho_{X,2}^2)\sigma_X^2. \end{align} This follows from well-known results about the conditional distribution of normal random variables: $$ (X|Y_i=z) \sim \mathcal{N}\Bigl(\mu_X + \rho_{X,i}\frac{\sigma_X}{\sigma_i}(z-\mu_i), (1-\rho_{X,i}^2)\sigma_X^2\Bigr). $$

1
On

That $Y_1$ and $Y_2$ are normally distributed and mutually exclusive is contradictory: $$1=P(Y_1\in R \cap Y_2\in R)=P((Y_1=0\cup Y_1\neq 0) \cap (Y_2=0\cup Y_2\neq 0))=$$ $$=P(Y_1=0 \cap Y_2=0)+P(Y_1=0 \cap Y_2\neq 0)+P(Y_1\neq 0 \cap Y_2=0)+ P(Y_1\neq 0 \cap Y_2\neq 0)=0$$