What assumptions are needed to "un-marginalize" this conditional distribution?

24 Views Asked by At

I want to prove the identity: $$p(y|D,x)=\int p(y|W,x)p(W|D)dW$$ using the conditional independences given by one of the following two graphical models:

enter image description here

I believe I have a proof that assumes model (A).

$$p(y|D,x)=\int p(y,W|D,x)dW=\int \frac{p(y,W,D,x)}{p(D,x)}dW$$ $$=\int \frac{p(y|W,D,x)}{p(D,x)}p(W,D,x)dW=\int p(y|W,D,x)p(W|D,x)dW$$ $$\stackrel{(A)}{=}\int p(y|W,x)p(W|D)dW$$

Is this line of reasoning valid? Is the identity provable using model (B)?

1

There are 1 best solutions below

0
On

1. Your reasoning seems valid. For a more concise argument, note that

$$ q(Y, W) = q(Y \mid W) q(W) $$

holds for any distribution $q$ for $(Y, W)$. So by choosing $q(\cdot) = p(\cdot \mid D, X)$, it follows that

\begin{align*} p(Y, W \mid D, X) &= p(Y \mid D, X, W) p(W \mid D, X) \\ &= p(Y \mid X, W) p(W \mid D), \end{align*}

where the model (A) is used in the last step. Now marginalizing both sides with respect to $W$ proves the desired equality.


2. However, the identity cannot be proved using (B) simply because we have counter-examples.

Assume otherwise that the proposed identity also holds for any distribution $p(D, W, X, Y)$ compatible with the model (B). Then it must hold for the extreme case in which the arrow $W\to Y$ does not exist.

enter image description here

That is, we should have

$$ p(Y \mid D, X) \stackrel{?}= \int p(Y \mid X) p(W \mid D) \, \mathrm{d}W = p(Y \mid X). $$

However, it is obvious that the above equality cannot hold in general by choosing a suitable counter-example.