How to show that $p(t|x,\mathbf x,\mathbf t)= \int p(t|x,\mathbf w)p(\mathbf w|\mathbf x, \mathbf t)d\mathbf w $

195 Views Asked by At

The following paragraph is approximately cited from Bishop's book, Pattern Recognition and Machine Learning.

In curve fitting problem, we have training data $\mathbf x$ and $\mathbf t$, along with a new test point $x$, and our goal is to predict the value of $t$. We therefore wish to evaluate the predictive distribution $p(t|x,\mathbf x,\mathbf t)$.
Where the predictive distribution is:

$$p(t|x,\mathbf x,\mathbf t)= \int p(t|x,\mathbf w)p(\mathbf w|\mathbf x, \mathbf t)d\mathbf w $$
and $\mathbf w$ is the weight vector of the polynomial.

My question is how I can prove the above relation? I know Bayes' rule and sum and product rules of probability. Specifically I know for example $p(a)=\int p(a,b)db=\int p(a|b)p(b)db$.
Thanks.

1

There are 1 best solutions below

2
On

Putting $a=t$, $b=\mathbf w$, $c=(x,\mathbf x, \mathbf t)$ into $p(a|c)=\int p(a|b,c)p(b|c)db$ yields:

$$ p(t|x,\mathbf x,\mathbf t)=\int p(t|\mathbf w, x, \mathbf x, \mathbf t)p(\mathbf w|x,\mathbf x,\mathbf t)d\mathbf w\tag1$$

If we impose the assumption that $(x,t)$ is independent of $(\mathbf x,\mathbf t)$ given $\mathbf w$, then it follows that $$p(t|\mathbf w, x, \mathbf x, \mathbf t)=p(t|x,\mathbf w)\;.$$ It remains to explain how $p(\mathbf w|x,\mathbf x,\mathbf t)$ became $p(\mathbf w|\mathbf x,\mathbf t)$. In general this is not true without additional assumptions. We can finesse the issue in this case because we are treating $x$ as a fixed test point, i.e., we are not putting a distribution on $x$, so one can just as well omit $x$ from all the equations.

Alternatively, we can bypass the need for proof by using (1) to motivate the definition of the predictive distribution as $$ p(t|x,\mathbf x,\mathbf t):= \int p(t|x,\mathbf w)p(\mathbf w|\mathbf x, \mathbf t)d\mathbf w\;. $$

If you jump ahead in Bishop, you'll see that a similar equation (3.57) in section 3.3.2 omits $x$ from the definition of the predictive distribution: $$p(t|\mathbf t,\alpha,\beta)=\int p(t|w,\beta)p(w|\mathbf t,\alpha,\beta)\,dw$$