Proof of expected loss in linear regression

349 Views Asked by At

While reading section 1.5.5: Loss functions for regression in PRML, I came across the following derivation that I need help with.

Notation: $$ \mathbb{E}[L] = \int \int L(t,y(x))p(x,t) \; dx \; dt$$ and $L(t, y(x))$ is chosen as $\{y(x) - t\}^2$.

Given from the calculus approach, we already know that loss is minimized when $y(x) = \mathbb{E}[t|x]$, so we can write loss as $L(t, y(x)) = \{y(x) - \mathbb{E}[t|x] + \mathbb{E}[t|x] - t\}^2$.

The textbook further mentions that after substituting for $L(t,y(x))$ and integrating over $t$ to compute $\mathbb{E}[L]$, the cross term vanishes. This is where I'm stuck and I've done the following expansion for the cross term.

\begin{align*} \int \int \{y(x)-\mathbb{E}[t|x]\}\{\mathbb{E}[t|x]-t\} p(t,x) \; dt \;dx &= \int\int y(x)\mathbb{E}[t|x] p(t,x)\;dt\;dx - \int\int t \; y(x) p(t,x)\;dt\;dx \\ & - \int\int \mathbb{E}[t|x]^2 p(t,x)\;dt\;dx + \int\int t \; y(x) p(t,x)\;dt\;dx \\ &= \int y(x)\mathbb{E}[t|x]\; p(x) dx - \int\int t \; y(x) p(t,x)\;dt\;dx \\ & - \int \mathbb{E}[t|x]^2 p(x)\;dx + \int\int t \; y(x) p(t,x)\;dt\;dx \end{align*}

But I don't see how the cross term vanishes. Could someone help me understand what I'm missing? TIA.

Edit

Here the aim is to prove that $y(x) = \mathbb{E}[t|x]$ is an optimal choice without using calculus. I have also attached below the exact reference from the textbook.

enter image description here

So, the author mentions that the cross term cancels out after substituting for the loss term and integrating over $t$ and this does not require $\mathbb{E}[t|x] = y(x)$ yet. After examining the left over two terms, we finally determine that to minimize the loss, we should have $\mathbb{E}[t|x] = y(x)$.

1

There are 1 best solutions below

3
On BEST ANSWER

You are almost there; if you use the assumption that $\mathbf{y}(\mathbf{x}) = \mathbb{E}[\mathbf{t}|\mathbf{x}]$ and substitute $\mathbf{y}(\mathbf{x})$ in your last step, you will get that the cross-term is zero.

For each of the four terms, we re-write the joint distribution $p(t, \mathbf{x})$ as $p(t|\mathbf{x})p(\mathbf{x})$ and push the integral over $t$ inside; we get following:

$$ \begin{aligned} &\int\int y(\mathbf{x})\mathbb{E}[t|\mathbf{x}] p(t,\mathbf{x}) \; \mathrm{d}t \; \mathrm{d}\mathbf{x} - \int\int t y(\mathbf{x}) p(t,\mathbf{x}) \; \mathrm{d}t \; \mathrm{d}\mathbf{x} \\ - & \int\int \mathbb{E}[t|\mathbf{x}]^2 p(t,\mathbf{x}) \; \mathrm{d}t \; \mathrm{d}\mathbf{x} + \int\int t \mathbb{E}[t|\mathbf{x}] p(t,\mathbf{x}) \; \mathrm{d}t \; \mathrm{d}\mathbf{x} \\ = & \int y(\mathbf{x})\mathbb{E}[t|\mathbf{x}] p(\mathbf{x})\left\{\int p(t|\mathbf{x}) \; \mathrm{d}t\right\} \mathrm{d}\mathbf{x} - \int y(\mathbf{x}) p(\mathbf{x})\left\{\int t p(t|\mathbf{x}) \; \mathrm{d}t\right\} \mathrm{d}\mathbf{x} \\ - & \int \mathbb{E}[t|\mathbf{x}]^2 p(\mathbf{x}) \left\{\int p(t|\mathbf{x}) \; \mathrm{d}t\right\} \mathrm{d}\mathbf{x} + \int \mathbb{E}[t|\mathbf{x}] p(\mathbf{x}) \left\{\int t p(t|\mathbf{x}) \; \mathrm{d}t\right\} \mathrm{d}\mathbf{x} \\ =& \int y(\mathbf{x})\mathbb{E}[t|\mathbf{x}] p(\mathbf{x}) \; \mathrm{d}\mathbf{x} - \int y(\mathbf{x}) \mathbb{E}[t|\mathbf{x}] p(\mathbf{x}) \; \mathrm{d}\mathbf{x} \\ -& \int \mathbb{E}[t|\mathbf{x}]^2 p(\mathbf{x}) \; \mathrm{d}\mathbf{x} + \int \mathbb{E}[t|\mathbf{x}]^2 p(\mathbf{x}) \; \mathrm{d}\mathbf{x}. \end{aligned} $$ The last equation shows that the four terms cancel each other.