[From PRML Bishop, p:46]
The average or expected loss function is given by $$E[L] = \int\int (y(x)-t)^2 p(x,t)\ \ dx\ \ dt$$,
where, the loss function $L = (y(x)-t)^2$, given x and the corresponding output t .
To obtain the $y(x)$ which minimizes the expected loss, $\frac {\partial E[L]} {\partial y(x)}$ is calculated and set to $0$. Which is the following:
$$\frac {\partial E[L]} {\partial y(x)}=2\int\{y(x)-t\}p(x,t)dt$$
I am facing difficulty in doing this differentiation myself. Could anyone help?
Very Thanks
It is inconvenient to use the same variable $x$ as a parameter and as a integration variable. When taking the variation $\frac{\delta E[L]}{\delta y(x)}$, we should write $E$ as
$$E[L] = \iint (y(x')-t)^2p(x',t)dx'dt$$
To compute this, we start with the rule
$$\frac{\delta y(x')}{\delta y(x)}=\delta(x-x')$$
where $\delta$ is the delta-function. Since variation acts just like a normal differentiation, we have
$$\frac{\delta [(y(x')-t)^2p]}{\delta y(x)}=2(y(x')-t)p(x',t)\frac{\delta y(x')}{\delta y(x)} = 2(y(x')-t)p(x',t)\delta(x-x')$$
where we used the chain-rule. The delta function (distribution) acts under a integral as $\int \delta(x-x')g(x')dx' = g(x)$ and this gives us
$$\frac{\delta E }{\delta y(x)} = \iint \frac{\delta[(y(x')-t)^2p(x',t)]}{\delta y(x)}dx'dt \\= \iint 2(y(x')-t)p(x',t) \delta(x-x')dx'dt = 2\int(y(x)-t)p(x,t) dt$$