In Bishop's book [1] they show that the optimal y(x) w.r.t. squared error loss function
$$E[L]=\int \int \{y(x)-t\}^2p(x,t)dxdt $$
is given by a conditional expectation $y(x) = E_t[t|x]$. However, in the derivation of this result, they just say "using the calculus of variations we have":
$$ \frac{\delta E[L]}{\delta y(x)} = 2 \int \{y(x)-t\}p(x,t)dt $$
How to use calculus of variations to come to this result? Are there any simple rules to use like for the ordinary differentiation wrt to variables?
Thanks!
[1] Pattern Recognition and Machine Learning (p. 46)
The book mentioned that $y(x)$ is "completely flexible", hence I suppose $y(x)$ is a minimizer of the functional over all continuous functions. Let $\phi(x)$ be a continuous test function, then $y(x)+s\phi(x)$ is also in the admissible set (that is a continuous function in this case).
Hence $g(s)=\int \int \{y(x)+s\phi(x)-t\}^2p(x,t)dxdt $ obtains local minimum at $s=0$. Differentiate w.r.t $s$,
$$g'(s)=\int \int 2\{y(x)+s\phi(x)-t\}p(x,t)\phi(x)dxdt=$$ $$=\int \int 2\{y(x)+s\phi(x)-t\}p(x,t)dt\phi(x)dx$$
Hence $$0=g'(0)=\int \int 2\{y(x)-t\}p(x,t)dt\phi(x)dx$$ for all continous function $\phi(x)$.
Thus we have $\int 2\{y(x)-t\}p(x,t)dt=0$.