I read this problem where I have to minimize a functional $E[L]$ using calculus of variations, but I'm not sure what is the procedure to follow.
The functional is the expected loss:
$$E[L] = \int\int L(t, y(x))p(x,t)dxdt $$
and we want to choose a $y(x)$ to minimize $E[L]$ in order to get the following result with $L = (y(x) - t)^2$
$$y(x) = \frac{\int tp(x,t)dt}{p(x)}$$
but in doing so, I think it's ignoring the variation on $p(x,t)$. What is the proper Lagrangian for this functional and how to proceed with the minimization when there is an additional integration (in this case) in $t$.
By the way, I'm used to see Lagragians in the form $L(q(t), q'(t),t)$ where $t$ is the independent variable and we get something along the lines of (informally):
$$\delta \displaystyle \int L(q(t), q'(t),t) dt= \displaystyle \int\left(\displaystyle \frac{ \partial L}{\partial q}\delta q + \displaystyle \frac{\partial L }{\partial q'}\delta q'\right) dt$$
An additional question: Why are we using a double integral in the expected loss? Thinking about it in discrete terms, the equivalent would be:
$$E[L] = \sum_{i} \sum_{j} (y(x_{i}) - t_{j})^{2}p(x_{i},t_{j})$$
which seems a bit odd in the context of a regression since we are taking into account cross terms, right? So, I guess that at some point we are requiring a minimum distance, for example, from a point $y(x_{2})$ to $t_{1}$.
UPDATE:
I found this paper in which there is a minimization using a different functional, but it's not explained why they ignored $p(x)$.
Thanks in advance!
Despite appearances, this is really a one-dimensional variational problem. The unknown $y$ is only a function of $x$, so we can treat the integral in $t$ as a "black box", that is, just some ordinary function $F(x,y) := \int L(t,y)p(x,t)\,\mathrm dt$. Then your functional becomes $$E[L]=\iint L\big(t,y(x)\big)p(x,t)\,\mathrm dx\,\mathrm dt=\int F\big(x,y(x)\big)\,\mathrm dx,$$ and you can apply the usual Euler-Lagrange equation in $x$ and $y(x)$.
In this specific case, we can write $F$ out explicitly. Observe that $$L(t,y)p(x,t)=(y-t)^2p(x,t)=y^2p(x,t)-2typ(x,t)+t^2p(x,t),$$ so $$F(x,y)=\int L(t,y)p(x,t)\,\mathrm dt=y^2\int p(x,t)\,\mathrm dt-2y\int tp(x,t)\,\mathrm dt+\int t^2p(x,t)\,\mathrm dt\\=y^2q(x)-2yr(x)+s(x),$$ where $q(x)=\int p(x,t)\,\mathrm dt$, $r(x)=\int tp(x,t)\,\mathrm dt$, and $s(x)=\int t^2p(x,t)\,\mathrm dt$.
Heck, $F\big(x,y(x)\big)$ doesn't involve any derivatives of $y$, so we don't even need any variational calculus at all. The value of $F\big(x,y(x)\big)$ at any point $x$ is independent of the value of $y$ at any other point, so you can just choose $y(x)$ at each $x$ independently to minimize $F\big(x,y(x)\big)$ at that point. That is to say, when you find $y(1)$ to minimize $F\big(1,y(1)\big)$, you don't have to worry about what $y(2)$ or $y(1.1)$ or $y(42)$ is going to be. That makes things ridiculously easy: $$\frac{\partial}{\partial y}F(x,y)=2yq(x)-2r(x)=0\implies y=\frac{r(x)}{q(x)}=\frac{\int tp(x,t)\,\mathrm dt}{\int p(x,t)\,\mathrm dt}.$$
Anyway, I guess what you really want to understand is what the original objective $E[L]=\iint \big(y(x)-t\big)^2p(x,t)\,\mathrm dx\,\mathrm dt$ is all about. Well, clearly the term $L(t,y)=\big(y(x)-t\big)^2$ says that you want $y(x)-t$ to be small, that is, you want $y(x)$ to be close to $t$. But $t$ could be anything, while $y$ only depends on $x$! So how can we ever hope to get $y(x)$ to be close to $t$, whatever $t$ might be? The key is that for any given $x$, some values of $t$ are more likely than others — that's what $p(x,t)$ tells you. So if you know the value of $x$, then you know what values of $t$ are more likely at that $x$, and you can pick $y(x)$ to be somewhere in the middle of those values.
That's all that's really going on here. Don't think of the $p(x,t)$'s as "cross terms", think of them as weights in a weighted average (or a weighted regression, maybe). After all, the optimal value of $y(x)$ turns out to be nothing but the expected value of $t$ conditioned on $x$.