Calculus of Variations (Pattern Recognition and Machine Learning)

343 Views Asked by At

According to Paul Sinclair this answer is incorrect.

Can anyone explain how to use the calculus of variations to show that given

$$E[L]=\int \int (y(\textbf{x})-t)^2 p(\textbf{x},t) d\textbf{x} dt$$

we have

$$ \frac{\delta E[L]}{\delta y(\textbf{x})} = 2\int(y(\textbf{x})-t)p(\textbf{x},t)dt$$

1

There are 1 best solutions below

0
On BEST ANSWER

Let $F(y) = \iint (y(x)-t)^2p(x,t)\,dxdt$. (I'm dropping the $\vec x$ notation for convenience. Just remember that $x$ is a vector, not a number).

$F$ is not a function from $\Bbb R \to \Bbb R$ that is being composed with $y$. Its definition requires the variable $y$ represent a function, not a number. Instead it is an operator on functions. So we cannot just blythely talk about "$\frac{\partial F}{\partial y}$". What does that even mean for operators like $F$? Note that the book does not use $d$ or $\partial$. Instead it uses $\delta$. In the calculus of variations, this indicates what is elsewhere called a "directional derivative".

$\frac{\delta F}{\delta y}$ when taken at a specific function $y$ is not a single number, but instead a linear operator indicating how $F$ changes when one leaves $y$ in the direction of various other functions. If $h(x)$ is an arbitrary function, then

$$\frac{\delta F}{\delta y}(h) := \lim_{\epsilon \to 0} \dfrac{F(y + \epsilon h) - F(y)}\epsilon$$

It will give different values depending on which $h$ is picked.

Now $$\begin{align}F(y + \epsilon h) &= \iint (y(x)+\epsilon h(x)-t)^2p(x,t)\,dxdt\\&= \iint (y - t)^2 + 2\epsilon(y-t)h + \epsilon^2h^2)p(x,t)\,dxdt\end{align}$$

which means $$\frac{\delta F}{\delta y}(h) = 2\iint (y(x)-t)h(x)p(x,t)\,dxdt$$

The function $y$ desired is the one where $\frac{\delta F}{\delta y}(h)$ is $0$ for all $h$. Which requires that $$\iint (y(x)-t)h(x)p(x,t)\,dxdt = 0$$

But this must hold for all $h(x)$, which includes* $h(x) = \delta(x - x_0)$, the Dirac delta function about $x_0$. But $$\int (y(x)-t)\delta(x - x_0)p(x,t)\,dx = (y(x_0)-t)p(x_0,t)$$ And therefore, for the stationary $y$, $$\int (y(x_0) - t)p(x_0,t)\,dt = 0$$ for every point $x_0$ Drop the $_0$ subscript I introduced for clarity, and you have the result.


* $\delta(x)$ is not a true function, but there are sequences of actual functions $\delta_n(x)$ such that $\lim_{n\to\infty} \int \delta_n(x)f(x)\,dx = f(0)$, which is sufficient.