Minimization of Expected Value

Question

Minimization of Expected Value

288 Views Asked by Bumbble Comm At 03 Apr 2026 - 6:34

I'd like to know how I can minimize, with respect to $\hat{y}(x)$, $$ \DeclareMathOperator{\Tr}{Tr} \mathbb{E}_{p(x,y)}[(\hat{y}(x)-y)^2 + (\hat{y}(x)-y)\Tr(\nabla^2_x\hat{y}(x)) + ||\nabla_x\hat{y}(x)||^2_2], $$ where $x$ is a vector and $y, \hat{y}(x)$ scalars.

I googled for functional derivatives and I read a little about the Calculus of Variations. I now know how to minimize functionals of the form $$ \theta(y(t)) = \int_0^T F(t,y(t),y'(t))dt $$ by using the Euler equation $$ F_y - \frac{d F_{y'}}{dt} = 0 $$ The problem is that the expression above has a completely different form: $$ \int_Y\left[\int_X F(\hat{y}(x),\nabla_x\hat y(x),\nabla_x^2\hat y(x))dx\right] dy $$ We have a gradient, a Hessian and $x$ is a vector!

Is there a general method to minimize functionals like this?

Note: I found this problem in a book about Deep Learning (see page 216 of this).

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

We can minimize the functional by computing the Gâteaux derivative (a generalization of the directional derivative) and equate it to zero.

It's important to note that the derivative must be zero for every choice of $h(x)$. $$ \DeclareMathOperator{\Tr}{Tr} \theta(\hat y) = \mathbb{E}_{p(x,y)}[(\hat y - y)^2+\nu(\hat y-y)\Tr(\nabla_x^2\hat y) + \nu||\nabla_x\hat y||^2] $$ $$ d_h\theta(\hat y) = \left.\frac{d}{d\alpha}\theta(\hat y + \alpha h)\right|_{\alpha=0} = $$ $$ \left.\frac{d}{d\alpha}\mathbb{E}[(\hat y+\alpha h-y)^2 + \nu(\hat y + \alpha h-y)\Tr(\nabla_x^2\hat y + \alpha\nabla_x^2 h) + \nu||\nabla_x\hat y + \alpha\nabla_x h||^2]\right|_{\alpha=0} = $$ $$ \left.\mathbb{E}\left[2(\hat y+\alpha h-y)h + \nu\frac{d}{d\alpha}(...)\right]\right|_{\alpha=0} = \mathbb{E}[2(\hat y-y)h + O(\nu)] = 0 \iff $$ $$ \mathbb{E}[(\hat y - y)h + O(\nu)] = 0 $$ Let's rewrite the expectation more explicitly: $$ \int_X\int_Y p(x,y)((\hat y(x)-y)h(x) + O(\nu))dy dx = $$ $$ \int_X h(x)\left[\int_Y p(x,y)((\hat y(x)-y) + O(\nu))dy\right]dx = 0 \iff $$ $$ \int_Y p(x,y)((\hat y(x)-y) + O(\nu))dy = 0 $$ $$ p(x)(\hat y(x) + O(\nu))\int_Y p(y|x)dy = p(x)\int_Y p(y|x)y dy $$ $$ \hat y(x) + O(\nu) = \mathbb{E}_{p(y|x)}[y] $$ $$ \hat y(x) = \mathbb{E}_{p(y|x)}[y] + O(\nu) $$ which is the result in the book.

Minimization of Expected Value

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in EXPECTATION

Related Questions in CALCULUS-OF-VARIATIONS

Trending Questions

Popular # Hahtags

Popular Questions