Loss function for regression

66 Views Asked by At

On Bishop, it is explained that the decision stage for a regression problem consists in finding a properly $y(\textbf{x})$ associated to the target $t$. To do this we must consider a Loss function $L(t,y(\textbf{x}))$ and what we do next is minimizing its expected value

$$\mathbb{E}[L] = \iint L(y(\textbf{x}),t)p(\textbf{x},t)\,d\textbf{x}\,dt$$

On this why do we considerthe joint distribution $p(\textbf{x},t)$? What I was thinking is that $t$ should not be a random variable since it is a deterministic value..

Also choosing the loss $L(y(\textbf{x},t)) = \{y(\textbf{x})-t\}^2$ how do we exactly obtain that

$$\frac{\partial\mathbb{E}[L]}{\partial y(\textbf{x})} = 2 \int \{y(\textbf{x})-t\}p(\textbf{x},t) \, dt $$ ??

Thanks!

EDIT:

Ok based on what I've read about variational calculus the functional derivative can be defined as (if $S$ is a functional)

$$\frac{d}{d\varepsilon}S[y(x)+\varepsilon g(x)]_{\varepsilon=0}$$

where $g$ is any function satisfying eventual boundary conditions. I then tried to apply this definition to the above loss getting

\begin{align} \frac{d}{d\varepsilon}L[(y(\textbf{x})+\varepsilon g(\textbf{x}), t)]_{\varepsilon = 0} &= \iint \frac{d}{d\varepsilon} \{(y(\textbf{x})+\varepsilon g(\textbf{x}))-t)\}^2p(\textbf{x},t) \, d\textbf{x}dt \\ &= \iint 2\{y(\textbf{x})-t\}g(\textbf{x})p(\textbf{x},t)\,d\textbf{x}dt \end{align}

but this of course is not equal to what it should to be.. can you please help me through these steps?

1

There are 1 best solutions below

1
On

The value of $t$ for a given ${\bf x}$ is not assumed to be fixed. For instance, see page 28 where for a given $x$, $t$ is assumed to be Gaussian with certain parameters depending on $x$. The book notes in Section 1.2.3 that it takes a Bayesian approach, where probabilities provide information about uncertainty. So, for instance, in the digit classification problem, $p(C_k|{\bf x})$ is interpreted as the level of certainty that the image ${\bf x}$ should be classified as belonging to the digit class $C_k$. Certainly there is a "correct" classification based on the intentions of the person who drew the digit, but as outside observers we only have degrees of certainty/uncertainty about classification (for instance, how certain are you in your own classification of these images?).

For your second question, there is a note next to the equation to see Appendix D which explains calculus of variations and functional derivatives. You left out the $p({\bf x},t)$ term - perhaps that's where your confusion lies?