Derivative of Mean Square Error Function with respect to output

59 Views Asked by At

I'm trying to understand the gradient derivation for the back-propagation algorithm.

I'm having trouble computing the explicit derivative of the Loss Mean Square Error function with respect to the output value in a regression setting. I only have one output neuron.

Let,

  • $n$ be the number of training examples
  • $ y_i $ be the predicted target for training example $x_i$
  • $ t_i $ be the actual target value (from train data) for training example $x_i$
  • $ L_i $ be the loss for sample $i$

I'm using the following definition of the loss function,

$$ E = \frac{1}{2n} \sum_{i=1}^{n} L_i = \frac{1}{2n} \sum_{i=1}^{n} \frac{1}{2} ( y_i - t_i)^2 $$

how do I compute, $\frac{\partial E}{\partial y}$ ?

This is in a neural network setting, so $E$ is a function of $w$ the weights, in the Bishop Book equation (5.11) is, as far as I can see, the same expression except that it's not divided by $n$,

$$ E(w) = \frac{1}{2} \sum_{i=1}^n (y(x_i, w) - t_i)^2 $$ So here $y$ is a function which depends on $x_i$ and $w$, so writing,

$$ \frac{\partial E}{\partial y} $$ means deriving by a function ??

And yet Bishop does this at equation (5.19),

$$ \frac{\partial E}{\partial y_k} = y_k − t_k $$

Where $y_k$ is the output of the kth neuron and $t_k$ the actual target value, but where are the instances gone ? They've dissapeared from the equation ! $y_k$ is predicted for an input $x$ !

I don't understand the nature of $y$ and why it's legal to derive E with respect to it.

Thanks for any help.