Error in usage of chain rule

212 Views Asked by At

I am currently trying to understand the math used training neural network, in which gradient descent is used to extract minimize the error between the target and extracted. I am currentlying trying to understand the the error term is being computed for the output layer , and the inner layers. This is the source

So the error between the neural network output and the actual output is error term being computed as

\begin{align} c &= \frac{1}{2}(||y_N - t||)^2 \end{align}

$t$: being the target

$y_N:$ being the neural network output

$c: $ being the squared normed error

They then define

\begin{align} \delta_n = \frac{\partial c}{\partial x_n} \end{align}

$\delta_n:$ being the error gradient at layer $n$..

$x_n = w_{n-1}y_{n-1}:$ $x_n$ is the input vector at layer $ n $

So how can this definition be made?.. The only partial derivative that would be possible to do, would be with respect to $y_N$ or $t$.

Their expanded derivation:

\begin{align} \delta_n &= \frac{\partial c}{\partial x_n} \\ &= \frac{\partial c}{\partial x_{n+1}} \frac{\partial x_{n+1}}{\partial x_n} \\ &= \delta_{n+1} \frac{\partial x_{n+1}}{\partial x_n} \\ &= \delta_{n+1} \frac{\partial w_n y_n}{\partial x_n} \\ &= \delta_{n+1} \frac{\partial w_n y_n}{\partial y_n} \frac{\partial y_n}{\partial x_n} \\ &= \delta_{n+1} \frac{\partial w_n y_n}{\partial y_n} \frac{\partial f(x_n)}{\partial x_n} \\ &= \delta_{n+1} w_n f'(x_n) \end{align}

In which line 3 should not be possible to become line 4

So what form for black magic are they performing?