I am currently trying to understand the math used training neural network, in which gradient descent is used to extract minimize the error between the target and extracted. I am currentlying trying to understand the the error term is being computed for the output layer , and the inner layers. This is the source
So the error between the neural network output and the actual output is error term being computed as
\begin{align} c &= \frac{1}{2}(||y_N - t||)^2 \end{align}
$t$: being the target
$y_N:$ being the neural network output
$c: $ being the squared normed error
They then define
\begin{align} \delta_n = \frac{\partial c}{\partial x_n} \end{align}
$\delta_n:$ being the error gradient at layer $n$..
$x_n = w_{n-1}y_{n-1}:$ $x_n$ is the input vector at layer $ n $
So how can this definition be made?.. The only partial derivative that would be possible to do, would be with respect to $y_N$ or $t$.
Their expanded derivation:
\begin{align} \delta_n &= \frac{\partial c}{\partial x_n} \\ &= \frac{\partial c}{\partial x_{n+1}} \frac{\partial x_{n+1}}{\partial x_n} \\ &= \delta_{n+1} \frac{\partial x_{n+1}}{\partial x_n} \\ &= \delta_{n+1} \frac{\partial w_n y_n}{\partial x_n} \\ &= \delta_{n+1} \frac{\partial w_n y_n}{\partial y_n} \frac{\partial y_n}{\partial x_n} \\ &= \delta_{n+1} \frac{\partial w_n y_n}{\partial y_n} \frac{\partial f(x_n)}{\partial x_n} \\ &= \delta_{n+1} w_n f'(x_n) \end{align}
In which line 3 should not be possible to become line 4
So what form for black magic are they performing?