I was studying some neural networks back propagation from
http://jeremykun.com/2012/12/09/neural-networks-and-backpropagation/
I did not catch how did he achieve this part.
Could someone explain to me?
Thanks!
I was studying some neural networks back propagation from
http://jeremykun.com/2012/12/09/neural-networks-and-backpropagation/
I did not catch how did he achieve this part.
Could someone explain to me?
Thanks!
Copyright © 2021 JogjaFile Inc.
The goal is to change the weights so as to bring the loss function $E(\omega)$ to a global mininum; mathematically this can be described by the following update rule
$$\omega\rightarrow -\omega + \eta\nabla E(\omega)$$
with the gradient given by
$$\nabla E(\omega) = \left ( \frac{\partial E}{\partial \omega_1}, \frac{\partial E}{\partial \omega_2} , ..., \frac{\partial E}{\partial \omega_n} \right )$$
As stated by the author in the link, the loss function is given by
$$E(\omega) = \frac{1}{2}\sum\limits_{k=0}^n (y_i-f(x_j))^2$$
so the gradient of this is
$$\frac{\partial }{\partial \omega_i} E(\omega) = \frac{1}{2}\sum\limits_{k=0}^n \frac{\partial }{\partial \omega_i}(y_i-f(x_j))^2 = \frac{1}{2}\sum\limits_{k=0}^n \frac{\partial }{\partial \omega_i}(y_i^2-2y_i f(x_j) + f(x_j)^2)$$
$$... = \frac{1}{2}\sum\limits_{k=0}^n (-2y_i \frac{\partial }{\partial \omega_i}f(x_j) + \frac{\partial }{\partial \omega_i} f(x_j)^2) = - \sum\limits_{k=0}^n (y_i - f(x_j)) \frac{\partial }{\partial \omega_i} f(x_j)$$
The final derivative turns into
$$ \frac{\partial }{\partial \omega_i} f(x_j) = \sum\limits_{l=0}^n \frac{\partial \omega_l}{\partial \omega_i} x_j = \sum\limits_{l=0}^n \delta_{li} x_{j,l} = x_{j,i}$$
as the i-th term is the only who survives the partial derivative.
Now it's just a matter of putting all these components back into the first equation to get the final result.
Hope it helps!