Understanding backpropagation

737 Views Asked by At

I was studying some neural networks back propagation from

http://jeremykun.com/2012/12/09/neural-networks-and-backpropagation/

I did not catch how did he achieve this part.

Could someone explain to me?

enter image description here

Thanks!

1

There are 1 best solutions below

4
On

The goal is to change the weights so as to bring the loss function $E(\omega)$ to a global mininum; mathematically this can be described by the following update rule

$$\omega\rightarrow -\omega + \eta\nabla E(\omega)$$

with the gradient given by

$$\nabla E(\omega) = \left ( \frac{\partial E}{\partial \omega_1}, \frac{\partial E}{\partial \omega_2} , ..., \frac{\partial E}{\partial \omega_n} \right )$$

As stated by the author in the link, the loss function is given by

$$E(\omega) = \frac{1}{2}\sum\limits_{k=0}^n (y_i-f(x_j))^2$$

so the gradient of this is

$$\frac{\partial }{\partial \omega_i} E(\omega) = \frac{1}{2}\sum\limits_{k=0}^n \frac{\partial }{\partial \omega_i}(y_i-f(x_j))^2 = \frac{1}{2}\sum\limits_{k=0}^n \frac{\partial }{\partial \omega_i}(y_i^2-2y_i f(x_j) + f(x_j)^2)$$

$$... = \frac{1}{2}\sum\limits_{k=0}^n (-2y_i \frac{\partial }{\partial \omega_i}f(x_j) + \frac{\partial }{\partial \omega_i} f(x_j)^2) = - \sum\limits_{k=0}^n (y_i - f(x_j)) \frac{\partial }{\partial \omega_i} f(x_j)$$

The final derivative turns into

$$ \frac{\partial }{\partial \omega_i} f(x_j) = \sum\limits_{l=0}^n \frac{\partial \omega_l}{\partial \omega_i} x_j = \sum\limits_{l=0}^n \delta_{li} x_{j,l} = x_{j,i}$$

as the i-th term is the only who survives the partial derivative.

Now it's just a matter of putting all these components back into the first equation to get the final result.

Hope it helps!