Backpropagation: why partial derivative, not full derivative?

402 Views Asked by At

After studying backpropagation for neural networks, I have a question: why can't we use full derivatives for backpropagation? I understand why partial derivatives work in backpropagation. However I wonder why we cannot (or should not) use full derivatives.

1

There are 1 best solutions below

0
On

It is because you ultimately want to find "in which direction" you should change the network's parameters (biases and weights), one by one, in order to minimize the loss. So you look at every parameter and tweak it slightly from its value and see if the loss increases or decreases: this is equivalent to computing

$$\frac{\partial \mathcal{L}}{\partial b^{(l)}_j} \hspace{0.5cm} or \hspace{0.5cm} \frac{\partial \mathcal{L}}{\partial w^{(l)}_{ij}}$$

where $(l)$ is the layer the bias or weight refers to.

To compute these quantities you use backpropagation (which is just the chain rule applied to neural networks) and this involves partial derivatives.