What calculus theory I should study to understand back-propagation in general form?

140 Views Asked by At

I understand mnemonic of chain rule. But for example if I have some error function E and I want to find its first derivative against some matrix W. Or if I have some vector valued function V and I also want to find its derivative of some matrix W what kind of entities these would be?

$$\frac{\partial E}{\partial W} = ?$$ $$\frac{\partial V}{\partial W} = ?$$

These are tensors? Or multi dimensional arrays? What operation is between entities in some derivate chain, is a matrix multiplication, or dot product? What should I study to understand these entities tensor algebra, differential geometry?

I know that there are plenty materials in web that avoid this question completely, or introduce silly notation like using $$\partial w_{ij}$$ instead of $$\partial W$$ but I tired follow that I want to see the general form. I want to operate and see entities as they are

1

There are 1 best solutions below

1
On

I recommend reading the relevant parts of Tom Mitchells book

T. M. Mitchell, Machine learning, ser. McGraw Hill series in computer science. McGraw-Hill, 1997.

I made a short summary in my own words in my bachelors thesis

Things to understand for gradient descent in neural networks:

The Gradient

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ be a function:

$$f(x_1, x_2, \dots, x_n) = (F_1(x_1, x_2, \dots, x_n), F_2(x_1, x_2, \dots, x_n), \dots, F_m(x_1, x_2, \dots, x_n))$$

Then the gradient of $f$ is denoted by $\nabla f$ and $$\nabla f = \begin{pmatrix} \frac{\partial F_1}{\partial x_1}, & \frac{\partial F_1}{\partial x_2}& \dots &\frac{\partial F_1}{\partial x_n}\\ \frac{\partial F_2}{\partial x_1}, & \frac{\partial F_2}{\partial x_2}& \dots &\frac{\partial F_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial F_m}{\partial x_1}, & \frac{\partial F_m}{\partial x_2}& \dots &\frac{\partial F_m}{\partial x_n}\\ \end{pmatrix}$$

You can see that you can decompose the problem by output neuron ($F_i$)

The chain rule

(To be continued - I need to go to work now)