What calculus theory I should study to understand back-propagation in general form?

Question

What calculus theory I should study to understand back-propagation in general form?

140 Views Asked by Bumbble Comm At 27 Mar 2026 - 3:07

I understand mnemonic of chain rule. But for example if I have some error function E and I want to find its first derivative against some matrix W. Or if I have some vector valued function V and I also want to find its derivative of some matrix W what kind of entities these would be?

$$\frac{\partial E}{\partial W} = ?$$ $$\frac{\partial V}{\partial W} = ?$$

These are tensors? Or multi dimensional arrays? What operation is between entities in some derivate chain, is a matrix multiplication, or dot product? What should I study to understand these entities tensor algebra, differential geometry?

I know that there are plenty materials in web that avoid this question completely, or introduce silly notation like using $$\partial w_{ij}$$ instead of $$\partial W$$ but I tired follow that I want to see the general form. I want to operate and see entities as they are

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2017-08-17 06:23:37

I recommend reading the relevant parts of Tom Mitchells book

T. M. Mitchell, Machine learning, ser. McGraw Hill series in computer science. McGraw-Hill, 1997.

I made a short summary in my own words in my bachelors thesis

Things to understand for gradient descent in neural networks:

The Gradient

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ be a function:

$$f(x_1, x_2, \dots, x_n) = (F_1(x_1, x_2, \dots, x_n), F_2(x_1, x_2, \dots, x_n), \dots, F_m(x_1, x_2, \dots, x_n))$$

Then the gradient of $f$ is denoted by $\nabla f$ and $$\nabla f = \begin{pmatrix} \frac{\partial F_1}{\partial x_1}, & \frac{\partial F_1}{\partial x_2}& \dots &\frac{\partial F_1}{\partial x_n}\\ \frac{\partial F_2}{\partial x_1}, & \frac{\partial F_2}{\partial x_2}& \dots &\frac{\partial F_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial F_m}{\partial x_1}, & \frac{\partial F_m}{\partial x_2}& \dots &\frac{\partial F_m}{\partial x_n}\\ \end{pmatrix}$$

You can see that you can decompose the problem by output neuron ($F_i$)

TODO: Example (e.g. https://math.stackexchange.com/a/1815055/6876 )
TODO: Explain $\frac{\partial f}{\partial x_1}$ notation
http://www.markusengelhardt.com/skripte/grad-div-rot.pdf (German)

The chain rule

(To be continued - I need to go to work now)

What calculus theory I should study to understand back-propagation in general form?

There are 1 best solutions below

The Gradient

The chain rule

Related Questions in LINEAR-ALGEBRA

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions