Backpropagation and chain rule (notation and calculus check)

31 Views Asked by At

Let's note $z^l$, $a^l$ to point the affine function and the activation function at the layer $l$ and $z^l(x)$, $a^l(x)$ the evaluation of the forward pass at the layer l:

$$x \in R^n\stackrel{z^{2}}{\longrightarrow} z^{2}(x) \stackrel{a^{2}/\sigma}{\longrightarrow} a^2(x) \stackrel{z^{3}}{\longrightarrow} \ldots \stackrel{a^{L-1}}{\longrightarrow} a^{L-1}(x) \in R^{n_{L-1}} \stackrel{z^{L}}{\longrightarrow}z^L(x) \in R^{n_L} \stackrel{a^{L}/SM}{\longrightarrow}a^L(x) \in R^{n_L} \stackrel{C}{\longrightarrow}C(x)\in R$$

Does the folowing is correct ?

  1. Input layer: $$x=a^1$$

  2. Affin layer: $$z^{l}=w^{l} a^{l-1}+b^{l} \leftrightarrow J_{a}z^l\left(a^{l-1}\right) \in R^{n_l \times n_{l-1}}$$

  3. Activation: $$\left.\begin{array}[c]{r@{}} a^{l}=\sigma\left(z^{l}\right) & \leftrightarrow J_{z}\sigma\left(z^{l}\right)\\ a^{L}=SM(z^L) & \leftrightarrow J_{z}SM\left(z^{L}\right) \end{array}\right\} \equiv J_za^l(z^l) \in R^{n_l \times n_{l-1}}$$

  4. Cost:

$$C=C(a^{L}) \leftrightarrow J_{a} C(a^{L}) \in R^{1 \times n_L}$$

and : $$\delta_{j}^{l} = \frac{\partial C}{\partial z_{j}^{l}} \leftrightarrow \delta^{l}= J_{z}C(z^l)\in R^{1 \times n_L}$$

The chain rule holds:

$\delta^{L}:=J_{z}C(z^L)=J_{a} C(a^{L}) J_{z}SM\left(z^{L}\right)$ et

$\begin{cases}\delta^{l-1}:=J_zC(z^{l-1})&= J_zC(z^{l})J_az^{l}(a^{l-1})J_za^{l-1}(z^{l-1}) \\ \delta^{l-1} &=\delta^lW^lJ_z\sigma(z^{l-1}) \\ \end{cases} $

$ \frac{\partial C}{\partial w_{j k}^{l}}=a_{k}^{l-1} \delta_{j}^{l}$

$ \frac{\partial C}{\partial b^l}=\delta^l$