Understanding the backpropagation algorithm

44 Views Asked by At

I am currently trying to implement back propagation as described in the Wikipedia article.

It defines the gradient of the weights in layer $l$ as: $$\delta^l (a^{l-1})^T$$

where $a^{l}$ is is the output of layer $l$.

The article says:

Note that $\delta^l$ is a vector, of length equal to the number of nodes in level $l$; [...]

The number of entries of vector $a^{l}$ is equal to the number of nodes in layer $l$. But how can one calculate $\delta^l (a^{l-1})^T$ if layer $l-1$ and layer $l$ have a different number of nodes?