Consider a Convolutional Neural Network with the following architecture:
\begin{align} Input---C_1 P_1 --- C_2 P_2 ---Softmax \end{align}
Here $C_i$ refers to the $i^{th}$ convolutional layer and $P_i$ refers to the $i^{th}$ mean pooling layer. Corresponding to each layer will be an output. Let $\delta^{P_j}$ refer to the error in the output of layer $P_j$ (and same for $C_j$).
$\delta^{P_2}$ can be calculated easily using normal backpropagation equations since it is fully connected to the softmax layer. $\delta^{C_2}$ can be calculated simply by upsampling $\delta^{P_2}$ appropriately (and multiplying by gradient of output of $C_2$) since we are using mean pooling.
How do we propagate error from the output of $C_2$ to the output of $P_1$? In other words, how do we find $\delta^{P_1}$ from $\delta^{C_2}$?
Standford's Deep Learning tutorial uses the following equation to do this:
\begin{align} \delta_k^{(l)} = \text{upsample}\left((W_k^{(l)})^T \delta_k^{(l+1)}\right) \bullet f'(z_k^{(l)}) \end{align}
However I am facing the following problems in using this equation:
My ${W_k}^{(l)}$ has size (2x2) and $\delta_k^{(l+1)}$ has size (6x6), (I am using valid convolution, output of $P_1$ has size (13x13) and output of $P_2$ has size (6x6)). This inner matrix multiplication does not even makes sense in my case.
Equation assumes that the number of channels in both layers is same. Again this is not true for me. Output of $P_1$ has 64 channels while output of $C_2$ has 96 channels.
What am I doing wrong here? Can anybody please explain me how to propagate errors through a convolutional layer?
Simple MATLAB example will be highly appreciated.