where is the kernel size and stride specified in convolution?

453 Views Asked by At

I am currently a bit in doubt about the definition of a convolution...

I just been reading the paper which describes convolution as way of transforming

input image -> "convolution" -> Convolution map

But the way they describe it in the math seems a bit off in my head - They seem to describe it as it was an neural network would get an input * weight + bias.

This is how they describe it math wise.

$$q_{j,m} =\sigma\left(\sum_{c = 0}^{\text{total}~ \text{number} ~ \text{column}} \sum_{r = 0}^{\text{total}~\text{number}~\text{rows}} \text{img}(r,c) \text{W}_{c,j,r} + W_{0,j}\right)$$

And the complete convolution map $Q_j$, is computed as:

$$Q_j = \sigma \left( \sum_{c = 0}^{\text{total}~\text{number} ~\text{columns}} \text{img}(:,c) \ast W_{c,j} \right) $$

img(r,c) is the pixel unit stored in row = r and column = c. img(:,c) is a row vector for column = c of the image.

And W is a weight matrix of size $\left( (C \cdot R) \times J\right)$

\begin{bmatrix} w_{111} & w_{121} & x_{131} & \dots & x_{1J1} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ w_{C11} & w_{I21} & x_{C31} & \dots & x_{CJ1} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ w_{C12} & w_{C22} & w_{C32} & \dots & x_{CJ2} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ w_{C1R} & w_{C2R} & w_{C3R} & \dots & x_{CJR} \\ \end{bmatrix}

$W_{c,j,r}$ is one unit in the weight matrix and $W_{c,j}$ is a 1d vector of length $\left(C \cdot R \right)$

Where is the actual convolution being stated?.. where is kernel size specified, and the stride. I guess the formula is a standard convolution, but where in the formula is the stride and kernel size defined?

I guess they are using multiple convolutions, which J describes. So the weight asscociated with one convolution is with weight vector $W_{c,j}$

About weight sharing! enter image description here

Given an image - Convolution is applied which reduces the depth of the image Is the weight connecting the convoluted image to the an output neutron the same or different?

Meaning will the weight be updated as one unit, or as an individual unit,

1

There are 1 best solutions below

4
On

In a general neural networks, two consecutive layers $u \in \mathbb{R}^N,v\in \mathbb{R}^M$ are related by $v = f(M u)$ (or $v = f(Mu+b)$ if you want a bias) where $M$ is a linear operator $\mathbb{R}^N\to \mathbb{R}^M$ (a matrix $n \times m$) and $f(x)_i = \tanh(x_i)$. The weights are the $M_{i,j}$.

In a convolutionnal layer, it is the same except $Mu$ is replaced by $u \ast h\ \ $ (or $d(u\ast h)$ where $d$ is a decimation operator).

So a convolutionnal layer is just a particular case of the general layer, the only difference being the number of weights, how they are shared among the layer, and how you'll compute the gradient (during the backpropagation) accordingly.